Paper notes on ConceptSeg-R1 — RL-style reasoning over SAM 3's prompt space, using Meta-GRPO to infer transferable segmentation rules from a few visual demonstrations.
Paper notes on NVIDIA's LocateAnything — a 3B vision-language grounding model that treats boxes as atomic units via Parallel Box Decoding, hitting 10× the throughput of comparable VLMs.
A hands-on plan for MiniMind-O — a ~0.1B end-to-end omni model that takes text/audio/image in and emits text + streaming speech, with the mini pipeline running in ~2 hours on one RTX 3090.
Notes on OpenBMB's UltraData collection — an L0–L4 data pyramid (Ultra-FineWeb, UltraData-Math, UltraData-SFT) battle-tested on MiniCPM5-1B and released open.
A workflow note for running Apple's ml-sharp monocular 3D Gaussian Splatting model inside ComfyUI via the ComfyUI-Sharp custom node. Setup, graph, gotchas.
Paper notes on QuCo-RAG — using entity frequency and co-occurrence in a 4T-token corpus as a retrieval trigger, instead of the model's own unreliable uncertainty.
A 30-question Chinese-market workplace personality test, 28 public types plus 2 hidden branches, all on Cloudflare with edge-rendered share posters. Live at wbtilab.xyz.
Press Esc to close.