Paper notes on ConceptSeg-R1 — RL-style reasoning over SAM 3's prompt space, using Meta-GRPO to infer transferable segmentation rules from a few visual demonstrations.
Paper notes on NVIDIA's LocateAnything — a 3B vision-language grounding model that treats boxes as atomic units via Parallel Box Decoding, hitting 10× the throughput of comparable VLMs.
Paper notes on QuCo-RAG — using entity frequency and co-occurrence in a 4T-token corpus as a retrieval trigger, instead of the model's own unreliable uncertainty.
Press Esc to close.