MiniMind-O: training a from-scratch omni model over a weekend
Weekend build
A hands-on plan for MiniMind-O — a ~0.1B end-to-end omni model that takes text/audio/image in and emits text + streaming speech, with the mini pipeline running in ~2 hours on one RTX 3090.