3-year AI video generation progress comparison (Modelscope vs Grok Imagine v1)
3 years of AI video generation, side by side.
Min Choi posted a comparison this week that stopped me cold. Modelscope on the left from 2023. Grok Imagine v1 on the right from today.
The difference is not incremental. It looks like a decade of progress compressed into 36 months.
Modelscope was genuinely impressive when it dropped. Blurry, yes. Artifacts everywhere, sure. But it moved. It generated. People were amazed. I was amazed.
Now look at what Grok Imagine is producing. Clean motion, coherent subjects, realistic lighting. The kind of output that, three years ago, required a team, a render farm, and a serious budget.
Here’s what I keep thinking about.
We’re terrible at internalizing exponential curves while we’re inside them. Each individual step feels modest. “Oh, the new model is a bit better.” But when you stack those steps and look back even 36 months, it’s jarring.
The engineers and researchers doing this work have essentially rebuilt the ceiling every single year. And the ceiling keeps moving.
What this means practically for anyone building with AI right now: the thing you ship today will look quaint in 18 months. That’s not a reason to wait. It’s a reason to build fast, learn from real users, and stay close to what the models can actually do, not what you read about them doing six months ago.
The teams that win in this environment are the ones that stay in motion. Not the ones that wait for the “mature” version of the technology.
The 2023-to-2026 video gap also tells you something about where image and audio quality are heading. If video moved this fast, modalities that started from a higher baseline are going to get genuinely disorienting to track.
I’m not saying that to hype the space. I’m saying it because the planning horizon for any AI product is shorter than most people are treating it.
Build accordingly.
#AI #GenerativeAI #MachineLearning #AIVideo #BuildingWithAI
