3-year AI video generation progress comparison (Modelscope vs Grok Imagine v1)
| | |

3-year AI video generation progress comparison (Modelscope vs Grok Imagine v1)

Three Years. One Comparison. No Going Back.

Min Choi posted a side-by-side video comparison this week that genuinely stopped me mid-scroll. Left side: Modelscope, 2023. Right side: Grok Imagine v1, April 2026. Same basic prompt, same general idea, three years apart. If you haven’t seen it, go find it at https://x.com/minchoi/status/2043743669258174918 and give it 30 seconds. I’ll wait.

The gap between those two clips does not look like three years of software iteration. It looks like a different civilization made the second one.

What Modelscope Actually Was

Let’s be fair to where we started. When Modelscope dropped in 2023, it was genuinely remarkable. Blurry motion, flickering artifacts, subjects that morphed into abstract soup if you looked too long. But it moved. It generated coherent-ish video from a text prompt, on consumer hardware, for free. People lost their minds, and rightfully so. I remember running it and thinking “okay, this is real now.” The bar was low, but clearing it still mattered.

Modelscope was also a signal about where compute was heading. Vast.ai was already renting RTX 3090s with 24GB VRAM for fractions of a dollar per hour. The infrastructure for video generation was quietly getting commoditized while everyone was still arguing about image generators.

What Grok Imagine v1 Is Producing

The right side of Min Choi’s comparison is a different conversation entirely. Clean motion. Subjects that hold their shape across frames. Lighting that behaves like actual physics. The kind of output that, three years ago, required a render farm, a compositing team, and a budget that most independent creators could never touch.

I want to be precise here: this isn’t “better Modelscope.” It’s a qualitative phase shift. The artifacts aren’t reduced, they’re largely gone. Motion isn’t smoother, it’s coherent. Those are different things. One is improvement on a curve, the other is crossing a threshold.

The Exponential Trap We Keep Falling Into

Here’s what I actually want to talk about, because the comparison is almost a secondary point.

We are terrible at internalizing exponential progress while we are inside it. Each individual model release feels incremental. “Oh, the new version handles hands slightly better.” “Motion blur is a bit more realistic.” Each step reads as modest. Then you look at a three-year side-by-side and your brain short-circuits.

This is the same cognitive failure that made people consistently underestimate transformer scaling, underestimate image generation quality, underestimate LLM reasoning. We anchor to last month’s benchmark and miss the slope entirely.

The 36 months between Modelscope and Grok Imagine v1 compressed what would have taken a decade of traditional software development. That rate is not slowing down. If anything, the compute curves and architectural improvements suggest the next 36 months will feel even more compressed in retrospect.

What This Actually Means for Video Production

The practical consequences are already landing. A solo creator today can generate b-roll, concept visualizations, and short narrative clips that would have required production infrastructure two years ago. That’s not a distant possibility, it’s happening now.

The more uncomfortable implication is for mid-tier production work. The kind of video that didn’t need a Hollywood budget but did need a real crew. Corporate explainers, product demos, event recaps. That category is being carved out quickly, and the people in it need to be honest with themselves about the timeline.

I’m not saying human creative direction becomes irrelevant. I’m saying the execution layer, the part between idea and rendered output, is getting radically compressed. The value shifts toward taste, judgment, and knowing what to make, not how to make it render.

Three Years Is Nothing

That’s what this comparison should make viscerally clear. Three years in AI video is nothing. It’s a rounding error on a research timeline. And the gap between 2023 and 2026 is already borderline incomprehensible.

Whatever you think video generation will look like in 2029, you’re probably undershooting. The people building on top of these tools right now, understanding their failure modes and their strengths, are the ones who will have the clearest picture when the next comparison screenshot circulates and everyone else acts surprised again.

🎬

Sources & Further Reading

#AIVideo #GenerativeAI #VideoGeneration #MachineLearning #ArtificialIntelligence

Watch the full breakdown on YouTube

Sources & Further Reading

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *