How a Chinese Food Delivery Giant Built World-Class AI Models Faster Than Big Tech

How a Chinese Food Delivery Company Quietly Became an AI Powerhouse

Author: Aswin Anil

At first glance, it sounds like a tech meme.

A Chinese smartphone brand builds a large language model. That tracks. A Chinese Pinterest clone launches its own AI lab. Still believable. But a Chinese food delivery company releasing open-source AI models that rival Meta and DeepSeek?

That is where most people pause, laugh, and assume hype.

Except this time, the hype is backed by papers, benchmarks, and working systems.

In late 2025, Meituan—China’s largest food delivery and local services platform—quietly launched an AI research lab called LongCat. Within just four months, the lab released a base language model, a reasoning model, a multimodal model, an audio encoder, an image model, a video generation model, multiple datasets, and a new benchmark.

Some AI labs struggle to ship one serious model per year.

So what is really going on?

The “Chinese DoorDash” Label Misses the Point

Calling Meituan the “Chinese DoorDash” undersells the company by a massive margin.

Meituan generates nearly four times the revenue of DoorDash and operates at a scale closer to Uber than to any single-category delivery app. Founded in 2010 by Wang Xing, Meituan started as a group-buying platform before pivoting—twice—into the food delivery and local services giant it is today.

Its entry into food delivery even slightly predates Uber Eats.

More importantly, Meituan is not new to deep technical work. Since 2013, the company has published engineering blogs covering large-scale systems, databases, infrastructure, and machine learning. In 2019, it began publishing peer-reviewed AI research at top conferences such as NeurIPS, ICML, and CVPR.

As of 2025, Meituan-affiliated researchers have published 80+ papers on arXiv, excluding internal reports and engineering blogs. That output exceeds several well-known Chinese AI startups combined.

LongCat did not appear overnight. It emerged from a decade-long research culture.

Why LongCat’s Speed Shocked the AI World

In September 2025, Meituan officially introduced the LongCat AI Lab. Four months later, the lab had already released a portfolio that would normally take years.

Even more surprising, LongCat trained its first large language model—LongCat-Flash-Chat—in just 30 days.

This was not a shallow demo model.

The accompanying technical report spans 36 pages and documents everything from data curation to distributed training strategies and inference cost optimization. The paper dedicates eight full pages to deployment efficiency, including how the model achieves inference costs as low as $0.50 per million tokens.

Very few labs, open or closed, share this level of operational detail.

A Serious Language Model, Not a Copy-Paste Clone

LongCat-Flash-Chat does not reinvent transformers. That would be unrealistic.

Instead, it builds on proven ideas, including Mixture-of-Experts (MoE) and Multi-Head Latent Attention, similar to techniques used by DeepSeek and other frontier models.

The real innovation lies in how LongCat handles computation.

The team introduced a context-aware dynamic computation mechanism. In simple terms, not all tokens are equally difficult to predict. Some require deep reasoning, while others are trivial.

So why spend the same compute on both?

LongCat solves this by adding “zero-computation experts” into its MoE routing system. Easy tokens route to experts that do nothing. Hard tokens route to real experts that perform full computation.

The result is a model that dynamically adjusts its compute budget per token—without breaking the existing MoE pipeline.

This idea feels obvious in hindsight, which is usually the sign of good research.

How It Compares to Other Open Models

A fair comparison for LongCat-Flash-Chat is DeepSeek V3.1, released in August 2025. Both models use latent attention and MoE architectures, though they differ in parameter counts.

On public benchmarks, LongCat shows strong performance in:

  • Instruction following
  • Agentic tool use
  • Long-context coherence

Its reasoning and knowledge scores remain competitive, though not dominant.

Where LongCat truly stands out is efficiency. Serving at around 100 tokens per second on H100 GPUs, the model balances speed and cost in a way that makes real-world deployment practical.

The main reason it has not been widely adopted yet is simple: its custom MoE++ design requires infrastructure changes most users have not implemented.

LongCat Video: Open Innovation in a Closed Field

If the language model impressed researchers, LongCat Video surprised them.

High-quality video generation remains one of the most closed areas in AI. Most top-tier models remain proprietary, and public details are scarce.

LongCat broke that pattern.

The lab introduced a 3D block-sparse attention mechanism tailored for video latents. This approach retains under 10% of the compute cost of dense attention while preserving near-lossless quality.

They also shared a practical implementation using ring-based block sparse attention for efficient context parallelism.

Even more interesting, LongCat adapted Group Relative Policy Optimization (GRPO)—a technique popularized by DeepSeek—for flow-matching video models.

Few labs publish this level of detail. Almost none do it for video.

Unified Inputs: A Step Toward World Models

LongCat’s video model also unified:

  • Text-to-video
  • Image-to-video
  • Video continuation

Instead of relying on heavy cross-attention hacks, all three tasks share a single input format.

This matters.

Many current video models rely on brute-force cross-attention to bolt features together. That approach scales poorly and limits generalization.

Unified inputs move closer to true world modeling, where the system understands time, space, and causality in a consistent way.

The Omnimodal Model That Quietly Raised the Bar

LongCat’s most recent release may be its most underappreciated.

The lab extended LongCat-Flash into an omnimodal model that supports text, vision, and audio inputs, with both text and audio outputs.

While it is not fully symmetric—few models are—it performs on par with private systems such as:

  • Qwen Omni
  • GPT-4o-style multimodal stacks
  • Gemini 2.5-series models

The real contribution lies in infrastructure.

LongCat published detailed explanations of modality-decoupled parallelism, chunk-based modality bridges, and optimized streaming pipelines.

No other lab has shared this much about multimodal system design.

Why Meituan Could Move So Fast

The answer is not secrecy or shortcuts.

Meituan has spent over a decade solving large-scale optimization problems: routing millions of deliveries, predicting demand in real time, and operating one of China’s most complex logistics platforms.

Those skills transfer directly to AI infrastructure.

Efficiency, cost control, and system-level thinking define LongCat’s work. Most of its innovations focus on compute reduction, parallelism, and deployment—not flashy demos.

This explains why the lab shipped so much, so fast.

Why LongCat Matters in 2026

LongCat represents a shift.

It shows that frontier AI research no longer belongs only to dedicated AI startups or Western tech giants. Companies with deep engineering DNA and real-world scale can compete—and sometimes outpace—them.

For researchers, LongCat’s papers are a goldmine. They offer practical insights into large-scale training, multimodal systems, and inference optimization.

For the industry, LongCat proves that open research still has a place in a world moving toward closed models.

Final Thoughts

It is easy to laugh at the idea of a food delivery company building world-class AI.

It is harder to ignore 36-page papers, working models, and reproducible benchmarks.

LongCat did not win by magic. It won by doing the unglamorous work—systems, optimization, and efficiency—better than almost anyone else.

In 2026, this is one AI lab worth watching closely.


Sources:

  • Meituan Technical Blog (tech.meituan.com)
  • arXiv.org – Meituan & LongCat research papers
  • DeepMind “Chinchilla” Scaling Laws Paper (2022)
  • DeepSeek V3 Technical Reports