I Tested 48 Chinese Internet Companies for llms.txt: Zero of the 17 LLM Labs Got It Right

GEO (Generative Engine Optimization) has been the SEO-industry topic of the past six months in the West. WP Engine has published five separate articles about it in the last month, Cloudflare has shipped supporting bot-traffic dashboards, and Mintlify has built a one-click llms.txt feature into its product.

Inside China, the picture is different. The big SEO blogs have published almost no long-form Chinese articles on llms.txt. Baidu, Bytedance, and the other major search providers have shipped no "AI-friendly site" guides. Even introductory explainers are rare.

What makes this stranger: Chinese LLM companies — the people who, in theory, most understand what it means to be cited by a large language model — have not implemented llms.txt on their own websites. Not one of them.

I ran a script against 48 mainstream Chinese internet companies' /llms.txt to find out what was actually there. Here is what I found.

TL;DR: Of 48 sites, 5 (10.4%) had a real llms.txt. 17 returned an HTML fallback (a SPA framework catching the route, with the site's team unaware). 22 returned 404. Among the 17 Chinese LLM labs tested, zero had llms.txt — DeepSeek, Kimi, Zhipu, Tongyi (Qwen), Wenxin (Ernie), MiniMax, Tencent Hunyuan, iFlytek Spark, SenseTime, 01.ai, Baichuan — all absent. The most complete implementation is Alibaba Cloud's help documentation (53.9 KB, with four-language cross-links and nested per-product llms.txt sub-files). The least conformant is CSDN, which has shipped a llms.txt written in robots.txt syntax.

The candidate set

I split 48 sites across six buckets:

Chinese LLM labs (17): Baidu Wenxin, Zhipu Qingyan, Kimi / Moonshot, Tongyi Qianwen, DeepSeek, MiniMax, SenseTime, 01.ai, Baichuan, Tencent Hunyuan, iFlytek Spark, and others.
Chinese cloud providers (8): Alibaba Cloud, Tencent Cloud, Huawei Cloud, ByteDance Volcengine, Baidu Cloud, Qiniu, Upyun.
Chinese collaboration SaaS (9): Feishu, DingTalk, WeCom, Shimo, Yuque, Tencent Docs, WPS, Jinshuju.
Chinese developer communities (9): Gitee, CSDN, Juejin, SegmentFault, InfoQ China, Geekbang Shichang, Aliyun Developer, Tencent Cloud Developer.
Chinese hosting / webmaster tooling (3): West.cn, CNDNS, Aoyou.
Chinese docs platforms (2): Kdocs, Wolai.

For each site I did a plain HTTP GET to https://<domain>/llms.txt with a desktop browser User-Agent, no authentication. The body was inspected to determine whether it was actual markdown or an HTML shell from a SPA's catch-all route. As simple as it gets — exactly what any LLM crawler would do.

Overall: Chinese adoption is a quarter of international adoption

Outcome	China-side (48)	International reference (70)
Real `llms.txt`	5 (10.4%)	31 (44%)
HTML fallback	17 (35%)	9 (13%)
404	22 (46%)	22 (31%)
403 / timeout / other	4 (8%)	8 (11%)

The international reference data comes from the same probe run a week earlier against AI labs, docs platforms, and the Stripe / Cloudflare tier of Western tech companies. Side by side:

China-side adoption is roughly 4× lower than international.
China-side HTML-fallback rate is 2.7× the international rate — modern SPA frameworks are widely adopted, but nobody is monitoring what /llms.txt actually returns.
404 rates are not great either way, but China-side absolute numbers are higher.

A 10% adoption rate isn't catastrophic in itself — the standard is still early. But against the backdrop of "international peers treat GEO as a KPI," the gap is striking.

The 0/17 result: most counterintuitive data point

The 17 Chinese LLM labs I tested, one by one:

Company	Domain	Result
Baidu Wenxin Yiyan	yiyan.baidu.com	HTML fallback
Zhipu Qingyan	chatglm.cn / zhipuai.cn	HTML fallback × 2
Kimi / Moonshot	kimi.moonshot.cn / moonshot.cn	HTML fallback × 2
Tongyi Qianwen	tongyi.aliyun.com / qianwen.aliyun.com	404 × 2
DeepSeek	deepseek.com / chat / api-docs	404 / 202 / 404
MiniMax	minimaxi.com / platform	HTML fallback / 404
Tencent Hunyuan	hunyuan.tencent.com	HTML fallback
iFlytek Spark	xinghuo.xfyun.cn	404
SenseTime	sensetime.com	404
01.ai	01.ai	404
Baichuan	baichuan-ai.com	404

Zero implemented llms.txt.

The paradox is that these companies' core business is the LLM. They understand, better than anyone, what "being cited in an answer" means. Their support teams field "why did Claude cite Perplexity and not us" questions daily. Yet not a single second of engineering attention has gone into the file on their own sites.

Flipping the question: if even the model providers haven't done this, do they know something the SEO industry doesn't — namely, that llms.txt doesn't actually do anything?

That's an open question. I'll come back to it.

But one thing is certain: no Chinese LLM company currently treats "getting cited by AI answer engines" as a measurable goal. Otherwise the result wouldn't be a unanimous zero.

The one site that nailed it: Alibaba Cloud help

The five Chinese sites that have a real llms.txt:

Site	Size	Notes
Alibaba Cloud Help (help.aliyun.com)	53.9 KB	Industrial-grade, multilingual + nested per-product sub-files
Feishu Open Platform (open.feishu.cn)	11.4 KB	Structured by OpenAPI
Qiniu Cloud (qiniu.com)	10.5 KB	Bilingual Chinese/English
Gitee (gitee.com)	6.5 KB	English, positioned against GitHub
CSDN Blog (blog.csdn.net)	1.4 KB	Wrong format (covered below)

Alibaba Cloud's is worth a close look. The opening of help.aliyun.com/llms.txt:

# Alibaba Cloud Documentation

> Alibaba Cloud is one of the world's leading cloud computing and AI companies.
> ... This file provides a structured index of Alibaba Cloud's official
> documentation for LLMs and AI Agents.

## Available sites and languages

- Mainland (zh) - [llms.txt](https://help.aliyun.com/zh/llms.txt)
- Mainland (en) - [llms.txt](https://help.aliyun.com/en/llms.txt)
- International (zh) - [llms.txt](https://www.alibabacloud.com/help/zh/llms.txt)
- International (en) - [llms.txt](https://www.alibabacloud.com/help/en/llms.txt)
- International (ja) - [llms.txt](https://www.alibabacloud.com/help/ja/llms.txt)
- International (id) - [llms.txt](https://www.alibabacloud.com/help/id/llms.txt)

Four sites, six language variants, all cross-linked. Below this header, the file is organized by product category — AI/ML, compute, storage, databases, security, networking — and every product has its own llms.txt sub-file. Products like Bailian (their model platform), PAI, and DashVector each index their own documentation. The sub-files then link to the markdown source of every documentation page.

This "recursive sitemap" pattern is rare even internationally — only Anthropic, PostHog, and Stripe do it at this depth. Alibaba Cloud's is the only Chinese implementation that meets that bar.

Of the remaining four, Feishu Open Platform and Qiniu are clean, conformant, no overreach. Gitee writes its file in English — a deliberate move to be referenceable by GitHub Copilot-class tools.

And then there's CSDN, which deserves its own section.

CSDN wrote a robots.txt and called it llms.txt

CSDN Blog does have a /llms.txt file. Opening it:

# llms.txt for https://blog.csdn.net/
# Last updated: 2025-10-22
# Purpose: Define access and usage rules for large language model (LLM) crawlers

########################################
# 1. General Rules
########################################

Disallow: /images/
Disallow: /content/
Disallow: /ui/
Disallow: /js/

Allow: /article/
Allow: /column/
Allow: /tag/

# 2. Usage Policy
Policy: Summarization with Source Attribution
Policy: No Redistribution of Raw Files

# 3. Attribution and Licensing
Citation: https://blog.csdn.net/
License: CC BY-NC-ND 4.0
Contact: gaoyang@csdn.net

Notice the fields: Disallow: / Allow: — that's robots.txt syntax. The Policy: / License: / Citation: fields are from a different proposal entirely (an "AI training authorization" file sometimes called ai.txt).

What is llms.txt? llmstxt.org itself defines it cleanly:

A # Title H1.
A > blockquote introduction.
## H2 sections.
Markdown links pointing to your content.

CSDN's file has zero markdown links. There's no index pointing to "what's worth reading." The whole document is a list of what AI is not allowed to do.

This is a classic case of mental-model transplant. The CSDN team translated their robots.txt mindset directly onto the new file. They assumed llms.txt was a "rules for AI crawling" file, but the actual spec is "actively tell LLMs where the good content is" — a sibling of sitemap, not of robots.txt.

If ChatGPT, Claude, or Kimi ever start recommending sources via llms.txt, CSDN's response is "only summarized attributions are allowed." The LLM gets the access rules. It doesn't get told CSDN has good articles about PyTorch fundamentals or Go concurrency.

CSDN has shipped a wall, thinking it shipped a doorway.

HTML fallback is 3× worse in China than internationally

International samples had 9 HTML-fallback cases (13%). China-side has 17 (35%) — close to 3× the rate.

The China-side fallback list is painful to read:

AI labs: Baidu Wenxin, Zhipu (two domains), Kimi (two domains), Moonshot, MiniMax, Tencent Hunyuan
Cloud: Tencent Cloud, Volcengine, Upyun
SaaS: DingTalk, Jinshuju
Developer communities: Aliyun codeup, Juejin, InfoQ China, Geekbang Shichang
Webmaster tooling: CNDNS

If any of these companies' SRE teams ran a curl against their own /llms.txt, they'd see HTTP 200 OK. CDN dashboards would show green. Nothing is broken according to any monitoring tool. But the body is a Vue or React shell — <div class="..."> and a list of JavaScript bundle URLs.

An LLM crawler reading the response gets "this is a webpage." It gets no index of the site's content. From the crawler's perspective, this is indistinguishable from a 404. From the site owner's perspective, they think they've done the right thing.

Why is China-side HTML fallback so much worse? Two guesses:

Chinese SaaS apps predominantly run Vue, Nuxt, or in-house SPA stacks. The default catch-all route hands every unknown path to the frontend router, which renders the app shell. No one specifically handles a new path like /llms.txt.
Chinese operations monitoring focuses on 200/404 ratio, not response body content. A green 200 passes the check.

The cost to fix? Effectively zero. A single config rule in nginx / Next.js / Nuxt to either serve a real file or 404 the path explicitly. One line of code. But nobody's looking, so nobody's changing it.

Does any of this actually work?

The hard part of writing this audit is admitting: there is no public evidence that llms.txt has measurable impact on LLM citations.

I looked specifically for:

Data showing sites with llms.txt get cited more often in Perplexity, ChatGPT, Claude, Baidu Search AI, or Tongyi Answers than equivalent sites without? Not found.
Are LLM crawlers actually requesting /llms.txt? Cloudflare publishes some bot traffic data, but not for this specific path.
Any A/B-tested case study showing that adding llms.txt changed citation share or referral traffic? Mostly absent — what exists reads like vendor marketing.

Back to the open question: Chinese LLM labs at 0/17 — is this because they know it doesn't work, or because they're as confused as everyone else?

I lean toward the latter. The international picture supports that: Mintlify sells llms.txt as a feature, and its own marketing site returns 404. Jeremy Howard proposed the file, and his own site fast.ai is 404. The entire industry is writing tutorials, and almost no one is actually doing it. That looks less like "we have data" and more like "we're all waiting for someone else to go first."

One thing is certain: the cost to do this is near zero. A correctly-formatted plain text file, a few hundred to a few thousand bytes, can be done in under fifteen minutes.

A few observations

Not advice — the internet already has too much llms.txt advice. Just things I noticed after staring at 5 China-side samples and 70 international ones for a day.

1. The China-side GEO conversation is six to twelve months behind international. Western SEO blogs are debating "is GEO the new SEO." Chinese SEO blogs have almost no long-form articles about llms.txt. That's either an opportunity (move first, claim the territory) or a signal (the people closest to SEO have done the math and skipped it). Take your pick.

2. The biggest China-side problem isn't 404, it's HTML fallback. 35% HTML-fallback is harder to fix than 46% 404 — because the first group thinks they've done it. Any China-side team running Vue, Nuxt, or Next.js: curl your own /llms.txt today and read the bytes.

3. Alibaba Cloud's implementation is directly copyable. Multi-language cross-links plus per-product nested sub-files — this is the same architecture Anthropic and PostHog use internationally. Any Chinese documentation site can adopt the pattern straight.

4. CSDN's robots-flavored llms.txt is likely to be copied. llms.txt and ai.txt are different files solving different problems — the former says "here's what I have," the latter says "here's what AI may do." Neither is a W3C standard yet, neither has an official arbiter. But if your goal is "be cited by an LLM," what you write is llms.txt, not robots-style access rules.

5. The 0/17 result among Chinese LLM labs is the biggest free signal Chinese SEO has. If they're all sitting it out, ordinary site owners need not panic about being late. On the other hand, if you're a Chinese SaaS, WordPress site owner, or hosting reseller, doing this puts you ahead of 80%+ of your industry. Cost is near zero. Downside risk is near zero.

About the data

48 Chinese candidate sites probed on 2026-05-16 with vanilla HTTP GET and a desktop browser User-Agent. 5 returned real llms.txt. 17 returned HTML fallback. 22 returned 404. 4 errored out otherwise. The probe script, raw responses, and per-site analysis are archived.

Explicit limitations:

Only the root /llms.txt path was tested. Some sites may have real files on subdomains (like docs.xxx.com) or versioned paths.
The optional /llms-full.txt variant from the spec was not tested.
Some sites block automated requests by IP or User-Agent, which may have caused false-positive HTML fallback or 403 results.
Only 48 Chinese candidates were sampled, not the full set of top Chinese internet companies.

But the 10% China-side adoption number, and the unanimous-zero result for the 17 LLM labs, should be directionally clear. If anyone runs the same probe and gets meaningfully different results — or has published data showing llms.txt actually drives AI citations — I'd genuinely like to see it.

微

Editor

微码宝 Expert Team

Focused on WHMCS and WordPress deep customization, with 500+ enterprise projects delivered.