FRIDAY · 01 MAY 2026

Michael English

Clonmel · Co. Tipperary · Ireland
Open source · Topic 04

The Chinese open-source AI stack — a practical guide

DeepSeek, Qwen, GLM, MiniMax, Kimi, Yi. The open-source side of the AI market in 2026 is overwhelmingly Chinese. Here is how to use it sensibly without giving up anything that matters.

Most western teams in 2026 still treat the Chinese open-source model ecosystem as either an academic curiosity or a vague compliance risk. Both framings cost the team money. The reality on the ground is that the open-source frontier — the models you can pull, run on your own hardware, and put into production with no per-call fees — is overwhelmingly Chinese. If you are not running at least one of these models for at least one workload, you are paying frontier-API prices for work that does not need a frontier API.

This is a practical guide. We will not argue about geopolitics. The question is which model you should be reaching for next Wednesday morning when an engineer asks where the workload should run.

The lineage tree, briefly

The open-source side has consolidated around a handful of model families. The lineage matters because the architecture decisions and the training-data choices propagate down the tree, and the failure modes are inherited.

When each one is the right choice

Below is the cheat-sheet we use at IMPT.io. It is opinionated, and your mileage will vary, but it should accelerate your first three weeks.

Where they aren't the right choice

Three honest caveats. First, the absolute frontier of capability is still the western models — Claude 4.7 Opus, GPT-class flagship, and a handful of others — for the genuinely hard tasks where capability per token is the constraint. If the work is “reason about a novel architecture under uncertainty,” the frontier wins. Second, the safety profiles of the open-weight models are different. They will refuse different things and accept different things, and your evaluation harness needs to test for what your specific use case requires. Third, fine-tuning quality across the families is uneven. The mature ones (Qwen, DeepSeek) are easy to fine-tune well; some of the others are not, and you'll spend more time on the tuning loop than expected.

Where they run

The whole point is that you can run them anywhere. Three sensible deployment patterns:

1. On your own GPU box, on your own premises

The cheapest pattern at volume and the one we use most heavily at IMPT.io. A single mid-tier inference server can serve a 70B-class model at production latencies for a small team. The capex pays back inside a year if the workload is non-trivial.

2. On a European GPU cloud

If you don't want to own hardware, several European providers (Scaleway, Hetzner with H100s, OVHcloud) host these models at well-priced inference endpoints. You keep workloads in EU jurisdiction. We use this for spillover.

3. Behind a managed API in Europe or the US

Together AI, Fireworks, Groq, Cerebras and similar managed providers give you the cheap-per-token economics of the open models with no operational overhead. Your data leaves your network, but it leaves into a US or EU jurisdiction of your choice — not the model's country of origin.

For most regulated EU workloads, pattern (1) or (2) is the answer. Pattern (3) is fine for non-sensitive work and faster to start with.

Data residency and the China question

The most common reason teams give for not using these models is data residency. The reason does not survive contact with how the models actually work. The model weights are open. You download them once. After that, every inference call happens on hardware you control or rent. None of your data goes to the country that produced the model. The model has no telemetry phone-home — they are static binary blobs of weights — and you can verify this by running them in a network-isolated environment, which is in fact how we run several of our most sensitive workloads.

The legitimate concerns are about training data and licence terms. Read the licences. Most of the major open-weights models from these labs ship under permissive commercial licences; a couple have non-commercial restrictions or revenue thresholds. None of this is exotic, but it does need a five-minute legal review per family. We will do that review in the workshop.

What we'll do in the workshop

Day 2 morning of the Clonmel workshop is hands-on. We will pull DeepSeek, Qwen, and Yi onto laptops and onto a shared GPU instance, run identical workloads through each, and look at the outputs side-by-side. By lunchtime everyone in the room will have a working open-source inference setup on their machine and a clear sense of which family fits which slot in their existing stack.

Reserve your seat →

Reserve a seat in Clonmel

This topic is one of seven covered in the AI Brain workshops. Two open weekends — 25–26 July & 29–30 August 2026. Free admission. All welcome.

Register