Tweet-GPT

A transformer you can see through.

Role
Design & build
When
2026
Stack
  • PyTorch
  • ONNX
  • TypeScript

Most language models are black boxes: you put text in, text comes out, and whatever happened in between is invisible. I wanted to build one small enough to see all the way through — so I trained a character-level GPT from scratch on public tweets, exported it to ONNX, and wired it into this page. It runs entirely in your browser. No server, no API call.

As it generates a tweet one character at a time, it shows you two things: the probability it assigned to every character it could have picked next, and the attention it cast back over its own text while making that choice. That's the glass box — not a metaphor, the actual numbers.

How it works

The model is ~0.85M parameters: 4 transformer layers, 4 attention heads, 128-dimensional embeddings, and a 128-character context window.
The vocabulary is 152 frequent characters plus one end-of-tweet token — 153 total. The "tokenizer" is a ~150-entry lookup dict; no library involved.
Trained on ~93k public tweets from the cardiffnlp/tweet_eval dataset. Best validation loss ~1.60, against a random baseline of ln(153) ≈ 5.03.
The trained weights were exported to ONNX and run in the browser via onnxruntime-web (WASM backend). The model never leaves your machine after the initial load.

The architecture follows Andrej Karpathy's nanoGPT / "Zero to Hero" series — a minimal decoder-only transformer, implemented from first principles in PyTorch, then ported to the browser. The interesting work was not the training; it was deciding what to expose and building the visualizer around the raw logits and attention weights the model already computes.