🧬 Carbon Helix — DNA continuation in 3D

🧬 CARBON HELIX Carbon-3B

A T G C

🧬 What is Carbon?

Think of it as autocomplete, but for DNA instead of English.

DNA is a long string of just 4 letters: A, T, G, C. Every living thing's instructions are written in those letters — humans, mice, oak trees, viruses, bacteria.
Carbon-3B is to DNA what ChatGPT is to English — same kind of transformer, trained on real genomes instead of internet text. It learned the “spelling rules” of natural DNA.
You hand Carbon a starting DNA snippet. It predicts what would come next, six letters at a time. The app draws each predicted letter as a colored bead on a 3D double helix as it's emitted.

What you can do

Watch DNA appear in real time. Each bead is one letter Carbon just predicted.
See how confident the model is. Glowing, bright beads = Carbon was very sure. Dim or red-flickering beads = Carbon was guessing.
Steer toward different organisms using the metadata tag chips (vertebrate_mammalian, protein_coding_region).
Try meaningful seeds:
- ATG-start — the universal “begin protein” signal cells use to read the start of a gene.
- TATA box — a real promoter motif; cells look for this pattern to find where genes begin.
- Random — gibberish letters. Carbon's confidence should drop because random isn't natural.
Adjust randomness with the Temperature slider. 0 = always pick the most likely letter (deterministic). Higher = take risks, more variety.
Hover over any bead to see its exact letter, log-probability, percentage probability, and position in the sequence. Great for inspecting why a particular bead is flickering red.
Move the helix: by default, drag rotates it; scroll or pinch to zoom; right-click drag (or two-finger drag on touch) pans it. If you'd rather have left-drag = pan, click the ✻ button (bottom-right) to toggle drag-pan mode.
Zoom buttons (+ / − / ⌖, bottom-right) zoom in, out, and reset the view if you get lost.

Reading the visualization

Beads — A T G C
Cyan + purple wires = the two backbones of the DNA double helix. Real DNA has two strands twisting around each other.
Faint gray crossbars = base pairs (A↔T and G↔C) holding the two strands together.
Bead glow = how confident Carbon was about that letter.
Bottom bar chart = rolling confidence over the last 80 letters. Green = confident, amber = uncertain, red = surprising.

The HUD numbers (top-right)

tokens — 6-letter chunks generated so far.
bases — total DNA letters (tokens × 6).
mean log-p — average confidence. Closer to 0 is more confident. Around −2 to −3 is typical for free generation; −0.1 with temperature 0 means the model is very sure.
tok / sec — generation speed on the GPU.

Try this experiment

Set seed to TATA box, temperature 0, both metadata tags ON. Hit Generate. Note the mean log-p.
Now set seed to Random, same settings. Compare.
The TATA box run should score noticeably better — Carbon recognizes a real biological signal vs. gibberish.

Why this is interesting beyond being pretty

Carbon's confidence on each letter is a real scientific signal. If you typed a real human gene and deliberately changed one letter (a mutation), the confidence at that position would drop sharply. That's how labs use models like Carbon to predict which mutations are likely to cause disease — it's been shown to match ClinVar-grade variant scoring. The glowing helix you see is the same math, just made watchable.

Seed DNA 12 bp

ATG·start

TATA box

Random

Metadata tags

vertebrate_mammalian

protein_coding_region

Max tokens 128 ≈768 bp

Temperature 0.70

Top-p 0.90

status idle

tokens 0

bases 0

mean log-p —

tok / sec —

log-probability ticker · last 80 tokens

ERROR