CARBON HELIX Carbon-3B
A T G C

🧬 What is Carbon?

Think of it as autocomplete, but for DNA instead of English.

  • DNA is a long string of just 4 letters: A, T, G, C. Every living thing's instructions are written in those letters — humans, mice, oak trees, viruses, bacteria.
  • Carbon-3B is to DNA what ChatGPT is to English — same kind of transformer, trained on real genomes instead of internet text. It learned the “spelling rules” of natural DNA.
  • You hand Carbon a starting DNA snippet. It predicts what would come next, six letters at a time. The app draws each predicted letter as a colored bead on a 3D double helix as it's emitted.

What you can do

  • Watch DNA appear in real time. Each bead is one letter Carbon just predicted.
  • See how confident the model is. Glowing, bright beads = Carbon was very sure. Dim or red-flickering beads = Carbon was guessing.
  • Steer toward different organisms using the metadata tag chips (vertebrate_mammalian, protein_coding_region).
  • Try meaningful seeds:
    • ATG-start — the universal “begin protein” signal cells use to read the start of a gene.
    • TATA box — a real promoter motif; cells look for this pattern to find where genes begin.
    • Random — gibberish letters. Carbon's confidence should drop because random isn't natural.
  • Adjust randomness with the Temperature slider. 0 = always pick the most likely letter (deterministic). Higher = take risks, more variety.
  • Hover over any bead to see its exact letter, log-probability, percentage probability, and position in the sequence. Great for inspecting why a particular bead is flickering red.
  • Move the helix: by default, drag rotates it; scroll or pinch to zoom; right-click drag (or two-finger drag on touch) pans it. If you'd rather have left-drag = pan, click the button (bottom-right) to toggle drag-pan mode.
  • Zoom buttons (+ / − / ⌖, bottom-right) zoom in, out, and reset the view if you get lost.

Reading the visualization

  • Beads — A   T   G   C
  • Cyan + purple wires = the two backbones of the DNA double helix. Real DNA has two strands twisting around each other.
  • Faint gray crossbars = base pairs (A↔T and G↔C) holding the two strands together.
  • Bead glow = how confident Carbon was about that letter.
  • Bottom bar chart = rolling confidence over the last 80 letters. Green = confident, amber = uncertain, red = surprising.

The HUD numbers (top-right)

  • tokens — 6-letter chunks generated so far.
  • bases — total DNA letters (tokens × 6).
  • mean log-p — average confidence. Closer to 0 is more confident. Around −2 to −3 is typical for free generation; −0.1 with temperature 0 means the model is very sure.
  • tok / sec — generation speed on the GPU.

Try this experiment

  1. Set seed to TATA box, temperature 0, both metadata tags ON. Hit Generate. Note the mean log-p.
  2. Now set seed to Random, same settings. Compare.
  3. The TATA box run should score noticeably better — Carbon recognizes a real biological signal vs. gibberish.

Why this is interesting beyond being pretty

Carbon's confidence on each letter is a real scientific signal. If you typed a real human gene and deliberately changed one letter (a mutation), the confidence at that position would drop sharply. That's how labs use models like Carbon to predict which mutations are likely to cause disease — it's been shown to match ClinVar-grade variant scoring. The glowing helix you see is the same math, just made watchable.

ATGΒ·start
TATA box
Random
vertebrate_mammalian
protein_coding_region
status idle
tokens 0
bases 0
mean log-p β€”
tok / sec β€”
log-probability ticker Β· last 80 tokens
ERROR