Core concepts
The mental model
Before any command, the idea. Tamper Signal borrows three things you already trust in the physical world: a receipt, a tamper-evident seal, and a chain of custody. Hold those pictures in your head and the rest of the tool reads like common sense.
Tamper Signal proves continuity, not correctness. It can prove nobody changed your data between the source and the dashboard. It cannot tell you the source was right to begin with. Keep that line bright and everything else falls into place.
The receipt
Every time a stage of your pipeline runs, it hands you a receipt: a small signed record that says "here is the data I received, here is the code I ran on it, and here is exactly what I produced." It is a plain file on disk. No database, no server, no account.
Think of it like
A receipt from a shop. It does not promise the milk was fresh. It promises that on this date, this register, run by this cashier, rang up these exact items for this exact total. Months later you can hold it up and check the arithmetic.
data received = the items rung in · code that ran = the cashier · data produced = the printed total · signature = the register's stamp that says this slip is genuine
A receipt captures four things: a fingerprint of the data going in, a fingerprint of the code, a fingerprint of the data coming out, and a set of human-legible totals. Then it signs the whole record so it cannot be quietly edited later.
The chain
One receipt is interesting. A chain of them is the whole point. The receipts link because each stage's input fingerprint must equal the previous stage's output fingerprint. Stage two received exactly what stage one produced, stage three received exactly what stage two produced, all the way to the dashboard.
Think of it like
A chain of custody for evidence. Every person who touches the bag signs and dates the label, and each signature has to match the handoff before it. If one link is missing or the seal was broken between two signatures, the custody is compromised, and you know the exact handoff where it happened.
each handoff = a pipeline stage · matching signatures = input hash equals prior output hash · a broken seal = a hash that does not match · the chain has exactly two honest endings: intact or broken at a named link
To verify, the tool walks the chain link by link, checking every signature and every join. The chain file (chain.json) is just the ordered list of receipts plus the public key they were signed with. The first receipt is special: it is the source.
Two fingerprints: evidence and semantic
A "fingerprint" here is a hash: a short string that changes completely if the thing it describes changes by even one byte. Tamper Signal takes two fingerprints of your data, because there are two different questions worth answering.
Evidence hash: is this the same file?
The evidence hash is taken once, at ingest, over the raw bytes of the original export, exactly as it left the source system. It is never recomputed downstream. It pins the original artifact in place.
Semantic hash: is this the same data?
The semantic hash is taken over the meaning of the data after a careful canonicalization, so it stays stable even if you re-save the file or convert formats. An xlsx export and a CSV copy of the same rows produce the same semantic hash. Row order is deliberately not part of it (rows are sorted before hashing), and numeric-looking text hashes as the number it parses to, so a typeless format like CSV cannot move the hash on its own.
Think of it like
Two ways to recognize a person. The evidence hash is a photograph of them in the exact coat they wore that day: change the coat and the photo no longer matches. The semantic hash is their actual fingerprint: the same person in any coat, any lighting, still matches. You want both. One proves you have the original file; the other proves you have the same data even after it has been re-wrapped.
evidence hash answers "is this the same file" · semantic hash answers "is this the same data"
Control totals
A hash is binary. It says the data changed; it will not say how. That is what control totals are for: a handful of human-legible aggregates recorded in every receipt. Row counts, sums of numeric columns, date ranges, null counts.
Think of it like
The subtotal line at the bottom of the receipt. The register stamp tells you the slip is genuine. The subtotal tells you the order came to forty-eight dollars, not fifty. When a chain breaks, the hashes say broken, and the totals say by how much: row_count 48212 -> 48190 (-22). Twenty-two rows went missing at this exact stage.
hashes say the chain is broken · control totals say how broken, in numbers a human can read
The light
All of that machinery collapses into one glanceable verdict: the light. It is the user-facing answer, expressed as a traffic light, and it is the only thing most people will ever look at.
Green. Every link verifies and every signature checks. The data made it from the source to the dashboard unchanged. The light is green, the data is clean.
Yellow. Verifiable, but with caveats worth a human glance: a gap in receipt coverage, a signing key you did not pre-trust, or control-total drift you asked to be warned about. The light is yellow, a human should look.
Red. The chain is broken at a specific link, and you get the stage, the expected and found fingerprints, and the totals delta. The light is red, the chain is broken.
Yellow never blames; it asks. Red does not panic; it points. The same three states show up everywhere: as CLI exit codes (0 green, 2 yellow, 1 red), as the badge on a dashboard, and as the inline status light.
The signing key
A receipt is only trustworthy because it is signed. Signing uses an Ed25519 keypair: a private key that only you hold and that produces signatures, and a public key that anyone can use to recognize them. The private key never leaves your control and never belongs in the repository; the public key is safe to publish.
Think of it like
A wax seal and the signet ring that makes it. Only the ring's owner can press a genuine seal, but anyone who has seen it can recognize one at a glance. The ring stays on your hand (the private key); a picture of the seal goes to everyone who needs to check your letters (the public key).
private key = the signet ring, kept on your hand · public key = the picture of the seal you hand out · rotating keys just means handing out a new picture while old letters still verify
The anchor
The local key is the day-to-day root of trust, but it has one gap: whoever holds the private key could re-sign a whole new chain and call it the original. An anchor closes that gap for the moments that matter. It records your chain in an external, append-only public log (a Sigstore transparency log) under your identity, witnessed independently of your key.
Think of it like
Getting a document notarized and time-stamped at the courthouse. The notary does not vouch that the contract is wise. They vouch that this exact document existed, in this exact form, on this date, witnessed by someone who is not you. Later, even you cannot quietly swap in a different version and pretend it was always there.
anchor = a public, independent witness that this exact chain existed at this time · it proves existence-at-a-time, never correctness, and says nothing about the moments before it
Spec version and golden vectors
Two supporting ideas keep the fingerprints honest across time and across languages.
The spec version is the edition of the canonicalization rulebook, recorded in every receipt. If the rules for turning data into bytes ever change, the version bumps and receipts regenerate. Old chains still verify internally, and the recorded version explains any before-and-after difference.
Golden vectors are the reference standard: known inputs paired with the exact canonical bytes and hash they must produce. Every implementation, the Python core, the Node port, the in-browser verifier, has to reproduce them byte for byte. If a port ever drifts, a test fails loudly instead of a user seeing a false tamper verdict.
Think of it like
The reference kilogram once kept in Paris. Every scale in the world had to agree with that one bar. Golden vectors are the reference bar for canonicalization: Python, JavaScript, and the browser all weigh against the same standard, so a chain signed in one stack verifies in another.
spec version = which edition of the rulebook was used · golden vectors = the shared reference standard every implementation must match
Putting it together
A source file is ingested and gets its evidence and semantic fingerprints plus a source receipt. Each transform is wrapped so it signs a receipt linking its input to the previous output. The receipts form a chain. Verification walks the chain and reduces it to a light: green if every link holds, yellow if something deserves a glance, red at the exact link that broke, with the totals delta to show what moved. Optionally, an anchor witnesses the whole chain in public. Through all of it, the claim never changes: it cannot tell you the data is right, but it can prove nobody changed it.
Receipt a signed record of one stage. Chain the linked sequence. Evidence hash the original file's fingerprint. Semantic hash the data's fingerprint, stable across formats. Control totals the human-legible aggregates. The light the traffic-light verdict. Anchor the public witness. Spec version the rulebook edition. Golden vectors the cross-language reference standard.