The first time someone explained vector clocks to me, I came away thinking they were a clever trick for ordering writes in a Dynamo-style database. That is true, but it sells them short. Vector clocks are interesting because they force you to be precise about a question you usually avoid: which of these two events knew about which?
Happens-before, restated
Given two events a and b, exactly one of these is true: a → b, b → a, or they are concurrent. Scalar Lamport timestamps collapse cases 1 and 2 onto a total order but lose case 3 entirely. If a has timestamp 4 and b has timestamp 7, you cannot tell whether b actually saw a or whether they are concurrent events that happened to land at those numbers.
Vector clocks preserve the partial order exactly. If neither VC(a) < VC(b) nor VC(b) < VC(a), the events are concurrent. That third case is the whole point.
Two things that took me too long to internalize
- Vector clocks tell you that two events are concurrent. They do not tell you what to do about it. The merge policy is a separate decision.
- Comparison cost is O(N) in replicas. For systems with churn, this is the actual reason people move to dotted version vectors, not theoretical elegance.
I keep coming back to vector clocks because they are a small, sharp tool for asking a question most systems pretend does not exist.