Do UUIDs Collide? Implementation and Operational Patterns That Invite Duplicates

· · UUID, Identifiers, Distributed Systems, Data Design, Implementation

You used UUIDs as primary keys, and one day duplicate key shows up. At that moment, there is a good chance someone says, “So UUIDs do collide after all.”

In practice, however, most UUID duplicates are not a problem with the UUID specification itself but cases where the implementation or operations break the generation conditions the spec assumes. Under RFC 9562, UUIDv4 has 122 bits of random space, and UUIDv7 is also defined on the assumption that the 74 bits outside the timestamp are used for randomness or counters that provide uniqueness. UUIDv8, on the other hand, is explicitly described as implementation-specific, and uniqueness must not be assumed.123 The Python standard library also documents that uuid4() is generated using a cryptographically secure method, so as long as you “use a proper implementation in the normal way,” the guarantees on the UUID side are quite strong.4

In this article, we organize the typical patterns where incorrect operations or implementations make UUIDs collide, together with measures to prevent recurrence. The content is based on RFC 9562, the official Python documentation, and the official PostgreSQL documentation as verifiable as of March 2026.546

1. The Conclusion First

To summarize up front, these are the dangerous patterns.

Pattern What happens First countermeasure
Hand-rolling UUIDv4-like values with a fixed seed or weak PRNG The same sequence is reproduced in another process or node Use the OS / runtime standard UUID API
Carrying over generator state as is after fork, VM snapshot, or container cloning Random or counter state rewinds and duplicates appear Re-seed after fork, re-initialize after clone, review how persistent state is handled
Using UUIDv3 / v5 under the misconception that they yield “a new ID every time” The same UUID is regenerated from the same namespace and same name Understand they are deterministic IDs and restrict their use
Implementing UUIDv1 / v6 / v7 / v8 yourself and handling clock rollback or node/counter carelessly Duplicates become likely under high-frequency generation or across multiple nodes Use existing libraries and reduce custom generators
Truncating UUIDs midway or squashing them into another format You throw away the original 128-bit uniqueness yourself Store and compare at full length
Not placing UNIQUE / PRIMARY KEY on the DB side Duplicates slip in silently and root-cause analysis is delayed Keep a uniqueness constraint at the storage layer

In short, rather than “the UUID collided,” it is usually that the uniqueness you expected from the UUID was shaved away somewhere in the design.

2. Suspect the Generation and Operations First, Not the “Math of UUIDs”

UUID discussions get confusing because the properties differ by version.

  • UUIDv4 is random-based. Under RFC 9562, the 122 bits other than version / variant are filled with random data.1
  • UUIDv7 has a structure that sorts well chronologically; in addition to a Unix millisecond timestamp, the rest is composed of randomness or a carefully seeded counter.2
  • UUIDv3 / v5 are name-based. Given the same namespace and the same canonical name, producing the same UUID is the correct behavior.7
  • UUIDv8 is for experimental and vendor-specific use, and its uniqueness is implementation-specific. The RFC says uniqueness must not be assumed.3

So even if you say “we use UUIDs,” the story changes completely depending on whether it is

  • the standard library’s uuid4()
  • a homemade timestamp + random
  • uuid5(namespace, name)
  • or a custom format that merely looks like UUIDv8.
UUID duplicate is foundWhere did the same value really come fromWeak generatorState was rewoundMisuse of name-based UUIDTruncated at storage timeNo uniqueness constraint on the DB sideImplementation mistake

In practice, working from the right side of this diagram is faster.

3. Pattern 1: Calling It UUIDv4 While Actually Using a Weak PRNG

This is the most common one.

  • Building 128 bits with a general-purpose PRNG equivalent to Math.random()
  • Seeding at startup with time() or the PID
  • Hand-assembling “32 hex digits that look like UUID format”

It may look like a UUID, but if the random source is weak, the same sequence is reproduced in another process or another node.

RFC 9562 says a CSPRNG should be used, both for UUID uniqueness and for unguessability. This is a recommendation (SHOULD), so exceptions can be designed for some use cases, but if you hand-roll UUIDs with a general-purpose PRNG you should be able to explain why. Furthermore, it states that the CSPRNG state should be properly re-seeded upon state changes such as a process fork.8 Python’s uuid.uuid4() is likewise documented as generating random UUIDs in a cryptographically secure way.4

The practical conclusion here is simple.

  • Do not hand-roll UUIDs
  • Do not fiddle with random seeds by hand
  • Use the standard library or a widely used implementation as is

Keeping a custom generator around “because it’s lightweight” or “because we’ve always used it” is what ends up costing the most later.

4. Pattern 2: Rewinding Generator State via fork, Snapshot, or Clone

The second most dangerous thing is operations where the generator state gets duplicated or rewound.

RFC 9562 explicitly recommends re-seeding after fork, and explains that implementations without stable storage have to generate clock sequences, counters, and random data more frequently, which raises the probability of duplicates.89

A practical line of reasoning follows naturally from this.

  • Restoring multiple instances of the same image after taking a VM snapshot
  • A custom generator starting from the same initial state every time a container image boots
  • Sharing PRNG state or counter state across worker forks

Under these operations, the UUID generation sequence can be unintentionally reproduced. The RFC does not literally say “snapshots are dangerous,” but this is a very practical caution you can derive from its notes on re-seeding after fork and on handling generator state.89

Countermeasures look like this.

  • Do not hold custom UUID generator state for long
  • Re-initialize immediately after fork / clone / restore
  • Where possible, lean on implementations that draw OS-provided randomness every time
  • For high-frequency generators, document the state management and re-seeding specification explicitly

5. Pattern 3: Misreading UUIDv3 / v5 as “a New ID Every Time”

UUIDv3 / v5 are not random IDs that resist collision. They are deterministic IDs that can regenerate the same ID from the same name.

RFC 9562 states that UUIDs generated from the same name in canonical format within the same namespace must be equal.7 So with usage like the following, duplicates are not an accident — they are the specified behavior.

  • Using uuid5(NAMESPACE_URL, "https://example.com/users/42") as “fresh ID assignment” every time
  • Issuing IDs from a namespace shared across all customers plus an email, without putting the tenant into the namespace
  • Assuming that re-issuing the same logical name on each retry will yield a different ID

Conversely, if the canonicalization of the name is inconsistent, you get different UUIDs for the same subject. The RFC repeatedly stresses the handling of the canonical representation.710

What matters in this family is three things:

  • UUIDv3 / v5 are not “collision-free ID assignment” but “same input, same ID”
  • Do not leave the namespace design vague
  • Specify the canonicalization of names explicitly

6. Pattern 4: Hand-Implementing Time-Based UUIDs or UUIDv8

UUIDv1 / v6 / v7 / v8 are dangerous to imitate by appearance only.

6.1 Handling node or clock sequence carelessly in UUIDv1 / v6

Under RFC 9562, UUIDv6 is a field-reordered UUIDv1 designed to improve DB locality, and it deals with clock sequences and nodes. The RFC also carries multiple cautions about node collision resistance in distributed environments and about state retention.11912

Moreover, the RFC goes as far as saying that with the advent of virtual machines and containers, the uniqueness of MAC addresses can no longer be guaranteed.5

So designs like

  • assuming “it’s a MAC address, so it must be unique”
  • replicating a node ID baked into an image
  • resetting the clock sequence to a fixed value on every restart

are dangerous.

6.2 Hand-rolling UUIDv7 and ignoring counter rollover or clock rollback

UUIDv7 is quite practical, but the RFC is careful about monotonicity and counter handling under high-frequency generation. It also explicitly states that implementations must not knowingly return duplicates on clock rollback or counter rollover.213

Which means implementations like

  • issuing large volumes within the same millisecond with no counter design
  • continuing to generate without doing anything when the clock goes backwards
  • multiple processes each initializing the same internal counter independently

are risky.

6.3 Treating UUIDv8 lightly, as if it were just “the new UUID spec”

UUIDv8 looks convenient, but RFC 9562 is quite clear: the uniqueness of UUIDv8 is implementation-specific and must not be assumed.3

So a “company-proprietary UUID” that

  • embeds a timestamp
  • embeds a shard ID
  • embeds some business meaning
  • and fills the rest with whatever randomness

means that design document is itself your UUID uniqueness specification. It is far too dangerous to introduce without review.

7. Pattern 5: Shortening the UUID Along the Way

Even if generation is done correctly, things can be broken at the storage or comparison stage.

Typical examples:

  • Using only the first 8 characters as a stand-in for a foreign key
  • Squashing a 128-bit UUID into a 64-bit integer
  • A string column too short, so the tail gets cut off
  • Treating the shortened representation used in logs or on screen as the unique key

What matters here is that changing the representation is not inherently bad.

  • Removing hyphens
  • Normalizing to lower / upper case
  • Storing as 16 binary bytes

Transformations like these, which do not drop any of the 128 bits, are fine. What is dangerous is a transformation that shaves away the very material of uniqueness.

In particular, a design where a separate “human-friendly short ID” was created and then quietly started taking precedence over the real UUID is prone to incidents.

8. Pattern 6: No Uniqueness Constraint on the DB Side

And this one is especially important.

Even if UUIDs are sufficiently collision-resistant, if you truly cannot tolerate duplicates, the storage destination should also carry a uniqueness constraint.

The official PostgreSQL documentation explains that a unique constraint guarantees that the value of a column or group of columns is unique across the whole table, and that a primary key is a row identifier that is unique and not null.6

RFC 9562 also says that while UUIDs can provide sufficient uniqueness in practice, true global uniqueness can never be absolutely guaranteed, and that for uses where the impact of a collision is high, stronger countermeasures should be taken.14

In practice, this combination is the baseline.

  • Use UUIDs as IDs that are unlikely to collide
  • Keep UNIQUE / PRIMARY KEY in the DB as the last line of defense
  • Design retry / idempotency / incident logging for the duplicate case

Using UUIDs and omitting uniqueness constraints are not the same thing.

9. A Practical Checklist

Finally, here is a form you can use directly for adoption or audits.

  1. Check whether you are generating UUIDs yourself If you can move to standard APIs like uuid4() / uuid7(), do that first.
  2. Decide the UUID version as part of the specification State explicitly that v4/v7 are random-based, v3/v5 are deterministic, and v8 is a custom specification.
  3. Inventory how seeds and generator state are handled Make sure the same state is not carried over after fork, worker restart, snapshot, or clone.
  4. Confirm full length is preserved at storage time Do not use prefix comparison or shortened displays as the actual key.
  5. Place UNIQUE / PRIMARY KEY in the DB A UUID is a mechanism that lowers the probability; it is not a constraint itself.
  6. Make duplicates observable Do not swallow duplicate key errors; make it traceable which generator / node / deployment produced them.

10. Summary

UUID collision incidents usually start not because the UUID is weak, but because the implementation or operations break the assumptions the UUID relies on.

  • Hand-rolling with weak randomness
  • Rewinding state after fork or snapshot
  • Using name-based UUIDs for fresh ID assignment
  • Casually hand-implementing v7 or v8
  • Truncating along the way and discarding uniqueness
  • Removing the uniqueness constraint on the DB side

Do any of these, and it is almost the same as actively constructing a situation where collisions become likely.

When you find a duplicate, what you should suspect first is not the math of UUIDs but the generator, state management, storage format, and constraint design. Look in that order, and the cause usually narrows down quickly.

12. References

  1. IETF RFC 9562, Section 5.4 UUID Version 4. On the 122-bit random space of UUIDv4.  2

  2. IETF RFC 9562, Section 5.7 UUID Version 7. On the design of UUIDv7’s timestamp, random bits, and counters.  2 3

  3. IETF RFC 9562, Section 5.8 UUID Version 8. On UUIDv8 uniqueness being implementation-specific and not to be assumed.  2 3

  4. Python 3.14 documentation, uuid module. On uuid4()’s cryptographically-secure generation, uuid5()’s deterministic behavior, and the properties of uuid7() / uuid8() 2 3

  5. IETF RFC 9562, Universally Unique IDentifiers (UUIDs). The baseline document for the UUID format, each version, and best practices overall.  2

  6. PostgreSQL documentation, Constraints. On guaranteeing uniqueness via UNIQUE constraints and PRIMARY KEY.  2

  7. IETF RFC 9562, Section 6.5 Name-Based UUID Generation. On the same namespace + same name yielding the same UUID, and the importance of canonicalization.  2 3

  8. IETF RFC 9562, Section 6.9 Unguessability. On CSPRNG use and re-seeding after fork.  2 3

  9. IETF RFC 9562, Section 6.3 UUID Generator States. On handling stable storage and generator state.  2 3

  10. IETF RFC 9562, Section 5.5 UUID Version 5. On the specification of name-based UUIDs built from a namespace plus a canonical name. 

  11. IETF RFC 9562, Section 5.6 UUID Version 6. On UUIDv6’s node / clock sequence / DB locality. 

  12. IETF RFC 9562, Section 6.4 Distributed UUID Generation. On node collision resistance in distributed environments. 

  13. IETF RFC 9562, Section 6.2 Monotonicity and Counters. On cautions around clock rollback, counter rollover, and batch generation. 

  14. IETF RFC 9562, Sections 6.7 and 6.8. On the thinking behind collision resistance and global uniqueness. 

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

This article connects naturally to the following service pages.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog