Do UUIDs Collide? Implementation and Operational Patterns That Invite Duplicates
· Go Komura · UUID, Identifiers, Distributed Systems, Data Design, Implementation
You used UUIDs as primary keys, and one day duplicate key shows up.
At that moment, there is a good chance someone says, “So UUIDs do collide after all.”
In practice, however, most UUID duplicates are not a problem with the UUID specification itself but cases where the implementation or operations break the generation conditions the spec assumes. Under RFC 9562, UUIDv4 has 122 bits of random space, and UUIDv7 is also defined on the assumption that the 74 bits outside the timestamp are used for randomness or counters that provide uniqueness. UUIDv8, on the other hand, is explicitly described as implementation-specific, and uniqueness must not be assumed.123
The Python standard library also documents that uuid4() is generated using a cryptographically secure method, so as long as you “use a proper implementation in the normal way,” the guarantees on the UUID side are quite strong.4
In this article, we organize the typical patterns where incorrect operations or implementations make UUIDs collide, together with measures to prevent recurrence. The content is based on RFC 9562, the official Python documentation, and the official PostgreSQL documentation as verifiable as of March 2026.546
1. The Conclusion First
To summarize up front, these are the dangerous patterns.
| Pattern | What happens | First countermeasure |
|---|---|---|
| Hand-rolling UUIDv4-like values with a fixed seed or weak PRNG | The same sequence is reproduced in another process or node | Use the OS / runtime standard UUID API |
| Carrying over generator state as is after fork, VM snapshot, or container cloning | Random or counter state rewinds and duplicates appear | Re-seed after fork, re-initialize after clone, review how persistent state is handled |
| Using UUIDv3 / v5 under the misconception that they yield “a new ID every time” | The same UUID is regenerated from the same namespace and same name | Understand they are deterministic IDs and restrict their use |
| Implementing UUIDv1 / v6 / v7 / v8 yourself and handling clock rollback or node/counter carelessly | Duplicates become likely under high-frequency generation or across multiple nodes | Use existing libraries and reduce custom generators |
| Truncating UUIDs midway or squashing them into another format | You throw away the original 128-bit uniqueness yourself | Store and compare at full length |
| Not placing UNIQUE / PRIMARY KEY on the DB side | Duplicates slip in silently and root-cause analysis is delayed | Keep a uniqueness constraint at the storage layer |
In short, rather than “the UUID collided,” it is usually that the uniqueness you expected from the UUID was shaved away somewhere in the design.
2. Suspect the Generation and Operations First, Not the “Math of UUIDs”
UUID discussions get confusing because the properties differ by version.
- UUIDv4 is random-based. Under RFC 9562, the 122 bits other than version / variant are filled with random data.1
- UUIDv7 has a structure that sorts well chronologically; in addition to a Unix millisecond timestamp, the rest is composed of randomness or a carefully seeded counter.2
- UUIDv3 / v5 are name-based. Given the same namespace and the same canonical name, producing the same UUID is the correct behavior.7
- UUIDv8 is for experimental and vendor-specific use, and its uniqueness is implementation-specific. The RFC says uniqueness must not be assumed.3
So even if you say “we use UUIDs,” the story changes completely depending on whether it is
- the standard library’s
uuid4() - a homemade
timestamp + random uuid5(namespace, name)- or a custom format that merely looks like UUIDv8.
flowchart TD
A[UUID duplicate is found] --> B{Where did the same value really come from}
B --> C[Weak generator]
B --> D[State was rewound]
B --> E[Misuse of name-based UUID]
B --> F[Truncated at storage time]
B --> G[No uniqueness constraint on the DB side]
C --> H[Implementation mistake]
D --> H
E --> H
F --> H
G --> H
In practice, working from the right side of this diagram is faster.
3. Pattern 1: Calling It UUIDv4 While Actually Using a Weak PRNG
This is the most common one.
- Building 128 bits with a general-purpose PRNG equivalent to
Math.random() - Seeding at startup with
time()or the PID - Hand-assembling “32 hex digits that look like UUID format”
It may look like a UUID, but if the random source is weak, the same sequence is reproduced in another process or another node.
RFC 9562 says a CSPRNG should be used, both for UUID uniqueness and for unguessability. This is a recommendation (SHOULD), so exceptions can be designed for some use cases, but if you hand-roll UUIDs with a general-purpose PRNG you should be able to explain why. Furthermore, it states that the CSPRNG state should be properly re-seeded upon state changes such as a process fork.8
Python’s uuid.uuid4() is likewise documented as generating random UUIDs in a cryptographically secure way.4
The practical conclusion here is simple.
- Do not hand-roll UUIDs
- Do not fiddle with random seeds by hand
- Use the standard library or a widely used implementation as is
Keeping a custom generator around “because it’s lightweight” or “because we’ve always used it” is what ends up costing the most later.
4. Pattern 2: Rewinding Generator State via fork, Snapshot, or Clone
The second most dangerous thing is operations where the generator state gets duplicated or rewound.
RFC 9562 explicitly recommends re-seeding after fork, and explains that implementations without stable storage have to generate clock sequences, counters, and random data more frequently, which raises the probability of duplicates.89
A practical line of reasoning follows naturally from this.
- Restoring multiple instances of the same image after taking a VM snapshot
- A custom generator starting from the same initial state every time a container image boots
- Sharing PRNG state or counter state across worker forks
Under these operations, the UUID generation sequence can be unintentionally reproduced. The RFC does not literally say “snapshots are dangerous,” but this is a very practical caution you can derive from its notes on re-seeding after fork and on handling generator state.89
Countermeasures look like this.
- Do not hold custom UUID generator state for long
- Re-initialize immediately after fork / clone / restore
- Where possible, lean on implementations that draw OS-provided randomness every time
- For high-frequency generators, document the state management and re-seeding specification explicitly
5. Pattern 3: Misreading UUIDv3 / v5 as “a New ID Every Time”
UUIDv3 / v5 are not random IDs that resist collision. They are deterministic IDs that can regenerate the same ID from the same name.
RFC 9562 states that UUIDs generated from the same name in canonical format within the same namespace must be equal.7 So with usage like the following, duplicates are not an accident — they are the specified behavior.
- Using
uuid5(NAMESPACE_URL, "https://example.com/users/42")as “fresh ID assignment” every time - Issuing IDs from a namespace shared across all customers plus an email, without putting the tenant into the namespace
- Assuming that re-issuing the same logical name on each retry will yield a different ID
Conversely, if the canonicalization of the name is inconsistent, you get different UUIDs for the same subject. The RFC repeatedly stresses the handling of the canonical representation.710
What matters in this family is three things:
- UUIDv3 / v5 are not “collision-free ID assignment” but “same input, same ID”
- Do not leave the namespace design vague
- Specify the canonicalization of names explicitly
6. Pattern 4: Hand-Implementing Time-Based UUIDs or UUIDv8
UUIDv1 / v6 / v7 / v8 are dangerous to imitate by appearance only.
6.1 Handling node or clock sequence carelessly in UUIDv1 / v6
Under RFC 9562, UUIDv6 is a field-reordered UUIDv1 designed to improve DB locality, and it deals with clock sequences and nodes. The RFC also carries multiple cautions about node collision resistance in distributed environments and about state retention.11912
Moreover, the RFC goes as far as saying that with the advent of virtual machines and containers, the uniqueness of MAC addresses can no longer be guaranteed.5
So designs like
- assuming “it’s a MAC address, so it must be unique”
- replicating a node ID baked into an image
- resetting the clock sequence to a fixed value on every restart
are dangerous.
6.2 Hand-rolling UUIDv7 and ignoring counter rollover or clock rollback
UUIDv7 is quite practical, but the RFC is careful about monotonicity and counter handling under high-frequency generation. It also explicitly states that implementations must not knowingly return duplicates on clock rollback or counter rollover.213
Which means implementations like
- issuing large volumes within the same millisecond with no counter design
- continuing to generate without doing anything when the clock goes backwards
- multiple processes each initializing the same internal counter independently
are risky.
6.3 Treating UUIDv8 lightly, as if it were just “the new UUID spec”
UUIDv8 looks convenient, but RFC 9562 is quite clear: the uniqueness of UUIDv8 is implementation-specific and must not be assumed.3
So a “company-proprietary UUID” that
- embeds a timestamp
- embeds a shard ID
- embeds some business meaning
- and fills the rest with whatever randomness
means that design document is itself your UUID uniqueness specification. It is far too dangerous to introduce without review.
7. Pattern 5: Shortening the UUID Along the Way
Even if generation is done correctly, things can be broken at the storage or comparison stage.
Typical examples:
- Using only the first 8 characters as a stand-in for a foreign key
- Squashing a 128-bit UUID into a 64-bit integer
- A string column too short, so the tail gets cut off
- Treating the shortened representation used in logs or on screen as the unique key
What matters here is that changing the representation is not inherently bad.
- Removing hyphens
- Normalizing to lower / upper case
- Storing as 16 binary bytes
Transformations like these, which do not drop any of the 128 bits, are fine. What is dangerous is a transformation that shaves away the very material of uniqueness.
In particular, a design where a separate “human-friendly short ID” was created and then quietly started taking precedence over the real UUID is prone to incidents.
8. Pattern 6: No Uniqueness Constraint on the DB Side
And this one is especially important.
Even if UUIDs are sufficiently collision-resistant, if you truly cannot tolerate duplicates, the storage destination should also carry a uniqueness constraint.
The official PostgreSQL documentation explains that a unique constraint guarantees that the value of a column or group of columns is unique across the whole table, and that a primary key is a row identifier that is unique and not null.6
RFC 9562 also says that while UUIDs can provide sufficient uniqueness in practice, true global uniqueness can never be absolutely guaranteed, and that for uses where the impact of a collision is high, stronger countermeasures should be taken.14
In practice, this combination is the baseline.
- Use UUIDs as IDs that are unlikely to collide
- Keep UNIQUE / PRIMARY KEY in the DB as the last line of defense
- Design retry / idempotency / incident logging for the duplicate case
Using UUIDs and omitting uniqueness constraints are not the same thing.
9. A Practical Checklist
Finally, here is a form you can use directly for adoption or audits.
- Check whether you are generating UUIDs yourself
If you can move to standard APIs like
uuid4()/uuid7(), do that first. - Decide the UUID version as part of the specification State explicitly that v4/v7 are random-based, v3/v5 are deterministic, and v8 is a custom specification.
- Inventory how seeds and generator state are handled Make sure the same state is not carried over after fork, worker restart, snapshot, or clone.
- Confirm full length is preserved at storage time Do not use prefix comparison or shortened displays as the actual key.
- Place UNIQUE / PRIMARY KEY in the DB A UUID is a mechanism that lowers the probability; it is not a constraint itself.
- Make duplicates observable Do not swallow duplicate key errors; make it traceable which generator / node / deployment produced them.
10. Summary
UUID collision incidents usually start not because the UUID is weak, but because the implementation or operations break the assumptions the UUID relies on.
- Hand-rolling with weak randomness
- Rewinding state after fork or snapshot
- Using name-based UUIDs for fresh ID assignment
- Casually hand-implementing v7 or v8
- Truncating along the way and discarding uniqueness
- Removing the uniqueness constraint on the DB side
Do any of these, and it is almost the same as actively constructing a situation where collisions become likely.
When you find a duplicate, what you should suspect first is not the math of UUIDs but the generator, state management, storage format, and constraint design. Look in that order, and the cause usually narrows down quickly.
11. Related Articles
- How to Use FileSystemWatcher Safely - Pitfalls of Missed Events, Duplicate Notifications, and Completion Detection
- Fundamentals of Exclusive Control in File Integration - Best Practices for File Locking and Atomic Claims
12. References
-
IETF RFC 9562, Section 5.4 UUID Version 4. On the 122-bit random space of UUIDv4. ↩ ↩2
-
IETF RFC 9562, Section 5.7 UUID Version 7. On the design of UUIDv7’s timestamp, random bits, and counters. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 5.8 UUID Version 8. On UUIDv8 uniqueness being implementation-specific and not to be assumed. ↩ ↩2 ↩3
-
Python 3.14 documentation,
uuidmodule. Onuuid4()’s cryptographically-secure generation,uuid5()’s deterministic behavior, and the properties ofuuid7()/uuid8(). ↩ ↩2 ↩3 -
IETF RFC 9562, Universally Unique IDentifiers (UUIDs). The baseline document for the UUID format, each version, and best practices overall. ↩ ↩2
-
PostgreSQL documentation, Constraints. On guaranteeing uniqueness via UNIQUE constraints and PRIMARY KEY. ↩ ↩2
-
IETF RFC 9562, Section 6.5 Name-Based UUID Generation. On the same namespace + same name yielding the same UUID, and the importance of canonicalization. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 6.9 Unguessability. On CSPRNG use and re-seeding after fork. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 6.3 UUID Generator States. On handling stable storage and generator state. ↩ ↩2 ↩3
-
IETF RFC 9562, Section 5.5 UUID Version 5. On the specification of name-based UUIDs built from a namespace plus a canonical name. ↩
-
IETF RFC 9562, Section 5.6 UUID Version 6. On UUIDv6’s node / clock sequence / DB locality. ↩
-
IETF RFC 9562, Section 6.4 Distributed UUID Generation. On node collision resistance in distributed environments. ↩
-
IETF RFC 9562, Section 6.2 Monotonicity and Counters. On cautions around clock rollback, counter rollover, and batch generation. ↩
-
IETF RFC 9562, Sections 6.7 and 6.8. On the thinking behind collision resistance and global uniqueness. ↩
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Real-Time Systems Programming in Ada — Priorities, Periodic Execution, and CPU Time Control in Practice
A practical deep dive into Ada's Annex D real-time features — task priorities, the Ceiling_Locking protocol, drift-free periodic executio...
Fable Is Gone — Don't Give Up: OpenRouter Fusion + Chinese LLMs + Review Layer
Fable is nowhere near replaceable. But combine OpenRouter Fusion with 5 Chinese LLMs, then add a review layer (GPT-5.5-Pro or Codex PR re...
Safe Concurrency with Ada — A Practical Guide to Tasks and Protected Objects
A practical introduction to Ada's built-in concurrency model — tasks, rendezvous, and protected objects. Covers the rendezvous pattern (e...
The Appeal of the Ada Language — Expressing Design Through Types and Powering Software That Runs for Decades
An introduction to the appeal of the Ada language: strong typing, range constraints, separation of specification and implementation via p...
What Is MFC on Windows? Foundational Knowledge for Maintaining Existing Assets
An overview of the Microsoft Foundation Classes (MFC): its relationship to Win32, application structure, message maps, Document/View, DDX...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Technical Consulting & Design Review
UUID collision questions span not just the spec itself but random sources, snapshot operations, DB constraints, and idempotency, so they are worth working through as a design review or technical consultation.
Bug Investigation & Root Cause Analysis
In real duplicate-ID incidents you need to determine whether the UUID itself is at fault or the implementation and operations are, so organizing the investigation angles and designing recurrence prevention is essential.
Author Profile
Profile page for the article author.
Go Komura
Representative of KomuraSoft LLC
Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.
Public links