You are designing a table and the ID column has to be something. You open the docs and see UUID, nanoid, ULID, SHA-256, auto-increment, snowflake, random string. They all produce "a unique-ish string", they all look vaguely the same in a column, and most posts on the internet will just tell you "use UUID" without saying which kind or why.
This is the decision framework I wish I had when I first picked an ID strategy. It covers the three real categories — deterministic IDs, random IDs, sequential IDs — how UUID, nanoid, and hashes fit into each, the collision math that matters in practice, and the specific anti-patterns (MD5 for security, UUID v1 in public data, SHA-1 in 2026) that you want to avoid.
If you only remember one rule: a primary key is a random or sequential ID, never a hash. Hashes are for content-addressed lookups — the same input must always give the same ID. Once that distinction clicks, most of the other choices get easy.
The three categories of identifier
Every ID you will ever generate falls into one of three buckets. The category decides what the ID can do for you.
1. Random IDs
Drawn from a cryptographically random source, with enough bits that collisions are astronomically unlikely. Examples: UUID v4 (122 bits of randomness), nanoid (default 126 bits), a random string of 24 alphanumeric characters. Random IDs have no meaning, carry no information, and cannot be regenerated — if you lose it, you lose it.
Use cases: database primary keys, API tokens, session IDs, one-off links, anything that needs to be unguessable and unique.
2. Sequential or time-based IDs
Not purely random — they encode a timestamp or a monotonically increasing counter, often with a random tail. Examples: UUID v7 (48-bit timestamp + random), ULID, Twitter snowflake, SERIAL/AUTOINCREMENT in SQL, nanoid with a timestamp prefix. The key property is that later IDs sort after earlier ones, which is great for database index performance and chronological ordering.
Use cases: high-write databases, event logs, anywhere you care about insertion order or want fast range queries by time.
3. Deterministic IDs (hashes)
Derived from content — the same input always produces the same ID. Examples: SHA-256 of a file's contents, MD5 of a URL, a cache key built from function arguments. They are not "generated"; they are computed. And because the same input gives the same output, hashes are the opposite of random IDs — you cannot use them as primary keys for rows that contain the same content twice, because both rows would get the same ID.
Use cases: content-addressed storage (git commits, IPFS), cache keys, deduplication keys, file integrity checks, ETags.
UUID in detail
UUID is the default answer for "give me a unique ID" in most ecosystems. It is a 128-bit value, usually written as 32 hexadecimal digits in 5 dash-separated groups. There are several versions, and the version you pick matters more than people realise.
UUID v4 — pure random
The most common variant. Each v4 has 6 bits reserved for version and variant markers, leaving 122 bits of pure randomness. A typical one looks like 9b2a3c4d-5e6f-4a7b-8c9d-0e1f2a3b4c5d — the 4 at the start of the third group marks it as v4.
The collision math is reassuring. With 2^122 possible values, you would need to generate about 2^61 (2.3 quintillion) UUIDs before you hit a 50% chance of a single collision. For context, if you generated a billion UUIDs per second, it would take about 73 years to reach that threshold. For every practical application, UUID v4 collisions do not happen.
Generate some to see the format with the UUID Generator.
UUID v1 — timestamp plus MAC address (avoid)
v1 uses the current time and the network card's MAC address to build a unique ID. That sounds clever, but it has a privacy problem: anyone who sees a v1 UUID in your data can recover the MAC address of the machine that generated it and the exact moment it was generated. Do not ship v1 UUIDs in anything public — they leak infrastructure details.
UUID v7 — time-ordered random (the new default)
v7, standardised in RFC 9562 (2024), fixes the one real downside of v4 for databases. A v4 UUID is random throughout, which means newly inserted rows scatter across the whole B-tree index, causing page splits and cache churn on high-write tables. v7 puts a millisecond timestamp in the first 48 bits, so new IDs always sort after older ones and the index stays happy.
UUID v4: c1a8f3d2-6b7e-4d9f-8c01-2e3f4a5b6c7d
UUID v7: 01934e5a-7c2d-7b8f-8a9c-1d2e3f4a5b6c
^^^^^^^^^^^^ timestamp (ms since epoch)If you are picking a primary key type in 2026 for a new table, v7 is the answer unless you have a specific reason to prefer v4. You keep the 122-ish bits of entropy (enough to prevent collisions and guessing) and you get index-friendly insertion order for free.
Random strings: nanoid and friends
UUIDs are 36 characters with dashes, which is ugly in URLs. When you need something shorter — a short link, a public invite code, a user-facing ID — a custom random string is usually better.
nanoid
Nanoid is the modern default. The library generates strings from a URL-safe 64-character alphabet (A-Z, a-z, 0-9, _, -). The default 21-character nanoid has 126 bits of entropy — more than UUID v4 — in 15 fewer characters and with no dashes. That is why short-link services, invite codes, and modern framework session IDs increasingly use nanoid-style strings.
Length vs entropy trade-off
Length directly controls the chance of collision. The rule of thumb: for a 62-character alphanumeric alphabet, each character adds about 6 bits. For a 36-character lowercase-plus-digits alphabet, each character adds about 5 bits. Some reference points:
- 8 chars (alphanumeric, ~48 bits) — fine for short-lived things with low volume, bad for database keys.
- 12 chars (alphanumeric, ~71 bits) — safe for moderate-volume public IDs like shortlinks.
- 21 chars (alphanumeric, ~126 bits) — nanoid default. Safe for basically everything.
- 32 chars (alphanumeric, ~190 bits) — overkill but common for API tokens.
The Random String Generator lets you pick an alphabet and length and generate bulk strings for testing or seeding. For URL slugs specifically, the Slug Generator is better — it produces readable, lowercase, hyphenated output suitable for SEO.
Custom alphabets
Removing ambiguous characters (0/O, 1/l/I) is worth it when humans have to type the ID — invite codes, license keys, auth codes. Crockford's Base32 alphabet (0123456789ABCDEFGHJKMNPQRSTVWXYZ) is a good template. Never remove characters from an alphabet you did not author without recalculating the entropy — shrinking the alphabet shrinks the ID space.
Hashes: deterministic IDs
A hash function maps any input to a fixed-length output, always producing the same output for the same input. That property is exactly what makes hashes useless as random IDs (two identical records would share an ID) and exactly what makes them perfect for content addressing.
SHA-256: the default in 2026
SHA-256 produces a 256-bit (64 hex character) fingerprint. It is collision-resistant — finding two inputs that hash to the same output is computationally infeasible with today's hardware. It is not encryption and it cannot be reversed; given a SHA-256 you cannot recover the input except by brute-forcing candidate inputs and checking each.
Generate hashes with the Hash Generator for a file fingerprint, a cache key, or a deduplication marker. For anything security-related — signing, integrity proofs, content IDs — SHA-256 is the minimum in 2026.
Do not use MD5 or SHA-1 for security
MD5 and SHA-1 are both broken. MD5 collisions can be computed in seconds on a laptop. SHA-1 collisions were demonstrated by Google in 2017 and are well within reach of any motivated attacker today. If you see either in a security context — signatures, password hashing, integrity checks on anything an attacker controls — replace it with SHA-256 or SHA-3.
MD5 is fine for non-adversarial uses where you only need a short stable hash: cache keys, database partition keys, content-addressed deduplication where the inputs are trusted. But the moment an attacker could craft an input, MD5 is not safe.
When a hash is the right ID
- Content-addressed storage (git commits, Docker image digests, IPFS CIDs).
- Deduplication keys — if two rows have the same content, you want the same ID for both.
- Cache keys derived from inputs.
- ETags and HTTP conditional requests.
- File integrity fingerprints.
Format comparison at a glance
Here are the same three categories side by side, in the formats you will actually see in code:
UUID v4 9b2a3c4d-5e6f-4a7b-8c9d-0e1f2a3b4c5d (36 chars, 122 bits)
UUID v7 01934e5a-7c2d-7b8f-8a9c-1d2e3f4a5b6c (36 chars, ~74 bits random + time)
nanoid 21 V1StGXR8_Z5jdHi6B-myT (21 chars, 126 bits)
nanoid 12 aB3dEf9hI2kL (12 chars, ~71 bits)
SHA-256 hex e3b0c44298fc1c149afbf4c8996fb92427ae... (64 chars, deterministic)Decision table: if you need X, use Y
Skip the theory for a moment. Here is the mapping for the most common cases.
- Database primary key on a high-write table → UUID v7 (or snowflake/ULID if your stack supports them natively).
- Database primary key on a low-write table → UUID v4 is fine; v7 is nicer.
- Public-facing short ID (shortlinks, invite codes, public resource IDs) → nanoid 10-12 characters.
- API access token → 32+ character random string, or a signed JWT if you need claims.
- Session ID → framework default (usually 128+ bits random).
- Deduplication of file uploads → SHA-256 of the file bytes.
- Cache key → SHA-256 (or MD5 if collision risk is acceptable) of the stringified inputs.
- URL slug → Slug Generator from the title, not a random ID.
- Password storage → NEVER use any of these. Use bcrypt, scrypt, or argon2.
- User-typed code (invite, 2FA backup) → custom alphabet without ambiguous characters, 8-12 chars.
Anti-patterns and how to avoid them
Four specific mistakes worth naming, because they all pass code review and break later.
MD5 for security
MD5 is broken for collision resistance and should never appear in a security context. If your login flow, file signature, or API signature uses MD5, migrate to SHA-256 before the next audit. The Hash Generator supports SHA-256 and SHA-512 side by side for quick comparison.
Auto-increment IDs in distributed systems
A single SERIAL counter in Postgres is fine when one database writes all rows. The moment you shard, replicate, or add a second region that also writes, auto-increment becomes a coordination problem — every insert either blocks on a single node or risks reusing an ID. This is exactly why UUID v7, ULID, and snowflake exist. If your architecture is distributed, do not build it on AUTO_INCREMENT.
UUID v1 in public data
v1 embeds the generating machine's MAC address and the exact generation time. Posting v1 UUIDs in URLs, public APIs, or log exports leaks infrastructure fingerprints that attackers can use for reconnaissance. Switch to v4 or v7 for any identifier a non-trusted party will see.
SHA-1 in 2026
SHA-1 has been considered insecure since 2005 and was demonstrably broken in 2017 (Google's SHAttered attack). It persists in old git internals and some legacy protocols, but you should never choose it for new code. If you see SHA-1 in a security-relevant code path during refactoring, upgrade it — SHA-256 is a drop-in replacement in every modern library.
Tool kit
The tools below cover every generator you need for IDs, tokens, and content-addressed hashes. Bookmark the ones you reach for most.
ID and hash generators
For quick throwaway test data, the Random String Generator in bulk mode gives you 100 or 1000 IDs in one click. For secrets that have to survive brute-force, the Password Generator uses cryptographic randomness and a larger alphabet than nanoid.
Summary
Pick the category first, then the algorithm.
- Random IDs (UUID v4, nanoid, random string) for things that need to be unique and unguessable — primary keys, tokens, session IDs.
- Time-ordered IDs (UUID v7, ULID, snowflake) for high-write databases where index locality matters.
- Deterministic hashes (SHA-256) for content-addressed keys — deduplication, cache keys, file fingerprints.
- Never any of them for passwords. Use bcrypt, scrypt, or argon2.
If you remember nothing else: UUID v7 for new primary keys, nanoid for public short IDs, SHA-256 for content hashing, and never MD5 or SHA-1 in a security context. Those four rules cover 90% of the ID decisions you will make in a career.
Frequently Asked Questions
What is the difference between a UUID and a random string?
A UUID is a specific 128-bit format with standardised versions and dashes (like `xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx`). A random string is any sequence of characters from an alphabet you choose, with whatever length you pick. Both can be cryptographically random; UUIDs are just a fixed format so every language and database understands them out of the box.
Should I use UUID v4 or v7 for my database primary keys?
Prefer v7 in 2026. v7 puts a millisecond timestamp in the first 48 bits so new IDs sort after old ones, which keeps your B-tree indexes efficient and makes time-range queries easier. v4 is still fine on low-write tables, but v7 is strictly better for high-write workloads and is the new default recommendation in RFC 9562.
Are UUID v4 collisions possible?
In theory yes, in practice effectively no. With 122 bits of randomness you would need to generate about 2^61 UUIDs to hit a 50% chance of a single collision. At a billion UUIDs per second that takes about 73 years. For any realistic application, UUID v4 collisions should never be part of your threat model.
Can I use a hash as a primary key?
Only if you want identical rows to share the same ID, which is what content-addressed systems like git or Docker image registries rely on. For normal application data where two rows can legitimately have the same attributes but are still different entities, a hash is the wrong choice — use a UUID or random string instead.
Is MD5 still safe to use?
Not for security. MD5 collisions can be generated in seconds on a laptop, which breaks signatures, integrity checks, and password hashing. MD5 is acceptable for non-adversarial uses like cache keys and partition hashing where the inputs are trusted. For anything an attacker could control, use SHA-256 at minimum.
Why should I never use UUID v1 in public data?
UUID v1 encodes the machine's MAC address and the exact generation timestamp in the UUID. That means anyone who sees the UUID can recover which machine generated it and when. For public-facing IDs — URLs, API responses, log exports — use v4 or v7 instead to avoid leaking infrastructure details.
How long should a nanoid be?
The default 21 characters (126 bits of entropy) is safe for basically any use case. For short public IDs like shortlinks, 10-12 characters (about 60-71 bits) is enough as long as the total count stays under a few billion. For API tokens or anything adversarial, lean toward 24+ characters to stay well ahead of brute-force budgets.