Databases
Learn about relational databases, ACID properties, transactions, and how databases ensure data consistency and reliability in system design
What is a Relational Database?
A relational database stores data in a structured way using tables with rows and columns.
Think of it like a spreadsheet where:
- Each row is one record (like one user)
- Each column is one property (like name, email, age)
Example:
Imagine a User table:
| ID | Name | Age | |
|---|---|---|---|
| 1 | Rajesh | rajesh@email.com | 25 |
| 2 | Priya | priya@email.com | 30 |
| 3 | Amit | amit@email.com | 28 |
Each row is one user, and each column is one piece of information about that user.
Why Use Relational Databases?
Relational databases follow ACID properties to make sure your data is:
- Safe — no data gets lost
- Correct — data stays consistent
- Reliable — works even if something fails
ACID Properties
ACID stands for four important rules that databases follow:
- A — Atomicity
- C — Consistency
- I — Isolation
- D — Durability
| Property | Meaning | Example |
|---|---|---|
| Atomicity | All or nothing | If one step fails, everything is undone |
| Consistency | Data follows rules | Age must be 18-100, or it is rejected |
| Isolation | Transactions don't interfere | Two users buying same item — only one wins |
| Durability | Data survives crashes | Power goes out, but data is still saved |
A — Atomicity
Atomicity means: All or nothing.
A transaction (a group of operations) must either complete fully or not happen at all. There is no middle ground.
Example Without Atomicity (Bad)
Imagine you are creating a user profile and updating a user count:
Step 1: Create Profile
└─> Success ✓
Step 2: Increase User Count
└─> FAILS ✗
Result: Profile created but count not updated
→ Data is inconsistent!Problem: The profile exists, but the user count is wrong. This creates inconsistent data.
Example With Atomicity (Good)
When you wrap both operations in a transaction:
Transaction Start
│
├─> Step 1: Create Profile
│ └─> Success ✓
│
└─> Step 2: Increase User Count
└─> FAILS ✗
│
Transaction Rollback (Undo everything)
└─> Profile creation is undone
└─> Count stays the same
Result: Nothing changed → Data stays consistent ✓Solution: If any step fails, everything is undone. Your data stays consistent.
C — Consistency
Consistency means: Data follows the rules.
The database will never allow data that breaks the rules you set.
Example
Rule: A user's age must be between 18 and 100.
Good Data:
┌──────┬──────┐
│ Name │ Age │
├──────┼──────┤
│ John │ 25 │ ✓ (Follows rule)
└──────┴──────┘
Bad Data:
┌──────┬──────┐
│ Name │ Age │
├──────┼──────┤
│ John │ 15 │ ✗ (Breaks rule - too young)
└──────┴──────┘
Database says: "No! Age must be 18-100"
→ Transaction rejectedResult: The database rejects invalid data, keeping everything consistent.
I — Isolation
Isolation means: Transactions don't interfere with each other.
Even if many users are doing things at the same time, each transaction works as if it is the only one running.
Example
Scenario: Two users trying to buy the last item in stock.
Without Isolation (Bad):
| Time | User A | User B |
|---|---|---|
| T1 | Reads stock = 1 | |
| T2 | Reads stock = 1 | |
| T3 | Purchases → stock = 0 | |
| T4 | Purchases → stock = 0 (over-sell!) |
Result: Both users think they bought the item → Stock oversold
With Isolation (Good):
| Time | User A | User B |
|---|---|---|
| T1 | Reads stock = 1 | |
| T2 | Locks row + purchase | |
| T3 | Commit success ✓ | |
| T4 | Reads stock = 0 | |
| T5 | Purchase denied ✘ |
Result: Only one user successfully buys → No overselling
Result: Each transaction sees a consistent view of the data, even when many happen at once.
D — Durability
Durability means: Once saved, data stays saved.
Even if the database crashes or the power goes out, your data is safe.
Example
Step 1: You save data
└─> Database writes to disk ✓
Step 2: Power goes out ✗
└─> Server shuts down
Step 3: Power comes back
└─> Database restarts
Result: Your data is still there! ✓Result: The database saves data to permanent storage (like a hard drive), so it survives crashes.
Database Isolation Levels
These levels control how much data is protected when many transactions (group of operations) run at the same time.
PostgreSQL supports 4 levels:
- Read Uncommitted
- Read Committed (default in PostgreSQL)
- Repeatable Read
- Serializable (strongest)
What these terms mean:
- Dirty Reads: Reading data that another transaction is still changing and might undo later.
- Non-Repeatable Reads: Reading the same data twice in one transaction and getting different values each time.
- Phantom Reads: Seeing new rows appear in your query results that were not there when you started your transaction.
| Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Speed |
|---|---|---|---|---|
| Read Uncommitted | ✗ Yes | ✗ Yes | ✗ Yes | Fast |
| Read Committed | ✓ No | ✗ Yes | ✗ Yes | Fast |
| Repeatable Read | ✓ No | ✓ No | ✗ Yes | Medium |
| Serializable | ✓ No | ✓ No | ✓ No | Slow |
Rule: Higher isolation = safer data, but slower performance.
1. Read Uncommitted
Read Uncommitted is the weakest isolation level.
You can read data that another transaction is still changing. This data might not be saved yet.
Example
Two users are working with a bank account balance:
Initial Balance: ₹1000
Transaction A (User A):
├─> Reads balance: ₹1000
├─> Adds ₹500
└─> Balance now: ₹1500 (NOT saved yet)
Transaction B (User B):
├─> Reads balance: ₹1500 (reads uncommitted data!)
└─> Sees ₹1500
Transaction A:
└─> FAILS and rolls back → Balance back to ₹1000
Result: User B saw ₹1500, but real balance is ₹1000
→ This is called a "Dirty Read" ✗Diagram:
| Time | Transaction A | Transaction B | Database |
|---|---|---|---|
| T1 | Read: ₹1000 | ₹1000 | |
| T2 | Update: ₹1500 | ₹1000 | |
| T3 | (not saved) | Read: ₹1500 ✗ | ₹1000 |
| T4 | ROLLBACK | ₹1000 | |
| T5 | Sees wrong data! | ₹1000 |
Problem: You can read data that might be undone later. This causes dirty reads.
2. Read Committed
Read Committed is the default level in PostgreSQL.
You can only read data that has been saved (committed). You cannot read data that is still being changed.
Example
Same bank account scenario:
Initial Balance: ₹1000
Transaction A (User A):
├─> Reads balance: ₹1000
├─> Adds ₹500
└─> Balance now: ₹1500 (NOT saved yet)
Transaction B (User B):
├─> Tries to read balance
└─> Waits... (sees old ₹1000 until A finishes)
Transaction A:
└─> COMMITS → Balance saved: ₹1500 ✓
Transaction B:
└─> Now reads: ₹1500 ✓ (only committed data)Diagram:
| Time | Transaction A | Transaction B | Database |
|---|---|---|---|
| T1 | Read: ₹1000 | ₹1000 | |
| T2 | Update: ₹1500 | ₹1000 | |
| T3 | (not saved) | Read: ₹1000 ✓ (waits for commit) | ₹1000 |
| T4 | COMMIT | ₹1500 | |
| T5 | Read: ₹1500 ✓ | ₹1500 |
Good: No dirty reads. You only see saved data.
Problem: If you read the same data twice in one transaction, you might see different values.
3. Repeatable Read
Repeatable Read is stronger than Read Committed.
If you read data once, you will see the same data every time you read it again in the same transaction. Even if another transaction changes it.
Example
Reading account balance twice in one transaction:
Initial Balance: ₹1000
Transaction A (User A):
├─> Read balance: ₹1000
├─> (doing some work...)
└─> Read balance again: ₹1000 ✓ (same value)
Transaction B (User B):
├─> Updates balance to ₹1500
└─> COMMITS ✓
Transaction A:
└─> Still sees ₹1000 (not ₹1500)
→ Consistent view throughout transactionDiagram:
| Time | Transaction A | Transaction B | Database |
|---|---|---|---|
| T1 | Read: ₹1000 | ₹1000 | |
| T2 | (working...) | Update: ₹1500 | ₹1000 |
| T3 | COMMIT | ₹1500 | |
| T4 | Read again: ₹1000 ✓ (same as first read) | ₹1500 |
Good: No dirty reads. No non-repeatable reads. Same data every time.
Problem: You might see "phantom rows" — new rows that appear during your transaction.
4. Serializable
Serializable is the strongest isolation level.
Transactions run as if they happened one after another, not at the same time. This prevents all problems.
Example
Two transactions trying to update the same account:
Initial Balance: ₹1000
Transaction A (User A):
├─> Wants to add ₹500
└─> Locks the account row
Transaction B (User B):
├─> Wants to add ₹300
└─> Waits... (cannot access locked row)
Transaction A:
├─> Updates: ₹1000 + ₹500 = ₹1500
└─> COMMITS → Unlocks row
Transaction B:
├─> Now can read: ₹1500
├─> Updates: ₹1500 + ₹300 = ₹1800
└─> COMMITS ✓
Final Result: ₹1800 (correct!)Diagram:
| Time | Transaction A | Transaction B | Database |
|---|---|---|---|
| T1 | Lock row | ₹1000 | |
| T2 | Read: ₹1000 | ₹1000 | |
| T3 | Update: ₹1500 | Waits... | ₹1000 |
| T4 | COMMIT (unlock) | ₹1500 | |
| T5 | Lock row | ₹1500 | |
| T6 | Read: ₹1500 | ₹1500 | |
| T7 | Update: ₹1800 | ₹1500 | |
| T8 | COMMIT | ₹1800 |
Good: No dirty reads. No non-repeatable reads. No phantom rows. Everything is safe.
Problem: Slower because transactions must wait for each other.
Database Scaling
Database scaling means making your database handle more users and more data without slowing down.
Vertical Scaling
Vertical Scaling - You upgrade the existing database to make it more powerful.
- Add more RAM, disk, storage to database
- Require down time during reboot
Example
- You add more RAM, more CPU, and faster storage to the same database server.
- Like upgrading a mobile from 4GB RAM to 8GB RAM so apps run faster.
Horizontal Scaling
Horizontal Scaling - You add more databases to share the load.
1. Read Replica
A Read Replica is a copy of your main (master) database that is used only for reading (not writing).
Why do we need it?
- When our apps become bigger, if we do the same read and write operations on one database then it will become slow. That is the reason why you use a read replica.
How it works?
| Type | Purpose |
|---|---|
| Main Database (Primary) | Insert, Update, Delete (writes) |
| Read Replica | Select (reads only) |
Data from the main database is automatically copied to the replicas.
- Faster performance
- Primary DB is free for writes
- Good for apps with many read operations
Diagram
Write Queries
(Insert/Update/Delete)
|
v
Primary Database
|
Auto Copy / Replication
|
v
Read Replica(s)
/ \
User A Reading User B Reading2. Replication
Replication means making copies of your database and keeping them updated automatically.
Types of Replication
1. Synchronous
- The replica receives the change immediately
- Primary waits until replica confirms data is copied
- Both are always in sync
- Safe but Slow
Diagram
Write Operation (Insert/Update/Delete)
|
v
Primary Database
|
(Waits for confirmation before saying "Done")
|
Replica DB2. Asynchronous
- Primary does not wait
- Replica receives updates after some delay
- Very fast performance
- Good for apps with millions of reads
Example
Posting a photo on Instagram
- When you upload → It appears immediately to you (Primary)
- Your friends may see it after a tiny delay (Replica syncing)
Diagram
Write Operation (Insert/Update/Delete)
|
v
Primary Database
|
(Primary responds "Done" instantly)
|
Replica DB (Updates after delay)3. Sharding
Sharding means splitting your big database into smaller parts (called shards) so each part stores only a portion of the data.
- Instead of one huge database
- You have many smaller databases working together
Why do we need sharding?
When the database becomes too big:
- Too much data
- Too many users
- Too many queries
One server cannot handle everything, so we divide the data.
Example
Assume a users table with millions of rows:
Before sharding (one big DB)
DB1
| Users |
| millions of rows |After sharding (split by UserId)
Shard 1 → UserID 1–1M
Shard 2 → UserID 1M–2M
Shard 3 → UserID 2M–3MEach shard is smaller, faster, and easier to manage.
Diagram
Application
|
┌──────────┼──────────┐
| | |
SHARD 1 SHARD 2 SHARD 3
(userId 1-1000) (1001-2000) (2001-3000)How do we divide data into shards?
Common methods:
- User ID range (1–1000, 1001–2000…)
- Hashing (hash(userId) % number_of_shards)
- Geo-based (India users → Shard A, US users → Shard B)
Sharding and Partitioning
Partitioning
- Break one big table into small pieces
- Still on the same database server
Example (users table)
Before (no partition)
Currently we have 10 millions users data in signle user table becuase of this query become slows.
users
------
id | name | country
1
2
3
...
10000000After partitioning
Here we did partitioning by user country now we have multiple table based on country in single server
users_india
users_usa
users_europeWe can do partitioning in MYSQL using PARTITION method.
Sharding
- Split data across multiple servers or many databases
- Each server has only a part of the data
Example(users table)
Shard by country
Server 1 (India)
users_india
Server 2 (USA)
users_usa
Server 3 (Europe)
users_europeMonolithic & Microservices
Compare monolithic vs microservices architectures - understand when to use each, deployment strategies, scaling approaches, and database design patterns
tRPC
Complete guide to tRPC - type-safe APIs without REST, routers and procedures, context vs middleware, and when to use tRPC vs REST/GraphQL