Learn about relational databases, ACID properties, transactions, and how databases ensure data consistency and reliability in system design

What is a Relational Database?

A relational database stores data in a structured way using tables with rows and columns.

Think of it like a spreadsheet where:

Each row is one record (like one user)
Each column is one property (like name, email, age)

Example:

Imagine a User table:

ID	Name	Email	Age
1	Rajesh	rajesh@email.com	25
2	Priya	priya@email.com	30
3	Amit	amit@email.com	28

Each row is one user, and each column is one piece of information about that user.

Why Use Relational Databases?

Relational databases follow ACID properties to make sure your data is:

Safe — no data gets lost
Correct — data stays consistent
Reliable — works even if something fails

ACID Properties

ACID stands for four important rules that databases follow:

A — Atomicity
C — Consistency
I — Isolation
D — Durability

Property	Meaning	Example
Atomicity	All or nothing	If one step fails, everything is undone
Consistency	Data follows rules	Age must be 18-100, or it is rejected
Isolation	Transactions don't interfere	Two users buying same item — only one wins
Durability	Data survives crashes	Power goes out, but data is still saved

A — Atomicity

Atomicity means: All or nothing.

A transaction (a group of operations) must either complete fully or not happen at all. There is no middle ground.

Example Without Atomicity (Bad)

Imagine you are creating a user profile and updating a user count:

Step 1: Create Profile
   └─> Success ✓

Step 2: Increase User Count
   └─> FAILS ✗

Result: Profile created but count not updated
        → Data is inconsistent!

Problem: The profile exists, but the user count is wrong. This creates inconsistent data.

Example With Atomicity (Good)

When you wrap both operations in a transaction:

Transaction Start
   │
   ├─> Step 1: Create Profile
   │      └─> Success ✓
   │
   └─> Step 2: Increase User Count
          └─> FAILS ✗
   │
Transaction Rollback (Undo everything)
   └─> Profile creation is undone
   └─> Count stays the same

Result: Nothing changed → Data stays consistent ✓

Solution: If any step fails, everything is undone. Your data stays consistent.

C — Consistency

Consistency means: Data follows the rules.

The database will never allow data that breaks the rules you set.

Example

Rule: A user's age must be between 18 and 100.

Good Data:
┌──────┬──────┐
│ Name │ Age  │
├──────┼──────┤
│ John │  25  │ ✓ (Follows rule)
└──────┴──────┘

Bad Data:
┌──────┬──────┐
│ Name │ Age  │
├──────┼──────┤
│ John │  15  │ ✗ (Breaks rule - too young)
└──────┴──────┘

Database says: "No! Age must be 18-100"
→ Transaction rejected

Result: The database rejects invalid data, keeping everything consistent.

I — Isolation

Isolation means: Transactions don't interfere with each other.

Even if many users are doing things at the same time, each transaction works as if it is the only one running.

Example

Scenario: Two users trying to buy the last item in stock.

Without Isolation (Bad):

Time	User A	User B
T1	Reads stock = 1
T2		Reads stock = 1
T3	Purchases → stock = 0
T4		Purchases → stock = 0 (over-sell!)

Result: Both users think they bought the item → Stock oversold

With Isolation (Good):

Time	User A	User B
T1	Reads stock = 1
T2	Locks row + purchase
T3	Commit success ✓
T4		Reads stock = 0
T5		Purchase denied ✘

Result: Only one user successfully buys → No overselling

Result: Each transaction sees a consistent view of the data, even when many happen at once.

D — Durability

Durability means: Once saved, data stays saved.

Even if the database crashes or the power goes out, your data is safe.

Example

Step 1: You save data
   └─> Database writes to disk ✓

Step 2: Power goes out ✗
   └─> Server shuts down

Step 3: Power comes back
   └─> Database restarts

Result: Your data is still there! ✓

Result: The database saves data to permanent storage (like a hard drive), so it survives crashes.

Database Isolation Levels

These levels control how much data is protected when many transactions (group of operations) run at the same time.

PostgreSQL supports 4 levels:

Read Uncommitted
Read Committed (default in PostgreSQL)
Repeatable Read
Serializable (strongest)

What these terms mean:

Dirty Reads: Reading data that another transaction is still changing and might undo later.
Non-Repeatable Reads: Reading the same data twice in one transaction and getting different values each time.
Phantom Reads: Seeing new rows appear in your query results that were not there when you started your transaction.

Level	Dirty Reads	Non-Repeatable Reads	Phantom Reads	Speed
Read Uncommitted	✗ Yes	✗ Yes	✗ Yes	Fast
Read Committed	✓ No	✗ Yes	✗ Yes	Fast
Repeatable Read	✓ No	✓ No	✗ Yes	Medium
Serializable	✓ No	✓ No	✓ No	Slow

Rule: Higher isolation = safer data, but slower performance.

1. Read Uncommitted

Read Uncommitted is the weakest isolation level.

You can read data that another transaction is still changing. This data might not be saved yet.

Example

Two users are working with a bank account balance:

Initial Balance: ₹1000

Transaction A (User A):
├─> Reads balance: ₹1000
├─> Adds ₹500
└─> Balance now: ₹1500 (NOT saved yet)

Transaction B (User B):
├─> Reads balance: ₹1500 (reads uncommitted data!)
└─> Sees ₹1500

Transaction A:
└─> FAILS and rolls back → Balance back to ₹1000

Result: User B saw ₹1500, but real balance is ₹1000
        → This is called a "Dirty Read" ✗

Diagram:

Time	Transaction A	Transaction B	Database
T1	Read: ₹1000		₹1000
T2	Update: ₹1500		₹1000
T3	(not saved)	Read: ₹1500 ✗	₹1000
T4	ROLLBACK		₹1000
T5		Sees wrong data!	₹1000

Problem: You can read data that might be undone later. This causes dirty reads.

2. Read Committed

Read Committed is the default level in PostgreSQL.

You can only read data that has been saved (committed). You cannot read data that is still being changed.

Example

Same bank account scenario:

Initial Balance: ₹1000

Transaction A (User A):
├─> Reads balance: ₹1000
├─> Adds ₹500
└─> Balance now: ₹1500 (NOT saved yet)

Transaction B (User B):
├─> Tries to read balance
└─> Waits... (sees old ₹1000 until A finishes)

Transaction A:
└─> COMMITS → Balance saved: ₹1500 ✓

Transaction B:
└─> Now reads: ₹1500 ✓ (only committed data)

Diagram:

Time	Transaction A	Transaction B	Database
T1	Read: ₹1000		₹1000
T2	Update: ₹1500		₹1000
T3	(not saved)	Read: ₹1000 ✓ (waits for commit)	₹1000
T4	COMMIT		₹1500
T5		Read: ₹1500 ✓	₹1500

Good: No dirty reads. You only see saved data.

Problem: If you read the same data twice in one transaction, you might see different values.

3. Repeatable Read

Repeatable Read is stronger than Read Committed.

If you read data once, you will see the same data every time you read it again in the same transaction. Even if another transaction changes it.

Example

Reading account balance twice in one transaction:

Initial Balance: ₹1000

Transaction A (User A):
├─> Read balance: ₹1000
├─> (doing some work...)
└─> Read balance again: ₹1000 ✓ (same value)

Transaction B (User B):
├─> Updates balance to ₹1500
└─> COMMITS ✓

Transaction A:
└─> Still sees ₹1000 (not ₹1500)
    → Consistent view throughout transaction

Diagram:

Time	Transaction A	Transaction B	Database
T1	Read: ₹1000		₹1000
T2	(working...)	Update: ₹1500	₹1000
T3		COMMIT	₹1500
T4	Read again: ₹1000 ✓ (same as first read)		₹1500

Good: No dirty reads. No non-repeatable reads. Same data every time.

Problem: You might see "phantom rows" — new rows that appear during your transaction.

4. Serializable

Serializable is the strongest isolation level.

Transactions run as if they happened one after another, not at the same time. This prevents all problems.

Example

Two transactions trying to update the same account:

Initial Balance: ₹1000

Transaction A (User A):
├─> Wants to add ₹500
└─> Locks the account row

Transaction B (User B):
├─> Wants to add ₹300
└─> Waits... (cannot access locked row)

Transaction A:
├─> Updates: ₹1000 + ₹500 = ₹1500
└─> COMMITS → Unlocks row

Transaction B:
├─> Now can read: ₹1500
├─> Updates: ₹1500 + ₹300 = ₹1800
└─> COMMITS ✓

Final Result: ₹1800 (correct!)

Diagram:

Time	Transaction A	Transaction B	Database
T1	Lock row		₹1000
T2	Read: ₹1000		₹1000
T3	Update: ₹1500	Waits...	₹1000
T4	COMMIT (unlock)		₹1500
T5		Lock row	₹1500
T6		Read: ₹1500	₹1500
T7		Update: ₹1800	₹1500
T8		COMMIT	₹1800

Good: No dirty reads. No non-repeatable reads. No phantom rows. Everything is safe.

Problem: Slower because transactions must wait for each other.

Database Scaling

Database scaling means making your database handle more users and more data without slowing down.

Vertical Scaling

Vertical Scaling - You upgrade the existing database to make it more powerful.

Add more RAM, disk, storage to database
Require down time during reboot

Example

You add more RAM, more CPU, and faster storage to the same database server.
Like upgrading a mobile from 4GB RAM to 8GB RAM so apps run faster.

Horizontal Scaling

Horizontal Scaling - You add more databases to share the load.

1. Read Replica

A Read Replica is a copy of your main (master) database that is used only for reading (not writing).

Why do we need it?

When our apps become bigger, if we do the same read and write operations on one database then it will become slow. That is the reason why you use a read replica.

How it works?

Type	Purpose
Main Database (Primary)	Insert, Update, Delete (writes)
Read Replica	Select (reads only)

Data from the main database is automatically copied to the replicas.

Faster performance
Primary DB is free for writes
Good for apps with many read operations

Diagram

          Write Queries
        (Insert/Update/Delete)
               |
               v
         Primary Database
               |
     Auto Copy / Replication
               |
               v
        Read Replica(s)
           /        \
   User A Reading  User B Reading

2. Replication

Replication means making copies of your database and keeping them updated automatically.

Types of Replication

1. Synchronous

The replica receives the change immediately
Primary waits until replica confirms data is copied
Both are always in sync
Safe but Slow

Diagram

     Write Operation (Insert/Update/Delete)
                    |
                    v
              Primary Database
                    |
   (Waits for confirmation before saying "Done")
                    |
                Replica DB

2. Asynchronous

Primary does not wait
Replica receives updates after some delay
Very fast performance
Good for apps with millions of reads

Example

Posting a photo on Instagram

When you upload → It appears immediately to you (Primary)
Your friends may see it after a tiny delay (Replica syncing)

Diagram

     Write Operation (Insert/Update/Delete)
                    |
                    v
              Primary Database
                    |
    (Primary responds "Done" instantly)
                    |
        Replica DB (Updates after delay)

3. Sharding

Sharding means splitting your big database into smaller parts (called shards) so each part stores only a portion of the data.

Instead of one huge database
You have many smaller databases working together

Why do we need sharding?

When the database becomes too big:

Too much data
Too many users
Too many queries

One server cannot handle everything, so we divide the data.

Example

Assume a users table with millions of rows:

Before sharding (one big DB)

DB1
| Users |
| millions of rows |

After sharding (split by UserId)

Shard 1 → UserID 1–1M
Shard 2 → UserID 1M–2M
Shard 3 → UserID 2M–3M

Each shard is smaller, faster, and easier to manage.

Diagram

           Application
                 |
      ┌──────────┼──────────┐
      |          |           |
 SHARD 1     SHARD 2     SHARD 3
(userId 1-1000) (1001-2000) (2001-3000)

How do we divide data into shards?

Common methods:

User ID range (1–1000, 1001–2000…)
Hashing (hash(userId) % number_of_shards)
Geo-based (India users → Shard A, US users → Shard B)

Sharding and Partitioning

Partitioning

Break one big table into small pieces
Still on the same database server

Example (users table)

Before (no partition)

Currently we have 10 millions users data in signle user table becuase of this query become slows.
users
------
id | name | country
1
2
3
...
10000000

After partitioning

Here we did partitioning by user country now we have multiple table based on country in single server

users_india
users_usa
users_europe

We can do partitioning in MYSQL using PARTITION method.

Sharding

Split data across multiple servers or many databases
Each server has only a part of the data

Example(users table)

Shard by country

Server 1 (India)
  users_india

Server 2 (USA)
  users_usa

Server 3 (Europe)
  users_europe

Databases

On this page