View Portfolio
System Design

Databases

Learn about relational databases, ACID properties, transactions, and how databases ensure data consistency and reliability in system design

What is a Relational Database?

A relational database stores data in a structured way using tables with rows and columns.

Think of it like a spreadsheet where:

  • Each row is one record (like one user)
  • Each column is one property (like name, email, age)

Example:

Imagine a User table:

IDNameEmailAge
1Rajeshrajesh@email.com25
2Priyapriya@email.com30
3Amitamit@email.com28

Each row is one user, and each column is one piece of information about that user.

Why Use Relational Databases?

Relational databases follow ACID properties to make sure your data is:

  • Safe — no data gets lost
  • Correct — data stays consistent
  • Reliable — works even if something fails

ACID Properties

ACID stands for four important rules that databases follow:

  • A — Atomicity
  • C — Consistency
  • I — Isolation
  • D — Durability
PropertyMeaningExample
AtomicityAll or nothingIf one step fails, everything is undone
ConsistencyData follows rulesAge must be 18-100, or it is rejected
IsolationTransactions don't interfereTwo users buying same item — only one wins
DurabilityData survives crashesPower goes out, but data is still saved

A — Atomicity

Atomicity means: All or nothing.

A transaction (a group of operations) must either complete fully or not happen at all. There is no middle ground.

Example Without Atomicity (Bad)

Imagine you are creating a user profile and updating a user count:

Step 1: Create Profile
   └─> Success

Step 2: Increase User Count
   └─> FAILS

Result: Profile created but count not updated
 Data is inconsistent!

Problem: The profile exists, but the user count is wrong. This creates inconsistent data.

Example With Atomicity (Good)

When you wrap both operations in a transaction:

Transaction Start

   ├─> Step 1: Create Profile
> Success

   └─> Step 2: Increase User Count
          └─> FAILS

Transaction Rollback (Undo everything)
   └─> Profile creation is undone
   └─> Count stays the same

Result: Nothing changed Data stays consistent

Solution: If any step fails, everything is undone. Your data stays consistent.

C — Consistency

Consistency means: Data follows the rules.

The database will never allow data that breaks the rules you set.

Example

Rule: A user's age must be between 18 and 100.

Good Data:
┌──────┬──────┐
 Name Age
├──────┼──────┤
 John  25 (Follows rule)
└──────┴──────┘

Bad Data:
┌──────┬──────┐
 Name Age
├──────┼──────┤
 John  15 (Breaks rule - too young)
└──────┴──────┘

Database says: "No! Age must be 18-100"
 Transaction rejected

Result: The database rejects invalid data, keeping everything consistent.

I — Isolation

Isolation means: Transactions don't interfere with each other.

Even if many users are doing things at the same time, each transaction works as if it is the only one running.

Example

Scenario: Two users trying to buy the last item in stock.

Without Isolation (Bad):

TimeUser AUser B
T1Reads stock = 1
T2Reads stock = 1
T3Purchases → stock = 0
T4Purchases → stock = 0 (over-sell!)

Result: Both users think they bought the item → Stock oversold

With Isolation (Good):

TimeUser AUser B
T1Reads stock = 1
T2Locks row + purchase
T3Commit success ✓
T4Reads stock = 0
T5Purchase denied ✘

Result: Only one user successfully buys → No overselling

Result: Each transaction sees a consistent view of the data, even when many happen at once.

D — Durability

Durability means: Once saved, data stays saved.

Even if the database crashes or the power goes out, your data is safe.

Example

Step 1: You save data
   └─> Database writes to disk

Step 2: Power goes out
   └─> Server shuts down

Step 3: Power comes back
   └─> Database restarts

Result: Your data is still there!

Result: The database saves data to permanent storage (like a hard drive), so it survives crashes.

Database Isolation Levels

These levels control how much data is protected when many transactions (group of operations) run at the same time.

PostgreSQL supports 4 levels:

  1. Read Uncommitted
  2. Read Committed (default in PostgreSQL)
  3. Repeatable Read
  4. Serializable (strongest)

What these terms mean:

  • Dirty Reads: Reading data that another transaction is still changing and might undo later.
  • Non-Repeatable Reads: Reading the same data twice in one transaction and getting different values each time.
  • Phantom Reads: Seeing new rows appear in your query results that were not there when you started your transaction.
LevelDirty ReadsNon-Repeatable ReadsPhantom ReadsSpeed
Read Uncommitted✗ Yes✗ Yes✗ YesFast
Read Committed✓ No✗ Yes✗ YesFast
Repeatable Read✓ No✓ No✗ YesMedium
Serializable✓ No✓ No✓ NoSlow

Rule: Higher isolation = safer data, but slower performance.

1. Read Uncommitted

Read Uncommitted is the weakest isolation level.

You can read data that another transaction is still changing. This data might not be saved yet.

Example

Two users are working with a bank account balance:

Initial Balance: ₹1000

Transaction A (User A):
├─> Reads balance: ₹1000
├─> Adds ₹500
└─> Balance now: ₹1500 (NOT saved yet)

Transaction B (User B):
├─> Reads balance: ₹1500 (reads uncommitted data!)
└─> Sees ₹1500

Transaction A:
└─> FAILS and rolls back Balance back to ₹1000

Result: User B saw ₹1500, but real balance is ₹1000
 This is called a "Dirty Read"

Diagram:

TimeTransaction ATransaction BDatabase
T1Read: ₹1000₹1000
T2Update: ₹1500₹1000
T3(not saved)Read: ₹1500 ✗₹1000
T4ROLLBACK₹1000
T5Sees wrong data!₹1000

Problem: You can read data that might be undone later. This causes dirty reads.

2. Read Committed

Read Committed is the default level in PostgreSQL.

You can only read data that has been saved (committed). You cannot read data that is still being changed.

Example

Same bank account scenario:

Initial Balance: ₹1000

Transaction A (User A):
├─> Reads balance: ₹1000
├─> Adds ₹500
└─> Balance now: ₹1500 (NOT saved yet)

Transaction B (User B):
├─> Tries to read balance
└─> Waits... (sees old ₹1000 until A finishes)

Transaction A:
└─> COMMITS Balance saved: ₹1500

Transaction B:
└─> Now reads: ₹1500 (only committed data)

Diagram:

TimeTransaction ATransaction BDatabase
T1Read: ₹1000₹1000
T2Update: ₹1500₹1000
T3(not saved)Read: ₹1000 ✓
(waits for commit)
₹1000
T4COMMIT₹1500
T5Read: ₹1500 ✓₹1500

Good: No dirty reads. You only see saved data.

Problem: If you read the same data twice in one transaction, you might see different values.

3. Repeatable Read

Repeatable Read is stronger than Read Committed.

If you read data once, you will see the same data every time you read it again in the same transaction. Even if another transaction changes it.

Example

Reading account balance twice in one transaction:

Initial Balance: ₹1000

Transaction A (User A):
├─> Read balance: ₹1000
├─> (doing some work...)
└─> Read balance again: ₹1000 (same value)

Transaction B (User B):
├─> Updates balance to ₹1500
└─> COMMITS

Transaction A:
└─> Still sees ₹1000 (not ₹1500)
 Consistent view throughout transaction

Diagram:

TimeTransaction ATransaction BDatabase
T1Read: ₹1000₹1000
T2(working...)Update: ₹1500₹1000
T3COMMIT₹1500
T4Read again: ₹1000 ✓
(same as first read)
₹1500

Good: No dirty reads. No non-repeatable reads. Same data every time.

Problem: You might see "phantom rows" — new rows that appear during your transaction.

4. Serializable

Serializable is the strongest isolation level.

Transactions run as if they happened one after another, not at the same time. This prevents all problems.

Example

Two transactions trying to update the same account:

Initial Balance: ₹1000

Transaction A (User A):
├─> Wants to add ₹500
└─> Locks the account row

Transaction B (User B):
├─> Wants to add ₹300
└─> Waits... (cannot access locked row)

Transaction A:
├─> Updates: ₹1000 + ₹500 = ₹1500
└─> COMMITS Unlocks row

Transaction B:
├─> Now can read: ₹1500
├─> Updates: ₹1500 + ₹300 = ₹1800
└─> COMMITS

Final Result: ₹1800 (correct!)

Diagram:

TimeTransaction ATransaction BDatabase
T1Lock row₹1000
T2Read: ₹1000₹1000
T3Update: ₹1500Waits...₹1000
T4COMMIT (unlock)₹1500
T5Lock row₹1500
T6Read: ₹1500₹1500
T7Update: ₹1800₹1500
T8COMMIT₹1800

Good: No dirty reads. No non-repeatable reads. No phantom rows. Everything is safe.

Problem: Slower because transactions must wait for each other.

Database Scaling

Database scaling means making your database handle more users and more data without slowing down.

Vertical Scaling

Vertical Scaling - You upgrade the existing database to make it more powerful.

  • Add more RAM, disk, storage to database
  • Require down time during reboot

Example

  • You add more RAM, more CPU, and faster storage to the same database server.
  • Like upgrading a mobile from 4GB RAM to 8GB RAM so apps run faster.

Horizontal Scaling

Horizontal Scaling - You add more databases to share the load.

1. Read Replica

A Read Replica is a copy of your main (master) database that is used only for reading (not writing).

Why do we need it?

  • When our apps become bigger, if we do the same read and write operations on one database then it will become slow. That is the reason why you use a read replica.

How it works?

TypePurpose
Main Database (Primary)Insert, Update, Delete (writes)
Read ReplicaSelect (reads only)

Data from the main database is automatically copied to the replicas.

  • Faster performance
  • Primary DB is free for writes
  • Good for apps with many read operations

Diagram

          Write Queries
        (Insert/Update/Delete)
               |
               v
         Primary Database
               |
     Auto Copy / Replication
               |
               v
        Read Replica(s)
           /        \
   User A Reading  User B Reading

2. Replication

Replication means making copies of your database and keeping them updated automatically.

Types of Replication

1. Synchronous

  • The replica receives the change immediately
  • Primary waits until replica confirms data is copied
  • Both are always in sync
  • Safe but Slow

Diagram

     Write Operation (Insert/Update/Delete)
                    |
                    v
              Primary Database
                    |
   (Waits for confirmation before saying "Done")
                    |
                Replica DB

2. Asynchronous

  • Primary does not wait
  • Replica receives updates after some delay
  • Very fast performance
  • Good for apps with millions of reads

Example

Posting a photo on Instagram

  • When you upload → It appears immediately to you (Primary)
  • Your friends may see it after a tiny delay (Replica syncing)

Diagram

     Write Operation (Insert/Update/Delete)
                    |
                    v
              Primary Database
                    |
    (Primary responds "Done" instantly)
                    |
        Replica DB (Updates after delay)

3. Sharding

Sharding means splitting your big database into smaller parts (called shards) so each part stores only a portion of the data.

  • Instead of one huge database
  • You have many smaller databases working together

Why do we need sharding?

When the database becomes too big:

  • Too much data
  • Too many users
  • Too many queries

One server cannot handle everything, so we divide the data.

Example

Assume a users table with millions of rows:

Before sharding (one big DB)

DB1
| Users |
| millions of rows |

After sharding (split by UserId)

Shard 1 UserID 1–1M
Shard 2 UserID 1M–2M
Shard 3 UserID 2M–3M

Each shard is smaller, faster, and easier to manage.

Diagram

           Application
                 |
      ┌──────────┼──────────┐
      |          |           |
 SHARD 1     SHARD 2     SHARD 3
(userId 1-1000) (1001-2000) (2001-3000)

How do we divide data into shards?

Common methods:

  • User ID range (1–1000, 1001–2000…)
  • Hashing (hash(userId) % number_of_shards)
  • Geo-based (India users → Shard A, US users → Shard B)

Sharding and Partitioning

Partitioning

  • Break one big table into small pieces
  • Still on the same database server

Example (users table)

Before (no partition)

Currently we have 10 millions users data in signle user table becuase of this query become slows.
users
------
id | name | country
1
2
3
...
10000000

After partitioning

Here we did partitioning by user country now we have multiple table based on country in single server

users_india
users_usa
users_europe

We can do partitioning in MYSQL using PARTITION method.

Sharding

  • Split data across multiple servers or many databases
  • Each server has only a part of the data

Example(users table)

Shard by country

Server 1 (India)
  users_india

Server 2 (USA)
  users_usa

Server 3 (Europe)
  users_europe