MySQL DBA Interview Questions

February 18, 2026

MySQL DBA Interview Questions

1. MySQL Architecture & Internals (Advanced)

Explain the internal architecture of InnoDB buffer pool, change buffer, double-write buffer, redo logs, undo logs, and adaptive hash index.
What happens internally when you run this query?
UPDATE users SET balance = balance + 100 WHERE id = 10;
Explain the full journey of a transaction from BEGIN → COMMIT inside MySQL.
How does the MySQL optimizer choose an execution plan?
Why does MySQL sometimes prefer a full table scan even when an index exists?
Explain InnoDB crash recovery phases in detail.
Why is InnoDB redo logging physical and binlogging logical?
Explain MVCC implementation in InnoDB (hidden transaction IDs, rollback pointers).
What is the difference between persistent statistics and histograms?
What happens in memory and on disk when a page split occurs?

🟦 2. Advanced Storage Engine Concepts (Senior Level)

Why does InnoDB sometimes hold gap locks even for equality conditions?
Explain next-key locking and how it prevents phantom reads.
How does InnoDB purge process work? What causes purge lag?
Explain fragmentation types in InnoDB and how to fix each.
Why does InnoDB not reclaim space after DELETE or TRUNCATE in certain cases?
What is the importance of log sequence number (LSN)?
How is insert buffer merging handled during crash recovery?
How does InnoDB handle index cardinality and stats updates?

🟦 3. Replication (Architect-Level)

Explain GTID lifecycle in MySQL during write, relay, and apply stages.
What is an “errant transaction”? How do you detect and fix it?
Why can multi-threaded replication still run single-threaded?
Explain replication lag root causes and deep-dive diagnosis steps.
What is WRITESET-based parallel replication? When should you use it?
What is the difference between logical clock and commit order?
Why does DDL break replication? How do you prevent DDL drift?
What are the challenges of multi-source replication?
How do you rebuild a replica without stopping writes on master?
Explain binlog crash recovery and binlog checksum.

🟦 4. High Availability, Group Replication, Failover

Explain the internal working of MySQL Group Replication (GCS, certification layer, applier threads).
Why do multi-primary clusters generate conflicts? How to prevent them?
What causes cluster “expelled member” events?
How does quorum calculation work in Group Replication?
What is flow control and why can it freeze your cluster?
How does MySQL Router detect primary role changes?
What is a split-brain scenario? How to prevent it?
When does Group Replication lose consistency guarantees?
How do you design zero-downtime failover with GTID replication?
How do you avoid data loss during failover?

🟦 5. Performance Optimization (Expert Level)

What are the top 10 signals from Performance Schema that indicate bottlenecks?
Why does MySQL perform poorly on write-heavy workloads? How to fix it?
What is the impact of redo log size on performance?
How do you detect CPU-based bottlenecks using PFS wait events?
What leads to excessive temporary table creation?
Why can an index degrade performance instead of improving it?
How do you tune MySQL for:
- 90% reads / 10% writes
- 80% writes / 20% reads
What is the impact of page flush, checkpoint age, and I/O capacity?
How do you detect and fix plan regression in MySQL 8?
How do histograms help the optimizer?

🟦 6. Query Optimization (Hard)

Explain the scenarios where EXPLAIN output is misleading.
How do you optimize multi-table JOIN with millions of rows?
Why is ORDER BY + LIMIT slow even with index?
How do you optimize queries using window functions?
Why does MySQL use filesort? How do you eliminate it?
What causes BLOATED B+Tree index? How to fix it?
Why does MySQL ignore a composite index?
How do you drastically reduce temp-table creation for analytic queries?
Why does MySQL store temp tables on disk even when memory is available?

🟦 7. Locking, Deadlocks, Concurrency

Explain in detail InnoDB deadlock internal detection.
Why do deadlocks increase after adding an index?
What is metadata lock? How does it impact production?
How do you track down a long-running transaction holding locks?
What causes “Waiting for flush” or “Waiting for table metadata lock”?
Why do online ALTER operations cause application timeouts?
How does MySQL handle lock escalation?

🟦 8. Partitioning, VLDB, Scaling

Why does partition pruning fail?
Should you partition or shard? When?
What happens to indexes when table is partitioned?
How do you scale MySQL beyond 1B rows?
Why do partitions degrade write performance?
What is the impact of global index in partitioning?
How do you migrate a 5TB table with near-zero downtime?

🟦 9. Backup, Restore, and DR (Senior Level)

Explain the architecture of XtraBackup.
Why can a partial prepare stage make backup unrecoverable?
How do you perform cross-region PITR?
How do you restore a single table from physical backup?
Explain backup consistency under heavy write load.
How does MySQL handle encrypted backup and restore?
What causes binlog corruption? How do you fix it?

🟦 10. Disaster Recovery, Failover, Chaos Testing

Master crashes with unsent transactions — how do you avoid data loss?
How do you detect and fix GTID gaps after failover?
How do you simulate crash recovery for DR testing?
How do you handle MySQL crash with half-written pages?
How do you perform cluster-wide failover testing without downtime?

🟦 11. Cloud, Aurora, RDS MySQL

When will Aurora replica lag even though hardware is superior?
Why does RDS failover take 40–90 seconds?
How is aurora redo logging different from InnoDB?
How do you recover an RDS instance stuck in storage full state?
How do you fix “InnoDB: Write throttled” in cloud environments?
How does Aurora Global DB replication actually work?
Why does RDS performance degrade after large parameter change?

🟦 12. Senior-Level Scenario Questions

CPU suddenly spikes to 90% on master — walk through your step-by-step investigation.
Replication lag increases from 1 sec → 3600 sec — what do you do?
MySQL crashes during heavy writes — how do you diagnose crash cause?
Application performance drops suddenly — how do you isolate DB bottleneck?
A 400GB table ALTER is stuck for 12 hours — how do you recover?
A table reports corruption in production — what are immediate next steps?
Failover happens but replicas refuse writes — why?
EXPLAIN shows correct index but query still slow — what next?
MySQL slows down at peak time every day — how to find the reason?
Your MySQL instances show “too many connections” — how to fix root cause?

Comments