DBMS Interview Concepts: Normalisation, Indexing, Transactions, and Everything Else That Gets Asked
DBMS questions trip up students because they memorise definitions without understanding why the concepts exist. This guide explains every major DBMS concept from first principles β with the exact depth interviewers test.
DBMS is tested at virtually every tech company in India. TCS, Infosys, and Wipro ask definitional questions. Zoho and Freshworks ask you to apply normalisation and explain indexing. Oracle and database-focused roles go deep into transactions and query optimisation.
This guide covers everything from fundamentals to advanced concepts β with the actual explanations that work in interviews.
1. ACID Properties
Every database interview starts here. ACID is what makes transactions reliable.
Atomicity: A transaction either fully succeeds or fully fails. No partial updates. If you transfer βΉ1000 from account A to B, either both the debit and credit happen, or neither does.
Consistency: A transaction brings the database from one valid state to another. All rules (constraints, foreign keys, business rules) must hold before and after.
Isolation: Concurrent transactions don't interfere with each other. The result is the same as if they'd run sequentially.
Durability: Once a transaction is committed, it survives system crashes. Data is persisted to disk.
Interview answer:
"ACID properties ensure that transactions are reliable. Atomicity means all-or-nothing β you can't have half a bank transfer. Consistency means constraints are never violated. Isolation means concurrent transactions don't see each other's intermediate states. Durability means committed data survives crashes β achieved through write-ahead logging."
2. Normalisation β The Why, Not Just the Rules
Normalisation reduces redundancy and update anomalies.
Update anomaly: If a student's address is stored in every row of their enrollment records, changing their address requires updating 50 rows. Miss one and data is inconsistent.
Insertion anomaly: Can't insert a department without a student enrolled in it.
Deletion anomaly: Deleting the last student in a department deletes the department's information too.
First Normal Form (1NF)
Rule: Each column contains atomic (indivisible) values. No repeating groups.
Violation:
| StudentID | Courses | |---|---| | 1 | Math, Physics, Chemistry |
Fixed:
| StudentID | Course | |---|---| | 1 | Math | | 1 | Physics | | 1 | Chemistry |
Second Normal Form (2NF)
Rule: 1NF + every non-key attribute is fully dependent on the entire primary key (no partial dependency).
Only relevant when the primary key is composite (multiple columns).
Violation: Primary key is (StudentID, CourseID), but CourseName depends only on CourseID, not on the full composite key.
| StudentID | CourseID | CourseName | Grade | |---|---|---|---| | 1 | C1 | Math | A |
Fixed: Separate the courses table:
- Enrollments(StudentID, CourseID, Grade)
- Courses(CourseID, CourseName)
Third Normal Form (3NF)
Rule: 2NF + no transitive dependencies (non-key attribute depends on another non-key attribute).
Violation:
| EmployeeID | DepartmentID | DepartmentName | |---|---|---|
DepartmentName depends on DepartmentID, which depends on EmployeeID. Transitive dependency.
Fixed: Separate departments:
- Employee(EmployeeID, DepartmentID)
- Department(DepartmentID, DepartmentName)
BCNF (Boyce-Codd Normal Form)
Stricter than 3NF. Every determinant must be a candidate key. Rare in practice; understood conceptually is enough for most interviews.
Interview format:
"We normalise to remove redundancy and prevent anomalies. 1NF ensures atomic values. 2NF removes partial dependencies on composite keys. 3NF removes transitive dependencies. In practice, we normalise to 3NF β it covers the vast majority of real-world anomaly scenarios without making the schema overly complex."
3. Indexing β The Most Practically Useful DBMS Topic
An index is a data structure that speeds up data retrieval. Think of it as the index at the back of a textbook β instead of reading every page, you find the topic and go directly to the right page.
Without index: Full table scan β O(n) for every lookup.
With index: B-tree lookup β O(log n).
B-Tree Index (the default)
Most database indexes are B-trees. Balanced tree structure β O(log n) search, insert, and delete. Works for:
- Equality:
WHERE id = 5 - Range:
WHERE age BETWEEN 20 AND 30 - Prefix:
WHERE name LIKE 'Kumar%'
Hash Index
Ideal for exact equality only. O(1) average lookup. Doesn't support range queries.
Clustered vs Non-Clustered Index
Clustered index: The actual data rows are stored in the order of the index. A table can have only one clustered index (the data can only be physically sorted one way). Primary key is usually the clustered index.
Non-clustered index: A separate structure that stores the indexed column values and pointers to the actual data rows. A table can have many non-clustered indexes.
When to Add an Index
Add index on:
- Columns frequently in WHERE clauses
- Columns used in JOIN conditions
- Columns in ORDER BY (avoids sorting at query time)
- Columns with high cardinality (many distinct values β indexing a boolean column rarely helps)
Don't over-index:
- Every INSERT, UPDATE, DELETE must also update all indexes
- Indexes consume disk space
- Too many indexes actually slow down write-heavy tables
Interview answer:
"I'd add an index on columns frequently used in WHERE, JOIN, and ORDER BY clauses. But I'd be careful β indexes slow down writes because the index must be maintained on every modification. On a write-heavy table like event logs, too many indexes can cause more harm than the read performance gain."
4. Transactions and Isolation Levels
Transaction Problems Without Isolation
Dirty read: Transaction A reads data written by Transaction B that hasn't committed yet. B rolls back β A read data that never officially existed.
Non-repeatable read: Transaction A reads a row. Transaction B updates and commits that row. A reads the same row again and gets a different value.
Phantom read: Transaction A reads a set of rows matching a condition. B inserts a new row matching that condition and commits. A reads again and sees the new "phantom" row.
Isolation Levels (SQL Standard)
| Level | Dirty Read | Non-Repeatable | Phantom | |---|---|---|---| | READ UNCOMMITTED | Possible | Possible | Possible | | READ COMMITTED | Prevented | Possible | Possible | | REPEATABLE READ | Prevented | Prevented | Possible | | SERIALIZABLE | Prevented | Prevented | Prevented |
Higher isolation = fewer anomalies but more locking = lower concurrency.
PostgreSQL's default is READ COMMITTED. MySQL/InnoDB defaults to REPEATABLE READ.
Interview answer:
"Serializable gives the strongest guarantees but uses the most locking, hurting performance under high concurrency. Most applications use READ COMMITTED β it prevents dirty reads while still allowing good concurrency. You'd use SERIALIZABLE for financial operations where phantoms matter, like checking account balance before transfer."
5. Joins β Every Type With When to Use It
Table A: Student (id, name) Table B: Enrollment (student_id, course)
1 Alice 1 Math
2 Bob 1 Physics
3 Charlie 3 Chemistry
4 Diana (no enrollment)
INNER JOIN: Only rows that match in both tables.
-- Returns Alice (Math), Alice (Physics), Charlie (Chemistry)
SELECT s.name, e.course
FROM Student s
INNER JOIN Enrollment e ON s.id = e.student_id;
LEFT JOIN: All rows from left table, matched rows from right (NULL if no match).
-- Also returns Bob (NULL) and Diana (NULL)
SELECT s.name, e.course
FROM Student s
LEFT JOIN Enrollment e ON s.id = e.student_id;
RIGHT JOIN: All rows from right table. Rarely used β you can always rewrite as LEFT JOIN by swapping tables.
FULL OUTER JOIN: All rows from both tables, NULL where no match. Not supported directly in MySQL (use UNION of LEFT and RIGHT JOIN).
SELF JOIN: Join a table with itself. Used for hierarchies.
-- Find employees and their manager's name
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
CROSS JOIN: Every row from A combined with every row from B. Output has AΓB rows. Rarely useful except for generating combinations.
6. Stored Procedures, Views, and Triggers
View: A saved SELECT query that behaves like a virtual table.
CREATE VIEW active_students AS
SELECT * FROM students WHERE is_active = TRUE;
-- Now: SELECT * FROM active_students; works like any table
Use for: hiding complexity, security (users see the view, not underlying tables), frequently reused queries.
Stored Procedure: Pre-compiled SQL code stored in the database. Called like a function.
CREATE PROCEDURE get_student_courses(IN student_id INT)
BEGIN
SELECT c.name FROM courses c
JOIN enrollments e ON c.id = e.course_id
WHERE e.student_id = student_id;
END;
-- Call: CALL get_student_courses(42);
Use for: complex business logic that should live near the data, batch operations, reducing network round-trips.
Trigger: SQL code that automatically fires before or after INSERT, UPDATE, or DELETE.
CREATE TRIGGER update_modified_time
BEFORE UPDATE ON students
FOR EACH ROW
SET NEW.modified_at = NOW();
Use carefully: triggers run invisibly, making debugging harder.
7. CAP Theorem (For System Design Context)
In a distributed database, you can guarantee at most two of:
Consistency: Every read receives the most recent write.
Availability: Every request receives a response (not necessarily the most recent data).
Partition Tolerance: The system continues despite network partitions.
Network partitions always happen in distributed systems. So in practice, you choose between CP (consistent + partition-tolerant) and AP (available + partition-tolerant).
- PostgreSQL, MySQL: CP β consistent, may become unavailable during partition
- Cassandra, DynamoDB: AP β always available, eventually consistent
- MongoDB: configurable
8. Quick-Answer Questions
What is a foreign key? A column that references the primary key of another table. Enforces referential integrity β you can't have an enrollment record for a student that doesn't exist.
What is a deadlock in databases? Two transactions each hold a lock the other needs. Both wait forever. Databases detect this by tracking lock dependency graphs and kill one transaction to break the cycle.
What is sharding? Horizontally partitioning data across multiple database servers β user IDs 1-10M on shard 1, 10M-20M on shard 2. Allows write scaling beyond a single machine.
What is replication? Copying data from a primary database to one or more replica databases. Reads go to replicas, writes go to primary. Increases read throughput and provides failover.
What is an ORM? Object-Relational Mapper β maps database rows to programming language objects. Prisma, SQLAlchemy, Hibernate. Convenient but can generate inefficient queries if not careful.
Practice: The best way to prepare for DBMS questions is hands-on SQL. Use SCS AI Tutor to quiz yourself: "Ask me DBMS interview questions and tell me when my answers are incomplete."
Ready to practice what you just learned?
Apply these concepts with AI-powered tools built for CS students.