Top 100 SQL Interview Questions

Top 100 SQL Interview Questions | Interview Questions

Top 100 SQL Interview Questions

Top 100 SQL Interview Questions

Basic SQL Questions

What is SQL?

Explain the SELECT statement.

  • The SELECT statement is used to retrieve data from a database. It can be used to retrieve specific columns or all columns from a table.

What is the difference between SQL and MySQL?

  • SQL is a language used for managing and querying databases, while MySQL is a relational database management system (RDBMS) that uses SQL.

Explain the WHERE clause in SQL.

  • The WHERE clause is used to filter records based on a specified condition.

What is a primary key?

  • A primary key is a unique identifier for each record in a table. It must contain unique values and cannot be null.

Intermediate SQL Questions

Explain the JOIN clause in SQL.

  • JOIN is used to combine rows from two or more tables based on a related column between them.

What is a foreign key?

  • A foreign key is a field that refers to the primary key in another table. It establishes a link between two tables.

What is normalization?

  • Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.

Explain GROUP BY and HAVING clauses.

  • GROUP BY is used to group rows based on the values of one or more columns. HAVING is used to filter results after grouping.

What is an index?

  • An index is a data structure that improves the speed of data retrieval operations on a database table.

Explain the difference between INNER JOIN and LEFT JOIN.

  • INNER JOIN returns only matching rows from both tables, while LEFT JOIN returns all rows from the left table and the matching rows from the right table.

What is a subquery?

  • A subquery is a query nested inside another query. It can be used to retrieve data that will be used by the main query.

Explain the difference between UNION and UNION ALL.

  • UNION combines the result sets of two SELECT statements and removes duplicates, while UNION ALL includes all rows, including duplicates.

What is a stored procedure?

  • A stored procedure is a set of SQL statements that can be stored in the database and executed later as a single unit.

Explain the concept of ACID properties in a database.

  • ACID stands for Atomicity, Consistency, Isolation, and Durability, which are properties that guarantee database transactions are processed reliably.

Advanced SQL Questions

What is a trigger?

  • A trigger is a set of instructions that are automatically executed or fired when a certain event occurs in the database.

Explain the concept of a view.

  • A view is a virtual table based on the result of a SELECT statement. It does not store the data itself but provides a way to represent it.

What is the difference between DELETE and TRUNCATE?

  • DELETE is used to delete specific rows from a table based on a condition, while TRUNCATE is used to remove all rows from a table.

Explain the concept of a cursor in SQL.

  • A cursor is a database object used to process a result set one row at a time.

What is the purpose of the COMMIT and ROLLBACK statements?

  • COMMIT is used to save changes made during the current transaction, while ROLLBACK is used to undo changes made during the current transaction.

Explain the concept of a deadlock.

  • A deadlock occurs when two or more transactions are blocked indefinitely, each waiting for the other to release a lock.

What is the difference between a clustered and non-clustered index?

  • A clustered index determines the physical order of data rows in a table, while a non-clustered index does not.

Explain the concept of normalization forms.

  • Normalization forms are rules that define how to organize data in a relational database to reduce redundancy and improve data integrity.

What is the purpose of the CASE statement in SQL?

  • The CASE statement is used to perform conditional logic within a SQL query, similar to an IF-THEN-ELSE statement.

Explain the concept of a materialized view.

  • A materialized view is a database object that contains the results of a query and is stored as a physical table.

Write a SQL query to find the second highest salary from an Employee table.

SELECT MAX(salary) 
FROM Employee 
WHERE salary < (SELECT MAX(salary) FROM Employee);

Write a SQL query to count the number of rows in a table.

SELECT COUNT(*) 
FROM TableName;

Write a SQL query to find duplicate records in a table.

SELECT column1, column2, COUNT(*)
FROM TableName
GROUP BY column1, column2
HAVING COUNT(*) > 1;

Write a SQL query to retrieve the top N records from a table.

SELECT * 
FROM TableName 
LIMIT N;

Find the second-highest salary from an “employees” table.

SELECT MAX(salary) 
FROM employees 
WHERE salary < (SELECT MAX(salary) FROM employees);
  • Explanation: This query uses a subquery to find the highest salary and then retrieves the next highest salary by filtering salaries less than the maximum.

Calculate the running total of sales for each month in a “sales” table.

SELECT sale_date, 
       amount, 
       SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;
  • Explanation: The query uses the window function SUM() OVER to calculate the running total of sales, ordered by the sale date.

Retrieve the top 3 most sold products from a “products” table.

SELECT product_id, 
       product_name, 
       sold_units
FROM (
    SELECT product_id, 
           product_name, 
           SUM(units_sold) AS sold_units
    FROM sales
    JOIN products ON sales.product_id = products.product_id
    GROUP BY product_id, product_name
    ORDER BY sold_units DESC
    LIMIT 3
) AS top_products;
  • Explanation: This query joins the “sales” and “products” tables, calculates the total sold units for each product, orders them in descending order, and selects the top 3.

Identify customers who have made at least three consecutive purchases.

SELECT customer_id, 
       MIN(sale_date) AS start_date, 
       MAX(sale_date) AS end_date,
       COUNT(*) AS consecutive_purchases
FROM (
    SELECT customer_id, 
           sale_date, 
           ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY sale_date) - 
           ROW_NUMBER() OVER (PARTITION BY customer_id, DATE_ADD(sale_date, INTERVAL 1 DAY) ORDER BY sale_date) AS grp
    FROM sales
) AS purchase_groups
GROUP BY customer_id, grp
HAVING consecutive_purchases >= 3;
  • Explanation: This query uses window functions to create groups of consecutive purchases and then identifies customers with at least three consecutive purchases.

Calculate the moving average of the “temperature” column for a weather table over a 7-day window.

SELECT date, 
       temperature, 
       AVG(temperature) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
FROM weather;

Explanation: The query calculates the moving average of the “temperature” column using the window function AVG() OVER with a specified window of the last 7 days.

Update the “rank” column in a “students” table based on their scores, assigning the same rank to students with equal scores.

UPDATE students
SET rank = (
    SELECT COUNT(DISTINCT score) + 1
    FROM students s2
    WHERE s2.score > students.score
);
  • Explanation: This query uses a correlated subquery to update the “rank” column based on the count of distinct scores greater than the current student’s score.

Pivot the “sales” table to show total sales for each product in each month.

SELECT product_id,
       product_name,
       COALESCE(SUM(CASE WHEN MONTH(sale_date) = 1 THEN amount END), 0) AS january_sales,
       COALESCE(SUM(CASE WHEN MONTH(sale_date) = 2 THEN amount END), 0) AS february_sales,
       -- Repeat for other months
       COALESCE(SUM(CASE WHEN MONTH(sale_date) = 12 THEN amount END), 0) AS december_sales
FROM sales
JOIN products ON sales.product_id = products.product_id
GROUP BY product_id, product_name;

This query uses conditional aggregation (CASE statements) to pivot the “sales” table and display total sales for each product in each month.

Identify the longest consecutive sequence of login dates for each user in a “logins” table.

SELECT user_id,
       MIN(login_date) AS start_date,
       MAX(login_date) AS end_date,
       COUNT(*) AS consecutive_days
FROM (
    SELECT user_id,
           login_date,
           ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) -
           ROW_NUMBER() OVER (PARTITION BY user_id, DATE_ADD(login_date, INTERVAL 1 DAY) ORDER BY login_date) AS grp
    FROM logins
) AS login_groups
GROUP BY user_id, grp
ORDER BY consecutive_days DESC
LIMIT 1;
  • Explanation: This query identifies the longest consecutive sequence of login dates for each user by creating groups of consecutive logins using window functions and then selecting the maximum consecutive days.

SQL Query Questions

What is an SQL query?

  • Explanation: An SQL query is a statement used to retrieve, insert, update, or delete data from a database. It is written using the SQL language and can be executed against a relational database management system (RDBMS).

Explain the difference between SQL and T-SQL.

  • Explanation: SQL (Structured Query Language) is a standard language for managing and manipulating relational databases. T-SQL (Transact-SQL) is Microsoft’s proprietary extension of SQL used with SQL Server and Azure SQL Database.

How do you retrieve all columns from a table named “employees”?

  • Explanation: Use the SELECT * FROM employees; query to retrieve all columns from the “employees” table.

What is the purpose of the WHERE clause in a SQL query?

  • Explanation: The WHERE clause is used to filter the rows returned by a query based on a specified condition. It is used in SELECT, UPDATE, and DELETE statements.

Write a SQL query to find the total number of rows in a table named “orders”.

  • Explanation: SELECT COUNT(*) FROM orders;

Explain the difference between INNER JOIN and LEFT JOIN.

  • Explanation: INNER JOIN returns only matching rows from both tables, while LEFT JOIN returns all rows from the left table and the matching rows from the right table.

Write a SQL query to retrieve unique values from a column named “category” in a table named “products”.

  • Explanation: SELECT DISTINCT category FROM products;

What is the purpose of the GROUP BY clause in SQL?

  • Explanation: GROUP BY is used to group rows based on the values of one or more columns. It is often used with aggregate functions like COUNT, SUM, AVG, etc.

Explain the HAVING clause in SQL.

  • Explanation: HAVING is used in combination with the GROUP BY clause to filter the results of aggregate functions based on a specified condition.

Write a SQL query to find the highest salary from an “employees” table.

  • Explanation: SELECT MAX(salary) FROM employees;

What is the purpose of the ORDER BY clause in SQL?

  • Explanation: ORDER BY is used to sort the result set of a query in ascending or descending order based on one or more columns.

Write a SQL query to update the “status” column to ‘Approved’ in a “orders” table where the “amount” is greater than 1000.

  • Explanation: UPDATE orders SET status = 'Approved' WHERE amount > 1000;

Explain the concept of a subquery.

  • Explanation: A subquery is a query nested inside another query. It can be used to retrieve data that will be used by the main query.

Write a SQL query to find the average salary for each department in a table named “employees”.

  • Explanation: SELECT department, AVG(salary) FROM employees GROUP BY department;

What is the purpose of the LIMIT clause in SQL?

  • Explanation: The LIMIT clause is used to restrict the number of rows returned by a query. It is often used for pagination.

Write a SQL query to delete all records from a table named “customers” where the “last_purchase_date” is older than a year.

  • Explanation: DELETE FROM customers WHERE last_purchase_date < NOW() - INTERVAL 1 YEAR;

Explain the concept of a self-join.

  • Explanation: A self-join occurs when a table is joined with itself. It is used to combine rows in a table with related rows in the same table.

Write a SQL query to find the top 5 highest-paid employees from an “employees” table.

  • Explanation: SELECT * FROM employees ORDER BY salary DESC LIMIT 5;

What is the purpose of the DISTINCT keyword in a SELECT statement?

  • Explanation: DISTINCT is used to retrieve unique values from a specified column in the result set of a query.

Write a SQL query to calculate the total sales for each month in a table named “sales”.

  • Explanation: SELECT MONTH(sale_date) AS month, SUM(amount) AS total_sales FROM sales GROUP BY MONTH(sale_date);

FQAs

Window Functions:

Q: What are window functions in SQL, and when would you use them?

  • A: Window functions perform calculations across a specified range of rows related to the current row. They are used for tasks like ranking, moving averages, and cumulative sums.

Q: Explain the difference between the ROW_NUMBER() and RANK() functions.

  • A: ROW_NUMBER() assigns a unique number to each row, while RANK() assigns a rank, with ties receiving the same rank and leaving gaps.

Q: How can you use the LAG() and LEAD() functions in a query?

  • A: LAG() accesses data from a previous row, and LEAD() accesses data from a subsequent row, both relative to the current row.

Common Table Expressions (CTEs):

Q: What is a Common Table Expression (CTE), and how is it different from a subquery?

  • A: A CTE is a named temporary result set defined within a SELECT, INSERT, UPDATE, or DELETE statement. It enhances readability and can be referenced multiple times.

Q: Provide an example where using a CTE improves the readability and performance of a query.

  • A: Using a CTE to calculate aggregates or recursive queries can enhance readability and simplify complex logic.

Stored Procedures and Functions:

Q: Compare and contrast stored procedures and functions. When would you prefer one over the other?

  • A: Both are reusable blocks of code, but stored procedures do not have a return value, while functions return a value. Use stored procedures for actions, functions for calculations.

Q: How do you handle errors within a stored procedure?

  • A: Use a combination of TRY, CATCH blocks to handle errors gracefully within a stored procedure.

Indexes and Query Optimization:

Q: Explain the importance of indexes in a database. When should you create composite indexes?

  • A: Indexes improve query performance by allowing the database engine to quickly locate rows. Composite indexes are beneficial when queries involve multiple columns.

Q: Discuss the concept of covering indexes and their impact on query performance.

  • A: Covering indexes include all columns needed for a query, eliminating the need to access the actual table and improving performance.

Transactions and Concurrency Control:

Q: What is a transaction in the context of a relational database?

  • A: A transaction is a sequence of one or more SQL statements treated as a single unit, ensuring data consistency and integrity.

Q: How does isolation level affect the behavior of transactions in a multi-user environment?

  • A: Isolation level defines the degree to which one transaction must be isolated from the effects of other concurrent transactions, impacting consistency and performance.

Advanced Joins:

Q: Explain the differences between INNER JOIN, LEFT JOIN, and RIGHT JOIN. Provide examples of when to use each.

  • A: INNER JOIN returns matching rows, LEFT JOIN returns all from the left table and matching from the right, and RIGHT JOIN is the opposite. Choose based on data requirements.

Q: How can you achieve the same result as a FULL JOIN using other join types?

  • A: You can simulate a FULL JOIN using a combination of LEFT JOIN and RIGHT JOIN with UNION.

Materialized Views:

Q: What is a materialized view, and how does it differ from a regular view?

  • A: A materialized view is a physical copy of the result set of a query, stored for faster retrieval. It differs from a regular view, which is a virtual table.

Q: In what scenarios would you choose to use a materialized view?

  • A: Materialized views are useful for complex queries with large datasets where precomputing and storing results can significantly improve performance.

Recursive Queries:

Q: Describe a scenario where a recursive query is necessary. Provide an example.

  • A: Recursive queries are useful for handling hierarchical data, such as organizational charts or category hierarchies.

Q: How does a recursive common table expression work?

  • A: A recursive CTE refers to itself in the SELECT statement, allowing iterative processing until a specified condition is met.

Database Security:

Q: What are some best practices for securing a database?

  • A: Best practices include using strong passwords, limiting access with roles and permissions, encrypting sensitive data, and regularly updating and patching the database system.

Q: Explain the concept of role-based access control (RBAC) in the context of database security.

  • A: RBAC assigns permissions to roles, and users are then assigned to roles. This simplifies permission management and ensures consistent access control.

Advanced Subqueries:

Q: How do correlated subqueries differ from non-correlated subqueries?

  • A: Correlated subqueries refer to columns in the outer query, while non-correlated subqueries can run independently of the outer query.

Q: Provide an example where a subquery is more appropriate than a join.

  • A: Subqueries are often more suitable when the inner query result depends on the outer query.

JSON in SQL:

Q: How does SQL handle JSON data? What functions can you use to manipulate JSON?

  • A: SQL has functions like JSON_VALUE, JSON_QUERY, and JSON_MODIFY to extract and manipulate data within JSON documents.

Q: Demonstrate how to extract specific information from a JSON column.

  • A: Using JSON_VALUE or JSON_QUERY functions along with the path to the desired data.

Temporal Tables:

Q: Explain the purpose of temporal tables. How do they simplify historical data tracking?

  • A: Temporal tables track changes to data over time, making it easy to retrieve historical versions of records.

Q: Provide a query that retrieves data as of a specific point in time using temporal tables.

  • A: Use the AS OF SYSTEM TIME clause to query temporal tables at a specific timestamp.
Three-Schema Architecture of DBMS

What is three scheme architecture of DBMS. Explain this architecture with the help of diagram. Also explain physical data independence and logical data independence with the help of example and structure.

Three-Schema Architecture of DBMS

The Three-Schema Architecture is a database architecture proposed by the ANSI/X3/SPARC committee that provides a clear separation between the user applications and the database system. It consists of three levels or schemas: the External Schema, the Conceptual Schema, and the Internal Schema.

1. External Schema:

The External Schema represents the user view of the data. It defines how individual users or user groups perceive the organization of the data. Each user or application can have its own external schema, customized to meet its specific requirements. This level shields users from changes in the database structure, providing a level of abstraction.

2. Conceptual Schema:

The Conceptual Schema represents the overall logical structure of the entire database as seen by the database administrator. It is an abstraction that defines the relationships between different entities and the constraints on the data. Changes to the database structure at this level impact all users but are transparent to them since their external schemas remain unchanged.

3. Internal Schema:

The Internal Schema defines how the data is stored in the physical storage devices, such as hard drives. It includes details about data structures, file organization, indexing mechanisms, and access paths. Changes at this level are transparent to both the conceptual and external schemas, ensuring that modifications to the physical storage do not affect the logical structure or user views.

Three-Schema Architecture Diagram

Three-Schema Architecture of DBMS
  • User Level (External Schema): Represents various user views or applications (e.g., user A and user B), each with its own customized view of the data.
  • Logical Level (Conceptual Schema): Represents the logical structure of the entire database, including relationships between entities and constraints.
  • Physical Level (Internal Schema): Represents the physical storage details, such as file organization, indexing, and access paths.

Physical Data Independence

Physical data independence is one of the key advantages provided by the Three-Schema Architecture. It refers to the ability to modify the physical storage structure of the database without affecting the conceptual or external schemas. This independence ensures that changes made to improve performance or storage efficiency do not require alterations to how users perceive or interact with the data.

Example of Physical Data Independence

Let’s consider a scenario where the internal schema needs optimization for storage space. Suppose the data is initially stored in a table with a fixed-length format:

CREATE TABLE employees (
    employee_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    hire_date DATE
);

Later, to save storage space, the database administrator decides to compress the data and store it in a more space-efficient manner, perhaps using a different storage format or a different file organization.

-- Modified Internal Schema for Compression
CREATE TABLE employees_compressed (
    employee_id INT,
    full_name VARCHAR(100),
    hire_date DATE
);

Despite this internal change, the conceptual schema and external schemas remain unaffected. Users can still interact with the data using the same queries and applications as before, and the logical structure of the database hasn’t changed.

Logical Data Independence

Logical data independence is another critical aspect of the Three-Schema Architecture. It refers to the ability to modify the conceptual schema without affecting the external schemas or the applications built on top of them. This independence allows for changes in the logical organization of the data without disrupting user views.

Example of Logical Data Independence

Let’s consider an example where the company decides to add a new attribute, “email,” to the employees. Initially, the conceptual schema looks like this:

-- Initial Conceptual Schema
CREATE TABLE employees (
    employee_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    hire_date DATE
);

Now, the company wants to add an “email” attribute to the conceptual schema:

-- Modified Conceptual Schema
CREATE TABLE employees (
    employee_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    hire_date DATE,
    email VARCHAR(100)
);

This modification is transparent to the users operating at the external schema level. They can continue to use their applications without any changes because the external schema remains the same.

Conclusion

The Three-Schema Architecture provides a clear and organized approach to database design by separating user views (external schema), the logical structure of the database (conceptual schema), and the physical storage details (internal schema). Physical and logical data independence ensure flexibility and adaptability to changes in the storage structure or logical organization of the data without impacting users and applications. This architecture is foundational for building robust and scalable database systems.