DISTINCT Keyword in SQL
Structured Query Language (SQL) is a ubiquitous and powerful language used to manipulate and retrieve data stored in relational databases. Among the multitude of functionalities SQL provides is the ability to filter data based on unique values or remove duplicate records, a feature that comes in handy while dealing with large data sets where duplicate entries are likely. The DISTINCT keyword in SQL is the key tool in our toolbox to achieve this functionality. In this blog post, we will delve deep into understanding the DISTINCT keyword and explore how to effectively use it in your SQL queries.
Understanding the DISTINCT Keyword
In SQL, DISTINCT is a keyword used in the SELECT statement to eliminate duplicate records and fetch only unique records. When the SELECT DISTINCT statement is used, the returned result set will only include unique rows. This feature is extremely helpful when you are interested in unique values in a certain column or set of columns.
Here is the basic syntax for using the DISTINCT keyword:
SELECT DISTINCT column_name1, column_name2, ..., column_nameN FROM table_name;
This will return all the unique combinations of column_name1, column_name2, …, column_nameN from the table specified.
Using DISTINCT on a Single Column
Suppose we have a customers table as shown below:
| CustomerID | CustomerName | Country |
|---|---|---|
| 1 | Alfreds Futterkiste | Germany |
| 2 | Ana Trujillo Emparedados | Mexico |
| 3 | Antonio Moreno Taquería | Mexico |
| 4 | Around the Horn | UK |
| 5 | Berglunds snabbköp | Sweden |
And we want to get a list of all unique countries in the Country column. The SQL query would be:
SELECT DISTINCT Country FROM customers;
The output will be:
| Country |
|---|
| Germany |
| Mexico |
| UK |
| Sweden |
Using DISTINCT on Multiple Columns
In the case where you need to select unique combinations of multiple columns, you can include all the column names in your SELECT DISTINCT statement.
For instance, consider the following orders table:
| OrderID | CustomerID | OrderDate |
|---|---|---|
| 1 | 3 | 2023-01-01 |
| 2 | 1 | 2023-02-01 |
| 3 | 1 | 2023-02-01 |
| 4 | 2 | 2023-03-01 |
| 5 | 5 | 2023-03-01 |
If you want to select the unique combinations of CustomerID and OrderDate, your SQL query would look like this:
SELECT DISTINCT CustomerID, OrderDate FROM orders;
The output will be:
| CustomerID | OrderDate |
|---|---|
| 3 | 2023-01-01 |
| 1 | 2023-02-01 |
| 2 | 2023-03-01 |
| 5 | 2023-03-01 |
Using DISTINCT with Aggregate Functions
The DISTINCT keyword can also be used in conjunction with SQL aggregate functions like COUNT, SUM, AVG, etc. This can be extremely useful for getting the count of distinct values in a column or computing the sum or average of unique values in a column.
Here is an example of using DISTINCT with the COUNT function. Suppose we want to find out how many unique countries our customers come from using the customers table from before:
SELECT COUNT(DISTINCT Country) ASDistinctCountries FROM customers;
This SQL statement will return the count of unique countries.
Understanding the Difference Between DISTINCT and GROUP BY
It is important to understand the difference between DISTINCT and GROUP BY as they can sometimes yield the same result but are not always interchangeable.
The DISTINCT keyword allows you to filter out duplicate rows, giving you a result set with unique values. GROUP BY, on the other hand, groups your results by specified column or columns. While this grouping may result in unique values, GROUP BY is typically used with aggregate functions (COUNT, SUM, AVG, etc.) to perform operations on each group of rows.
Let's illustrate this difference using an example:
Suppose we have a sales table:
| SalesPerson | Region | Sales |
|---|---|---|
| Alice | West | 500 |
| Bob | East | 300 |
| Alice | West | 200 |
| Bob | East | 500 |
| Charlie | North | 700 |
If we want to see the total sales per person, we use GROUP BY:
SELECT SalesPerson, SUM(Sales) as TotalSales FROM sales GROUP BY SalesPerson;
This will give us:
| SalesPerson | TotalSales |
|---|---|
| Alice | 700 |
| Bob | 800 |
| Charlie | 700 |
If we want to see unique combinations of SalesPerson and Region, we would use DISTINCT:
SELECT DISTINCT SalesPerson, Region FROM sales;
This will give us:
| SalesPerson | Region |
|---|---|
| Alice | West |
| Bob | East |
| Charlie | North |
Frequently Asked Questions (FAQs)
Q: Can I use the DISTINCT keyword with more than one column?
Yes, you can use DISTINCT with more than one column. The result will be the unique combinations of those columns.
Q: What is the difference between DISTINCT and GROUP BY?
While DISTINCT and GROUP BY may sometimes give the same result, they serve different purposes. DISTINCT is used to filter out duplicate rows, while GROUP BY is used to group the rows by certain column or columns, typically used with aggregate functions (COUNT, SUM, AVG, etc.).
Q: Can I use DISTINCT in a subquery?
Yes, DISTINCT can be used in a subquery. It will return the unique values from the column or columns specified in the subquery.
Q: How does DISTINCT handle NULL values?
In SQL, DISTINCT treats all NULL values as the same. Therefore, if multiple rows contain NULL values, DISTINCT will consider them as duplicates and will only return one NULL value.
Q: Can DISTINCT be used with ORDER BY?
Yes, DISTINCT can be used with ORDER BY. The query will first eliminate duplicate rows based on the DISTINCT clause, and then order the resulting unique rows based on the ORDER BY clause.
Understanding the power of the DISTINCT keyword in SQL opens up a host of possibilities in data manipulation and retrieval. It's an essential tool for any SQL developer, data analyst, or anyone working with data in a relational database environment. Make sure you practice using DISTINCT in your queries to gain proficiency and improve your SQL skills.
Sharing is caring
Did you like what Mehul Mohan wrote? Thank them for their work by sharing it on social media.
No comments so far
Curious about this topic? Continue your journey with these coding courses: