DISTINCT Keyword in SQL
Structured Query Language (SQL) is a ubiquitous and powerful language used to manipulate and retrieve data stored in relational databases. Among the multitude of functionalities SQL provides is the ability to filter data based on unique values or remove duplicate records, a feature that comes in handy while dealing with large data sets where duplicate entries are likely. The DISTINCT
keyword in SQL is the key tool in our toolbox to achieve this functionality. In this blog post, we will delve deep into understanding the DISTINCT
keyword and explore how to effectively use it in your SQL queries.
Understanding the DISTINCT Keyword
In SQL, DISTINCT
is a keyword used in the SELECT
statement to eliminate duplicate records and fetch only unique records. When the SELECT DISTINCT
statement is used, the returned result set will only include unique rows. This feature is extremely helpful when you are interested in unique values in a certain column or set of columns.
Here is the basic syntax for using the DISTINCT
keyword:
SELECT DISTINCT column_name1, column_name2, ..., column_nameN FROM table_name;
This will return all the unique combinations of column_name1, column_name2, …, column_nameN from the table specified.
Using DISTINCT on a Single Column
Suppose we have a customers
table as shown below:
CustomerID | CustomerName | Country |
---|---|---|
1 | Alfreds Futterkiste | Germany |
2 | Ana Trujillo Emparedados | Mexico |
3 | Antonio Moreno Taquería | Mexico |
4 | Around the Horn | UK |
5 | Berglunds snabbköp | Sweden |
And we want to get a list of all unique countries in the Country
column. The SQL query would be:
SELECT DISTINCT Country FROM customers;
The output will be:
Country |
---|
Germany |
Mexico |
UK |
Sweden |
Using DISTINCT on Multiple Columns
In the case where you need to select unique combinations of multiple columns, you can include all the column names in your SELECT DISTINCT
statement.
For instance, consider the following orders
table:
OrderID | CustomerID | OrderDate |
---|---|---|
1 | 3 | 2023-01-01 |
2 | 1 | 2023-02-01 |
3 | 1 | 2023-02-01 |
4 | 2 | 2023-03-01 |
5 | 5 | 2023-03-01 |
If you want to select the unique combinations of CustomerID
and OrderDate
, your SQL query would look like this:
SELECT DISTINCT CustomerID, OrderDate FROM orders;
The output will be:
CustomerID | OrderDate |
---|---|
3 | 2023-01-01 |
1 | 2023-02-01 |
2 | 2023-03-01 |
5 | 2023-03-01 |
Using DISTINCT with Aggregate Functions
The DISTINCT
keyword can also be used in conjunction with SQL aggregate functions like COUNT
, SUM
, AVG
, etc. This can be extremely useful for getting the count of distinct values in a column or computing the sum or average of unique values in a column.
Here is an example of using DISTINCT
with the COUNT
function. Suppose we want to find out how many unique countries our customers come from using the customers
table from before:
SELECT COUNT(DISTINCT Country) ASDistinctCountries FROM customers;
This SQL statement will return the count of unique countries.
Understanding the Difference Between DISTINCT and GROUP BY
It is important to understand the difference between DISTINCT
and GROUP BY
as they can sometimes yield the same result but are not always interchangeable.
The DISTINCT
keyword allows you to filter out duplicate rows, giving you a result set with unique values. GROUP BY
, on the other hand, groups your results by specified column or columns. While this grouping may result in unique values, GROUP BY
is typically used with aggregate functions (COUNT
, SUM
, AVG
, etc.) to perform operations on each group of rows.
Let's illustrate this difference using an example:
Suppose we have a sales
table:
SalesPerson | Region | Sales |
---|---|---|
Alice | West | 500 |
Bob | East | 300 |
Alice | West | 200 |
Bob | East | 500 |
Charlie | North | 700 |
If we want to see the total sales per person, we use GROUP BY
:
SELECT SalesPerson, SUM(Sales) as TotalSales FROM sales GROUP BY SalesPerson;
This will give us:
SalesPerson | TotalSales |
---|---|
Alice | 700 |
Bob | 800 |
Charlie | 700 |
If we want to see unique combinations of SalesPerson and Region, we would use DISTINCT
:
SELECT DISTINCT SalesPerson, Region FROM sales;
This will give us:
SalesPerson | Region |
---|---|
Alice | West |
Bob | East |
Charlie | North |
Frequently Asked Questions (FAQs)
Q: Can I use the DISTINCT
keyword with more than one column?
Yes, you can use DISTINCT
with more than one column. The result will be the unique combinations of those columns.
Q: What is the difference between DISTINCT
and GROUP BY
?
While DISTINCT
and GROUP BY
may sometimes give the same result, they serve different purposes. DISTINCT
is used to filter out duplicate rows, while GROUP BY
is used to group the rows by certain column or columns, typically used with aggregate functions (COUNT
, SUM
, AVG
, etc.).
Q: Can I use DISTINCT
in a subquery?
Yes, DISTINCT
can be used in a subquery. It will return the unique values from the column or columns specified in the subquery.
Q: How does DISTINCT
handle NULL values?
In SQL, DISTINCT
treats all NULL
values as the same. Therefore, if multiple rows contain NULL
values, DISTINCT
will consider them as duplicates and will only return one NULL
value.
Q: Can DISTINCT
be used with ORDER BY
?
Yes, DISTINCT
can be used with ORDER BY
. The query will first eliminate duplicate rows based on the DISTINCT
clause, and then order the resulting unique rows based on the ORDER BY
clause.
Understanding the power of the DISTINCT
keyword in SQL opens up a host of possibilities in data manipulation and retrieval. It's an essential tool for any SQL developer, data analyst, or anyone working with data in a relational database environment. Make sure you practice using DISTINCT
in your queries to gain proficiency and improve your SQL skills.
Sharing is caring
Did you like what Mehul Mohan wrote? Thank them for their work by sharing it on social media.
No comments so far
Curious about this topic? Continue your journey with these coding courses: