DISTINCT Keyword in SQL

Structured Query Language (SQL) is a ubiquitous and powerful language used to manipulate and retrieve data stored in relational databases. Among the multitude of functionalities SQL provides is the ability to filter data based on unique values or remove duplicate records, a feature that comes in handy while dealing with large data sets where duplicate entries are likely. The DISTINCT keyword in SQL is the key tool in our toolbox to achieve this functionality. In this blog post, we will delve deep into understanding the DISTINCT keyword and explore how to effectively use it in your SQL queries.

Understanding the DISTINCT Keyword

In SQL, DISTINCT is a keyword used in the SELECT statement to eliminate duplicate records and fetch only unique records. When the SELECT DISTINCT statement is used, the returned result set will only include unique rows. This feature is extremely helpful when you are interested in unique values in a certain column or set of columns.

Here is the basic syntax for using the DISTINCT keyword:

SELECT DISTINCT column_name1, column_name2, ..., column_nameN 
FROM table_name;

This will return all the unique combinations of column_name1, column_name2, …, column_nameN from the table specified.

Using DISTINCT on a Single Column

Suppose we have a customers table as shown below:

CustomerID	CustomerName	Country
1	Alfreds Futterkiste	Germany
2	Ana Trujillo Emparedados	Mexico
3	Antonio Moreno Taquería	Mexico
4	Around the Horn	UK
5	Berglunds snabbköp	Sweden

And we want to get a list of all unique countries in the Country column. The SQL query would be:

SELECT DISTINCT Country FROM customers;

The output will be:

Country
Germany
Mexico
UK
Sweden

Using DISTINCT on Multiple Columns

In the case where you need to select unique combinations of multiple columns, you can include all the column names in your SELECT DISTINCT statement.

For instance, consider the following orders table:

OrderID	CustomerID	OrderDate
1	3	2023-01-01
2	1	2023-02-01
3	1	2023-02-01
4	2	2023-03-01
5	5	2023-03-01

If you want to select the unique combinations of CustomerID and OrderDate, your SQL query would look like this:

SELECT DISTINCT CustomerID, OrderDate FROM orders;

The output will be:

CustomerID	OrderDate
3	2023-01-01
1	2023-02-01
2	2023-03-01
5	2023-03-01

Using DISTINCT with Aggregate Functions

The DISTINCT keyword can also be used in conjunction with SQL aggregate functions like COUNT, SUM, AVG, etc. This can be extremely useful for getting the count of distinct values in a column or computing the sum or average of unique values in a column.

Here is an example of using DISTINCT with the COUNT function. Suppose we want to find out how many unique countries our customers come from using the customers table from before:

SELECT COUNT(DISTINCT Country) ASDistinctCountries FROM customers;

This SQL statement will return the count of unique countries.

Understanding the Difference Between DISTINCT and GROUP BY

It is important to understand the difference between DISTINCT and GROUP BY as they can sometimes yield the same result but are not always interchangeable.

The DISTINCT keyword allows you to filter out duplicate rows, giving you a result set with unique values. GROUP BY, on the other hand, groups your results by specified column or columns. While this grouping may result in unique values, GROUP BY is typically used with aggregate functions (COUNT, SUM, AVG, etc.) to perform operations on each group of rows.

Let's illustrate this difference using an example:

Suppose we have a sales table:

SalesPerson	Region	Sales
Alice	West	500
Bob	East	300
Alice	West	200
Bob	East	500
Charlie	North	700

If we want to see the total sales per person, we use GROUP BY:

SELECT SalesPerson, SUM(Sales) as TotalSales
FROM sales
GROUP BY SalesPerson;

This will give us:

SalesPerson	TotalSales
Alice	700
Bob	800
Charlie	700

If we want to see unique combinations of SalesPerson and Region, we would use DISTINCT:

SELECT DISTINCT SalesPerson, Region
FROM sales;

This will give us:

SalesPerson	Region
Alice	West
Bob	East
Charlie	North

Frequently Asked Questions (FAQs)

Q: Can I use the DISTINCT keyword with more than one column?

Yes, you can use DISTINCT with more than one column. The result will be the unique combinations of those columns.

Q: What is the difference between DISTINCT and GROUP BY?

While DISTINCT and GROUP BY may sometimes give the same result, they serve different purposes. DISTINCT is used to filter out duplicate rows, while GROUP BY is used to group the rows by certain column or columns, typically used with aggregate functions (COUNT, SUM, AVG, etc.).

Q: Can I use DISTINCT in a subquery?

Yes, DISTINCT can be used in a subquery. It will return the unique values from the column or columns specified in the subquery.

Q: How does DISTINCT handle NULL values?

In SQL, DISTINCT treats all NULL values as the same. Therefore, if multiple rows contain NULL values, DISTINCT will consider them as duplicates and will only return one NULL value.

Q: Can DISTINCT be used with ORDER BY?

Yes, DISTINCT can be used with ORDER BY. The query will first eliminate duplicate rows based on the DISTINCT clause, and then order the resulting unique rows based on the ORDER BY clause.

Understanding the power of the DISTINCT keyword in SQL opens up a host of possibilities in data manipulation and retrieval. It's an essential tool for any SQL developer, data analyst, or anyone working with data in a relational database environment. Make sure you practice using DISTINCT in your queries to gain proficiency and improve your SQL skills.