Complete guide to matplotlib scatter in Python (with examples)

Complete guide to matplotlib scatter in Python (with examples)

Scatter plots play an essential role in data visualization, particularly when it comes to illustrating the relationship between two numerical variables. By representing data points on a two-dimensional graph, scatter plots help in identifying trends, clusters, and potential outliers within datasets. This makes them an invaluable tool for exploratory data analysis, allowing researchers and analysts to glean insights and draw conclusions about their data.


Matplotlib is a powerful plotting library in Python that offers a wide range of functionalities for creating static, animated, and interactive visualizations. Among its many features, scatter plots stand out as a fundamental aspect of data visualization, providing a simple yet effective way to visualize the relationship between two variables. The ability to customize these plots allows for the creation of highly informative and visually appealing graphics.

Getting Started with Matplotlib

To begin creating visualizations with Matplotlib, you first need to ensure that you have Matplotlib installed in your Python environment. Matplotlib can be easily installed and set up, making it accessible for beginners and experienced programmers alike.

Installation Process

Matplotlib can be installed using pip, Python’s package installer. Simply run the following command in your terminal or command prompt to install Matplotlib:

pip install matplotlib

This command downloads and installs the latest Matplotlib package along with its dependencies.

Importing Necessary Libraries

Once Matplotlib is installed, you can start using it by importing the necessary libraries into your Python script. The most important library for plotting is matplotlib.pyplot, which is typically imported as follows:

import matplotlib.pyplot as plt

In addition to matplotlib.pyplot, you might need to import other libraries depending on your specific requirements, such as numpy for numerical operations:

import numpy as np

Basics of Scatter Plots

Scatter plots are a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. Scatter plots are widely used to observe and show relationships between two numeric variables.

Simple Scatter Plot Example

Creating your first scatter plot with Matplotlib is straightforward. Here’s how you can plot a simple scatter plot showing the relationship between two variables:

1import matplotlib.pyplot as plt
3# Sample data
4x = [1, 2, 3, 4, 5]
5y = [2, 3, 5, 7, 11]
7# Creating scatter plot
8plt.scatter(x, y)
9plt.xlabel('X-axis label')
10plt.ylabel('Y-axis label')
11plt.title('Simple Scatter Plot')

This code snippet creates a basic scatter plot of the data points defined by x and y. The plt.xlabel and plt.ylabel functions label the x-axis and y-axis, respectively, while plt.title adds a title to the plot.

Customizing Scatter Plots

Matplotlib allows for extensive customization of scatter plots to enhance their visual appeal and make them more informative.

Changing Marker Style, Color, and Size

You can customize the appearance of the markers in a scatter plot by changing their style, color, and size. Here’s an example:

plt.scatter(x, y, marker='o', color='red', s=100) # 's' adjusts the size of the markers

Using Colormap for a Set of Data Points

Applying a colormap to differentiate a set of data points based on a third variable can add another dimension of information to your scatter plot. Here’s how you can apply a colormap:

plt.scatter(x, y, c=z, cmap='viridis') # 'c' is the array of values to color-code, 'cmap' specifies the colormap
plt.colorbar() # To show the color scale

Plotting Multiple Data Sets

Including multiple data sets in a single scatter plot allows for comparison and contrast between different data groups.

Adding Multiple Data Sets in One Scatter Plot

To plot multiple data sets in a single scatter plot and customize their appearance, you can simply call the plt.scatter function multiple times before calling plt.show():

# Second set of data
x2 = [2, 3, 4, 5, 6]
y2 = [5, 6, 8, 10, 13]

# Plotting both sets of data
plt.scatter(x, y, color='blue', label='Dataset 1')
plt.scatter(x2, y2, color='green', label='Dataset 2')

This code plots two different data sets on the same scatter plot with different colors and includes a legend to differentiate between the two data sets.

Incorporating a Third Dimension (3D Scatter Plots)

3D scatter plots add depth to your visualizations, allowing you to explore relationships between three variables. Matplotlib’s mpl_toolkits.mplot3d module enables these sophisticated visualizations. Here’s how to create a 3D scatter plot:

1from mpl_toolkits.mplot3d import Axes3D
2import matplotlib.pyplot as plt
4fig = plt.figure()
5ax = fig.add_subplot(111, projection='3d')
7x = [1, 2, 3, 4, 5]
8y = [5, 6, 7, 8, 9]
9z = [9, 8, 7, 6, 5]
11ax.scatter(x, y, z)
12ax.set_xlabel('X Label')
13ax.set_ylabel('Y Label')
14ax.set_zlabel('Z Label')

This example plots points in a 3D space, with each axis representing a different dimension (X, Y, Z). Customizing labels as shown enhances readability.

Interactive Scatter Plots

Interactive scatter plots allow users to explore data points closely, improving understanding of complex datasets. Matplotlib integrates with Jupyter Notebooks to create interactive plots using %matplotlib notebook. For web applications, libraries like Plotly and Bokeh can be used, but for simplicity, we’ll focus on Matplotlib’s capabilities.

%matplotlib notebook
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [5, 6, 7, 8, 9]

fig, ax = plt.subplots()
sc = ax.scatter(x, y)


Running this in a Jupyter Notebook renders an interactive plot, allowing zooming and panning.

Adding Annotations and Labels

Annotations and labels turn basic scatter plots into informative visualizations.

How to Add Text Labels to Individual Data Points

Adding text labels to data points can highlight significant information:

1import matplotlib.pyplot as plt
3x = [1, 2, 3, 4, 5]
4y = [5, 6, 7, 8, 9]
6fig, ax = plt.subplots()
7sc = ax.scatter(x, y)
9for i, txt in enumerate(range(len(x))):
10 ax.annotate(txt, (x[i], y[i]))

This labels each point with its index, aiding in data point identification.

Customizing Axes Labels and Plot Title

Enhancing readability and aesthetics is crucial for effective communication:

1import matplotlib.pyplot as plt
3x = [1, 2, 3, 4, 5]
4y = [5, 6, 7, 8, 9]
6plt.scatter(x, y)
7plt.title('My Scatter Plot')
8plt.xlabel('X Axis Label')
9plt.ylabel('Y Axis Label')

This code snippet adds a title and custom axes labels, significantly improving the plot’s readability.

Analyzing Real-world Data with Scatter Plots

Scatter plots become powerful when applied to real-world data, revealing insights and trends.

Selecting a Real-world Dataset

Datasets abound, but some reputable sources include Kaggle, UCI Machine Learning Repository, and government databases. Choose datasets that interest you and are relevant to your questions.

Loading Data with Pandas and Visualizing with Scatter Plots

Pandas, a data manipulation library, works seamlessly with Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt

# Loading dataset
data = pd.read_csv('path/to/your/dataset.csv')

# Visualizing
plt.scatter(data['Column1'], data['Column2'])

This simple workflow can uncover complex relationships and patterns in your data.

Best Practices for Scatter Plots

  • Keep it simple; avoid cluttering.
  • Use colors and markers effectively to differentiate data points or groups.
  • Ensure your plot is accessible by adding labels and annotations where necessary.

Sharing is caring

Did you like what Vishnupriya wrote? Thank them for their work by sharing it on social media.


No comments so far

Curious about this topic? Continue your journey with these coding courses: