How to get column names in a python pandas data frame?

While dealing with databases or data frames the most common term heard is column or attribute name. The main reason is they provide a valid meaning to the type of values present under the column. Hence, column names are a big asset while understanding the data.

Introduction

With a brief understanding of the importance of column names, let’s now retrieve the column names from the data frames using the pandas library. There are different methods available in pandas that help us to fetch the column names. Let’s see each of them in the upcoming section. Using column names has many real-life use cases, especially in the field of data science and machine learning.

Do give a read of the pandas docs in general and pandas.DataFrame.columns docs are the main focus of this blog.

Different Methods

Now let’s look at different methods available that are being offered by pandas library to fetch the details of the column names. These are as follows:

Using columns attribute with dataframes objects Method

This is the straightforward approach for getting the column names of a data frame. We make use of the {dataframe_name}.columns method to get the column names. It returns the column names as a pandas index data type. The command used is:

print(df.columns)Code language: Python (python)

The output type is <class ‘pandas.core.indexes.base.Index’>. This type is not consumable, hence we move to other methods.

## Sample output
Index(['Web frameworks', 'License'], dtype='object')Code language: Bash (bash)

The point to be noted is that small addition to the above methods proves very useful. Let’s see the minute difference in action:

print(df.columns.values)Code language: Python (python)

It results in a list, of one of the most preferred data types of Python for developers to work.

## Sample output
['Web frameworks', 'License'Code language: Bash (bash)

Columns are simply iterated

This is one of the naive approaches to using any loops like while, for, or do-while loops. This increases the flexibility of using the column names in our desired way. This is very useful in Data Science, especially while dealing with the data preprocessing step of separating the categorical and numerical values. The code to perform the task is:

for col in df.columns:
    print(col)
Code language: Python (python)

We would iterate through all the columns as indexed. This combination proves useful in proceeding steps to make the columns more accessible. The output is shown below:

## Sample output
Web frameworks
LicenseCode language: Bash (bash)

Using columns attribute with data frame object

We will make use of the columns method to list the column names and explicitly convert them to a list datatype. Its usage is as follows:

print(list(df.columns))Code language: Python (python)

The output of the above code execution is shown below:

## Sample Output
['Web frameworks', 'License']Code language: Bash (bash)

By using the keys() function, you will also get the dataframe’s columns.

This is one of the simplest/shortest commands in terms of length. We have to just use this inbuilt function to get the column names. The implementation of the same is shown below.

print(df.keys())Code language: Python (python)

This yields the same output as the first one, but the keys() function is widely used while dealing with the dictionary data type.

## Sample output
Index(['Web frameworks', 'License'], dtype='object')Code language: Bash (bash)

It is not preferred while dealing with data frames, though they are structured like dictionaries.

Using the tolist() method to list column values

This provides an explicit way of converting the column names to a list. This can be combined with most of the ways to convert it into a list. Its syntax is as follows:

print(list(df.columns.values.tolist()))Code language: Python (python)

This is not practically used because the lengthy syntax is one of the biggest drawbacks. Also, it is not a good coding practice as well and makes it difficult to keep track while dealing with large codebases. The output for the above code snippet is:

## Sample output
['Web frameworks', 'License']Code language: Bash (bash)

Using the Sorted() method

Last but not the least, we can make use of sorted() the method to get the output in the form of the list as well as in ascending order too. Its implementation is as follows:

print(sorted(df))Code language: Python (python)

It returns a list and the output is shown below:

## Sample output
['License', 'Web frameworks']Code language: Bash (bash)

Conclusion

All these methods listed above help us to fetch the column names. The main difference lies in how we fetch the names and the type of the final output. It could be either pandas.Index(), list, or string data type. We can also convert from one data type to other as per our needs. Also, we have to keep in mind the syntax and ways to retrieve the column names. If not it’s better to go through the documentation of the respective method which gives a clear-cut overview of its functionality.

This is the link to the codedamn playground of all the above methods in individual files and the main file containing the code to create the data frame.

Frequently Asked Questions -FAQs

What are the column names in a python pandas data frame?

https://nodegoat.net/guide.p/448.m/22/upload-a-csv-file — CSV File / Data frame

In this data frame, we can find in the first row is dark green i.e. the column names of any data frame. Here they are Family Name, Given Name, and VIAF ID.

How can I get the column names in Python Pandas dataframes?

We have nearly 5 different methods to get the column names using the pandas library in python. They are:

Iterative method
Using data frame objects
Using keys() function
Using the column values
Using tolist() method
Using sorted() method

They are discussed above in detail.

Do Python Pandas data frames have column names?

Yes, python pandas data frames have column names. We can have duplicate column names, but if we want to avoid any duplicate columns then we have to make use of this function .set_flags(allows_duplicate_labels=False). Implementation of the same is as follows:

df = pd.DataFrame(data).set_flags(allows_duplicate_labels=False)Code language: Python (python)