What is Cython? How to use it? Making a small project with it
Python is most famous for it is very beginner friendly and the most readable language able to make complex projects easily. But Python is also infamous for its speed. Other popular languages like C, C++, and Java are faster than Python. But there is also a way to make your Python projects faster while keeping the code base nearly the same using Cython. Let’s discuss what it is and how to use it in this article.
Why is Python slow?
Python is slow due to several factors and some are more complex than others. Two of the main factors are:
Python is an Interpreted Language
Instead of languages like C, C++, and Java, Python is an interpreted language. C, C++, and Java are examples of compiled languages. These languages compile the whole program at once into a binary file and execute via the binary file itself.
Programs are run line by line by an “interpreter” in languages like Python, and JavaScript. The execution of each line depends according to its position inside the file. Even though this method makes the execution of a program slower than compiled type languages, interpreted languages have their own advantages. Programs written with these languages are platform-independent, easy to debug, and generally have a much smaller code base.
Python is a dynamically typed language
Dynamically-typed languages have different methods for using memory. The variables accept whatever value is given to them and constantly change the memory location according to the value stored. Simply put, the data type of a variable is not fixed and only depends on the type of value stored. Hence, the data type of a variable can be easily changed with a change in the value. For example, if an integer is stored inside a variable, the variable becomes an integer type. Then if a string or an entirely different object is passed inside that same variable, the value is stored and the variable’s data type changes.
Disadvantages of Python being dynamically typed
- The interpreter does all the hard work of allocating (assigning) and deallocating (unassigning) the memory as per the value and variable.
- Additional mechanisms and processes keep a reference count for each memory block used in the program. When a memory block’s reference count hits zero, it means the program has no way or method to access that memory block again. Hence, the interpreter assumes this memory’s value as garbage.
- Above a certain threshold memory limit, the interpreter runs the garbage collector to remove all the values and objects inside memory that are no longer in use.
- When using operators inside a program, the interpreter has to check the data types of all the incoming values. The operation is then decided according to the variables’ data types. The interpreter raises an exception when there is no available operation for that combination of data types.
These checks and additional processes slow the execution of a Python program. Unless a single particular data type bounds the variable, the interpreter has to keep changing the memory location of the variable. If this was to happen only a few times it wouldn’t be a problem, but if this operation is in a loop, then the interpreter takes a long time dealing with the whole operation. Hence, for better performance, use Cython.
What is Cython?
Cython is a programming language built on top of Python acting as a form of a superset. It has a statically typed system of C and C++ and has a compiler that converts the Cython code directly into C or C++.
Note that Cython programs can not be run directly as normal programs are run, Cython programs act as a library for importing inside other Python programs.
Difference between Cython and CPython
There is a common misconception that Cython is the same as CPython. But it is not so.
- CPython: CPython is an interpreter for Python2 and Python3 written in C language. It is the default interpreter for running Python programs.
- Cython: Cython is a different language built as a superset of Python. It shares Python’s syntax with some added keywords and is able to compile with a statically typed system similar to C/C++.
Advantages of using Cython
Cython uses the best of both worlds when it comes to Python and C/C++. Cython gives them the flexibility of Python and the speed & performance of C/C++. Apart from the performance boost that Cython gives, there are still a lot of other advantages of using Cython:
- Cython uses the Python libraries without any direct interaction with Python, along with the libraries inside C as well.
- Cython libraries use the same garbage collector as that of Python.
- Cython automatically checks for runtime errors that can arise in C.
- Cython uses the Global Interpreter Lock of Python.
- Cython is more secure than Python. Software modules and libraries that require more protection and safety against popular vulnerabilities use Cython.
Many libraries inside Python like NumPy, SpaCy, etc. use Cython instead of just Python under the hood to boost their performance.
How to use Cython?
Cython comes as an installable library using the “pip” package manager. To install Cython, you also need to have Python and pip installed on your computer.
Install it using this command:
pip install Cython
Code language: Bash (bash)
The program built with Cython has a “.pyx” file extension instead of the regular “.py” file extension of Python programs. Then compile the Cython program using the command:
cythonize -i [file_name].pyx
Code language: Bash (bash)
The “-i” switch compiles the program “in place” such that all the other builds and compilation files are dissolved inside one compilation file.
Beginners also use this command:
cythonize -a -i [file_name].pyx
Code language: Bash (bash)
The “-a” switch generates an “annotation” file which is essentially an HTML file. This file contains highlightings of the Python/Cython code with processes of conversions underneath it.
Another way to compile the code is to use the setup.py file. Add the following code inside a setup.py file, in the same directory as the Cython file:
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules= cythonize("cython_insertion_sort.pyx"),
)
Code language: Bash (bash)
After that, run the setup.py file using the following command:
python setup.py build_ext --inplace
Code language: Bash (bash)
Building a small project
What better way to compare two languages than running a nested for-loop. But in this project, we will be creating two snippets with the same logic. One with Cython and the other with Python. Instead of implementing a simple nested loop, we will be adding the Insertion Sort algorithm.
Setting Up the project directory
Create a new folder and add two files namely “cython_insertion_sort” and “python_insertion_sort” for the code in respective languages. Create a main.py file to run and compare these two snippets. The final project’s directory map will look something like this:
.
├── cython_insertion_sort.pyx
├── main.py
├── python_insertion_sort.py
├── setup.py
├── utils.py
└── venv
└── ...
Code language: Bash (bash)
Setting Up a Virtual Environment
When building a new project, developers recommend setting up a new virtual environment. A virtual environment tidies up your working environment and starts with freshly installed packages and also removes redundancies.
First, install the virtual environment software with your Python (steps not included here as it varies with the machine and operating system). Then set it up using the following command:
virtualenv [name of the environment: eg. venv]
Code language: Bash (bash)
Then start the virtual environment using the following:
# For windows
.\venv\Scripts\activate
# For Unix-based OS
source ./venv/bin/activate
Code language: Bash (bash)
Installing Cython Packages
Cython is installable as a Python package:
pip install Cython
Code language: Bash (bash)
Coding the Python program
Given below is the Insertion Sort algorithm using Python:
# python_insertion_sort.py
def sort(arr):
n = len(arr)
for i in range(1, n):
j = i - 1
while j >= 0 and arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
j -= 1
return arr
Code language: Python (python)
Coding in Cython
Cython shares a similar syntax as Python. Hence, only little changes could result in huge differences. Cython’s main focus is to add static typing. In this way, the program only needs to declare and initialize variables beforehand, the remaining code is almost the same.
# cython: language_level=3
# To tell cython we are using Python3
# cython_insertion_sort.pyx
from array import array
from cpython cimport array
cpdef list sort(list nums):
cdef int n = len(nums)
cdef int i, j
cpdef int[:] arr = array("i", nums)
for i in range(1, n):
j = i - 1
while j >= 0 and arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
j -= 1
return list(arr)
Code language: Python (python)
Creating the main program
In the main.py file import the sorting algorithm from both programs, and time them both.
import python_insertion_sort as p_is
try:
import cython_insertion_sort as c_is
except ModuleNotFoundError:
# No Cython module found, compiling now.
# This is another method to compile Cython programs from within a snippet
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules= cythonize("cython_insertion_sort.pyx"),
# build_dir= "build",
script_args= ['build_ext', "--inplace"]
)
import cython_insertion_sort as c_is
from utils import generate_nums, timeit
def main():
values = generate_nums(n= 10000)
result1, t1 = timeit(c_is.sort, values.copy())
result2, t2 = timeit(p_is.sort, values.copy())
print(f"Python takes {t2:.4f} seconds")
print(f"Cython takes {t1:.4f} seconds")
print(f"Cython is {t2/t1:.2f}x faster!!")
print(f"Both the sorted arrays are the same:", result1 == result2)
if __name__ == "__main__":
print()
main()
print("Done------------------------------------------\n")
Code language: Python (python)
Result
Python takes 4.3736 seconds
Cython takes 0.0665 seconds
Cython is 65.77x faster!!
Both the sorted arrays are the same: True
Done------------------------------------------
Code language: Bash (bash)
Conclusion
Cython can not replace Python as it only acts alongside it to ramp up the performance of the programs. Cython is used to make libraries and other modules for importing only and is not run directly. It takes on the heavy toll of processing and leaves the remaining logic to the base programs themselves.
This repository contains all the code from this article.
Sharing is caring
Did you like what Aman Ahmed Siddiqui wrote? Thank them for their work by sharing it on social media.
No comments so far
Curious about this topic? Continue your journey with these coding courses:
302 students learning
Haris
Python Crash Course for Beginners
Surendra varma Pericherla
Learn Data Structures Using Python