The usage of Cython and the posture of filling the hole

Because of the needs of the project, the existing Python code needs to be optimized. The current execution process of Python code is to convert the Python code into line-by-line instructions, and then the interpreter interprets the execution of the instructions and calls them to the C code layer. If you remove the instruction interpretation stage and directly enter the C code layer, the efficiency will be higher. If you use the Python C API to transform Python code into C code and use it as a built-in module of Python, the workload is extremely large, and its correctness cannot be guaranteed, so this method is not realistic. The Cython library just meets the needs of this scenario, converting the existing Python code into C language code and extending it as a built-in module of Python.

Release Notes:

Python 2.7.13 ( CPython )

Cython 0.25.2

Introduction to file types in Python:

.py python source code file

After importing the .pyc Python source code, compile the generated bytecode

.pyo Python source code compilation optimizes the resulting bytecode. pyo is not much more optimized than pyc, it just removes the assertion

.pyd Python dynamic link library (Windows platform)

.py, .pyc, .pyo have almost no difference in running speed, except that pyc and pyo files are loaded faster, you cannot view the content with a text editor, and decompilation is not easy

 

The goal of this article is to generate the test.c file from the test.py file, and then use the test.c file as part of the Python source code to recompile to generate Python, and directly import test to use the test module.

Basic introduction to Cython:

The documentation summarizes Cython like this:

Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.

is a compiler for the Python programming language, writing C extensions is as easy as writing Python code.

Its most important functions are:

  • write Python code that calls back and forth from and to C or C++ code natively at any point.

That is to translate Python code to C code. After that, you can use these C codes like the C language extension Python module introduced in the previous article.

 

Basic usage of Cython:

 When compiling Python code with Cython, be sure to install the C/C++ compiler. This article is a development environment with Visual Studio 2015 installed directly.

1. Install the Cython library

   pip install Cython

 it's so simple

2. Write a test code file test.py and put it in D:/test/test.py

def say_hello():
    print "hello world"

Then in the same directory, create a new setup.py file with the following content:

from distutils.core import setup
from Cython.Build import cythonize

setup(ext_modules = cythonize("test.py"))

cythonize() is an API provided by Cython to convert Python code into C code,

setup is a method provided by Python to publish Python modules.

3. Compile the Python code using the command line:

python setup.py build_ext  --inplace

If this happens, it is because there is no C compiler-related configuration that has not been set properly. Microsoft Visual Studio is generally used on Windows, and different VS versions have different settings.

  • Visual Studio 2010 (VS10): SET VS90COMNTOOLS=%VS100COMNTOOLS%
  • Visual Studio 2012 (VS11): SET VS90COMNTOOLS=%VS110COMNTOOLS%
  • Visual Studio 2013 (VS12): SET VS90COMNTOOLS=%VS120COMNTOOLS%
  • Visual Studio 2015 (VS14): SET VS90COMNTOOLS=%VS140COMNTOOLS%
  • Visual Studio 2017 (VS14): SET VS90COMNTOOLS=%VS150COMNTOOLS% 

Here VS2015 is used as the C compiler.

Enter SET VS90COMNTOOLS=%VS140COMNTOOLS% in the command line

Then enter the compile command: python setup.py build_ext --inplace

The final generated result is as follows:

In the D:/test/ directory:

test.c is the C code file converted by test.py, you can see that test.c is very large! !

test.pyd is a dynamic link library for python, which will be loaded when we use import test

Temporary files generated during compilation in the build directory

Use the just-generated test module as you would any Python module:

 

A little explanation of the command line here: python setup.py build_ext --inplace

build_ext is an extension module that specifies python to generate C/C++ (build C/C++ extensions (compile/link to build directory))

--inplace instructs to put the compiled extension module directly in the same directory as test.py.

 

The flow of the entire Cython work is shown in the following figure:

In two steps:

1) The .py file is compiled into a .c file using Cython;

2) The .c file uses a C compiler to generate a .pyd (windos) or .so (linux) file.

 In addition to this common usage, you can also add static type declarations to some places in the Python code, and you can further improve the efficiency of Python. These are small skills~

for example:

def say_hello(int s):
    cdef int a = 2
    print s + 2

The s and a variables are directly indicated as int types, and there is no need to do type inference in dynamic languages.

 

small test:

copy code
import math
import time

def f():
    time1 = time.time()
    for i in range(100000000):
        x = math.sqrt(i)
    time2 = time.time()
    print time2 - time1
copy code

This native Python code runs in 13.17 seconds, and with Cython optimization, it runs in 9.36 seconds. Basically an increase of 30%. In fact, Cython generally claims that the efficiency improvement is probably so much.

 

Pit in Cython

In this subsection, we discuss some pits in Cython and the posture of filling pits. Cython's official documentation has clearly pointed out some unsupported Python features, some of which are not planned to be repaired, and combined with specific project scenarios, some solutions to the pits are given.

Specific project requirements: Translate some Python code modules that need to be optimized into C code, add them to the project, compile and link them, and use them as a built-in module of Python.

Therefore, you only need to convert it into C code. You don't need to use the distutils module provided by Python, only cythonize provided by Cython.

1. Extract the Cython directory of install from Python's site-package and make it independent. Because it is for other people to use, if others pip install cython, the version may be inconsistent, and some problems will occur.

The Cython directory is the Cython source code and cython.py under Python2.7/Lib/site-package, namely:

CythonTool is a py script file that encapsulates the conversion into C code.

When using, you need to set sys.path, and we can find our independent Cython module when importing.

# import Cython path
sys.path.insert(0, cython_path) from Cython.Build import cythonize from Cython.Compiler import Options

Add cython_path to the head of sys.path, so Cython in Python site-package will not affect our independent Cython module.

2. When compiling python code to C code, you need to specify the output C code file path. Cython defaults to the python script directory, which will cause the py file and the .c file to be mixed together, which is easy to mess up.

There are currently three working directories

LibDir: The directory where the Python script to be optimized is located

CfileDir: The directory where the output C code file is located

ToolDir: The directory where the packaged cython optimization script is located. Its function is to convert the Python module in LibDir into C code, and then output it to CfileDir

Therefore, the working directory of the encapsulated cython script is in ToolDir. The core of the script is the code:

cythonize(pyfilePath, build_dir=CfileDir)

Use the build_dir parameter to specify the C code output directory.

It looks perfect, but the Cython source code has a hole in it here.

When specifying build_dir, when both pyfilePath and CfileDir are absolute paths, and the working directory of the cython script is inconsistent with pyfilePath, cythonize will set the directory of the output file to the directory where pyfilePath is located, so the final output C code file will not arrive CfileDir.

So you should call os.chdir(LibDir) in the encapsulated cython script, and switch to the original working directory when the conversion is complete. Keep in mind that the working directory of cython should be the same as the directory of the python script to be optimized.

Reason: The implementation in cythonize has such a piece of code: [in debugging state]

 

In the red box, if c_file is an absolute file name, the following situations will occur. As for why c_file is an absolute file name, it is because the working directory of cython does not match the directory of the script to be optimized.

 

 3. The original Cython's package support for Python is not enough, a big pit! !

The pit can only be filled by modifying the source code of Cython.

After the original Cython compiles Python, there are two key places in the generated C code. Take the test module as an example:

The test module initialization function is defined here, and this function will contain the code part for creating the test module:

When importing, the Python interpreter will call this, initialize the test module, and add the test name to sys.builtin_module_names.

The test found that if there is D:/Lib/mypackate/test.py, after compilation, the generated C code is no different from the code generated by D:/Lib/test.py, that is, the package mypackate is ignored, resulting in the generated The C code has no package dependencies.

Reading along the code, I finally determined the source of the problem, Cython/Compiler/ModuleNode.py, and modified two functions in this file:

1) Generate module init code function: replace env.module_name with full_module_name, that is, replace init_test with initmypackage_test

2) Modified the module name rules passed in when creating a module, and taking into account the situation of mypackage/__init__.py, __path__ needs to be added to the package to identify that this object is not an ordinary Python module, but a package .

 

 4. Deep pit. related to inspect and types.

There are various type judgment functions in the Inspect module, such as isfunction, ismethod, ismodule, etc. The pit here is:

The cythonized function type becomes cython_function_or_method, and the original python function type is function, so if isfunction(func, types.FunctionType) is used in the Python script to be optimized, if func is the original function, it will return True, and cythonized The function returns False. In addition to the function type, there are generators, and there are also inconsistencies in the functionType.func_globals type.

At present, trick is added to the isfunction of inspect.py, which will judge

type(func).__name__=="cython_function_or_method". And the types.py module is not cythonized, then if you call inspect.isfunction(func, types.FunctionType), there is no problem with the original Python function or the cythonized function.

But if you use isinstance(func, types.FunctionType) directly, there will still be problems. Types.FunctionType is only correct for the original python function.

In a word, the type in python may be different from the corresponding type after cythonization. I summed up most of the python types, a few of which are inconsistent after cythonization:

There is no good solution. Either rewrite the inspect module, but also ensure that the Python code cannot directly use the types module, or modify the implementation of isinstance in the Python source code.

5. The pits listed in the official documentation

1) Nested tuple is not supported, the feature in Python2, Python3 does not support it. So Cython does not directly support the Nested tuple feature

2) Variable name not found: You can disable the latter behaviour by setting "error_on_unknown_names" to

 Solution:

3)Stack Frames. 

 Cython does not support Stack Frames.

 

Summary: You can consider using Cython to optimize some simple Python projects. If you use very complex scenarios, some syntax features are not supported, and there will be pits that cannot be bypassed.

 

References:

https://github.com/cython/cython

https://mdqinc.com/blog/2011/08/statically-linking-python-with-cython-generated-modules-and-packages/

 

Related: The usage of Cython and the posture of filling the hole