[python] Why compile Python code?

Why would you compile a Python script? You can run them directly from the .py file and it works fine, so is there a performance advantage or something?

I also notice that some files in my application get compiled into .pyc while others do not, why is this?

This question is related to python compilation

The answer is


There's certainly a performance difference when running a compiled script. If you run normal .py scripts, the machine compiles it every time it is run and this takes time. On modern machines this is hardly noticeable but as the script grows it may become more of an issue.


Something not touched upon is source-to-source-compiling. For example, nuitka translates Python code to C/C++, and compiles it to binary code which directly runs on the CPU, instead of Python bytecode which runs on the slower virtual machine.

This can lead to significant speedups, or it would let you work with Python while your environment depends on C/C++ code.


Pluses:

First: mild, defeatable obfuscation.

Second: if compilation results in a significantly smaller file, you will get faster load times. Nice for the web.

Third: Python can skip the compilation step. Faster at intial load. Nice for the CPU and the web.

Fourth: the more you comment, the smaller the .pyc or .pyo file will be in comparison to the source .py file.

Fifth: an end user with only a .pyc or .pyo file in hand is much less likely to present you with a bug they caused by an un-reverted change they forgot to tell you about.

Sixth: if you're aiming at an embedded system, obtaining a smaller size file to embed may represent a significant plus, and the architecture is stable so drawback one, detailed below, does not come into play.

Top level compilation

It is useful to know that you can compile a top level python source file into a .pyc file this way:

python -m py_compile myscript.py

This removes comments. It leaves docstrings intact. If you'd like to get rid of the docstrings as well (you might want to seriously think about why you're doing that) then compile this way instead...

python -OO -m py_compile myscript.py

...and you'll get a .pyo file instead of a .pyc file; equally distributable in terms of the code's essential functionality, but smaller by the size of the stripped-out docstrings (and less easily understood for subsequent employment if it had decent docstrings in the first place). But see drawback three, below.

Note that python uses the .py file's date, if it is present, to decide whether it should execute the .py file as opposed to the .pyc or .pyo file --- so edit your .py file, and the .pyc or .pyo is obsolete and whatever benefits you gained are lost. You need to recompile it in order to get the .pyc or .pyo benefits back again again, such as they may be.

Drawbacks:

First: There's a "magic cookie" in .pyc and .pyo files that indicates the system architecture that the python file was compiled in. If you distribute one of these files into an environment of a different type, it will break. If you distribute the .pyc or .pyo without the associated .py to recompile or touch so it supersedes the .pyc or .pyo, the end user can't fix it, either.

Second: If docstrings are skipped with the use of the -OO command line option as described above, no one will be able to get at that information, which can make use of the code more difficult (or impossible.)

Third: Python's -OO option also implements some optimizations as per the -O command line option; this may result in changes in operation. Known optimizations are:

  • sys.flags.optimize = 1
  • assert statements are skipped
  • __debug__ = False

Fourth: if you had intentionally made your python script executable with something on the order of #!/usr/bin/python on the first line, this is stripped out in .pyc and .pyo files and that functionality is lost.

Fifth: somewhat obvious, but if you compile your code, not only can its use be impacted, but the potential for others to learn from your work is reduced, often severely.


As already mentioned, you can get a performance increase from having your python code compiled into bytecode. This is usually handled by python itself, for imported scripts only.

Another reason you might want to compile your python code, could be to protect your intellectual property from being copied and/or modified.

You can read more about this in the Python documentation.


We use compiled code to distribute to users who do not have access to the source code. Basically to stop inexperienced programers accidentally changing something or fixing bugs without telling us.


As already mentioned, you can get a performance increase from having your python code compiled into bytecode. This is usually handled by python itself, for imported scripts only.

Another reason you might want to compile your python code, could be to protect your intellectual property from being copied and/or modified.

You can read more about this in the Python documentation.


There is a performance increase in running compiled python. However when you run a .py file as an imported module, python will compile and store it, and as long as the .py file does not change it will always use the compiled version.

With any interpeted language when the file is used the process looks something like this:
1. File is processed by the interpeter.
2. File is compiled
3. Compiled code is executed.

obviously by using pre-compiled code you can eliminate step 2, this applies python, PHP and others.

Heres an interesting blog post explaining the differences http://julipedia.blogspot.com/2004/07/compiled-vs-interpreted-languages.html
And here's an entry that explains the Python compile process http://effbot.org/zone/python-compile.htm


The .pyc file is Python that has already been compiled to byte-code. Python automatically runs a .pyc file if it finds one with the same name as a .py file you invoke.

"An Introduction to Python" says this about compiled Python files:

A program doesn't run any faster when it is read from a ‘.pyc’ or ‘.pyo’ file than when it is read from a ‘.py’ file; the only thing that's faster about ‘.pyc’ or ‘.pyo’ files is the speed with which they are loaded.

The advantage of running a .pyc file is that Python doesn't have to incur the overhead of compiling it before running it. Since Python would compile to byte-code before running a .py file anyway, there shouldn't be any performance improvement aside from that.

How much improvement can you get from using compiled .pyc files? That depends on what the script does. For a very brief script that simply prints "Hello World," compiling could constitute a large percentage of the total startup-and-run time. But the cost of compiling a script relative to the total run time diminishes for longer-running scripts.

The script you name on the command-line is never saved to a .pyc file. Only modules loaded by that "main" script are saved in that way.


As already mentioned, you can get a performance increase from having your python code compiled into bytecode. This is usually handled by python itself, for imported scripts only.

Another reason you might want to compile your python code, could be to protect your intellectual property from being copied and/or modified.

You can read more about this in the Python documentation.


There is a performance increase in running compiled python. However when you run a .py file as an imported module, python will compile and store it, and as long as the .py file does not change it will always use the compiled version.

With any interpeted language when the file is used the process looks something like this:
1. File is processed by the interpeter.
2. File is compiled
3. Compiled code is executed.

obviously by using pre-compiled code you can eliminate step 2, this applies python, PHP and others.

Heres an interesting blog post explaining the differences http://julipedia.blogspot.com/2004/07/compiled-vs-interpreted-languages.html
And here's an entry that explains the Python compile process http://effbot.org/zone/python-compile.htm


Yep, performance is the main reason and, as far as I know, the only reason.

If some of your files aren't getting compiled, maybe Python isn't able to write to the .pyc file, perhaps because of the directory permissions or something. Or perhaps the uncompiled files just aren't ever getting loaded... (scripts/modules only get compiled when they first get loaded)


There is a performance increase in running compiled python. However when you run a .py file as an imported module, python will compile and store it, and as long as the .py file does not change it will always use the compiled version.

With any interpeted language when the file is used the process looks something like this:
1. File is processed by the interpeter.
2. File is compiled
3. Compiled code is executed.

obviously by using pre-compiled code you can eliminate step 2, this applies python, PHP and others.

Heres an interesting blog post explaining the differences http://julipedia.blogspot.com/2004/07/compiled-vs-interpreted-languages.html
And here's an entry that explains the Python compile process http://effbot.org/zone/python-compile.htm


Beginners assume Python is compiled because of .pyc files. The .pyc file is the compiled bytecode, which is then interpreted. So if you've run your Python code before and have the .pyc file handy, it will run faster the second time, as it doesn't have to re-compile the bytecode

compiler: A compiler is a piece of code that translates the high level language into machine language

Interpreters: Interpreters also convert the high level language into machine readable binary equivalents. Each time when an interpreter gets a high level language code to be executed, it converts the code into an intermediate code before converting it into the machine code. Each part of the code is interpreted and then execute separately in a sequence and an error is found in a part of the code it will stop the interpretation of the code without translating the next set of the codes.

Sources: http://www.toptal.com/python/why-are-there-so-many-pythons http://www.engineersgarage.com/contribution/difference-between-compiler-and-interpreter


As already mentioned, you can get a performance increase from having your python code compiled into bytecode. This is usually handled by python itself, for imported scripts only.

Another reason you might want to compile your python code, could be to protect your intellectual property from being copied and/or modified.

You can read more about this in the Python documentation.


Beginners assume Python is compiled because of .pyc files. The .pyc file is the compiled bytecode, which is then interpreted. So if you've run your Python code before and have the .pyc file handy, it will run faster the second time, as it doesn't have to re-compile the bytecode

compiler: A compiler is a piece of code that translates the high level language into machine language

Interpreters: Interpreters also convert the high level language into machine readable binary equivalents. Each time when an interpreter gets a high level language code to be executed, it converts the code into an intermediate code before converting it into the machine code. Each part of the code is interpreted and then execute separately in a sequence and an error is found in a part of the code it will stop the interpretation of the code without translating the next set of the codes.

Sources: http://www.toptal.com/python/why-are-there-so-many-pythons http://www.engineersgarage.com/contribution/difference-between-compiler-and-interpreter


There's certainly a performance difference when running a compiled script. If you run normal .py scripts, the machine compiles it every time it is run and this takes time. On modern machines this is hardly noticeable but as the script grows it may become more of an issue.


The .pyc file is Python that has already been compiled to byte-code. Python automatically runs a .pyc file if it finds one with the same name as a .py file you invoke.

"An Introduction to Python" says this about compiled Python files:

A program doesn't run any faster when it is read from a ‘.pyc’ or ‘.pyo’ file than when it is read from a ‘.py’ file; the only thing that's faster about ‘.pyc’ or ‘.pyo’ files is the speed with which they are loaded.

The advantage of running a .pyc file is that Python doesn't have to incur the overhead of compiling it before running it. Since Python would compile to byte-code before running a .py file anyway, there shouldn't be any performance improvement aside from that.

How much improvement can you get from using compiled .pyc files? That depends on what the script does. For a very brief script that simply prints "Hello World," compiling could constitute a large percentage of the total startup-and-run time. But the cost of compiling a script relative to the total run time diminishes for longer-running scripts.

The script you name on the command-line is never saved to a .pyc file. Only modules loaded by that "main" script are saved in that way.


Pluses:

First: mild, defeatable obfuscation.

Second: if compilation results in a significantly smaller file, you will get faster load times. Nice for the web.

Third: Python can skip the compilation step. Faster at intial load. Nice for the CPU and the web.

Fourth: the more you comment, the smaller the .pyc or .pyo file will be in comparison to the source .py file.

Fifth: an end user with only a .pyc or .pyo file in hand is much less likely to present you with a bug they caused by an un-reverted change they forgot to tell you about.

Sixth: if you're aiming at an embedded system, obtaining a smaller size file to embed may represent a significant plus, and the architecture is stable so drawback one, detailed below, does not come into play.

Top level compilation

It is useful to know that you can compile a top level python source file into a .pyc file this way:

python -m py_compile myscript.py

This removes comments. It leaves docstrings intact. If you'd like to get rid of the docstrings as well (you might want to seriously think about why you're doing that) then compile this way instead...

python -OO -m py_compile myscript.py

...and you'll get a .pyo file instead of a .pyc file; equally distributable in terms of the code's essential functionality, but smaller by the size of the stripped-out docstrings (and less easily understood for subsequent employment if it had decent docstrings in the first place). But see drawback three, below.

Note that python uses the .py file's date, if it is present, to decide whether it should execute the .py file as opposed to the .pyc or .pyo file --- so edit your .py file, and the .pyc or .pyo is obsolete and whatever benefits you gained are lost. You need to recompile it in order to get the .pyc or .pyo benefits back again again, such as they may be.

Drawbacks:

First: There's a "magic cookie" in .pyc and .pyo files that indicates the system architecture that the python file was compiled in. If you distribute one of these files into an environment of a different type, it will break. If you distribute the .pyc or .pyo without the associated .py to recompile or touch so it supersedes the .pyc or .pyo, the end user can't fix it, either.

Second: If docstrings are skipped with the use of the -OO command line option as described above, no one will be able to get at that information, which can make use of the code more difficult (or impossible.)

Third: Python's -OO option also implements some optimizations as per the -O command line option; this may result in changes in operation. Known optimizations are:

  • sys.flags.optimize = 1
  • assert statements are skipped
  • __debug__ = False

Fourth: if you had intentionally made your python script executable with something on the order of #!/usr/bin/python on the first line, this is stripped out in .pyc and .pyo files and that functionality is lost.

Fifth: somewhat obvious, but if you compile your code, not only can its use be impacted, but the potential for others to learn from your work is reduced, often severely.


There's certainly a performance difference when running a compiled script. If you run normal .py scripts, the machine compiles it every time it is run and this takes time. On modern machines this is hardly noticeable but as the script grows it may become more of an issue.


The .pyc file is Python that has already been compiled to byte-code. Python automatically runs a .pyc file if it finds one with the same name as a .py file you invoke.

"An Introduction to Python" says this about compiled Python files:

A program doesn't run any faster when it is read from a ‘.pyc’ or ‘.pyo’ file than when it is read from a ‘.py’ file; the only thing that's faster about ‘.pyc’ or ‘.pyo’ files is the speed with which they are loaded.

The advantage of running a .pyc file is that Python doesn't have to incur the overhead of compiling it before running it. Since Python would compile to byte-code before running a .py file anyway, there shouldn't be any performance improvement aside from that.

How much improvement can you get from using compiled .pyc files? That depends on what the script does. For a very brief script that simply prints "Hello World," compiling could constitute a large percentage of the total startup-and-run time. But the cost of compiling a script relative to the total run time diminishes for longer-running scripts.

The script you name on the command-line is never saved to a .pyc file. Only modules loaded by that "main" script are saved in that way.


The .pyc file is Python that has already been compiled to byte-code. Python automatically runs a .pyc file if it finds one with the same name as a .py file you invoke.

"An Introduction to Python" says this about compiled Python files:

A program doesn't run any faster when it is read from a ‘.pyc’ or ‘.pyo’ file than when it is read from a ‘.py’ file; the only thing that's faster about ‘.pyc’ or ‘.pyo’ files is the speed with which they are loaded.

The advantage of running a .pyc file is that Python doesn't have to incur the overhead of compiling it before running it. Since Python would compile to byte-code before running a .py file anyway, there shouldn't be any performance improvement aside from that.

How much improvement can you get from using compiled .pyc files? That depends on what the script does. For a very brief script that simply prints "Hello World," compiling could constitute a large percentage of the total startup-and-run time. But the cost of compiling a script relative to the total run time diminishes for longer-running scripts.

The script you name on the command-line is never saved to a .pyc file. Only modules loaded by that "main" script are saved in that way.


We use compiled code to distribute to users who do not have access to the source code. Basically to stop inexperienced programers accidentally changing something or fixing bugs without telling us.


Yep, performance is the main reason and, as far as I know, the only reason.

If some of your files aren't getting compiled, maybe Python isn't able to write to the .pyc file, perhaps because of the directory permissions or something. Or perhaps the uncompiled files just aren't ever getting loaded... (scripts/modules only get compiled when they first get loaded)


There's certainly a performance difference when running a compiled script. If you run normal .py scripts, the machine compiles it every time it is run and this takes time. On modern machines this is hardly noticeable but as the script grows it may become more of an issue.


Yep, performance is the main reason and, as far as I know, the only reason.

If some of your files aren't getting compiled, maybe Python isn't able to write to the .pyc file, perhaps because of the directory permissions or something. Or perhaps the uncompiled files just aren't ever getting loaded... (scripts/modules only get compiled when they first get loaded)


Something not touched upon is source-to-source-compiling. For example, nuitka translates Python code to C/C++, and compiles it to binary code which directly runs on the CPU, instead of Python bytecode which runs on the slower virtual machine.

This can lead to significant speedups, or it would let you work with Python while your environment depends on C/C++ code.


There is a performance increase in running compiled python. However when you run a .py file as an imported module, python will compile and store it, and as long as the .py file does not change it will always use the compiled version.

With any interpeted language when the file is used the process looks something like this:
1. File is processed by the interpeter.
2. File is compiled
3. Compiled code is executed.

obviously by using pre-compiled code you can eliminate step 2, this applies python, PHP and others.

Heres an interesting blog post explaining the differences http://julipedia.blogspot.com/2004/07/compiled-vs-interpreted-languages.html
And here's an entry that explains the Python compile process http://effbot.org/zone/python-compile.htm