[python] How to obfuscate Python code effectively?

I am looking for how to hide my Python source code.

print "Hello World!" 

How can I encode this example so that it isn't human-readable? I've been told to use base64 but I'm not sure how.

This question is related to python

The answer is


I would mask the code like this:

def MakeSC():
    c = raw_input(" Encode: ")
    sc = "\\x" + "\\x".join("{0:x}".format(ord(c)) for c in c)
    print "\n shellcode =('" + sc + "'); exec(shellcode)"; MakeSC();

Cleartext:

import os; os.system("whoami")

Encoded:

Payload = ('\x69\x6d\x70\x6f\x72\x74\x20\x6f\x73\x3b\x20\x6f\x73\x2e\x73\x79\x73\x74\x65\x6d\x28\x22\x77\x68\x6f\x61\x6d\x69\x22\x29'); exec(Payload);

I know it is an old question. Just want to add my funny obfuscated "Hello world!" in Python 3 and some tips ;)

#//'written in c++'

#include <iostream.h>
#define true false
import os
n = int(input())
_STACK_CALS=  [ ];
_i_CountCals__= (0x00)
while os.urandom(0x00 >> 0x01) or (1 & True):
  _i_CountCals__+= 0o0;break;# call shell command echo "hello world" > text.txt
""#print'hello'
__cal__= getattr( __builtins__  ,'c_DATATYPE_hFILE_radnom'[ 0x00 ]+'.h'[-1]+'getRndint'[3].lower() )
_o0wiXSysRdrct   =eval (  __cal__(0x63) + __cal__(104) + 'r_RUN_CALLER'[0] );
_i1CLS_NATIVE=  getattr (__builtins__ ,__cal__(101)+__cal__(118  )+_o0wiXSysRdrct ( 0b1100001 )+'LINE 2'[0].lower( ))#line 2 kernel call
__executeMAIN_0x07453320abef  =_i1CLS_NATIVE ( 'map');
def _Main():
    raise 0x06;return 0 # exit program with exit code 0
def _0o7af():_i1CLS_NATIVE('_int'.replace('_', 'programMain'[:2]))(''.join(  __executeMAIN_0x07453320abef( _o0wiXSysRdrct ,_STACK_CALS)));return;_Main()
for _INCREAMENT in [0]*1024:
    _STACK_CALS= [0x000 >> 0x001 ,True&False&True&False ,'c++', 'h', 'e', 'l', 'o',' ', 'w', 'o', 'r', 'l', 'd']
   
#if
for _INCREAMENT in [0]*1024:
    _STACK_CALS= [40, 111, 41, 46, 46] * n
    
""""""#print'word'
while True:
    break;
_0o7af();
while os.urandom(0x00 >> 0xfa) or (1 & True): # print "Hello, world!"
  _i_CountCals__-= 0o0;break;
  while os.urandom(0x00 >> 0x01) or (1 & True):
      _i_CountCals__ += 0o0;
      break;

It is possible to do manually, my tips are:

  • use eval and/or exec with encrypted strings

  • use [ord(i) for i in s] / ''.join(map(chr, [list of chars goes here])) as simple encryption/decryption

  • use obscure variable names

  • make it unreadable

  • Don't write just 1 or True, write 1&True&0x00000001 ;)

  • use different number systems

  • add confusing comments like "line 2" on line 10 or "it returns 0" on while loop.

  • use __builtins__

  • use getattr and setattr


I'll write my answer in a didactic manner...

First type into your Python interpreter:

import this

then, go and take a look to the file this.py in your Lib directory within your Python distribution and try to understand what it does.

After that, take a look to the eval function in the documentation:

help(eval)

Now you should have found a funny way to protect your code. But beware, because that only works for people that are less intelligent than you! (and I'm not trying to be offensive, anyone smart enough to understand what you did could reverse it).


There are multiple ways to obfuscate code. Here's just one example:

(lambda _, __, ___, ____, _____, ______, _______, ________:
    getattr(
        __import__(True.__class__.__name__[_] + [].__class__.__name__[__]),
        ().__class__.__eq__.__class__.__name__[:__] +
        ().__iter__().__class__.__name__[_____:________]
    )(
        _, (lambda _, __, ___: _(_, __, ___))(
            lambda _, __, ___:
                chr(___ % __) + _(_, __, ___ // __) if ___ else
                (lambda: _).func_code.co_lnotab,
            _ << ________,
            (((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)
            - _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<
            __) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______
            << ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<
            ((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<
            __) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______
            << (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +
            _) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<
            (((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<
            _))) + (_____ << ______) + (_ << ___)
        )
    )
)(
    *(lambda _, __, ___: _(_, __, ___))(
        (lambda _, __, ___:
            [__(___[(lambda: _).func_code.co_nlocals])] +
            _(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []
        ),
        lambda _: _.func_code.co_argcount,
        (
            lambda _: _,
            lambda _, __: _,
            lambda _, __, ___: _,
            lambda _, __, ___, ____: _,
            lambda _, __, ___, ____, _____: _,
            lambda _, __, ___, ____, _____, ______: _,
            lambda _, __, ___, ____, _____, ______, _______: _,
            lambda _, __, ___, ____, _____, ______, _______, ________: _
        )
    )
)

Maybe you can try on pyconcrete

encrypt .pyc to .pye and decrypt when import it

encrypt & decrypt by library OpenAES

Usage

Full encrypted

  • convert all of your .py to *.pye

    $ pyconcrete-admin.py compile --source={your py script}  --pye
    $ pyconcrete-admin.py compile --source={your py module dir} --pye
    
  • remove *.py *.pyc or copy *.pye to other folder

  • main.py encrypted as main.pye, it can't be executed by normal python. You must use pyconcrete to process the main.pye script. pyconcrete(exe) will be installed in your system path (ex: /usr/local/bin)

    pyconcrete main.pye
    src/*.pye  # your libs
    

Partial encrypted (pyconcrete as lib)

  • download pyconcrete source and install by setup.py

    $ python setup.py install \
      --install-lib={your project path} \
      --install-scripts={where you want to execute pyconcrete-admin.py and pyconcrete(exe)}
    
  • import pyconcrete in your main script

  • recommendation project layout

    main.py       # import pyconcrete and your lib
    pyconcrete/*  # put pyconcrete lib in project root, keep it as original files
    src/*.pye     # your libs
    

Opy

https://github.com/QQuick/Opy

Opy will obfuscate your extensive, real world, multi module Python source code for free! And YOU choose per project what to obfuscate and what not, by editing the config file:

You can recursively exclude all identifiers of certain modules from obfuscation.
You can exclude human readable configuration files containing Python code.
You can use getattr, setattr, exec and eval by excluding the identifiers they use.
You can even obfuscate module file names and string literals.
You can run your obfuscated code from any platform.

Unlike some of the other options posted, this works for both Python 2 and 3! It is also free / opensource, and it is not an online only tool (unless you pay) like some of the others out there.

I am admittedly still evaluating this myself, but all of initial tests of it worked perfectly. It appears this is exactly what I was looking for!

The official version runs as a standalone utility, with the original intended design being that you drop a script into the root of the directory you want to obfuscate, along with a config file to define the details/options you want to employ. I wasn't in love with that plan, so I added a fork from project, allowing you to import and utilize the tool from a library instead. That way, you can roll this directly into a more encompassing packaging script. (You could of course wrap multiple py scripts in bash/batch, but I think a pure python solution is ideal). I requested my fork be merged into the original work, but in case that never happens, here's the url to my revised version:

https://github.com/BuvinJT/Opy


Try pasting your hello world python code to the following site:

http://enscryption.com/encrypt-and-obfuscate-scripts.html

It will produce a complex encrypted and obfuscated, but fully functional script for you. See if you can crack the script and reveal the actual code. Or see if the level of complexity it provides satisfies your need for peace of mind.

The encrypted script that is produced for you through this site should work on any Unix system that has python installed.

If you would like to encrypt another way, I strongly suggest you write your own encryption/obfuscation algorithm (if security is that important to you). That way, no one can figure out how it works but you. But, for this to really work, you have to spend a tremendous amount of time on it to ensure there aren't any loopholes that someone who has a lot of time on their hands can exploit. And make sure you use tools that are already natural to the Unix system... i.e. openssl or base64. That way, your encrypted script is more portable.


You can use the base64 module to encode strings to stop shoulder surfing, but it's not going to stop someone finding your code if they have access to your files.

You can then use the compile() function and the eval() function to execute your code once you've decoded it.

>>> import base64
>>> mycode = "print 'Hello World!'"
>>> secret = base64.b64encode(mycode)
>>> secret
'cHJpbnQgJ2hlbGxvIFdvcmxkICEn'
>>> mydecode = base64.b64decode(secret)
>>> eval(compile(mydecode,'<string>','exec'))
Hello World!

So if you have 30 lines of code you'll probably want to encrypt it doing something like this:

>>> f = open('myscript.py')
>>> encoded = base64.b64encode(f.read())

You'd then need to write a second script that does the compile() and eval() which would probably include the encoded script as a string literal encased in triple quotes. So it would look something like this:

import base64
myscript = """IyBUaGlzIGlzIGEgc2FtcGxlIFB5d
              GhvbiBzY3JpcHQKcHJpbnQgIkhlbG
              xvIiwKcHJpbnQgIldvcmxkISIK"""
eval(compile(base64.b64decode(myscript),'<string>','exec'))

Check out these tools for obfuscation and minification of python code:

  • pyarmor, https://pypi.org/project/pyarmor/ - full obfuscation with hex-encoding; apparently doesn't allow partial obfuscation of variable/function names only
  • python-minifier, https://pypi.org/project/python-minifier/ - minifies the code and obfuscates function/variable names (although not as intensely as pyminifier below)
  • pyminifier, https://pypi.org/project/pyminifier/ - does a good job in obfuscating names of functions, variables, literals; can also perform hex-encoding (compression) similar as pyarmor. Problem: after obfuscation the code may contain syntax errors and not run.

Example .py output from pyminifier when run with --obfuscate and --gzip:

$ pyminifier --obfuscate --gzip /tmp/tumult.py

#!/usr/bin/env python3
import zlib, base64
exec(zlib.decompress(base64.b64decode('eJx1kcFOwzAMhu95ClMO66apu0/KAQEbE5eJC+IUpa27haVJ5Ljb+vakLYJx4JAoiT/7/+3c3626SKvSuBW6M4Sej96Jq9y1wRM/E3kSexnIOBZObrSNKI7Sl59YsWDq1wLMiEKNrenoYCqB1woDwzXF9nn2rskZd1jDh+9mhOD8DVvAQ8WdtrZfwg74aNwp7ZpnMXHUaltk878ybR/ZNKbSjP8JPWk6wdn72ntodQ8lQucIrdGlxaHgq3QgKqtjhCY/zlN6jQ0oZZxhpfKItlkuNB3icrE4XYbDwEBICRP6NjG1rri3YyzK356CtsGwZuNd/o0kYitvrBd18qgmj3kcwoTckYPtJPAyCVzSKPCMNErs85+rMINdp1tUSspMqVYbp1Q2DWKTJpcGURRDr9DIJs8wJFlKq+qzZRaQ4lAnVRuJgjFynj36Ol7SX/iQXr8ANfezCw==')))
# Created by pyminifier.py (https://github.com/liftoff/pyminifier)

This output corresponds to a 40-line original input script as shown here.


Cython

It seems that the goto answer for this is Cython. I'm really surprised no one else mentioned this yet? Here's the home page: https://cython.org

In a nutshell, this transforms your python into C and compiles it, thus making it as well protected as any "normal" compiled distributable C program.

There are limitations though. I haven't explored them in depth myself, because as I started to read about them, I dropped the idea for my own purposes. But it might still work for yours. Essentially, you can't use Python to the fullest, with the dynamic awesomeness it offers. One major issue that jumped out at me, was that keyword parameters are not usable :( You must write function calls using positional parameters only. I didn't confirm this, but I doubt you can use conditional imports, or evals. I'm not sure how polymorphism is handled...

Anyway, if you aren't trying to obfuscate a huge code base after the fact, or ideally if you have the use of Cython in mind to begin with, this is a very notable option.


This is only a limited, first-level obfuscation solution, but it is built-in: Python has a compiler to byte-code:

python -OO -m py_compile <your program.py>

produces a .pyo file that contains byte-code, and where docstrings are removed, etc. You can rename the .pyo file with a .py extension, and python <your program.py> runs like your program but does not contain your source code.

PS: the "limited" level of obfuscation that you get is such that one can recover the code (with some of the variable names, but without comments and docstrings). See the first comment, for how to do it. However, in some cases, this level of obfuscation might be deemed sufficient.

PPS: If your program imports modules obfuscated like this, then you need to rename them with a .pyc suffix instead (I'm not sure this won't break one day), or you can work with the .pyo and run them with python -O ….pyo (the imports should work). This will allow Python to find your modules (otherwise, Python looks for .py modules).


I recently stumbled across this blogpost: Python Source Obfuscation using ASTs where the author talks about python source file obfuscation using the builtin AST module. The compiled binary was to be used for the HitB CTF and as such had strict obfuscation requirements.

Since you gain access to individual AST nodes, using this approach allows you to perform arbitrary modifications to the source file. Depending on what transformations you carry out, resulting binary might/might not behave exactly as the non-obfuscated source.


As other answers have stated, there really just isn't a way that's any good. Base64 can be decoded. Bytecode can be decompiled. Python was initially just interpreted, and most interpreted languages try to speed up machine interpretation more than make it difficult for human interpretation.

Python was made to be readable and shareable, not obfuscated. The language decisions about how code has to be formatted were to promote readability across different authors.

Obfuscating python code just doesn't really mesh with the language. Re-evaluate your reasons for obfuscating the code.


There are 2 ways to obfuscate python scripts

  • Obfuscate byte code of each code object
  • Obfuscate whole code object of python module

Obfuscate Python Scripts

  • Compile python source file to code object

    char * filename = "xxx.py";
    char * source = read_file( filename );
    PyObject *co = Py_CompileString( source, filename, Py_file_input );
    
  • Iterate code object, wrap bytecode of each code object as the following format

    0   JUMP_ABSOLUTE            n = 3 + len(bytecode)    
    3
    ...
    ... Here it's obfuscated bytecode
    ...
    
    n   LOAD_GLOBAL              ? (__armor__)
    n+3 CALL_FUNCTION            0
    n+6 POP_TOP
    n+7 JUMP_ABSOLUTE            0
    
  • Serialize code object and obfuscate it

    char *original_code = marshal.dumps( co );
    char *obfuscated_code = obfuscate_algorithm( original_code  );
    
  • Create wrapper script "xxx.py", ${obfuscated_code} stands for string constant generated in previous step.

    __pyarmor__(__name__, b'${obfuscated_code}')
    

Run or Import Obfuscated Python Scripts

When import or run this wrapper script, the first statement is to call a CFunction:

int __pyarmor__(char *name, unsigned char *obfuscated_code) 
{
  char *original_code = resotre_obfuscated_code( obfuscated_code );
  PyObject *co = marshal.loads( original_code );
  PyObject *mod = PyImport_ExecCodeModule( name, co );
}

This function accepts 2 parameters: module name and obfuscated code, then

  • Restore obfuscated code
  • Create a code object by original code
  • Import original module (this will result in a duplicated frame in Traceback)

Run or Import Obfuscated Bytecode

After module imported, when any code object in this module is called first time, from the wrapped bytecode descripted in above section, we know

  • First op JUMP_ABSOLUTE jumps to offset n

  • At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and place the original bytecode at offset 0

  • After function call, the last instruction jumps back to offset 0. The real bytecode is now executed

Refer to Pyarmor


You could embed your code in C/C++ and compile Embedding Python in Another Application

embedded.c

#include <Python.h>

int
main(int argc, char *argv[])
{
  Py_SetProgramName(argv[0]);  /* optional but recommended */
  Py_Initialize();
  PyRun_SimpleString("print('Hello world !')");
  Py_Finalize();
  return 0;
}

In Ubuntu/Debian

$ sudo apt-get install python-dev

In Centos/Redhat/Fedora

$ sudo yum install python-devel

compile with

$ gcc -o embedded -fPIC -I/usr/include/python2.7 -lpython2.7 embedded.c

run with

$ chmod u+x ./embedded
$ time ./embedded
Hello world !

real  0m0.014s
user  0m0.008s
sys 0m0.004s

initial script: hello_world.py:

print('Hello World !')

run the script

$ time python hello_world.py
Hello World !

real  0m0.014s
user  0m0.008s
sys 0m0.004s

however some strings of the python code may be found in the compiled file

$ grep "Hello" ./embedded
Binary file ./embedded matches

$ grep "Hello World" ./embedded
$

In case you want an extra bit of obfuscation you could use base64

...
PyRun_SimpleString("import base64\n"
                  "base64_code = 'your python code in base64'\n"
                  "code = base64.b64decode(base64_code)\n"
                  "exec(code)");
...

e.g:

create the base 64 string of your code

$ base64 hello_world.py
cHJpbnQoJ0hlbGxvIFdvcmxkICEnKQoK

embedded_base64.c

#include <Python.h>

int
main(int argc, char *argv[])
{
  Py_SetProgramName(argv[0]);  /* optional but recommended */
  Py_Initialize();
  PyRun_SimpleString("import base64\n"
                    "base64_code = 'cHJpbnQoJ0hlbGxvIFdvcmxkICEnKQoK'\n"
                    "code = base64.b64decode(base64_code)\n"
                    "exec(code)\n");
  Py_Finalize();
  return 0;
}

all commands

$ gcc -o embedded_base64 -fPIC -I/usr/include/python2.7 -lpython2.7 ./embedded_base64.c
$ chmod u+x ./embedded_base64

$ time ./embedded_base64
Hello World !

real  0m0.014s
user  0m0.008s
sys 0m0.004s

$ grep "Hello" ./embedded_base64
$

update:

this project (pyarmor) might also help:

https://pypi.org/project/pyarmor/


The best way to do this is to first generate a .c file, and then compile it with tcc to a .pyd file
Note: Windows-only

Requirements

  1. tcc
  2. pyobfuscate
  3. Cython

Install:

sudo pip install -U cython

To obfuscate your .py file:

pyobfuscate.py myfile.py >obfuscated.py

To generate a .c file,

  1. Add an init<filename>() function to your .py file Optional

  2. cython --embed file.py

  3. cp Python.h tcc\include

  4. tcc file.c -o file.pyd -shared -I\path\to\Python\include -L\path\to\Python\lib

  5. import .pyd file into app.exe


so that it isn't human-readable?

i mean all the file is encoded !! when you open it you can't understand anything .. ! that what i want

As maximum, you can compile your sources into bytecode and then distribute only bytecode. But even this is reversible. Bytecode can be decompiled into semi-readable sources.

Base64 is trivial to decode for anyone, so it cannot serve as actual protection and will 'hide' sources only from complete PC illiterates. Moreover, if you plan to actually run that code by any means, you would have to include decoder right into the script (or another script in your distribution, which would needed to be run by legitimate user), and that would immediately give away your encoding/encryption.

Obfuscation techniques usually involve comments/docs stripping, name mangling, trash code insertion, and so on, so even if you decompile bytecode, you get not very readable sources. But they will be Python sources nevertheless and Python is not good at becoming unreadable mess.

If you absolutely need to protect some functionality, I'd suggest going with compiled languages, like C or C++, compiling and distributing .so/.dll, and then using Python bindings to protected code.


Try this python obfuscator:

pyob.oxyry.com pyob.oxyry.c

__all__ = ['foo']

a = 'a'
_b = 'b'

def foo():
    print(a)

def bar():
    print(_b)

def _baz():
    print(a + _b)

foo()
bar()
_baz()

will translated to

__all__ =['foo']#line:1
OO00OO0OO0O00O0OO ='a'#line:3
_O00OO0000OO0O0O0O ='b'#line:4
def foo ():#line:6
    print (OO00OO0OO0O00O0OO )#line:7
def O0000000OOOO00OO0 ():#line:9
    print (_O00OO0000OO0O0O0O )#line:10
def _OOO00000O000O0OOO ():#line:12
    print (OO00OO0OO0O00O0OO +_O00OO0000OO0O0O0O )#line:13
foo ()#line:15
O0000000OOOO00OO0 ()#line:16
_OOO00000O000O0OOO ()#line:17

Well if you want to make a semi-obfuscated code you make code like this:

import base64
import zlib
def run(code): exec(zlib.decompress(base64.b16decode(code)))
def enc(code): return base64.b16encode(zlib.compress(code))

and make a file like this (using the above code):

f = open('something.py','w')
f.write("code=" + enc("""
print("test program")
print(raw_input("> "))"""))
f.close()

file "something.py":

code = '789CE352008282A2CCBC120DA592D4E212203B3FBD28315749930B215394581E9F9957500A5463A7A0A4A90900ADFB0FF9'

just import "something.py" and run run(something.code) to run the code in the file.

One trick is to make the code hard to read by design: never document anything, if you must, just give the output of a function, not how it works. Make variable names very broad, movie references, or opposites example: btmnsfavclr = 16777215 where as "btmnsfavclr" means "Batman's Favorite Color" and the value is 16777215 or the decimal form of "ffffff" or white. Remember to mix different styles of naming to keep those pesky people of of your code. Also, use tips on this site: Top 11 Tips to Develop Unmaintainable Code.


Maybe you should look into using something simple like a truecrypt volume for source code storage as that seems to be a concern of yours. You can create an encrypted file on a usb key or just encrypt the whole volume (provided the code will fit) so you can simply take the key with you at the end of the day.

To compile, you could then use something like PyInstaller or py2exe in order to create a stand-alone executable. If you really wanted to go the extra mile, look into a packer or compression utility in order to add more obfuscation. If none of these are an option, you could at least compile the script into bytecode so it isn't immediately readable. Keep in mind that these methods will merely slow someone trying to debug or decompile your program.