[python] How to obfuscate Python code effectively?

There are 2 ways to obfuscate python scripts

  • Obfuscate byte code of each code object
  • Obfuscate whole code object of python module

Obfuscate Python Scripts

  • Compile python source file to code object

    char * filename = "xxx.py";
    char * source = read_file( filename );
    PyObject *co = Py_CompileString( source, filename, Py_file_input );
    
  • Iterate code object, wrap bytecode of each code object as the following format

    0   JUMP_ABSOLUTE            n = 3 + len(bytecode)    
    3
    ...
    ... Here it's obfuscated bytecode
    ...
    
    n   LOAD_GLOBAL              ? (__armor__)
    n+3 CALL_FUNCTION            0
    n+6 POP_TOP
    n+7 JUMP_ABSOLUTE            0
    
  • Serialize code object and obfuscate it

    char *original_code = marshal.dumps( co );
    char *obfuscated_code = obfuscate_algorithm( original_code  );
    
  • Create wrapper script "xxx.py", ${obfuscated_code} stands for string constant generated in previous step.

    __pyarmor__(__name__, b'${obfuscated_code}')
    

Run or Import Obfuscated Python Scripts

When import or run this wrapper script, the first statement is to call a CFunction:

int __pyarmor__(char *name, unsigned char *obfuscated_code) 
{
  char *original_code = resotre_obfuscated_code( obfuscated_code );
  PyObject *co = marshal.loads( original_code );
  PyObject *mod = PyImport_ExecCodeModule( name, co );
}

This function accepts 2 parameters: module name and obfuscated code, then

  • Restore obfuscated code
  • Create a code object by original code
  • Import original module (this will result in a duplicated frame in Traceback)

Run or Import Obfuscated Bytecode

After module imported, when any code object in this module is called first time, from the wrapped bytecode descripted in above section, we know

  • First op JUMP_ABSOLUTE jumps to offset n

  • At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and place the original bytecode at offset 0

  • After function call, the last instruction jumps back to offset 0. The real bytecode is now executed

Refer to Pyarmor