Python Reverse Engineering Basics: Bytecode & Decompilation

Python is often viewed as a simple scripting language, but its compilation model is surprisingly elegant. When you run a script, the interpreter converts the source code into bytecode, which is executed by the Python Virtual Machine (PVM). Understanding how this bytecode works is the key to reverse engineering protected Python applications.

In this introductory guide, we’ll explore the foundation of Python reverse engineering, standard decompilation tools, and how obfuscated packages are analyzed by security experts.

The Structure of Python Bytecode

Python bytecode consists of instruction opcodes and parameters stored inside compiled .pyc files. A .pyc file contains:

  1. A Magic Number: 4 bytes representing the specific Python version compiler version.
  2. Modification Time & Size: Metadata indicating when the script was compiled.
  3. The Marshal Object: The serialized Code Object containing variable tables, constant arrays, and the raw bytecode instruction stream.

We can dissect a Python function's internal bytecode instructions using the built-in dis module:

# Python disassembly demonstration
import dis

def verify_serial(key):
    if key == "KCRACKER-SECURE-KEY":
        return True
    return False

dis.dis(verify_serial)

The disassembly output shows the virtual machine stack operations:

  2           0 LOAD_FAST                0 (key)
              2 LOAD_CONST               1 ('KCRACKER-SECURE-KEY')
              4 COMPARE_OP               2 (==)
              6 POP_JUMP_IF_FALSE       12

  3           8 LOAD_CONST               2 (True)
             10 RETURN_VALUE

  4     >>   12 LOAD_CONST               3 (False)
             14 RETURN_VALUE

Standard Decompilation Toolchains

To automate code recovery from standard .pyc files, reverse engineers use automated decompilers. These parse the compiled code objects and reconstruct original AST structures:

Analyzing Obfuscated Modules

When standard decompilers run into obfuscated scripts, they crash or return incomplete chunks. Standard protections implement several bypass barriers:

Solving these challenges requires resolving custom opcode mappings, tracing virtual instruction pointers in debuggers, and cleaning control-flow graphs manually or with custom solver scripts.