Packed Python Executables: Extracting Source from PyInstaller

Distributing Python applications is notoriously difficult because end users must have the Python interpreter installed along with all required library dependencies. To solve this, developers use packaging frameworks like PyInstaller, py2exe, or cx_Freeze to compile everything into a single, convenient executable (.exe) file.

However, there is a common misconception that these tools encrypt or secure your code. In reality, packed executables are merely self-extracting archive wrappers. In this article, we’ll dive into the anatomy of a packed PyInstaller executable and show how to extract the original source code.

Anatomy of a PyInstaller Executable

When PyInstaller processes a Python project, it creates a structured bundle containing:

The PyInstaller Startup Process

When an end user clicks the .exe:

  1. The compiled bootloader starts.
  2. It creates a temporary directory in the system temp directory (usually named _MEIxxxxxx).
  3. It decompresses all embedded dependencies and library modules from its overlay space into this directory.
  4. It initializes the Python runtime inside the temporary folder.
  5. It reads and executes the compiled main script from the internal archive, loading packages from the local directory dynamically.

How to Extract PyInstaller Binaries

Because all file content is stored directly within the executable, unpacking is straightforward using target utilities.

1. Extracting the Archive with pyinstxtractor

A standard Python reverse engineering tool is pyinstxtractor.py (PyInstaller Extractor). By scanning the binary layout, it detects the overlay structures, extracts the main bootloader components, parses the `.pyz` compressed index, and dumps the raw .pyc compiled bytecode files to disk.

# Commands to extract a packed executable
python pyinstxtractor.py target_application.exe

2. Restoring the Missing Magic Numbers

When PyInstaller packages your code, it strips the standard header from the compiled files to save space, specifically the magic compilation bytes. Standard decompilers like uncompyle6 or pycdc cannot parse these stripped files immediately.

To fix this, we must open the extracted main file and one of the standard library files (which preserves its header) in a hex editor:

# Conceptual hex representation of magic byte restoration
Standard Bytecode File Header (.pyc):
Magic Number (4 bytes) | Timestamp (4 bytes) | Size (4 bytes) | Marshal Blob...
[63 0D 0D 0A]            [00 00 00 00]        [A3 04 00 00]    [E3 01...]

Stripped PyInstaller Bytecode File (.pyc):
Marshal Blob... (Stripped header)
[E3 01...]  <-- Decompiler fails here until we prepend [63 0D 0D 0A 00 00 00 00 A3 04 00 00]

3. Decompiling to Python Source

With the headers restored, standard decompiler tools can now parse the files cleanly, rebuilding clean Python code structures.