Distributing Python applications is notoriously difficult because end users must have the Python interpreter installed along with all required library dependencies. To solve this, developers use packaging frameworks like PyInstaller, py2exe, or cx_Freeze to compile everything into a single, convenient executable (.exe) file.
However, there is a common misconception that these tools encrypt or secure your code. In reality, packed executables are merely self-extracting archive wrappers. In this article, we’ll dive into the anatomy of a packed PyInstaller executable and show how to extract the original source code.
Anatomy of a PyInstaller Executable
When PyInstaller processes a Python project, it creates a structured bundle containing:
- A Compiled Bootloader: A native system executable compiled in C that manages application startup.
- Dependency Libraries: Dynamic link libraries (like
python39.dlland C extension modules) required by your program. - A PYZ Archive: A compressed zip file containing all your project's custom Python module files.
- Compiled Entry Script: The main Python script compiled into bytecode and embedded outside the zip structure.
The PyInstaller Startup Process
When an end user clicks the .exe:
- The compiled bootloader starts.
- It creates a temporary directory in the system temp directory (usually named
_MEIxxxxxx). - It decompresses all embedded dependencies and library modules from its overlay space into this directory.
- It initializes the Python runtime inside the temporary folder.
- It reads and executes the compiled main script from the internal archive, loading packages from the local directory dynamically.
How to Extract PyInstaller Binaries
Because all file content is stored directly within the executable, unpacking is straightforward using target utilities.
1. Extracting the Archive with pyinstxtractor
A standard Python reverse engineering tool is pyinstxtractor.py (PyInstaller Extractor). By scanning the binary layout, it detects the overlay structures, extracts the main bootloader components, parses the `.pyz` compressed index, and dumps the raw .pyc compiled bytecode files to disk.
# Commands to extract a packed executable
python pyinstxtractor.py target_application.exe
2. Restoring the Missing Magic Numbers
When PyInstaller packages your code, it strips the standard header from the compiled files to save space, specifically the magic compilation bytes. Standard decompilers like uncompyle6 or pycdc cannot parse these stripped files immediately.
To fix this, we must open the extracted main file and one of the standard library files (which preserves its header) in a hex editor:
- Copy the first 12 to 16 bytes of header data from a valid library file.
- Open the stripped main file and prepend these copied header bytes to the beginning of the file.
- Save the modified file as
main_restored.pyc.
# Conceptual hex representation of magic byte restoration
Standard Bytecode File Header (.pyc):
Magic Number (4 bytes) | Timestamp (4 bytes) | Size (4 bytes) | Marshal Blob...
[63 0D 0D 0A] [00 00 00 00] [A3 04 00 00] [E3 01...]
Stripped PyInstaller Bytecode File (.pyc):
Marshal Blob... (Stripped header)
[E3 01...] <-- Decompiler fails here until we prepend [63 0D 0D 0A 00 00 00 00 A3 04 00 00]
3. Decompiling to Python Source
With the headers restored, standard decompiler tools can now parse the files cleanly, rebuilding clean Python code structures.