Labyrinx Labyrinx ← Back to Home

How to Protect Python Source Code from Reverse Engineering

Python ships as readable text. Here's how to change that — from basic obfuscation to custom virtual machines.


The Problem

Python is distributed as source code. Even when bundled into an EXE, well-known extraction tools recover the .pyc bytecode, and decompilers turn it back into readable Python in seconds.

# An attacker extracts your source in under a minute:
> extractor your_app.exe
> decompiler extracted/your_code.pyc
# Output: readable Python. Variable names, strings, comments — everything.

This isn’t a Python flaw — it’s inherent to how interpreted languages work. Protection means adding layers that each make reverse engineering more expensive.


The Four Levels of Protection

Level 1: Obfuscation

What it does: Makes the code harder for humans to read.

Techniques:

What it stops: Casual reading. A curious developer opening your .py file. What it doesn’t stop: Anyone with a debugger. Names are cosmetic.

Level 2: Native Code Compilation

What it does: Converts Python to native x64 machine code.

Unlike EXE bundling (which just packages .pyc bytecode), native compilation translates Python to C, then compiles to a .pyd (Windows) or .so (Linux) file — a native shared library.

.py source → C translation → .c file → C compiler → .pyd (x64 machine code)

What it stops: Decompilation. There’s no bytecode to decompile. An attacker gets assembly, not Python. What it doesn’t stop: String extraction tools still find every string literal. An experienced reverse engineer can trace the assembly back to the original logic with enough time.

Level 3: Encryption

What it does: Encrypts the sensitive parts of your code so they don’t exist in plaintext anywhere.

Techniques:

What it stops: Static analysis tools, string extraction, anyone peeking at files on disk. What it doesn’t stop: Memory dumps. When the module is loaded, the decrypted content exists in RAM.

Level 4: Virtualization

What it does: Replaces Python bytecode with a custom bytecode that only a custom virtual machine understands.

The VM’s instruction set is randomized per build — the opcode for “add” in build #1 is 0x4F; in build #2 it’s 0x93. An attacker who reverses one build learns nothing about the next.

What it stops: All standard Python reverse engineering tools. The bytecode format is unknown. What it doesn’t stop: A dedicated reverse engineer who traces the VM interpreter itself.


What No Tool Can Do

Claim Reality
“Unbreakable protection” Nothing is unbreakable. Any code can be reversed.
“Hide all logic” The CPU must execute the logic. A debugger can trace it.
“Prevent piracy” Protection makes copying harder, not impossible.
“Zero performance impact” Obfuscation and encryption add overhead.

The realistic goal: more expensive to break than to build.


The Labyrinx Approach: All Four Levels

Labyrinx chains all four levels in a single build pipeline:

1. AST Transform → Name obfuscation, control flow flattening
2. String encryption → AES-256 encrypt all string literals
3. Module encryption → AES-256 encrypt the module payload
4. Native compilation → .c → MSVC → .pyd (x64 machine code)
5. VM pass → Convert critical code to custom VM bytecode
6. Anti-debug → Embed debugger detection at multiple points
7. Output folder → python313.dll + .pyd files + launcher

Each layer independently raises the cost of reverse engineering. Together, they make it prohibitively expensive for most attackers.


Getting Started

Immediate (free): Use name obfuscation and comment stripping. Better than nothing and takes seconds.

Next step (free): Use a C-to-native compiler to compile your core modules to .pyd. Even without obfuscation, native code is far harder to reverse than bytecode. Requires a C compiler (MSVC on Windows).

Full protection: Labyrinx automates all four levels. One-click build, output folder ships to customers. Freemium tier is free; Enterprise unlocks the full 6-level pipeline.


← Back to Labyrinx  ·  Python Code Protection Approaches Compared →