Labyrinx — Python Code Obfuscation & Protection

Python Obfuscation Techniques: From Basic to Advanced

Six levels of Python code obfuscation explained — what each layer does, how it works, and what it protects against.

What Is Code Obfuscation?

Obfuscation transforms readable code into functionally equivalent but hard-to-understand code. Unlike encryption (which requires decryption to execute), obfuscated code is still valid code — it just looks like gibberish to a human reader.

# Before obfuscation
def calculate_discount(price, user_tier):
    if user_tier == "premium":
        return price * 0.8
    return price

# After name obfuscation
def _x7f3a(_a1, _a2):
    if _a2 == _l_s(0x17):
        return _a1 * 0.8
    return _a1

Obfuscation is one part of a defense-in-depth strategy. Combine it with native compilation, AES-256 encryption, and virtualization for effective protection.

Layer 1: Name Obfuscation

What it does: Renames every identifier — variables, functions, classes, methods, parameters — to meaningless tokens.

How it works:

Parse the Python source into an AST (Abstract Syntax Tree)
Walk the tree and collect all identifier names
Replace each with a random token
Track references so the same identifier gets the same token everywhere
Emit the transformed AST back to source

What it protects against: Casual code readers, competitors opening your .py files, automated code plagiarism detection.

Limitations: Does nothing against decompilation. inspect module and stack traces may reveal original names.

Layer 2: String Encryption

What it does: Replaces every string literal with an encrypted blob decoded at runtime.

How it works:

Scan the AST for all string literals
Encrypt each string with AES-256 using an embedded key
Replace "Hello World" with a call to the runtime decryptor
At runtime, the decryptor uses the key (compiled into the .pyd) to decrypt on-demand

What stops working:

String extraction tools find zero readable strings
Grep for API keys, URLs, SQL queries finds nothing
Static analysis tools that scan binaries for secrets

Before:

> strings.exe my_app.pyd | grep "api_key"
sk-proj-4f8a3b2c1d9e6f7a8b3c2d1e9f6a7b8c

After:

> strings.exe my_app.pyd | grep "api_key"
# Nothing. Every string is AES-256 encrypted at rest.

Layer 3: Control Flow Flattening

What it does: Restructures a function’s logic into a flat switch-case loop that jumps between blocks.

Before:

def process_order(order):
    validate(order)
    if order.total > 100:
        apply_discount(order)
    charge(order)
    send_receipt(order)

After:

def process_order(order):
    _s = 0
    while True:
        if _s == 0:
            validate(order)
            _s = 1 if order.total > 100 else 2
        elif _s == 1:
            apply_discount(order)
            _s = 2
        elif _s == 2:
            charge(order)
            _s = 3
        elif _s == 3:
            send_receipt(order)
            return

What this breaks: Linear reading of the code, automated control flow analysis, decompilers expecting structured if/else/for.

Layer 4: Module Encryption

What it does: Encrypts entire Python modules so they’re unreadable on disk.

Two variants:

Compression: Module is zlib-compressed and wrapped in a loader stub
AES-256: Module is AES-256 encrypted. Without the key (embedded in the .pyd runtime), it’s opaque ciphertext

What it protects against: Disk-based analysis, automated scanners looking for Python source patterns, accidental exposure in backups or file shares.

Layer 5: Code Virtualization

What it does: Replaces Python bytecode with a custom bytecode interpreted by a custom virtual machine.

How it works:

Take critical functions (encryption, license checking, proprietary algorithms)
Compile them to a custom instruction set — not x64, not Python bytecode — Labyrinx VM bytecode
The instruction set is randomized per build: opcodes, register mappings, and encoding all shuffle
Bundle the VM interpreter (compiled to native x64 via Cython) with the app
At runtime, the VM interpreter reads and executes the custom bytecode

What makes this strong:

No standard tooling works (IDA Pro, Ghidra understand x64 — not Labyrinx VM bytecode)
Per-build randomization (reversing one build teaches nothing about the next)
Register-indirect addressing (VM registers point to other registers that point to values)

Layer 6: Anti-Debug

What it does: Detects when the process is being debugged and responds.

Techniques:

Debugger presence check (Windows API)
Debug port detection
Hardware breakpoint detection (reads debug registers)
Timing checks (debugger stepping makes execution anomalously slow)
TLS callback (anti-debug code runs before the debugger can set breakpoints)

Honest note: Anti-debug is a cat-and-mouse game. Every check can be patched. The goal is making the attacker spend hours before they can start actual reverse engineering.

Layer 7: PYD Integrity Hashes

What it does: Each compiled .pyd file includes a cryptographic hash stored in a signed manifest. At load time, the runtime verifies each .pyd hasn’t been tampered with.

What it protects against: Patching (modifying a .pyd to disable checks), substitution (replacing a protected .pyd with an unprotected version).

Putting It All Together

Layer	What It Stops	Performance Cost
1. Name Obfuscation	Casual reading	Near zero
2. String Encryption	Static analysis, string extraction	< 5%
3. Control Flow Flattening	Linear code reading	5-15%
4. Module Encryption	Disk analysis	< 5%
5. Code Virtualization	All standard RE tools	5-50x (virtualized code only)
6. Anti-Debug	Debugger-based analysis	Near zero
7. PYD Integrity	Tampering, patching	< 1%

Defense in depth. Each layer independently raises the cost. Together, they make RE a weeks-long project rather than a 30-second decompile.

← Back to Labyrinx · How to Protect Python Code →