Labyrinx — Python Code Obfuscation & Protection

Python Obfuscation Techniques: From Basic to Advanced

Six levels of Python code obfuscation explained — what each layer does, how it works, and what it protects against.


What Is Code Obfuscation?

Obfuscation transforms readable code into functionally equivalent but hard-to-understand code. Unlike encryption (which requires decryption to execute), obfuscated code is still valid code — it just looks like gibberish to a human reader.

# Before obfuscation
def calculate_discount(price, user_tier):
    if user_tier == "premium":
        return price * 0.8
    return price

# After name obfuscation
def _x7f3a(_a1, _a2):
    if _a2 == _l_s(0x17):
        return _a1 * 0.8
    return _a1

Obfuscation is one part of a defense-in-depth strategy. Combine it with native compilation, AES-256 encryption, and virtualization for effective protection.


Layer 1: Name Obfuscation

What it does: Renames every identifier — variables, functions, classes, methods, parameters — to meaningless tokens.

How it works:

  1. Parse the Python source into an AST (Abstract Syntax Tree)
  2. Walk the tree and collect all identifier names
  3. Replace each with a random token
  4. Track references so the same identifier gets the same token everywhere
  5. Emit the transformed AST back to source

What it protects against: Casual code readers, competitors opening your .py files, automated code plagiarism detection.

Limitations: Does nothing against decompilation. inspect module and stack traces may reveal original names.


Layer 2: String Encryption

What it does: Replaces every string literal with an encrypted blob decoded at runtime.

How it works:

  1. Scan the AST for all string literals
  2. Encrypt each string with AES-256 using an embedded key
  3. Replace "Hello World" with a call to the runtime decryptor
  4. At runtime, the decryptor uses the key (compiled into the .pyd) to decrypt on-demand

What stops working:

Before:

> strings.exe my_app.pyd | grep "api_key"
sk-proj-4f8a3b2c1d9e6f7a8b3c2d1e9f6a7b8c

After:

> strings.exe my_app.pyd | grep "api_key"
# Nothing. Every string is AES-256 encrypted at rest.

Layer 3: Control Flow Flattening

What it does: Restructures a function’s logic into a flat switch-case loop that jumps between blocks.

Before:

def process_order(order):
    validate(order)
    if order.total > 100:
        apply_discount(order)
    charge(order)
    send_receipt(order)

After:

def process_order(order):
    _s = 0
    while True:
        if _s == 0:
            validate(order)
            _s = 1 if order.total > 100 else 2
        elif _s == 1:
            apply_discount(order)
            _s = 2
        elif _s == 2:
            charge(order)
            _s = 3
        elif _s == 3:
            send_receipt(order)
            return

What this breaks: Linear reading of the code, automated control flow analysis, decompilers expecting structured if/else/for.


Layer 4: Module Encryption

What it does: Encrypts entire Python modules so they’re unreadable on disk.

Two variants:

What it protects against: Disk-based analysis, automated scanners looking for Python source patterns, accidental exposure in backups or file shares.


Layer 5: Code Virtualization

What it does: Replaces Python bytecode with a custom bytecode interpreted by a custom virtual machine.

How it works:

  1. Take critical functions (encryption, license checking, proprietary algorithms)
  2. Compile them to a custom instruction set — not x64, not Python bytecode — Labyrinx VM bytecode
  3. The instruction set is randomized per build: opcodes, register mappings, and encoding all shuffle
  4. Bundle the VM interpreter (compiled to native x64 via Cython) with the app
  5. At runtime, the VM interpreter reads and executes the custom bytecode

What makes this strong:


Layer 6: Anti-Debug

What it does: Detects when the process is being debugged and responds.

Techniques:

Honest note: Anti-debug is a cat-and-mouse game. Every check can be patched. The goal is making the attacker spend hours before they can start actual reverse engineering.


Layer 7: PYD Integrity Hashes

What it does: Each compiled .pyd file includes a cryptographic hash stored in a signed manifest. At load time, the runtime verifies each .pyd hasn’t been tampered with.

What it protects against: Patching (modifying a .pyd to disable checks), substitution (replacing a protected .pyd with an unprotected version).


Putting It All Together

Layer What It Stops Performance Cost
1. Name Obfuscation Casual reading Near zero
2. String Encryption Static analysis, string extraction < 5%
3. Control Flow Flattening Linear code reading 5-15%
4. Module Encryption Disk analysis < 5%
5. Code Virtualization All standard RE tools 5-50x (virtualized code only)
6. Anti-Debug Debugger-based analysis Near zero
7. PYD Integrity Tampering, patching < 1%

Defense in depth. Each layer independently raises the cost. Together, they make RE a weeks-long project rather than a 30-second decompile.

← Back to Labyrinx  ·  How to Protect Python Code →