____ _               _     _ ____  
 / ___| |__   ___  ___| |_  | / ___| 
| |  _| '_ \ / _ \/ __| __| | \___ \ 
| |_| | | | | (_) \__ \ |_ _| |___) |
 \____|_| |_|\___/|___/\__(_)_|____/ 

made by lazy_sharaf

Python 3.7+ Regex Entropy Math Subdomain Crawling
← Return to Projects
soraf@kali:~/tools/GhostJS$ python3 ghostjs.py target.com --max-depth 2
[+] Initializing GhostJS Scanner v1.2
[+] Target: target.com | Depth: 2

[~] Crawling (Level 1)... Found 14 links
[~] Extracting JS assets... Found 6 .js files
[~] Applying Entropy Analysis & Regex Pattern Matching...

[!] HIGH [AWS_KEY] found in https://target.com/static/js/main.0a4b.js
Line 104: const AWS_ACCESS_KEY = "AKIAIOSFODNN7EXAMPLE";

[!] MED [JWT_TOKEN] found in https://target.com/assets/auth.js
Line 22: authorization: "Bearer eyJhbGciOiJIUzI1Ni..."

[✓] Scan Complete!
Stats: 2 secrets discovered across 6 scripts.
Report saved to: ghostjs_reports/target.json

01. The Objective

Modern web applications heavily rely on massive, obfuscated JavaScript bundles. Often, developers inadvertently leak highly sensitive infrastructure data inside these files. I built GhostJS to automate the tedious and manual process of web crawling and JavaScript analysis so researchers can instantly hunt for critical AWS keys, database connection strings, and exposed APIs across massive target scopes.

02. Technical Architecture

GhostJS operates as a command-line tool built in Python. It recursively crawls domains using depth-configurable multi-threading. As it extracts target JavaScript files, it applies a layered analysis approach. First, it uses an extensive dictionary of high-fidelity Regular Expressions. Second, it calculates the mathematical entropy of random-looking strings to flag highly probable encoded secrets (like JWTs) that regex misses.

Detection Engine

Sophisticated Custom RegEx patterns and Shannon Entropy analysis for deep token detection.

Target Vectors

API keys, JWT authentication tokens, AWS access credentials, Private Keys, Hardcoded Uris.

Reporting

Real-time colorized CLI output with batch export options to both JSON and structured Markdown.

03. Challenges & Solutions

A massive hurdle when scanning bundled Javascript is the staggering amount of "false positives" — completely benign random strings that trigger alerts. I combated this by carefully balancing my Regex sets to target specific developer key formats (e.g. AKIA... for AWS) and adjusting the mathematical entropy thresholds to quickly weed out harmless hashes from high-value production secrets.

View Repository