___ ____ _
/ _ \ ___ _ __ ___/ ___|_ __ __ ___ _| | ___ _ __
| | | |/ _ \| '_ \/ __| | | '__/ _` \ \ /\ / / |/ _ \ '__|
| |_| | (_) | |_) \__ \ |___| | | (_| |\ V V /| | __/ |
\___/ \___/| .__/|___/\____|_| \__,_| \_/\_/ |_|\___|_|
|_|
made by lazy_sharaf
Scaling web applications inevitably leads to "link rot" — dead ends, broken assets, and blocked paths that frustrate users and hurt SEO. OopsCrawler was born from a need to automate site health checks. Instead of clicking every link manually, OopsCrawler dives deep into a domain, validating every single anchor tag to find those "Oops" moments before a user does.
OopsCrawler utilizes a recursive depth-first search (DFS) algorithm to navigate internal links. It leverages BeautifulSoup4 for efficient DOM parsing and Requests for high-performance HTTP validation. To enhance UX, it features an animated terminal spinner and handles rate-limiting and whitelisting gracefully to avoid being flagged as a malicious bot.
Handling massive single-page applications (SPAs) and sites with infinite loops was the primary challenge. I implemented a robust "visited" URL set to prevent cyclical crawling and added whitelisting for major domains (like GitHub or LinkedIn) to reduce unnecessary network noise while focusing strictly on the target domain's internal health.