Skip to content

index-oss/Spiderweb-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spiderweb-Crawler

Web Reconnaissance Crawler

Project Overview

Web Reconnaissance Crawler is an ethical, non-intrusive security tool designed to assist authorized penetration testers during the reconnaissance phase of security assessments.

The tool focuses on mapping the application attack surface by discovering endpoints, parameters, forms, and basic security misconfigurations — without performing active exploitation.

⚠️ This project is intended strictly for educational purposes and authorized testing only.


Why This Project (Interview Perspective)

Reconnaissance is the first and most critical phase of penetration testing

Manual recon is time-consuming and error-prone

This tool automates safe information gathering, helping testers plan effective manual testing


Key Features

  1. Smart Crawling

Same-domain crawling only

Crawl depth control

Duplicate URL prevention

Optional robots.txt compliance

Request rate limiting

  1. Endpoint Discovery

Discovers reachable URLs

Identifies dynamic endpoints with parameters

Differentiates GET and POST endpoints

  1. Parameter & Form Mapping

Extracts URL parameters

Detects HTML forms

Maps input fields, methods, and actions

  1. Passive Security Checks (Non-Exploitative)

Missing security headers detection

HTTP → HTTPS redirection issues

Directory exposure indicators (status-based)

Input reflection identification (no payloads)

  1. Technology Fingerprinting

Server header analysis

Framework and CMS identification (heuristic-based)

JavaScript library detection

  1. Reporting

Structured JSON output

CSV export for manual review

Sitemap-style endpoint listing


System Architecture (High Level)

Target URL ↓ Crawler Controller ├── URL Queue & Depth Manager ├── HTTP Request Engine (Rate Limited) ├── Response Analyzer │ ├── Link Extractor │ ├── Form & Parameter Parser │ └── Header Analyzer ├── Passive Security Check Module └── Report Generator (JSON / CSV)


Tech Stack

Language: Python 3

Libraries: requests, BeautifulSoup, urllib3

Optional: Playwright (for JS-rendered pages)


Ethical Design Choices

No payload injection

No vulnerability exploitation

Scope-limited crawling

Explicit authorization disclaimer


How to Explain in Interview (Ready Line)

“This tool automates the reconnaissance phase of penetration testing by safely mapping endpoints, parameters, and security misconfigurations, allowing testers to focus on deeper manual analysis.”


Limitations (Honest Cons)

Does not perform active vulnerability exploitation

Limited JavaScript rendering without headless browser

Risk detection is heuristic, not confirmation-based


Future Improvements

Authenticated crawling support

Visual dashboard

API endpoint classification

Risk scoring based on exposure


Status: Interview-ready • Ethical • Practical

About

A web crawler designed to assist penetration testers by mapping attack surfaces, discovering endpoints, parameters, and common security misconfigurations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages