Fetches web pages and documents (PDF, Office, HTML), parses their content, and extracts cloud storage links embedded inside
Features • Installation • Usage • Recon Workflow • Limitations • Future Scope • Contributing • License
- Multiple cloud services: Google Drive, SharePoint, Dropbox, OneDrive, Box, iCloud
- Parses multiple formats: Fetches and extracts text from HTML pages, PDFs, PPTX, DOCX, XLSX, ODT, ODS, TXT
- Concurrent processing: Configurable worker pool for parallel fetching
- Reliable: HTTP retries with exponential backoff, connection pooling, timeouts
- Flexible output: Text (default) or JSON for scripting
Snooper fits in the recon phase of bug bounty / pentesting, right after URL discovery. You feed it URLs to pages and documents. It fetches each one, parses the content (HTML, PDF, Office, etc.), and extracts any cloud storage links embedded inside.
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ URL Discovery │ │ Snooper │ │ Manual Review │
│ (pages, PDFs, │────▶│ Fetch → Parse │────▶│ Check permissions │
│ Office docs) │ │ → Extract links │ │ Test for exposure │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
│ │
│ │
▼ ▼
waybackurls Drive, SharePoint,
gau, httpx Dropbox, OneDrive,
crawlers Box, iCloud links
| Step | Tool / Phase | Output |
|---|---|---|
| 1 | waybackurls, gau, crawlers | URLs to web pages, PDFs, presentations, etc. |
| 2 | Snooper | Fetches each URL, parses the document, extracts cloud links found in the content |
| 3 | Manual review | Misconfigs, exposed content, IDOR |
Use - with --file to read URLs from stdin. Snooper will fetch each URL, parse the page or document, and extract cloud storage links from the content:
# Wayback URLs (pages, PDFs, etc.) → Snooper fetches & parses each → extracts cloud links
echo "target.com" | waybackurls | snooper --file - --snoop all
# GAU (GetAllUrls) → Snooper
echo "target.com" | gau | snooper --file - --snoop all --workers 10
# Filter live URLs first, then Snooper parses their content for cloud links
echo "target.com" | waybackurls | httpx -mc 200 -silent | snooper --file - --snoop all --output jsonSnooper does not discover URLs itself. It takes URLs as input, fetches the content at each URL, and extracts cloud links from within that content.
- No JavaScript execution - Fetches raw HTML. Best for static content, PDFs, and Office documents. Links loaded dynamically in SPAs may not be found.
- Unauthenticated only - No cookies or auth headers. Targets public exposure.
- Rate limiting - Use
--delaywhen scanning live targets to avoid WAF/ban.
-
Clone the repository:
git clone https://github.com/nyxragon/Snooper.git
-
Navigate to the project directory:
cd Snooper -
Build the tool:
go build -o snooper ./cmd/snooper/
Provide URLs to web pages or documents (HTML, PDF, Office files, etc.). Snooper fetches each URL, parses the content, and extracts cloud storage links embedded inside.
| Flag | Short | Default | Description |
|---|---|---|---|
--url |
-u |
Comma-separated URLs to pages or documents to fetch and parse | |
--file |
-f |
Path to a file containing URLs to pages/documents (one per line). Use - to read from stdin |
|
--snoop |
-s |
drive |
Services to extract: drive, sharepoint, dropbox, onedrive, box, icloud, or all |
--workers |
-w |
5 |
Number of concurrent workers |
--timeout |
-t |
60 |
HTTP read timeout in seconds |
--retries |
-r |
3 |
HTTP retry attempts |
--output |
-o |
text |
Output format: text or json |
--delay |
-d |
0 |
Delay in ms between requests per worker (use for polite scanning) |
--user-agent |
-a |
Firefox-like | HTTP User-Agent header |
--check-access |
-c |
false |
HEAD each discovered link to check accessibility |
-
Parse documents from a URL list (Snooper fetches each, parses content, extracts links):
./snooper --snoop dropbox --file path/to/urls.txt
-
Parse a PDF and a presentation (extracts cloud links from inside the documents):
./snooper --snoop all --url "https://example.com/file1.pdf","https://example.com/file2.pptx"
-
Parse a web page (extracts cloud links from the HTML content):
./snooper --snoop drive,sharepoint --url "https://example.com/page.html" --workers 10 -
JSON output for scripting:
./snooper --snoop all --file urls.txt --output json
-
Polite scanning with delay and access check:
./snooper --file urls.txt --snoop all --delay 200 --check-access
- Crawling support: Extract links from nested pages (configurable depth)
- OCR: Optional Tesseract integration for scanned PDFs
Contributions are welcome! Please fork the repository and submit a pull request for any changes or enhancements.
This project is licensed under the MIT License - see the LICENSE file for details.