A professional forensic tool designed to extract comprehensive metadata from Google Drive documents and folders. Ideal for legal cases, compliance audits, and forensic investigations where document chronology, ownership, and modification history are critical evidence.
Metadata Sniffer provides a complete solution for extracting and analyzing metadata from Google Drive files. It generates legally admissible reports with deterministic forensic hashing, ensuring data integrity and reproducibility for court proceedings and legal documentation.
- 🔍 Complete Metadata Extraction: Comprehensive extraction of all file metadata including dates, ownership, permissions, and file properties
- 📅 Forensic Date Tracking: Creation dates, modification dates, and last viewed timestamps
- 🔐 Deterministic Forensic Hashing: SHA-256 hashing of immutable metadata ensures legal validity and reproducibility
- 📊 Multiple Export Formats: CSV for analysis, JSON for programmatic access, and PDF for court-ready reports
- 🌐 Web-Based Interface: Modern, user-friendly web application with real-time progress tracking
- ⏸️ Pause/Resume/Stop Controls: Full control over extraction process with ability to pause, resume, or stop operations
- 📁 Flexible Scanning: Extract from specific folders or entire Google Drive
- 🔗 Shared Folder Support: Scan shared folders using shared links - perfect for collaborative workspaces and client folders
- 📂 Recursive Scanning: Automatically scans all subfolders recursively, ensuring complete metadata extraction from nested folder structures
- 👤 Permission Analysis: Detailed owner and permission information
- 🔒 Secure Authentication: OAuth 2.0 authentication with read-only access
- Python 3.8 or higher
- Google account with access to Google Drive
- Google Cloud Console project with Google Drive API enabled
git clone https://github.com/E1DIGITALPF/Metadata-Sniffer.git
cd Metadata-Snifferpython3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txt- Navigate to Google Cloud Console
- Create a new project or select an existing one
- Name your project (e.g., "metadata-sniffer")
The Google Drive API may not appear in the default API list. Follow these steps:
- In the side menu, navigate to APIs & Services > Library
- Important: Use the search box at the top of the page (not the category filters)
- Type exactly: "Google Drive API" (without quotes)
- Select "Google Drive API" from the search results (identified by the Drive icon)
- Click the Enable button
Alternative Direct Link:
- Go directly to: https://console.cloud.google.com/apis/library/drive.googleapis.com
- Click Enable
Note: Ensure you're working in the correct project (verify using the project selector at the top of the page)
- Navigate to APIs & Services > Credentials
- Click + CREATE CREDENTIALS > OAuth client ID
- If this is your first time, you'll need to configure the OAuth consent screen:
- Application type: External
- App name: "Metadata Sniffer" (or your preferred name)
- User support email: Your email address
- Developer contact information: Your email address
- Click Save and Continue through all steps
- For the OAuth client:
- Application type: Desktop app
- Name: "Metadata Sniffer Desktop"
- Click Create
- Download the JSON credentials file
- Rename the file to
credentials.json - Place it in the project root directory
If you encounter an "access_denied" error, you must add your email as a test user:
- Navigate to APIs & Services > OAuth consent screen
- Scroll to the "Test users" section
- Click + ADD USERS
- Enter your Google account email address
- Click ADD
- Wait 2-3 minutes for changes to propagate
Important Notes:
- You can add up to 100 test users
- Each user must accept the consent screen once
- Changes may take a few minutes to take effect
- If you're the app owner, you should have access, but adding yourself as a test user ensures reliability
Launch the web-based interface for the easiest user experience:
python main.pyThe application will:
- Start a local web server (typically on port 5000)
- Automatically open your default web browser
- Display the extraction configuration interface
Features:
- Real-time progress tracking with progress bar
- Pause/Resume/Stop controls
- Automatic browser opening with results viewer
- Download generated files directly from the interface
For automated or scripted extractions:
python main.py --cli --output forensic_reportpython main.py --cli --folder-id <FOLDER_ID> --output folder_reportpython main.py --cli --include-trashed --output complete_reportpython main.py --helpCommand Line Options:
--cli: Run in command-line mode (instead of web interface)--folder-id <ID>: Extract from specific folder (leave empty for entire Drive)--output <NAME>: Base name for output files--include-trashed: Include files in trash--format <csv|json|pdf>: Export format (default: all formats)--workers <N>: Number of parallel workers (default: 1 for stability)
You can extract metadata from a specific folder in two ways:
- Open Google Drive in your web browser
- Navigate to the desired folder
- Examine the URL:
https://drive.google.com/drive/folders/ABC123XYZ... - The folder ID is the string after
/folders/(e.g.,ABC123XYZ...)
Yes, you can use shared links directly! This is perfect for scanning folders shared with you by clients, colleagues, or collaborators. Simply paste the shared link and the tool will automatically extract the folder ID and scan all contents recursively.
Supported link formats:
https://drive.google.com/drive/folders/FOLDER_IDhttps://drive.google.com/drive/folders/FOLDER_ID?usp=sharinghttps://drive.google.com/drive/folders/FOLDER_ID?usp=drive_linkhttps://drive.google.com/open?id=FOLDER_ID- Direct folder ID:
FOLDER_ID
Key Features for Shared Folders:
- ✅ Automatic ID Extraction: Just paste the shared link - no need to extract the folder ID manually
- ✅ Recursive Scanning: Automatically scans all subfolders and nested directories within the shared folder
- ✅ Works with Any Access Level: Viewer, commenter, or editor permissions - as long as you can see the folder
- ✅ Public & Private Folders: Works with both publicly shared folders and privately shared folders (with your access)
Important Notes:
- You still need your own Google Drive API credentials (OAuth) - the tool uses your credentials to access the shared folder
- You must have access to the shared folder (viewer, commenter, or editor permission)
- The tool uses your credentials to access the shared folder, not the folder owner's credentials
- This works for both public and private shared folders (as long as you have access)
- Recursive scanning: All files in all subfolders are automatically included in the extraction
Examples:
Using folder ID:
python main.py --cli --folder-id ABC123XYZ --output folder_reportUsing shared link:
python main.py --cli --folder-id "https://drive.google.com/drive/folders/ABC123XYZ?usp=sharing" --output folder_reportIn the web interface, simply paste the shared link in the "Folder ID or Shared Link" field.
All generated files are saved in the output/ directory.
- Tabular data suitable for Excel, Google Sheets, or data analysis tools
- All metadata fields in columns
- Sorted by file ID for consistency
- Complete structured data with extraction metadata
- Includes forensic integrity hash (SHA-256)
- Suitable for programmatic processing and integration
- Court-ready forensic report
- Statistical summary
- Detailed file information (all files included)
- Forensic integrity hash section
- Professional formatting for legal presentation
- Forensic footer on every page: SHA-256 hash and page numbering (X/Y format) for complete traceability
The tool extracts the following metadata for each file:
-
File Identification
- Unique Google Drive ID
- File name
- File type (MIME type)
- Complete path in Drive hierarchy
-
Temporal Information
- Creation date (raw ISO format and formatted)
- Last modification date (raw ISO format and formatted)
- Last viewed date (if available)
-
File Properties
- File size (bytes and human-readable format)
- MD5 checksum (if available)
- Version number
-
Ownership & Permissions
- Owner email and name
- Last modifier email and name
- Sharing status
- Permission count
- Share link URL
-
Additional Information
- Trash status
- Starred status
- Description
- Parent folder IDs
Metadata Sniffer implements a deterministic forensic hash (SHA-256) that:
- Changes only when files are actually modified: The hash reflects real changes (additions, deletions, modifications, renames)
- Remains constant for unchanged data: Viewing files or changing permissions does not affect the hash
- Ensures legal validity: Same Drive content = same hash, regardless of extraction time
- Provides reproducibility: Any party can verify the integrity of extracted data
The hash is calculated on immutable forensic fields only:
- File IDs, names, types
- Creation and modification dates
- File sizes and MD5 checksums
- File descriptions
- Trash status
Excluded from hash (to ensure determinism):
- Last viewed dates (changes on access)
- Share links (can change without file modification)
- Permission details (order may vary)
- Path information (can change if files are moved)
- Labor Disputes: Demonstrate work chronology with creation dates from hundreds or thousands of documents
- Evidence Documentation: Provide court-admissible metadata reports with forensic integrity hashing
- Document Chronology: Establish timeline of document creation and modification
- Forensic Audits: Complete metadata extraction for compliance reviews
- Data Governance: Track file ownership, sharing, and access patterns
- Document Verification: Verify document authenticity and modification history
- Drive Organization Analysis: Understand file distribution, types, and ownership
- Storage Optimization: Identify large files and unused documents
- Access Pattern Analysis: Review viewing and modification patterns
Solution:
- Verify you've downloaded the credentials from Google Cloud Console
- Ensure the file is named exactly
credentials.json(case-sensitive) - Confirm the file is in the project root directory
This is the most common error. It indicates your email is not in the test users list.
Solution:
- Navigate to APIs & Services > OAuth consent screen in Google Cloud Console
- Scroll to the "Test users" section
- Click + ADD USERS
- Add your Google account email (the one you're using to sign in)
- Click ADD
- Wait 2-3 minutes for changes to propagate
- Delete
token.jsonif it exists:rm token.json - Run the application again
Quick Fix:
# Delete the old token
rm token.json
# Run again (will prompt for authorization)
python main.pyNote: For production use, you can publish the app (requires Google verification). For personal/testing use, adding test users is the fastest solution.
Solution:
- Delete the
token.jsonfile:rm token.json - Run the application again
- The authorization window will open automatically
- Re-authorize the application
Solution:
- Use 1 worker (sequential processing) for maximum stability
- In the web interface, select "1 worker (Sequential - Recommended for Stability)"
- This is the default setting and recommended for most use cases
Solution:
- The application automatically resets all progress values when Stop is pressed
- If you see incorrect values, refresh the browser page
- The state is completely cleared on the server side
metadata-sniffer/
├── main.py # Main entry point
├── web_app.py # Web application (Flask)
├── requirements.txt # Python dependencies
├── credentials.json # Google OAuth credentials (not in repo)
├── token.json # OAuth token (generated, not in repo)
├── output/ # Generated reports directory
├── src/
│ ├── auth.py # Google Drive authentication
│ ├── extractor.py # Metadata extraction logic
│ ├── exporters.py # CSV, JSON, PDF export
│ ├── helpers.py # Utility functions
│ └── web_viewer.py # Web viewer templates
└── README.md # This file
google-api-python-client: Google Drive API clientgoogle-auth-oauthlib: OAuth 2.0 authenticationflask: Web application frameworkreportlab: PDF generationpandas: Data processingtqdm: Progress bars
See requirements.txt for complete version specifications.
- Read-Only Access: The application requests read-only access to Google Drive
- Local Processing: All processing occurs locally on your machine
- No Data Transmission: Metadata is not transmitted to external servers
- Secure Storage: OAuth tokens are stored locally in
token.json - User Control: You can revoke access at any time through Google Account settings
This tool should only be used with documents to which you have legal access and proper authorization. Ensure you have the necessary permissions before extracting metadata from any document. The tool is provided "as-is" without warranty. Users are responsible for compliance with applicable laws and regulations regarding data access and privacy.
MIT License
For issues, questions, or contributions, please visit the project repository.
Made with ❤️ by E1DIGITAL
Version: 1.0.0
Last Updated: 2026