This repo contains a set of tools to help you to organize your photo and movie collections.
The eventual goal is to consolidate different photo and movie archives (like google photos, whatsapp and digital camera) into one archive. The reasons for doing this are:
- Have all your media together and easy accessible in a chronological order.
- Reduce costs for cloud storage, Like Apple and Google, By taking (a part) of your media offline.
- Create an offline backup on any kind of storage device ( e.g. usb flash drive or external harddisk)
- Allows you to create compilations of all your media.
To achieve this goal I have made several programs:
Fortunately Google allows you to download your photos and movies, but the metadata is stored in external JSON files. This metadata contains important fields like the date and time when the photo was taken and the geographical information. This program will find the corresponding JSON file, which appeared to be more difficult than it seems because there can be many naming conventions for the JSON file.
This program fixes 2 things:
- The filenames of the media files should follow the convention
PhotoTakenTimestamp-AlbumName-OriginalImageName.ext. E.g. 20210130T203201-Vacation Italy-IMG021.JPG. This is fixed by making a copy of each media file in the target folder. - The EXIF meta data of the media files.
Identifies duplicate files based on exact same create datetime and exact same size. outputs a csv fil_e with duplicates that can be removed by calling delete_duplicate_files.py
- Creates a CSV file (
JSON-mapping.csv) in theMetadatasub folder containing:- Original media filename
- JSON filename (original associated metadata file)
The program handles various JSON file naming patterns used by Google Takeout with flexible pattern matching:
- Direct match (
.json,.suppl.json,.supplemental-metadata.json) - Double dots (
image.jpg..json- Google Takeout bug) - Duplicate files (
image(1).jpg→image.jpg.jsonorimage.HEIC.supplemental-metadata.json) - Edited images (
image-bewerkt.jpg→image.jpg.json) - Different extensions (
IMG_0001.MP4→IMG_0001.HEIC.supplemental-metadata.json) - Truncated supplemental-metadata - Matches ANY truncation of "supplemental-metadata":
supplemental-metadata.jsonsupplemental-meta.jsonsupplemental-me.jsonsupplemental-m.jsonsupplemental.jsonsupplemen.jsonsupplem.jsonsupple.jsonsuppl.json- And any other truncation starting with "suppl"
- Truncated filenames (handles names >46 characters by removing trailing characters)
- JSON files with duplicate markers (
image.mov.supplemental-met(1).json) - Case-insensitive matching (HEIC vs heic, JPG vs jpg)
- Direct match without extension (
IMG_NIGHT_xxx.mov→IMG_NIGHT_xxx.json)
The matching system uses multiple strategies:
- Direct file path matching (fastest)
- Regex pattern matching (flexible)
- Fallback pattern matching (catches edge cases)
Some images may not have corresponding JSON files. Common reasons:
- Split archives - images and JSON may be in different archives (a google takeout can be split into different .tar.gz files).
- Deleted photos - photos deleted before export may not have metadata files.
The program will report which images are missing JSON files and provide analysis.
Updates EXIF data in image files.
DateTimeOriginal: UsesphotoTakenTime(when photo was taken)DateTimeDigitized: UsescreationTime(when photo was digitized/created in Google Photos)- GPS coordinates (latitude, longitude, altitude)
- Camera information (make, model)
- Photo settings (focal length, aperture, ISO, exposure time)
- Copies media files to a target folder using this rename pattern:
PhotoTakenTimestamp-AlbumName-OriginalImageName.extExample:20231215T143022-MyAlbum-IMG_1234.jpg - Copies JSON metadata files to
Metadata/Googlesubfolder - Handles duplicate filenames automatically
- Supports
--overwriteflag to control whether existing files are skipped or overwritten.
- Scans a folder for duplicate media files.
- Two media files are considered duplicates if:
- They have the exact same create datetime (in EXIF meta data)
- They have the exact same size
- Creates
duplicates.csvcontaining the duplicate filenames - Creates
duplicates.htmlwith a listing of all duplicates in 2 columns and options to select all or a subset of the duplicates, plus a button to deduplicate the selected files (the first occurrence will be kept)
-
Install Python 3.9 or higher
-
Install dependencies from
requirements.json:python install.py
Or install manually:
pip install piexif pillow
The program automatically creates a log file in the Metadata folder:
- Filename format:
log_YYYYMMDD_HHMMSS.txt - Contains all console output (with ANSI color codes removed)
- Useful for reviewing processing history and debugging
- Log file path is displayed at the end of execution