How to automatically remove duplicate images from your Pictures folder (2024 Edition)

I have previously written two articles about automatically removing duplicate photos in your library (here and here). Still, I felt Bookworm and the new Pi3D PictureFrame update deserved a complete rewrite.

So here is the 2024 version with automatic Telegram notification and a refined method of recognizing duplicates and deciding which ones to eliminate.

Tested on a Raspberry Pi 5 with OS Bookworm (November 2024).

Please note: This solution might not work directly if you use Google Photos to upload photos to your Pictures folders because the sync mechanism is unidirectional. I guess Google Photos will reinstall the deleted duplicate photo, but at least you will be notified and can delete it yourself from Google Photos. I haven’t tried it, but please let me know if someone has.

How it works

Sharing your picture frame library with your family or friends is an excellent idea because you will be (most of the time pleasantly) surprised by new photos that appear in the frame.

However, you will eventually encounter duplicates you may want to clean up.

This should be straightforward, but sometimes you have almost identical images, and depending on your algorithms, a duplicate is recognized or not.

An example is when somebody adds a photo that was shared on WhatsApp, and somebody adds the original photo to the library. Both images will have different file sizes and quality because WhatsApp significantly compresses images.

Or someone has slightly cropped a photo or made some other edits.

Not every decision can be made by an algorithm because taste is a subjective issue, but this script can help you spot duplicates and leave you with the final decision.

In this script, the higher-quality photo, in terms of resolution, will be kept, and the other one will be moved to a Duplicate Folder.

Whenever the script encounters a duplicate, you also get a Telegram notification.

Basic package requirements

For the script to work, you need to install the following additional packages:

source venv_picframe/bin/activate

pip install pillow watchdog requests imagehash

The location of pip requires that you have installed Pi3D PictureFrame as described here, but I guess most people on this blog did that anyway.

The duplicates detection Python script

II wanted to have some extra features so this script ended up having this:

  • Before anything happens, it waits 15 minutes before performing duplicate detection. So, after a new file has been added (which it detects), the script waits 15 minutes. This is helpful to let longer file copying processing finish before any analysis.
  • The duplicate detection is done with perceptual hashing (imagehash.average_hash) to compare images. This means duplicates are detected even with slight modifications (e.g., resizing, minor edits). Hash-based methods are faster than more advanced methods like SSIM or deep learning, which I have also considered. Perceptual hashing can handle minor photo differences, such as resolution changes or minor edits, which are common when people upload pictures from different devices. So, it’s ideal for a family photo frame!
  • An inbuilt file name collision avoidance feature that ensures moved files have unique names by appending a short hash based on the original file path. If you have several directories, you may end up having files with the same name. It ensures that files with the same name from different directories do not overwrite each other. The original file names remain recognizable, making it easier for you to review duplicates later.
  • Finally, whenever a duplicate is detected and moved, you get a Telegram notification. Of course, this is nice to have, but most of the pleasures in life are nice to have!

So here it is.

Create a new file in your pi home directory with

sudo nano duplicate_detection.py

and paste the text below in it:

import os
import time
import shutil
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from threading import Timer
from PIL import Image
import imagehash
import requests
import hashlib
import random
from collections import defaultdict

# Paths
SOURCE_DIR = "/home/pi/Pictures"
DUPLICATE_DIR = "/home/pi/Duplicate_Images"

# Ensure the Duplicate_Images directory exists
os.makedirs(DUPLICATE_DIR, exist_ok=True)

# Delay timer
DELAY_SECONDS = 15 * 60  # 15 minutes

# Timer to debounce multiple events
process_timer = None

# Telegram Bot Configuration
BOT_TOKEN = "your_bot_token"  # Replace with your bot's API token
CHAT_ID = "your_chat_id"      # Replace with your chat ID

# Functions
def send_telegram_message(file_name, original_path, destination_path):
    """Send a humorous Telegram message about duplicate detection."""
    # A list of humorous comments
    comments = [
        "Caught red-handed! This duplicate won't clutter your albums anymore.",
        "One less duplicate to worry about. Your photo library thanks you!",
        "Another sneaky duplicate bites the dust!",
        "Detective Bot strikes again! This file has been safely relocated.",
        "Oops, you uploaded this twice! No worries, I’ve got it sorted.",
        "Cleaning up your duplicates, one file at a time!"
    ]

    # Choose a random comment
    comment = random.choice(comments)

    message = (
        f"📷 Duplicate Detected!\n\n"
        f"🖼️ File: `{file_name}`\n"
        f"📂 Original Location: `{original_path}`\n"
        f"🚮 Moved to: `{destination_path}`\n\n"
        f"{comment}"
    )

    url = f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage"
    payload = {"chat_id": CHAT_ID, "text": message, "parse_mode": "Markdown"}

    try:
        response = requests.post(url, data=payload)
        if response.status_code == 200:
            print("Telegram message sent!")
        else:
            print(f"Failed to send Telegram message: {response.status_code}")
    except Exception as e:
        print(f"Error sending Telegram message: {e}")

def get_image_resolution(image_path):
    """Get resolution (width x height) of an image."""
    with Image.open(image_path) as img:
        return img.size[0] * img.size[1]

def make_unique_filename(file_path, target_dir):
    """Generate a unique file name based on its original path."""
    base_name = os.path.basename(file_path)
    file_hash = hashlib.md5(file_path.encode()).hexdigest()[:8]  # Short hash of file path
    unique_name = f"{file_hash}_{base_name}"
    return os.path.join(target_dir, unique_name)

def handle_duplicate_group(duplicates):
    """Handle a group of duplicate images, keeping the best version."""
    # Sort duplicates by resolution (largest first)
    duplicates.sort(key=lambda img: get_image_resolution(img), reverse=True)

    # Keep the first (highest quality) image, move the rest
    master_image = duplicates[0]
    for duplicate in duplicates[1:]:
        # Generate a unique name and move the duplicate
        dest = make_unique_filename(duplicate, DUPLICATE_DIR)
        shutil.move(duplicate, dest)
        print(f"Moved {duplicate} to {dest}")

        # Send a Telegram notification
        file_name = os.path.basename(duplicate)
        send_telegram_message(file_name, duplicate, dest)

def find_duplicates_grouped(source_dir):
    """Find groups of duplicate images and handle them together."""
    image_paths = []
    for root, _, files in os.walk(source_dir):
        for file in files:
            if file.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')):
                image_paths.append(os.path.join(root, file))

    # Dictionary to group duplicates based on their hash
    hash_groups = defaultdict(list)

    # Generate perceptual hashes and group images by hash
    for image_path in image_paths:
        try:
            image_hash = str(imagehash.average_hash(Image.open(image_path)))
            hash_groups[image_hash].append(image_path)
        except Exception as e:
            print(f"Error processing {image_path}: {e}")

    # Handle each group of duplicates
    for hash_value, duplicates in hash_groups.items():
        if len(duplicates) > 1:  # Only process if there are duplicates
            handle_duplicate_group(duplicates)

# Watchdog event handler
class ImageFolderHandler(FileSystemEventHandler):
    """Handles events in the monitored directory."""

    def on_any_event(self, event):
        """Trigger duplicate detection after delay on any directory change."""
        global process_timer

        if process_timer:
            process_timer.cancel()

        # Start a new timer
        process_timer = Timer(DELAY_SECONDS, run_duplicate_detection)
        process_timer.start()

def run_duplicate_detection():
    """Run duplicate detection."""
    print("Starting duplicate detection using perceptual hashing...")
    find_duplicates_grouped(SOURCE_DIR)
    print("Duplicate detection completed.")

if __name__ == "__main__":
    # Start watching the directory
    print(f"Watching directory: {SOURCE_DIR}")
    event_handler = ImageFolderHandler()
    observer = Observer()
    observer.schedule(event_handler, path=SOURCE_DIR, recursive=True)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

Of course, you can have some fun by customizing the Telegram notifications, which are chosen randomly.

Once you’re finished, save and close. Later, you will come back to this script and insert the Telegram credentials.

Make it executable with

chmod +x duplicate_detection.py

Install the Telegram notifications

Telegram notifications are a no-cost way of sending alerts. They look like they are a bit tricky to install but they are not really, and they require absolutely no maintenance.

This is how you do it.

Create a Telegram bot by opening Telegram and searching for “BotFather” and

Enter “/newbot”.

Give your bot a name like “Duplicate Finder” so that you remember which bot does what.

Follow the prompts to name your bot (must be unique in the Telegram universe) and get a unique API token (e.g., 7337278999:AAHl7d5cro0wWQfCHyh0qlaQfGSslSFZnVY).

Copy it from Telegram and insert it into the Python script.

Go to your new Telegram channel by clicking on the “t.me/..” section in this welcome message:

Done! Congratulations on your new bot. You will find it at t.me/Pi3DuplicatesBot.

Start a chat by typing “hello”.

Use the following URL with your personalized Your-bot-token in a browser to get your chat ID:

https://api.telegram.org/bot<Your-Bot-Token>/getUpdates

Note: Don’t forget that there is still a “bot” before your “<Your-Bot-Token>“. When I set it up the first time, I copied it without the “bot” which resulted in an error.

You should get an answer like this

"message":{"message_id":8,"from":{"id":801573728,"is_bot":false,"first_name":"Wolfgang","last_name":"M\u00e4nnel","username":"wolfgang123","language_code":"de"},"chat":{"id":801573728,"first_name":"Wolfgang","last_name":"M\u00e4nnel","username":"wolfgang123","type":"private"},"date":1731855148,"text":"Hello Frame"}}]}

Pick out the ” Chat id” and also insert it into the Python script.

# Telegram Bot Configuration
BOT_TOKEN = "7837997029:AAHl7d5cro0pWQnsHyh0qlaQfGSIlSFZnVY"  # Replace with your bot's API token
CHAT_ID = "801573728"      # Replace with your chat ID

This is it. It might look a bit tricky, but it’s very easy.

Make a service file to start the script at boot

Finally, to always keep this script active in the background, create a service file with

sudo nano /etc/systemd/system/duplicate_detection.service

Paste the following text into the file

[Unit]
Description=Duplicate Detection and Notification Script
After=network.target

[Service]
ExecStart=/home/pi/venv_picframe/bin/python /home/pi/duplicate_detection.py
WorkingDirectory=/home/pi
Restart=always
User=pi

[Install]
WantedBy=multi-user.target

Save and close.

Now activate the service with (line by line)

sudo systemctl daemon-reload

sudo systemctl enable duplicate_detection.service

sudo systemctl start duplicate_detection.service

Finally, check if everything works fine with

sudo systemctl status duplicate_detection.service

Conclusion

Now duplicate a few files in your Pictures folder, wait fifteen minutes, and see what happens!

It may be a bit of an overkill solution to a simple problem, but it is great tinkering!

Was this article helpful?


Thank you for your support and motivation.


Scroll to Top