How to Scrape Google Images with Python: A Step-by-Step Guide

It’s important to note that web scraping should be done ethically and in compliance with Google’s terms of service.

In this guide, we’ll walk you through the steps to scrape Google Images using Python, focusing on a popular library called requests and BeautifulSoup for parsing HTML.

Prerequisites

Before we start, ensure you have Python installed on your machine and the following libraries:

requests – for making HTTP requests.
BeautifulSoup – for parsing HTML and XML documents.
pillow – for handling image files (optional, but useful for saving images).

You can install these libraries using pip:

bashCopy codepip install requests beautifulsoup4 pillow

Step 1: Import Required Libraries

First, import the necessary libraries in your Python script:

pythonCopy codeimport os
import requests
from bs4 import BeautifulSoup
from PIL import Image
from io import BytesIO

Step 2: Define Your Search Query

Set up your search query and the URL format for Google Images:

pythonCopy codequery = "cats"  # Replace with your search query
url = f"https://www.google.com/search?hl=en&tbm=isch&q={query}"

Step 3: Fetch the HTML Content

Use the requests library to fetch the HTML content of the search results page:

pythonCopy codeheaders = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

Step 4: Extract Image URLs

Google Images’ HTML structure contains image URLs in img tags. Extract these URLs:

pythonCopy codedef get_image_urls(soup):
    image_urls = []
    img_tags = soup.find_all("img")

    for img in img_tags:
        try:
            img_url = img["src"]
            if img_url.startswith("http"):
                image_urls.append(img_url)
        except KeyError:
            continue

    return image_urls

image_urls = get_image_urls(soup)
print(f"Found {len(image_urls)} images.")

Step 5: Download and Save Images

To download and save the images, use the following function:

pythonCopy codedef download_images(image_urls, folder="images"):
    if not os.path.exists(folder):
        os.makedirs(folder)

    for i, url in enumerate(image_urls):
        try:
            response = requests.get(url)
            img = Image.open(BytesIO(response.content))
            img.save(os.path.join(folder, f"image_{i + 1}.jpg"))
            print(f"Downloaded image_{i + 1}.jpg")
        except Exception as e:
            print(f"Failed to download {url}: {e}")

download_images(image_urls)

Step 6: Run the Script

Run the script to scrape and download images:

bashCopy codepython your_script_name.py

Important Considerations

Respect Copyrights: Ensure that you have the right to use the images you scrape. Many images are copyrighted and using them without permission can lead to legal issues.
API Alternatives: For more reliable and ethical access to image data, consider using Google’s Custom Search JSON API, which provides a structured way to access image search results.
Rate Limiting: Avoid making too many requests in a short period to prevent getting blocked by Google. Implement delays between requests if scraping a large number of images.
Ethics and Compliance: Always check and adhere to the website’s robots.txt file and terms of service to ensure your scraping activities are compliant.