Exercise 8 AI Data Cleaner and Categorizer with Google Sheets + Gemini

Source code https://github.com/lsvekis/Google-Apps-Script-Gemini-Projects/tree/main/AI%20Data%20Cleaner

Managing messy product data is a constant headache — especially when you’re working with spreadsheets full of inconsistent titles, missing brands, and vague descriptions.
In this exercise, we’ll build an AI-powered Google Sheets add-on that automatically cleans, classifies, and enriches product information using the Gemini API and Google Apps Script.

By the end, you’ll be able to:

✅ Normalize messy product names
✅ Detect brands and categories automatically
✅ Generate short marketing descriptions
✅ Tag items with AI-suggested keywords
✅ Run it directly from your Sheet — no coding knowledge required


🚀 Overview

We’ll connect Google Sheets to the Gemini 2.5 Flash model through a simple Apps Script project.
Once set up, you can select any range of rows and the script will:

  1. Send each product name + description to Gemini.
  2. Ask for a JSON response matching a strict schema.
  3. Parse and display clean data in the next columns.

The result is a ready-to-use AI data-cleaning tool built right into Google Sheets.


🧩 Step 1 — Set up the Script Project

  1. Open a Google Sheet.
  2. Go to Extensions → Apps Script.
  3. Create three files:
File namePurpose
Code.gsThe full Apps Script logic
Sidebar.htmlA small UI for testing single products
appsscript.jsonManifest with required scopes

Paste in the full code from this project (see below).

  1. In Project Settings → Script properties, add your API key:
    • Name: GEMINI_API_KEY
    • Value: your Gemini API key from ai.google.dev
  2. Save and run the onOpen() function once to authorize the script.

🧠 Step 2 — How the AI Cleaner Works

The core function is cleanOneRecord(name, desc):

  1. Prompt construction:
    It tells Gemini exactly which fields to return (clean name, brand, category, tags, short description, confidence, notes).
    It also warns the model to return “RAW JSON ONLY on a single line” — no markdown fences.
  2. Gemini request:
    The function uses UrlFetchApp.fetch() to call
    https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent.
  3. Robust JSON parsing:
    A helper safeParseJsonLoose() cleans up stray “`json fences and safely recovers incomplete objects if Gemini truncates output.
  4. Normalization:
    The code trims strings, clamps the confidence 0-1, shortens the description ≤ 120 characters, and joins tags into a readable list.
  5. Output:
    Each cleaned record writes into seven new columns:
    • Clean Name
    • Brand
    • Category
    • Tags
    • Short Description
    • Confidence
    • Notes

⚙️ Step 3 — Test It with Sample Data

From the new 🧠 AI Tools menu in your Sheet:

  1. Choose Insert Sample Data — this populates demo rows (iPhone, Nike shoes, Sony headphones…).
  2. Select columns A and B (Product Name and Description).
  3. Choose Clean Selection → next columns.
  4. Watch Gemini fill in clean, organized data instantly.

You can also open Open Sidebar and test individual items interactively.


💡 How the Prompt Keeps Gemini Consistent

The trick is the strict JSON schema and explicit rules:

{
  "clean_name": "string",
  "brand": "string",
  "category": "string",
  "tags": ["string"],
  "short_description": "string",
  "confidence": "number (0..1)",
  "notes": "string"
}

By enforcing this structure and rejecting anything that isn’t valid JSON, you get predictable, machine-readable output every time — perfect for automation.


🔍 Example Result

Input:

Product NameDescription
Apple iPhone 15 Pro Max 256 GB – Titan GreyFlagship phone with A17 Pro chip, 6.7″ display, excellent camera. Unlocked.

Output:

Clean NameBrandCategoryTagsShort DescriptionConfidenceNotes
Apple iPhone 15 Pro Max 256 GBAppleSmartphonesmartphone, apple, iphone, flagship, 256gbFlagship phone with A17 Pro chip and excellent camera.0.97

🧰 Step 4 — Optional: Add the Custom Function

You can also clean individual cells directly:

=AI_CLEAN_ROW(A2,B2)

This returns a full JSON string for the given product row.


📦 Download & Reuse

All ready-to-use files are available in this repo (or ZIP):

  • Code.gs
  • Sidebar.html
  • appsscript.json

Import them into a new Google Sheet’s Apps Script editor
→ Add your Gemini API key
→ You’re ready to build your own AI data-cleaning pipeline!


✨ What You Learned

  • How to integrate Gemini with Google Sheets using REST calls
  • How to design strict, self-validating JSON prompts
  • How to build a sidebar UI and custom functions in Apps Script
  • How to safely parse imperfect AI responses

This technique is a foundation for more complex automations — from cleaning e-commerce listings to categorizing support tickets or research data.