Building a Web Article Summarizer and Citation Extractor with Google Sheets Gemini AI

GitHub https://github.com/lsvekis/Google-Apps-Script-Gemini-Projects/tree/main/summarizer

🌐 Building a Web Article Summarizer & Citation Extractor with Google Sheets + Gemini AI

Have you ever wished you could drop a list of article URLs into Google Sheets and instantly get structured summaries — complete with titles, key quotes, named entities, and estimated reading time?
In this exercise, we’ll create exactly that using Google Apps Script and Gemini AI.


🧠 Project Overview

This project transforms your spreadsheet into a lightweight research assistant.
You’ll feed it article URLs, and the script will:

  • Fetch and clean each webpage
  • Summarize its main points in bullet form
  • Extract people, organizations, and locations
  • Highlight key quotes
  • Suggest topic tags and reading time
  • Output everything into the next columns automatically

It’s perfect for anyone who wants quick, AI-powered overviews of online content — students, journalists, analysts, or content creators.


⚙️ How It Works

The magic happens inside Google Apps Script, which acts as a bridge between Google Sheets and Gemini’s REST API.

Step 1 — Fetch the Web Page

Each URL is retrieved with UrlFetchApp.fetch().
Scripts, styles, and HTML tags are stripped away, leaving clean readable text.

body = body
  .replace(/<script[\s\S]*?<\/script>/gi, ' ')
  .replace(/<style[\s\S]*?<\/style>/gi, ' ')
  .replace(/<[^>]+>/g, ' ')
  .trim();

The result is cached for six hours to avoid hitting rate limits.


Step 2 — Chunk and Summarize

Long pages are split into ~6000-character chunks.
Each chunk is sent to Gemini with a simple instruction:

“Summarize this web page chunk in 4–6 concise bullet points focusing on facts.”

This ensures we don’t exceed token limits and still get accurate partial summaries.


Step 3 — Merge and Structure

Once all chunk summaries return, we send them again to Gemini — this time asking for a strict JSON object:

{
  "title": "string",
  "summary": ["string"],
  "key_quotes": ["string"],
  "entities": {
    "people": ["string"],
    "organizations": ["string"],
    "locations": ["string"]
  },
  "topics": ["string"],
  "reading_time_minutes": 3,
  "source_url": "string"
}

The AI consolidates everything into a clean data structure ready for Sheets.


Step 4 — Display Results in Sheets

Each row of URLs expands into seven columns:

| Title | Summary (bullets) | Entities (People | Orgs | Locs) | Key Quotes | Topics | Reading Minutes | Source URL |
|:——|:——————|:—————–|:————-|:——–|:—————-|:———–|
| AI Updates from Google | • Gemini 2.5 launch … | Sundar Pichai | Google | “AI for everyone” | ai, research | 6 | https://blog.google/… |


🧩 Code Highlights

  • fetchUrlText(url) — cleans HTML into readable text
  • summarizeUrl(url) — handles chunking, API calls, and JSON consolidation
  • AI_URL_SUMMARY(url) — custom sheet function returning the full JSON
  • summarizeUrlSelectionToRight() — bulk action for selected URLs
  • Sidebar UI — allows quick one-off summarization inside a small HTML panel

🧪 Try It Yourself

  1. Open a new Google Sheet → Extensions → Apps Script
  2. Create these files:
    • Code.gs
    • Sidebar.html
    • appsscript.json
  3. Paste in the code from the Exercise 7 GitHub package
  4. In Script Properties, add your Gemini API Key (GEMINI_API_KEY).
  5. Run onOpen() once and reload the Sheet.

Then, use the menu:

🧠 AI Tools → Insert Sample URLs
Select column A → Summarize URL Selection → next columns

or try a single-cell formula:

=AI_URL_SUMMARY(A2)

🧩 Tips & Troubleshooting

  • If you see ❌ Set GEMINI_API_KEY… → add your key in Script Properties.
  • Some news sites block fetch requests — try public blogs first.
  • Long pages? The script summarizes the first four chunks (≈ 24 K chars).
  • Need finer control? Adjust CHUNK_CHAR_LIMIT or MERGE_TOKENS.

🎯 Why This Matters

This exercise demonstrates multi-stage AI orchestration inside a low-code environment.
You’re combining:

  1. Data fetching (UrlFetchApp)
  2. Pre-processing and caching
  3. Multi-prompt reasoning (chunk → merge)
  4. Structured JSON outputs
  5. Instant visualization in Sheets

In short — a real-world example of AI-assisted knowledge extraction without any external server.