🚀 Automatically Convert PDFs to Text Files Using Google Apps Script (With OCR Support!)

Working with PDFs is painful. They’re great for sharing, but not so great when you actually need the text inside them. Manually copying and pasting from dozens—or hundreds—of PDFs is slow, error-prone, and exhausting.

So today, I’m sharing a practical automation that solves this problem completely:

✅ Convert only PDF files from a Google Drive folder

✅ Use Google’s built-in OCR to extract real text (even from scanned PDFs!)

✅ Save the extracted content as clean `.txt` files

✅ 100% free, runs entirely inside Google Apps Script

✅ No APIs, no add-ons, no third-party services

This is perfect for:

Educators processing assignments
Researchers digitizing documents
Developers preparing data for AI/LLMs
Writers and content creators migrating material
Anyone who wants clean text from PDFs—fast

Let’s walk through the solution.

✨ Why This Script Is Better Than Most

Most PDF-to-text scripts rely on:

paid APIs
complex OCR libraries
the deprecated Drive API
fragile regex tricks

This script doesn’t.

It uses Google Drive’s native OCR engine, the same one used when you upload a scanned PDF and open it as a Google Doc. Apps Script triggers this OCR process automatically, extracts the text, and stores it in a .txt file for you.

It even deletes temporary files so your Drive stays clean.

📁 What You Need Before You Start

Just two Google Drive folders:

Source folder → where you upload PDFs
Output folder → where .txt files will be saved

That’s it.

No API keys.
No billing.
No Advanced Services.

🛠️ Step-by-Step Setup

1️⃣ Open Google Apps Script

Go to:
https://script.google.com

Click New Project.

2️⃣ Replace the default code with the script below

(Scroll down for the full script.)

3️⃣ Insert your folder IDs

You can find a folder ID in the URL:

https://drive.google.com/drive/folders/1AbCdEFG123xyz

Everything after /folders/ is the ID.

4️⃣ Run the script

Select:

convertPdfsToText

and click Run.

Authorize once, and you’re done.

📜 The Script: Convert PDF → OCR → .TXT (PDF-Only Processing)

/**
 * Convert ONLY PDF files in a Google Drive folder into clean text (.txt) files.
 * Uses Google Drive OCR. No Advanced Services required.
 */

const SOURCE_FOLDER_ID = 'PUT_SOURCE_FOLDER_ID_HERE';
const OUTPUT_FOLDER_ID = 'PUT_OUTPUT_FOLDER_ID_HERE';
const DELETE_TEMP_DOC_AFTER = true;

function convertPdfsToText() {
  const sourceFolder = DriveApp.getFolderById(SOURCE_FOLDER_ID);
  const outputFolder = DriveApp.getFolderById(OUTPUT_FOLDER_ID);

  const files = sourceFolder.getFiles();
  let processedCount = 0;

  while (files.hasNext()) {
    const file = files.next();

    // ✅ STRICT: Only process PDFs
    if (file.getMimeType() !== MimeType.PDF) {
      Logger.log("Skipping non-PDF file: " + file.getName());
      continue;
    }

    const pdfName = file.getName();
    Logger.log("Processing PDF: " + pdfName);

    try {
      const text = extractTextFromPdf_(file);

      if (!text || !text.trim()) {
        Logger.log("No text extracted from: " + pdfName);
        continue;
      }

      // Save as .txt file
      const txtName = pdfName.replace(/\.pdf$/i, "") + ".txt";
      outputFolder.createFile(txtName, text, MimeType.PLAIN_TEXT);

      Logger.log("Created text file: " + txtName);
      processedCount++;

    } catch (err) {
      Logger.log("Error processing " + pdfName + ": " + err);
    }
  }

  Logger.log("Completed. PDFs processed: " + processedCount);
}


/**
 * Perform OCR by uploading the PDF to Drive as a Google Doc
 */
function extractTextFromPdf_(pdfFile) {
  const ocrUrl =
    "https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart";

  const metadata = {
    name: pdfFile.getName().replace(/\.pdf$/i, ""),
    mimeType: "application/vnd.google-apps.document"
  };

  const boundary = "xxxxxxxxxxxxxx";
  const delimiter = "\r\n--" + boundary + "\r\n";
  const closeDelimiter = "\r\n--" + boundary + "--";

  const pdfBlob = pdfFile.getBlob();
  const pdfBase64 = Utilities.base64Encode(pdfBlob.getBytes());

  // Multipart request for OCR
  const multipartRequestBody =
    delimiter +
    "Content-Type: application/json; charset=UTF-8\r\n\r\n" +
    JSON.stringify(metadata) +
    delimiter +
    "Content-Type: application/pdf\r\n" +
    "Content-Transfer-Encoding: base64\r\n\r\n" +
    pdfBase64 +
    closeDelimiter;

  const params = {
    method: "post",
    contentType: "multipart/related; boundary=" + boundary,
    payload: multipartRequestBody,
    headers: { Authorization: "Bearer " + ScriptApp.getOAuthToken() },
    muteHttpExceptions: true
  };

  const response = UrlFetchApp.fetch(ocrUrl, params);
  const result = JSON.parse(response.getContentText());

  if (!result.id) {
    throw new Error("OCR failed: " + response.getContentText());
  }

  // Read the OCR-converted Google Doc
  const doc = DocumentApp.openById(result.id);
  const text = doc.getBody().getText();

  // Optional cleanup
  if (DELETE_TEMP_DOC_AFTER) {
    DriveApp.getFileById(result.id).setTrashed(true);
  }

  return text;
}

📌 What Happens When You Run It

For every PDF in the source folder:

Google Apps Script uploads it to Drive with OCR
Drive converts it into a Google Doc
The script extracts the document’s text
A .txt file is created with the same name
The temporary document is deleted (optional)

Result: clean text files ready for analysis, reuse, or AI processing.

💡 Ways You Can Extend This Script

If you want, I can generate enhanced versions that:

Create Markdown files instead of text
Add headings and formatting automatically
Save files with timestamps
Process subfolders recursively
Log results into a Google Sheet
Zip all extracted text into one downloadable file

Just tell me what you want — the automation possibilities are endless.

🎉 Final Thoughts

Google Apps Script is one of the most underrated automation tools available today. With less than 80 lines of code, you now have a fully automated PDF processing system that can handle any number of files—completely free.

No more manual extraction.
No more copy–paste.
Just fast, accurate, Google-powered OCR conversion.