🚀 Automatically Convert PDFs to Text Files Using Google Apps Script (With OCR Support!)
Working with PDFs is painful. They’re great for sharing, but not so great when you actually need the text inside them. Manually copying and pasting from dozens—or hundreds—of PDFs is slow, error-prone, and exhausting.
So today, I’m sharing a practical automation that solves this problem completely:
✅ Convert only PDF files from a Google Drive folder
✅ Use Google’s built-in OCR to extract real text (even from scanned PDFs!)
✅ Save the extracted content as clean .txt files
✅ 100% free, runs entirely inside Google Apps Script
✅ No APIs, no add-ons, no third-party services
This is perfect for:
- Educators processing assignments
- Researchers digitizing documents
- Developers preparing data for AI/LLMs
- Writers and content creators migrating material
- Anyone who wants clean text from PDFs—fast
Let’s walk through the solution.
✨ Why This Script Is Better Than Most
Most PDF-to-text scripts rely on:
- paid APIs
- complex OCR libraries
- the deprecated Drive API
- fragile regex tricks
This script doesn’t.
It uses Google Drive’s native OCR engine, the same one used when you upload a scanned PDF and open it as a Google Doc. Apps Script triggers this OCR process automatically, extracts the text, and stores it in a .txt file for you.
It even deletes temporary files so your Drive stays clean.
📁 What You Need Before You Start
Just two Google Drive folders:
- Source folder → where you upload PDFs
- Output folder → where
.txtfiles will be saved
That’s it.
No API keys.
No billing.
No Advanced Services.
🛠️ Step-by-Step Setup
1️⃣ Open Google Apps Script
Go to:
https://script.google.com
Click New Project.
2️⃣ Replace the default code with the script below
(Scroll down for the full script.)
3️⃣ Insert your folder IDs
You can find a folder ID in the URL:
https://drive.google.com/drive/folders/1AbCdEFG123xyz
Everything after /folders/ is the ID.
4️⃣ Run the script
Select:
convertPdfsToText
and click Run.
Authorize once, and you’re done.
📜 The Script: Convert PDF → OCR → .TXT (PDF-Only Processing)
/**
* Convert ONLY PDF files in a Google Drive folder into clean text (.txt) files.
* Uses Google Drive OCR. No Advanced Services required.
*/
const SOURCE_FOLDER_ID = 'PUT_SOURCE_FOLDER_ID_HERE';
const OUTPUT_FOLDER_ID = 'PUT_OUTPUT_FOLDER_ID_HERE';
const DELETE_TEMP_DOC_AFTER = true;
function convertPdfsToText() {
const sourceFolder = DriveApp.getFolderById(SOURCE_FOLDER_ID);
const outputFolder = DriveApp.getFolderById(OUTPUT_FOLDER_ID);
const files = sourceFolder.getFiles();
let processedCount = 0;
while (files.hasNext()) {
const file = files.next();
// ✅ STRICT: Only process PDFs
if (file.getMimeType() !== MimeType.PDF) {
Logger.log("Skipping non-PDF file: " + file.getName());
continue;
}
const pdfName = file.getName();
Logger.log("Processing PDF: " + pdfName);
try {
const text = extractTextFromPdf_(file);
if (!text || !text.trim()) {
Logger.log("No text extracted from: " + pdfName);
continue;
}
// Save as .txt file
const txtName = pdfName.replace(/\.pdf$/i, "") + ".txt";
outputFolder.createFile(txtName, text, MimeType.PLAIN_TEXT);
Logger.log("Created text file: " + txtName);
processedCount++;
} catch (err) {
Logger.log("Error processing " + pdfName + ": " + err);
}
}
Logger.log("Completed. PDFs processed: " + processedCount);
}
/**
* Perform OCR by uploading the PDF to Drive as a Google Doc
*/
function extractTextFromPdf_(pdfFile) {
const ocrUrl =
"https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart";
const metadata = {
name: pdfFile.getName().replace(/\.pdf$/i, ""),
mimeType: "application/vnd.google-apps.document"
};
const boundary = "xxxxxxxxxxxxxx";
const delimiter = "\r\n--" + boundary + "\r\n";
const closeDelimiter = "\r\n--" + boundary + "--";
const pdfBlob = pdfFile.getBlob();
const pdfBase64 = Utilities.base64Encode(pdfBlob.getBytes());
// Multipart request for OCR
const multipartRequestBody =
delimiter +
"Content-Type: application/json; charset=UTF-8\r\n\r\n" +
JSON.stringify(metadata) +
delimiter +
"Content-Type: application/pdf\r\n" +
"Content-Transfer-Encoding: base64\r\n\r\n" +
pdfBase64 +
closeDelimiter;
const params = {
method: "post",
contentType: "multipart/related; boundary=" + boundary,
payload: multipartRequestBody,
headers: { Authorization: "Bearer " + ScriptApp.getOAuthToken() },
muteHttpExceptions: true
};
const response = UrlFetchApp.fetch(ocrUrl, params);
const result = JSON.parse(response.getContentText());
if (!result.id) {
throw new Error("OCR failed: " + response.getContentText());
}
// Read the OCR-converted Google Doc
const doc = DocumentApp.openById(result.id);
const text = doc.getBody().getText();
// Optional cleanup
if (DELETE_TEMP_DOC_AFTER) {
DriveApp.getFileById(result.id).setTrashed(true);
}
return text;
}
📌 What Happens When You Run It
For every PDF in the source folder:
- Google Apps Script uploads it to Drive with OCR
- Drive converts it into a Google Doc
- The script extracts the document’s text
- A
.txtfile is created with the same name - The temporary document is deleted (optional)
Result: clean text files ready for analysis, reuse, or AI processing.
💡 Ways You Can Extend This Script
If you want, I can generate enhanced versions that:
- Create Markdown files instead of text
- Add headings and formatting automatically
- Save files with timestamps
- Process subfolders recursively
- Log results into a Google Sheet
- Zip all extracted text into one downloadable file
Just tell me what you want — the automation possibilities are endless.
🎉 Final Thoughts
Google Apps Script is one of the most underrated automation tools available today. With less than 80 lines of code, you now have a fully automated PDF processing system that can handle any number of files—completely free.
No more manual extraction.
No more copy–paste.
Just fast, accurate, Google-powered OCR conversion.
