How to Automatically Create Sample PDFs and Extract Totals Using Google Apps Script

📘 Introduction

Have you ever wanted to generate PDFs automatically and extract key details from them, like invoice totals? With Google Apps Script, you can automate this process. In this guide, you’ll learn how to:

  • Create sample PDFs with mock invoice data.
  • Extract totals from PDFs using OCR (Optical Character Recognition).
  • Email a summary of the totals directly to your inbox.

This is useful for managing invoices, receipts, and other financial documents. By the end, you’ll have a system that can generate, scan, and extract key data from PDFs.


📘 Prerequisites

  1. Google Drive: Create a folder where the PDFs will be stored.
  2. Enable Drive API:
    • Go to Extensions → Advanced Google Services.
    • Enable the Drive API.
    • Go to Google Cloud Console and ensure the Drive API is enabled.

📘 Key Features

  1. Create Sample PDF Files
    Generate mock invoices with data like item descriptions, prices, subtotals, taxes, and totals.
  2. Extract and Email Totals
    Extract the total from each PDF using OCR and send a summary email with the totals.
  3. Automated Cleanup
    Temporary files are automatically deleted after processing to keep your Drive organized.

📘 The Code

Here’s the complete Google Apps Script. Paste it into the Apps Script editor, follow the instructions, and run it.


📗 1. Create Sample PDFs

This function creates 3 sample PDF invoices and stores them in a folder in Google Drive.

function createSamplePDFs() {
const folderId = 'YOUR_FOLDER_ID_HERE'; // Replace with your folder ID
const folder = DriveApp.getFolderById(folderId);

const pdfData = [
{
fileName: 'Sample Invoice 1.pdf',
content: `
INVOICE
Invoice Number: 1001
Date: 2024-12-09
Item: Widget A - $25.00
Item: Widget B - $50.00
Subtotal: $75.00
Tax: $7.50
Total: $82.50
`
},
{
fileName: 'Sample Invoice 2.pdf',
content: `
RECEIPT
Receipt Number: 2001
Date: 2024-12-08
Description: Service Charge - $150.00
Total: $150.00
`
},
{
fileName: 'Sample Invoice 3.pdf',
content: `
INVOICE
Invoice Number: 3001
Date: 2024-12-07
Product: Gadget A - $100.00
Product: Gadget B - $200.00
Subtotal: $300.00
Discount: $30.00
Tax: $27.00
Total: $297.00
`
}
];

pdfData.forEach(data => {
const doc = DocumentApp.create(data.fileName.replace('.pdf', ''));
const body = doc.getBody();
body.appendParagraph(data.content);
doc.saveAndClose();

const pdfBlob = DriveApp.getFileById(doc.getId()).getAs('application/pdf');
folder.createFile(pdfBlob.setName(data.fileName));
DriveApp.getFileById(doc.getId()).setTrashed(true);
});

Logger.log('Sample PDFs have been created in the folder: ' + folder.getName());
}

📗 2. Extract and Email Totals from PDFs

This function processes each PDF, extracts the total from the invoice using OCR, and emails the totals to the active user.

function extractAndEmailTotalsFromPDFFolder() {
const folderId = 'YOUR_FOLDER_ID_HERE'; // Replace with your folder ID
const folder = DriveApp.getFolderById(folderId);
const files = folder.getFilesByType(MimeType.PDF);
const userEmail = Session.getActiveUser().getEmail();
let emailBody = 'Here are the total values extracted from the PDF files in your folder:\n\n';

while (files.hasNext()) {
const file = files.next();
const fileName = file.getName();
Logger.log(`Processing file: ${fileName}`);

try {
const blob = file.getBlob().setContentType('application/pdf');
const text = extractTextFromPDF(blob);
const totalValue = extractTotalFromText(text);
Logger.log(`File: ${fileName} - Total Value: ${totalValue}`);
emailBody += `File: ${fileName} - Total Value: ${totalValue}\n`;
} catch (e) {
Logger.log(`Failed to extract text from file: ${fileName}. Error: ${e}`);
emailBody += `File: ${fileName} - Error extracting total\n`;
}
}

MailApp.sendEmail({
to: userEmail,
subject: 'Extracted Totals from PDF Files',
body: emailBody
});

Logger.log('Email sent to ' + userEmail);
}

function extractTextFromPDF(blob) {
const resource = { title: 'Temporary OCR File', mimeType: blob.getContentType() };
const options = { ocr: true, ocrLanguage: "en" };
const file = Drive.Files.insert(resource, blob, options);
const docId = file.id;
const text = DocumentApp.openById(docId).getBody().getText();
DriveApp.getFileById(docId).setTrashed(true);
return text;
}

function extractTotalFromText(text) {
const cleanText = text.replace(/\n/g, ' ');
const totalRegex = /total[^0-9]*([\d,]+\.\d{2})/i;
const match = cleanText.match(totalRegex);
return match && match[1] ? match[1] : 'Total value not found';
}

📘 How to Run

  1. Create a Folder in Google Drive:
    • Right-click and Get link.
    • Paste the folder ID in the folderId variable.
  2. Run createSamplePDFs():
    • This creates 3 PDF invoices in the folder.
  3. Run extractAndEmailTotalsFromPDFFolder():
    • Extracts totals from the PDFs.
    • Emails a summary of the totals.

📘 Email Example

Subject: Extracted Totals from PDF Files

Body:

Here are the total values extracted from the PDF files in your folder:

File: Sample Invoice 1.pdf - Total Value: 82.50
File: Sample Invoice 2.pdf - Total Value: 150.00
File: Sample Invoice 3.pdf - Total Value: 297.00

📘 What You’ll Learn

  • How to Generate PDFs from scratch using Google Docs.
  • How to Extract Text from PDFs using OCR.
  • How to Send Emails Automatically with the extracted totals.

📘 Customization Ideas

  • Add More Invoices: Add more data to the pdfData array.
  • Improve the Email: Include attachments of the PDFs in the email.
  • Advanced Extraction: Extract other details from the invoice, like item names, quantities, and prices.

📘 Conclusion

With this Google Apps Script, you now have a tool to:

  • Create PDFs dynamically.
  • Extract totals from PDFs using OCR.
  • Email the extracted totals in a clean summary.

This is a great tool for managing invoices, automating receipt processing, and streamlining your business workflows.