Automatically Extract Key Invoice Data with an AI Agent (Free n8n Workflow + Video + Tutorial + Download)
This step-by-step guide shows you how to automate the extraction of key data from PDF invoices using a pre-built AI agent in n8n. In seconds, turn any invoice into a clean, structured record — no manual data entry or tedious processing required.
You’ll get a ready-to-use workflow you can copy, test for free, and customize to fit your accounting or data analysis process — no coding needed.
Hack’celeration: The No-Code Automation Secrets Experts Never Share — We Give Them to You.






How n8n Automation Helps You Auto-Extract Key PDF Invoice Data with an AI Agent — No Manual Work Needed
Still wasting time manually copying data from PDF invoices every day? This n8n automation puts an end to that.
This ready-to-use workflow uses an AI Agent to automatically analyze each invoice and extract key details: invoice number, date, amount, customer, supplier, IBAN, line items, and more.
Just download the plug-and-play workflow, connect it to your inbox, Google Drive, or ERP, and within seconds your invoice data is automatically sent wherever you need it — Google Sheets, Notion, Airtable, your CRM or accounting software.
The result: no more manual input, no more copy-paste errors, and a clean, structured database — all while saving hours every week.
To make your life easier, the workflow comes fully documented and ready to use with step-by-step notes directly inside n8n. You’ll instantly understand how the AI Agent processes and extracts invoice data.
You also get a complete video tutorial and a step-by-step guide to walk you through setup and automation. Use it as is or easily adapt it to your stack.
The goal: automate invoice data entry with zero code, while giving you the flexibility to connect the AI Agent to Gmail, Google Drive, your ERP, or accounting tools.
Tutorial Video – Extract Invoice Data with an AI Agent in n8n
Step-by-Step Guide to the n8n Workflow: AI-Powered Invoice Data Extraction with Screenshots
pdftotext
.
Requirement: Use a Self-Hosted n8n Instance with Terminal Access
To run this workflow, you need a self-hosted n8n instance with terminal access. This setup enables local execution of the command that extracts text from your invoice PDFs.
The automation relies on the command-line tool pdftotext
, which converts each PDF invoice into plain text so the AI agent can analyze and structure the data (invoice number, date, amount, IBAN, client info, etc.).
- If you’re comfortable with the terminal, you can manually install
pdftotext
using the Poppler library. - If not, just ask ChatGPT how to install
pdftotext
based on your system (Ubuntu, Mac, Docker…). - Need help? Contact us using the form and we’ll guide you step-by-step.
Important: This step is essential — without it, your AI agent won’t be able to read and extract data from your invoices.
Step 1: Launch the Workflow (Manual Trigger)
This initial step lets you manually test your PDF invoice processing. The Manual Trigger node in n8n is perfect for simulating the automation step-by-step and making sure every field (amount, supplier, IBAN, etc.) is correctly extracted.
It’s the best way to confirm that your AI agent reads each file and pulls the correct data before adding a real trigger like new email received, file added to drive, or API call.
➡️ Settings:
- Trigger Type: Manual Trigger
- Usage: Manually launch the workflow to test one or several invoice PDFs
Start your first test by clicking “Execute Workflow” in the n8n editor.
Step 2: Retrieve PDF Invoices from Google Drive
This step automatically scans a specific folder in your Google Drive to fetch all PDF invoices ready for processing. Each file will then be analyzed individually by the AI agent.
💡 Tip: To find your Drive folder ID, open the folder in your browser—the ID appears in the URL after /folders/
.
➡️ Settings:
- Module: Google Drive
- Operation: List all files in a folder
- Folder: ID of the folder containing your invoice PDFs
- Authentication: Your Google Drive account connected to n8n
Step 3: Process Each Invoice with a Loop Node
This step uses a Loop node in n8n to process each invoice PDF individually. It ensures that every file is analyzed one by one, preventing data overlap or workflow collisions.
Looping over the list of files lets your automation treat every invoice as a separate item—from text extraction to AI analysis and data structuring.
➡️ Settings:
- Module: Loop
- Operation: Iterate through the list of PDF files
- Purpose: Ensure that each invoice is processed in isolation
Step 4: Download the Invoice PDF from Google Drive
This step automatically downloads the invoice PDF file from your Google Drive folder, using the dynamic file ID retrieved during the previous loop step.
➡️ Settings:
- Module: Google Drive
- Operation: Download file
- File: Dynamic file ID (from the loop)
- Authentication: Your connected Google account in n8n
💡 You can also replace Google Drive with a Gmail module, a webhook trigger, or your ERP’s API if invoices come from another source.
ReadWriteFileFromDisk
node. It’s a key step to prepare the file for automated text extraction via terminal command.
Step 5: Save the PDF Invoice Locally
The invoice is saved in PDF format inside a temporary server folder (/tmp/doc.pdf
). This step is required to make the file accessible for text extraction using a terminal command.
➡️ Settings:
- File path: /tmp/doc.pdf
- Content: Binary data from the downloaded PDF invoice
This method works with any type of PDF: customer invoice, vendor invoice, credit note, purchase order, etc.
pdftotext
command and extract the plain text content from the invoice PDF. Note: this requires a self-hosted n8n instance.
Step 6: Extract Text from the Invoice (PDFtoText)
In this step, we use the pdftotext
command (included in the Poppler library) to convert the PDF invoice into a plain text file. This format is essential for the AI agent to analyze and structure the information extracted from the invoice.
➡️ Command executed: pdftotext /tmp/doc.pdf /tmp/doc.txt
This method extracts all visible fields from an invoice: number, date, line items, VAT, total amount, IBAN, and more.
Not sure how to install pdftotext? Ask ChatGPT depending on your system (Ubuntu, Docker, Mac…) or contact us.
Step 7: Read the Extracted Text File
In this step, we use the Read File from Disk node to load the plain text content previously extracted from the PDF invoice. This data will then be passed to the AI agent for analysis and structured extraction.
➡️ Parameters:
- File Path:
/tmp/doc.txt
- Encoding: UTF-8
This step is crucial to ensure that the AI receives clean, readable input data for accurate processing.
Step 8: Prepare the Text for Analysis
The raw text extracted from the invoice may contain unwanted line breaks, extra spaces, or repeated headers. This step is used to clean and standardize the content so it can be properly interpreted by the AI agent.
The cleaned output is stored in $json.data
, ready to be passed to the analysis step. This ensures the AI model receives clear, usable input such as dates, invoice numbers, product lines, VAT, totals, and more.
Step 9: Analyze the Invoice with an AI Agent (GPT-4o)
The cleaned text is sent to an AI agent powered by GPT-4o, using LangChain. This agent is trained to automatically extract all key invoice data: invoice number, date, client, vendor, subtotal, total with tax, IBAN, product lines, and more.
➡️ Prompt: JSON-formatted output with standardized fields, optimized for Google Sheets (e.g. using apostrophes to prevent number formatting issues).
Step 10: Flatten the Extracted Data
The JSON generated by the AI agent is transformed into a flat structure, with standardized fields (e.g., invoice_number
, invoice_date
, total_amount
, client_name
, etc.) for direct integration into Google Sheets.
💡 You can easily adapt this structure for other tools like Notion, Airtable, or your accounting database depending on your stack.
Step 11: Insert Structured Data into Google Sheets
The extracted invoice information (amount, date, client, supplier, IBAN, etc.) is automatically added as a new row in a Google Sheet. Each column corresponds to a clearly defined field.
➡️ Connection: Google Sheets linked to your account
You can easily replace this output with Notion, Airtable, an ERP, an invoicing tool, or a SQL database depending on your needs.
Step 12: Clean Up the Server
To keep your server clean and avoid unnecessary storage usage, this step automatically deletes the temporary files created during the invoice processing (/tmp/doc.pdf
and /tmp/doc.txt
).
➡️ Command: rm -rf /tmp/doc.pdf /tmp/doc.txt
You can customize this path depending on your storage system or if you want to archive files in a different location.
Why Automatically Extracting Invoice Data is a Game-Changer for Your Admin Workflow
Managing your incoming invoices in your CRM, ERP, or Google Sheets efficiently is essential to automate your admin workflow and avoid manual entry errors. Manually reviewing PDF invoices is time-consuming, error-prone, and slows down follow-ups or accounting processes.
Common issues with manual invoice data entry:- Missing or incorrect information (invoice number, date, amount, client, etc.).
- Time wasted opening each PDF and copying the data manually.
- Risk of duplicates or incorrect amounts entered.
- Difficulty centralizing and using data for tracking or follow-up.
- Instantly structured and standardized billing information.
- Significant time savings on administrative tasks.
- Seamless integration with Google Sheets, Notion, Airtable, or accounting tools.
- Automated triggers (notifications, archiving, follow-ups, accounting sync, etc.).
By automating the extraction of data from PDF invoices using an AI agent, you eliminate repetitive tasks, improve data accuracy, and boost productivity. This n8n scenario becomes a powerful asset to scale your admin operations effortlessly.