Automatically Extract Key Invoice Data with an AI Agent (Free n8n Workflow + Video + Tutorial + Download)

This step-by-step guide shows you how to automate the extraction of key data from PDF invoices using a pre-built AI agent in n8n. In seconds, turn any invoice into a clean, structured record — no manual data entry or tedious processing required.

You’ll get a ready-to-use workflow you can copy, test for free, and customize to fit your accounting or data analysis process — no coding needed.

This free n8n workflow shows how an AI agent can automatically extract key data from a PDF invoice (invoice number, date, total, supplier, IBAN…) and send it directly into a Google Sheets database.

Hack’celeration: The No-Code Automation Secrets Experts Never Share — We Give Them to You.

Qonto logo, online banking solution for professionals.
Gymlib logo, fitness and wellness platform for companies.
Yousign logo, electronic signature service.
Spendesk logo, spend management platform.
Liligo logo, travel comparison tool.
Online restaurant reservation platform.

How n8n Automation Helps You Auto-Extract Key PDF Invoice Data with an AI Agent — No Manual Work Needed

Still wasting time manually copying data from PDF invoices every day? This n8n automation puts an end to that.

This ready-to-use workflow uses an AI Agent to automatically analyze each invoice and extract key details: invoice number, date, amount, customer, supplier, IBAN, line items, and more.

Just download the plug-and-play workflow, connect it to your inbox, Google Drive, or ERP, and within seconds your invoice data is automatically sent wherever you need it — Google Sheets, Notion, Airtable, your CRM or accounting software.

The result: no more manual input, no more copy-paste errors, and a clean, structured database — all while saving hours every week.

To make your life easier, the workflow comes fully documented and ready to use with step-by-step notes directly inside n8n. You’ll instantly understand how the AI Agent processes and extracts invoice data.

You also get a complete video tutorial and a step-by-step guide to walk you through setup and automation. Use it as is or easily adapt it to your stack.

The goal: automate invoice data entry with zero code, while giving you the flexibility to connect the AI Agent to Gmail, Google Drive, your ERP, or accounting tools.

Tutorial Video – Extract Invoice Data with an AI Agent in n8n

Step-by-Step Guide to the n8n Workflow: AI-Powered Invoice Data Extraction with Screenshots

This video walks through the setup of a Linux terminal to prepare a self-hosted n8n environment. It covers all prerequisites to automate PDF invoice data extraction using an AI agent, including installing pdftotext.

Requirement: Use a Self-Hosted n8n Instance with Terminal Access

To run this workflow, you need a self-hosted n8n instance with terminal access. This setup enables local execution of the command that extracts text from your invoice PDFs.

The automation relies on the command-line tool pdftotext, which converts each PDF invoice into plain text so the AI agent can analyze and structure the data (invoice number, date, amount, IBAN, client info, etc.).

  • If you’re comfortable with the terminal, you can manually install pdftotext using the Poppler library.
  • If not, just ask ChatGPT how to install pdftotext based on your system (Ubuntu, Mac, Docker…).
  • Need help? Contact us using the form and we’ll guide you step-by-step.

Important: This step is essential — without it, your AI agent won’t be able to read and extract data from your invoices.

This video demonstrates how to manually trigger your n8n workflow using the Manual Trigger node. It’s the very first step to test the automatic PDF invoice extraction with an AI agent.

Step 1: Launch the Workflow (Manual Trigger)

This initial step lets you manually test your PDF invoice processing. The Manual Trigger node in n8n is perfect for simulating the automation step-by-step and making sure every field (amount, supplier, IBAN, etc.) is correctly extracted.

It’s the best way to confirm that your AI agent reads each file and pulls the correct data before adding a real trigger like new email received, file added to drive, or API call.

➡️ Settings:

  • Trigger Type: Manual Trigger
  • Usage: Manually launch the workflow to test one or several invoice PDFs

Start your first test by clicking “Execute Workflow” in the n8n editor.

In this step, the workflow automatically detects PDF invoices stored in a Google Drive folder. The video walks through setting up the Google Drive node to filter and retrieve the correct files.

Step 2: Retrieve PDF Invoices from Google Drive

This step automatically scans a specific folder in your Google Drive to fetch all PDF invoices ready for processing. Each file will then be analyzed individually by the AI agent.

💡 Tip: To find your Drive folder ID, open the folder in your browser—the ID appears in the URL after /folders/.

➡️ Settings:

  • Module: Google Drive
  • Operation: List all files in a folder
  • Folder: ID of the folder containing your invoice PDFs
  • Authentication: Your Google Drive account connected to n8n
This step shows how the Loop node in n8n processes each invoice PDF individually. It ensures smooth and isolated execution of every file in the automation, preventing overlap and errors.

Step 3: Process Each Invoice with a Loop Node

This step uses a Loop node in n8n to process each invoice PDF individually. It ensures that every file is analyzed one by one, preventing data overlap or workflow collisions.

Looping over the list of files lets your automation treat every invoice as a separate item—from text extraction to AI analysis and data structuring.

➡️ Settings:

  • Module: Loop
  • Operation: Iterate through the list of PDF files
  • Purpose: Ensure that each invoice is processed in isolation
This step shows how to automatically download a PDF invoice from Google Drive using the Google Drive node in n8n. The configuration retrieves the target file for processing in the workflow.

Step 4: Download the Invoice PDF from Google Drive

This step automatically downloads the invoice PDF file from your Google Drive folder, using the dynamic file ID retrieved during the previous loop step.

➡️ Settings:

  • Module: Google Drive
  • Operation: Download file
  • File: Dynamic file ID (from the loop)
  • Authentication: Your connected Google account in n8n

💡 You can also replace Google Drive with a Gmail module, a webhook trigger, or your ERP’s API if invoices come from another source.

This video shows how to save the downloaded invoice PDF to your local server using the ReadWriteFileFromDisk node. It’s a key step to prepare the file for automated text extraction via terminal command.

Step 5: Save the PDF Invoice Locally

The invoice is saved in PDF format inside a temporary server folder (/tmp/doc.pdf). This step is required to make the file accessible for text extraction using a terminal command.

➡️ Settings:

  • File path: /tmp/doc.pdf
  • Content: Binary data from the downloaded PDF invoice

This method works with any type of PDF: customer invoice, vendor invoice, credit note, purchase order, etc.

This step uses a Terminal node in n8n to execute the pdftotext command and extract the plain text content from the invoice PDF. Note: this requires a self-hosted n8n instance.

Step 6: Extract Text from the Invoice (PDFtoText)

In this step, we use the pdftotext command (included in the Poppler library) to convert the PDF invoice into a plain text file. This format is essential for the AI agent to analyze and structure the information extracted from the invoice.

➡️ Command executed: pdftotext /tmp/doc.pdf /tmp/doc.txt

This method extracts all visible fields from an invoice: number, date, line items, VAT, total amount, IBAN, and more.

Not sure how to install pdftotext? Ask ChatGPT depending on your system (Ubuntu, Docker, Mac…) or contact us.

The Read File from Disk node is used to load the plain text content extracted from a PDF invoice. This step is essential to prepare the data for processing by the AI agent.

Step 7: Read the Extracted Text File

In this step, we use the Read File from Disk node to load the plain text content previously extracted from the PDF invoice. This data will then be passed to the AI agent for analysis and structured extraction.

➡️ Parameters:

  • File Path: /tmp/doc.txt
  • Encoding: UTF-8

This step is crucial to ensure that the AI receives clean, readable input data for accurate processing.

The Information Extractor node uses an AI agent powered by OpenAI to analyze the text from a PDF invoice and automatically extract key details: invoice number, date, amount, client name, and more.

Step 8: Prepare the Text for Analysis

The raw text extracted from the invoice may contain unwanted line breaks, extra spaces, or repeated headers. This step is used to clean and standardize the content so it can be properly interpreted by the AI agent.

The cleaned output is stored in $json.data, ready to be passed to the analysis step. This ensures the AI model receives clear, usable input such as dates, invoice numbers, product lines, VAT, totals, and more.

This video shows the AI agent running inside n8n: the prompt is dynamically generated and the response is returned as a clean, structured JSON – ready to be reused in your workflow.

Step 9: Analyze the Invoice with an AI Agent (GPT-4o)

The cleaned text is sent to an AI agent powered by GPT-4o, using LangChain. This agent is trained to automatically extract all key invoice data: invoice number, date, client, vendor, subtotal, total with tax, IBAN, product lines, and more.

➡️ Prompt: JSON-formatted output with standardized fields, optimized for Google Sheets (e.g. using apostrophes to prevent number formatting issues).

The Set node is used to organize the extracted invoice fields: invoice number, date, total amount, client… Each field is formatted for seamless integration into Google Sheets, Notion, or your preferred tool.

Step 10: Flatten the Extracted Data

The JSON generated by the AI agent is transformed into a flat structure, with standardized fields (e.g., invoice_number, invoice_date, total_amount, client_name, etc.) for direct integration into Google Sheets.

💡 You can easily adapt this structure for other tools like Notion, Airtable, or your accounting database depending on your stack.

This final step in the workflow shows the automated export of extracted invoice data to Google Sheets. Each column (amount, client, date…) is manually mapped for clean and structured entry.

Step 11: Insert Structured Data into Google Sheets

The extracted invoice information (amount, date, client, supplier, IBAN, etc.) is automatically added as a new row in a Google Sheet. Each column corresponds to a clearly defined field.

➡️ Connection: Google Sheets linked to your account

You can easily replace this output with Notion, Airtable, an ERP, an invoicing tool, or a SQL database depending on your needs.

Final step: the server automatically deletes temporary PDF files related to the invoice after extraction and storage. This helps maintain a clean and stable environment on each iteration.

Step 12: Clean Up the Server

To keep your server clean and avoid unnecessary storage usage, this step automatically deletes the temporary files created during the invoice processing (/tmp/doc.pdf and /tmp/doc.txt).

➡️ Command: rm -rf /tmp/doc.pdf /tmp/doc.txt

You can customize this path depending on your storage system or if you want to archive files in a different location.

Why Automatically Extracting Invoice Data is a Game-Changer for Your Admin Workflow

Managing your incoming invoices in your CRM, ERP, or Google Sheets efficiently is essential to automate your admin workflow and avoid manual entry errors. Manually reviewing PDF invoices is time-consuming, error-prone, and slows down follow-ups or accounting processes.

Common issues with manual invoice data entry:
  • Missing or incorrect information (invoice number, date, amount, client, etc.).
  • Time wasted opening each PDF and copying the data manually.
  • Risk of duplicates or incorrect amounts entered.
  • Difficulty centralizing and using data for tracking or follow-up.
Benefits of automatically extracting invoice data:
  • Instantly structured and standardized billing information.
  • Significant time savings on administrative tasks.
  • Seamless integration with Google Sheets, Notion, Airtable, or accounting tools.
  • Automated triggers (notifications, archiving, follow-ups, accounting sync, etc.).

By automating the extraction of data from PDF invoices using an AI agent, you eliminate repetitive tasks, improve data accuracy, and boost productivity. This n8n scenario becomes a powerful asset to scale your admin operations effortlessly.