Building an Automated Financial Document Information Extraction Pipeline with n8n and Groq API: Automating 10K, 10Q Report Analysis and Key Metric Extraction
Are you wasting time manually extracting necessary information from complex financial reports (such as 10K, 10Q)? Combine n8n and Groq API to fully automate this process and quickly and efficiently secure critical financial metrics. This article provides a detailed guide on how to build it and its real-world applications.
1. The Challenge / Context
Corporate 10K and 10Q reports are important sources of information used for various purposes such as investment, analysis, and competitive analysis. However, manually extracting necessary information from a vast amount of text data is a time-consuming and error-prone task. Especially due to the abundance of unstructured data and frequent use of specialized terminology, there is a high demand for automated extraction pipelines. Increasing the speed and accuracy of information extraction is essential for improving investment decisions, identifying market trends more quickly, and ultimately gaining a competitive advantage. Existing rule-based systems have the disadvantage of being difficult to adapt to changing report formats and incurring high maintenance costs.
2. Deep Dive: Groq API
The Groq API is an LLM (Large Language Model) API that provides extremely fast inference speeds. It can generate responses to complex queries much faster than existing GPU-based APIs, making it suitable for latency-sensitive applications such as real-time analysis, chatbots, and information extraction. Groq's LPU (Language Processing Unit) architecture is optimized for parallel processing and designed to efficiently run large language models. The Groq API can be accessed via simple HTTP requests and is easy to integrate with various programming languages. Its key features are as follows:
- Low Latency: Provides near real-time response speeds.
- High Throughput: Can process a large number of requests simultaneously.
- Ease of Use: Can be easily integrated via REST API.
- Cost-Effectiveness: Reduces overall computing costs due to fast processing speeds.
3. Step-by-Step Guide / Implementation
Now, let's look at how to build an automated financial document information extraction pipeline using n8n and Groq API, step by step.
Step 1: n8n Installation and Setup
First, you need to install and run n8n. n8n can be installed via Docker, npm, or cloud services. The simplest method is to use Docker.
docker run -d -p 5678:5678 -v ~/.n8n:/home/node/.n8n n8nio/n8n
Once n8n is running, you can access the n8n interface by navigating to http://localhost:5678 in your web browser. Complete the initial setup and create a user account.
Step 2: Groq API Key Setup
To use the Groq API, you must first obtain an API key from Groq. Sign up on the Groq website and generate an API key. Keep the issued API key secure and configure it for use in your n8n workflow. In n8n, create 'Credentials' and select 'Groq API' type to save your API key.
Step 3: Create Workflow
Create a new workflow in the n8n interface. The workflow consists of the following nodes:
- HTTP Request Node: Receives the URL of a 10K or 10Q report and downloads its content.
- HTML Extract Node (Optional): If the report is in HTML format, use this node to extract only the text content. Configure it to extract text within the
bodytag. - Function Node: Cleans up the downloaded report content and generates a prompt to be sent to the Groq API.
- HTTP Request Node (Groq API): Sends a request to the Groq API and receives a response.
- Function Node: Parses the Groq API response and organizes the extracted information.
- Google Sheets or Database Node: Saves the extracted information to a spreadsheet or database.
Step 4: HTTP Request Node Setup (Report Download)
Set up the first HTTP Request node to download the 10K or 10Q report. Enter the report URL in the URL field and set Method to GET. In the Response tab, set Response Format to String.
Step 5: Function Node Setup (Prompt Generation)
Use the Function node to generate a prompt to be sent to the Groq API. Here is an example code:
const reportContent = $input.item.json.body; // Report content received from the HTTP Request node
const prompt = `The following is a portion of a company's 10K report. Extract the following information from this report:\n\nTotal Revenue:\nNet Income:\nEarnings per Share (EPS):\n\nReport Content:\n${reportContent}\n\nExtracted Information:`;
return [{json: {prompt: prompt}}];
This code stores the report content received from the HTTP Request node in the reportContent variable and generates a prompt to be sent to the Groq API. The prompt clearly instructs the Groq API what information to extract.
Step 6: HTTP Request Node Setup (Groq API)
Set up the second HTTP Request node to send a request to the Groq API. Enter the Groq API endpoint (e.g., https://api.groq.com/openai/v1/chat/completions) in the URL field and set Method to POST. In the Headers tab, add an Authorization header and set it to Bearer YOUR_GROQ_API_KEY (replace YOUR_GROQ_API_KEY with your actual API key). In the Body tab, select JSON and enter the following JSON payload.
{
"model": "mixtral-8x7b-32768",
"messages": [
{
"role": "user",
"content": "{{$json.prompt}}" // Prompt generated in the Function node
}
],
"temperature": 0.0,
"max_tokens": 1024
}
The model field specifies the language model to use. The messages field


