Back to Integrations
integration integration
integration

Integrate Information Extractor with 500+ apps and services

Unlock Information Extractor’s full potential with n8n, connecting it to similar AI apps and over 1000 other services. Automate AI workflows by integrating, training, and deploying models across various platforms. Create adaptable and scalable workflows between Information Extractor and your stack. All within a building experience you will love.

Create workflows with Information Extractor integrations

799 integrations
Sort by:
Popularity
NameOldestNewest

Popular ways to use Information Extractor integration

HTTP Request node
Google Drive node
+7

Transcribing Bank Statements To Markdown Using Gemini Vision AI

This n8n workflow demonstrates an approach to parsing bank statement PDFs with multimodal LLMs as an alternative to traditional OCR. This allows for much more accurate data extraction from the document especially when it comes to tables and complex layouts. Multimodal Parsing is better than traditiona OCR because: It reduces complexity and overhead by avoiding the need to preprocess the document into text format such as markdown before passing to the LLM. It handles non-standard PDF formats which may produce garbled output via traditional OCR text conversion. It's orders of magnitude cheaper than premium OCR models that still require post-processing cleanup and formatting. LLMs can format to any schema or language you desire! How it works You can use the example bank statement created specifically for this workflow here: https://drive.google.com/file/d/1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA/view?usp=sharing A PDF bank statement is imported via Google Drive. For this demo, I've created a mock bank statement which includes complex table layouts of 5 columns. Typically, OCR will be unable to align the columns correctly and mistake some deposits for withdrawals. Because multimodal LLMs do not accept PDFs directly, well have to convert the PDF to a series of images. We can achieve this by using a tool such as Stirling PDF. Stirling PDF is self-hostable which is handy for sensitive data such as bank statements. Stirling PDF will return our PDF as a series of JPGs (one for each page) in a zipped file. We can use n8n's decompress node to extract the images and ensure they are ordered by using the Sort node. Next, we'll resize each page using the Edit Image node to ensure the right balance between resolution limits and processing speed. Each resized page image is then passed into the Basic LLM node which will use our multimodal LLM of choice - Gemini 1.5 Pro. In the LLM node's options, we'll add a "user message" of type binary (data) which is how we add our image data as an input. Our prompt will instruct the multimodal LLM to transcribe each page to markdown. Note, you do not need to do this - you can just ask for data points to extract directly! Our goal for this template is to demonstrate the LLMs ability to accurately read the page. Finally, with our markdown version of all pages, we can pass this to another LLM node to extract required data such as deposit line items. Requirements Google Gemini API for Multimodal LLM. Google Drive access for document storage. Stirling PDF instance for PDF to Image conversion Customising the workflow At time of writing, Gemini 1.5 Pro is the most accurate in text document parsing with a relatively low cost. If you are not using Google Gemini however you can switch to other multimodal LLMs such as OpenAI GPT or Antrophic Claude. If you don't need the markdown, simply asking what to extract directly in the LLM's prompt is also acceptable and would save a few extra steps. Not parsing any bank statements any time soon? This template also works for Invoices, inventory lists, contracts, legal documents etc.
jimleuk
Jimleuk
HTTP Request node
Webhook node
Respond to Webhook node
+8

Ultimate Scraper Workflow for n8n

What this template does The Ultimate Scraper for n8n uses Selenium and AI to retrieve any information displayed on a webpage. You can also use session cookies to log in to the targeted webpage for more advanced scraping needs. ⚠️ Important: This project requires specific setup instructions. Please follow the guidelines provided in the GitHub repository: n8n Ultimate Scraper Setup : https://github.com/Touxan/n8n-ultimate-scraper/tree/main. The workflow version on n8n and the GitHub project may differ; however, the most up-to-date version will always be the one available on the GitHub repository : https://github.com/Touxan/n8n-ultimate-scraper/tree/main. How to use Deploy the project with all the requirements and request your webhook. Example of request: curl -X POST http://localhost:5678/webhook-test/yourwebhookid \ -H "Content-Type: application/json" \ -d '{ "subject": "Hugging Face", "Url": "github.com", "Target data": [ { "DataName": "Followers", "description": "The number of followers of the GitHub page" }, { "DataName": "Total Stars", "description": "The total numbers of stars on the different repos" } ], "cookie": [] }' Or to just scrap a url : curl -X POST http://localhost:5678/webhook-test/67d77918-2d5b-48c1-ae73-2004b32125f0 \ -H "Content-Type: application/json" \ -d '{ "Target Url": "https://github.com", "Target data": [ { "DataName": "Followers", "description": "The number of followers of the GitHub page" }, { "DataName": "Total Stars", "description": "The total numbers of stars on the different repo" } ], "cookies": [] }' `
pablobarrier
Pablo
Airtable node
HTTP Request node
Merge node
+24

Scale Deal Flow with a Pitch Deck AI Vision, Chatbot and QDrant Vector Store

Are you a popular tech startup accelerator (named after a particular higher order function) overwhelmed with 1000s of pitch decks on a daily basis? Wish you could filter through them quickly using AI but the decks are unparseable through conventional means? Then you're in luck! This n8n template uses Multimodal LLMs to parse and extract valuable data from even the most overly designed pitch decks in quick fashion. Not only that, it'll also create the foundations of a RAG chatbot at the end so you or your colleagues can drill down into the details if needed. With this template, you'll scale your capacity to find interesting companies you'd otherwise miss! Requires n8n v1.62.1+ How It Works Airtable is used as the pitch deck database and PDF decks are downloaded from it. An AI Vision model is used to transcribe each page of the pitch deck into markdown. An Information Extractor is used to generate a report from the transcribed markdown and update required information back into pitch deck database. The transcribed markdown is also uploaded to a vector store to build an AI chatbot which can be used to ask questions on the pitch deck. Check out the sample Airtable here: https://airtable.com/appCkqc2jc3MoVqDO/shrS21vGqlnqzzNUc How To Use This template depends on the availability of the Airtable - make a duplicate of the airtable (link) and its columns before running the workflow. When a new pitchdeck is received, enter the company name into the Name column and upload the pdf into the File column. Leave all other columns blank. If you have the Airtable trigger active, the execution should start immediately once the file is uploaded. Otherwise, click the manual test trigger to start the workflow. When manually triggered, all "new" pitch decks will be handled by the workflow as separate executions. Requirements OpenAI for LLM Airtable For Database and Interface Qdrant for Vector Store Customising This Workflow Extend this starter template by adding more AI agents to validate claims made in the pitch deck eg. Linkedin Profiles, Page visits, Reviews etc.
jimleuk
Jimleuk
Google Sheets node
HTTP Request node
OpenAI Chat Model node
+2

AI Powered Web Scraping with Jina, Google Sheets and OpenAI : the EASY way

Purpose of workflow: The purpose of this workflow is to automate scraping of a website, transforming it into a structured format, and loading it directly into a Google Sheets spreadsheet. How it works: Web Scraping: Uses the Jina AI service to scrape website data and convert it into LLM-friendly text. Information Extraction: Employs an AI node to extract specific book details (title, price, availability, image URL, product URL) from the scraped data. Data Splitting: Splits the extracted information into individual book entries. Google Sheets Integration: Automatically populates a Google Sheets spreadsheet with the structured book data. Step by step setup: Set up Jina AI service: Sign up for a Jina AI account and obtain an API key. Configure the HTTP Request node: Enter the Jina AI URL with the target website. Add the API key to the request headers for authentication. Set up the Information Extractor node: Use Claude AI to generate a JSON schema for data extraction. Upload a screenshot of the target website to Claude AI. Ask Claude AI to suggest a JSON schema for extracting required information. Copy the generated schema into the Information Extractor node. Configure the Split node: Set it up to separate the extracted data into individual book entries. Set up the Google Sheets node: Create a Google Sheets spreadsheet with columns for title, price, availability, image URL, and product URL. Configure the node to map the extracted data to the appropriate columns.
derekcheungsa
Derek Cheung
HTTP Request node
Discord node
+3

Send daily translated Calvin and Hobbes Comics to Discord

How it works Automates the retrieval of Calvin and Hobbes daily comics. Extracts the comic image URL from the website. Translates comic dialogues to English and Korean. Posts the comic and translations to Discord daily. Set up steps Estimated setup time: ~10-15 minutes. Use a Schedule Trigger to automate the workflow at 9 AM daily. Add nodes for parameter setup, HTTP request, data extraction, and integration with Discord. Add detailed notes to each node in the workflow for easy understanding.
datapopcorn
Nathan Lee
HTTP Request node
Merge node
+13

Automate Sales Meeting Prep with AI & APIFY Sent To WhatsApp

This n8n template builds a meeting assistant that compiles timely reminders of upcoming meetings filled with email history and recent LinkedIn activity of other people on the invite. This is then discreetly sent via WhatsApp ensuring the user is always prepared, informed and ready to impress! How it works A scheduled trigger fires hourly to check for upcoming personal meetings. When found, the invite is analysed by an AI agent to pull email and LinkedIn details of the other invitees. 2 subworkflows are then triggered for each invitee to (1) search for last email correspondence with them and (2) scrape their LinkedIn profile + recent activity for social updates. Using both available sources, another AI agent is used to summarise this information and generate a short meeting prep message for the user. The notification is finally sent to the user's WhatsApp, allowing them ample time to review. How to use There are a lot of moving parts in this template so in it's current form, it's best to use this for personal rather than team calendars. The LinkedIn scraping method used in this workflow requires you to paste in your LinkedIn cookies from your browser which essentially let's n8n impersonate you. You can retrieve this from dev console or ask someone technical for help! Note: It may be wise to switch to other LinkedIn scraping approaches which do not impersonate your own account for production. Requirements OpenAI for LLM Gmail for Email Google Calendar for upcoming events WhatsApp Business account for notifications Customising this workflow Try adding information sources which are relevant to you and your invitees. Such as company search, other social media sites etc. Create an on-demand version which doesn't rely on the scheduled trigger. Sometimes you want to know prepare for meetings hours or days in advance where this could help immensely.
jimleuk
Jimleuk

About Information Extractor

Related categories

Similar integrations

  • Wikipedia node
  • OpenAI Chat Model node
  • Zep Vector Store node
  • Postgres Chat Memory node
  • Pinecone Vector Store node
  • Embeddings OpenAI node
  • Supabase: Insert node
  • OpenAI node

Over 3000 companies switch to n8n every single week

Connect Information Extractor with your company’s tech stack and create automation workflows

FAQ about Information Extractor integrations

  • How can I set up Information Extractor integration in n8n?

      To use Information Extractor integration in n8n, start by adding the Information Extractor node to your workflow. You'll need to authenticate your Information Extractor account using supported authentication methods. Once connected, you can choose from the list of supported actions or make custom API calls via the HTTP Request node, for example: to complete the setup, configure the necessary parameters for the actions you wish to perform, such as defining input data formats and specifying output options. Test the node to ensure it's correctly processing your data before finalizing your workflow. You can monitor and adjust settings as needed to optimize performance.

  • Do I need any special permissions or API keys to integrate Information Extractor with n8n?

  • Can I combine Information Extractor with other apps in n8n workflows?

  • What are some common use cases for Information Extractor integrations with n8n?

  • How does n8n’s pricing model benefit me when integrating Information Extractor?

We're using the @n8n_io cloud for our internal automation tasks since the beta started. It's awesome! Also, support is super fast and always helpful. 🤗

Last week I automated much of the back office work for a small design studio in less than 8hrs and I am still mind-blown about it.

n8n is a game-changer and should be known by all SMBs and even enterprise companies.

in other news I installed @n8n_io tonight and holy moly it’s good

it’s compatible with EVERYTHING

Implement complex processes faster with n8n

red icon yellow icon red icon yellow icon