Back to Integrations
integration integration
integration

Integrate HTML Extract with 500+ apps and services

Unlock HTML Extract’s full potential with n8n, connecting it to similar Core Nodes apps and over 1000 other services. Create adaptable and scalable workflows between HTML Extract and your stack. All within a building experience you will love.

The HTML Extract integrations are replaced by the HTML integrations

The HTML node replaces the HTML Extract node from version 0.199.0 onwards. Check out the HTML node!

Popular ways to use HTML Extract integration

HTTP Request node

Send trending "Show HN" to email

Triggers every day at 1pm Gets the current content from Hacker News Gets all the different submission items Extracts the rank, title and url Checks if it is a "Show HN" submission Combines the items into a simple email text Sends an email with the email text
jan
Jan Oberhauser
HTTP Request node

Extract post titles from a blog

This workflow uses n8n to extract the names of all the posts from the Hackernoon homepage.
sm-amudhan
amudhan
HTTP Request node
+6

Track changes of product prices

This workflow automatically tracks changes on specific websites, typically in e-commerce where you want to get information about price changes. Prerequisites Basic knowledge of HTML and JavaScript Nodes Execute Command nodes create a file named kopacky.json in the /data/ folder (Make sure that n8n has the permissions to make changes to the folder in your setup) and clean data. Cron node triggers the workflow at regular intervals (default is 15 minutes), depending on how often you want to crawl URLs of your watchers. Function Item node (Change me) adds the URL watchers. You can put as many URLs (watchers) as you want by changing the JavaScript code in the node. There are four properties for each watcher: |Property|Meaning| |-|-| |slug|Unique identifier for the watcher.| |link|URL of the website where you want to track changes.| |selector|CSS selector of the HTML tag, where your price is placed. You can use browser web tools to get a specific selector.| |currency|Currency code in which your price is set.| Function Item node (Init item) saves all required data from each watcher to the kopacky.json file. HTTP Request node fetches data from the website. HTML Extract node extracts the required information from the webpage. Send Email nodes (NotifyBetterPrice) send you an email when there is an issue with getting the price, and when a better price is available (this could happen if the website is down, your tracking product is not available anymore, or the owner of the website changed the selector or HTML). IF nodes filter the incoming data and route the workflow. Move Binary Data nodes convert the JSON file to binary data. Write Binary File nodes write the product prices in the file. NOTE: This is the first (beta) version of this workflow, so it could have some issues. For example, there is an issue with getting content of those websites, where the owner of the website blocks any calls from unknown foreign services - it's typical protection against crawlers.
stehos
sthosstudio
HTTP Request node
+8

Scrape and store data from multiple website pages

This workflow allows extracting data from multiple pages website. The workflow: 1) Starts in a country list at https://www.theswiftcodes.com/browse-by-country/. 2) Loads every country page (https://www.theswiftcodes.com/albania/) 3) Paginates every page in the country page. 4) Extracts data from the country page. 5) Saves data to MongoDB. 6) Paginates through all pages in all countries. It uses getWorkflowStaticData('global') method to recover the next page (saved from the previous page), and it goes ahead with all the pages. There is a first section where the countries list is recovered and extracted. Later, I try to read if a local cache page is available and I recover the cached page from the disk. Finally, I save data to MongoDB, and we paginate all the pages in the country and for all the countries. I have applied a cache system to save a visited page to n8n local disk. If I relaunch workflow, we check if a cache file exists to discard non-required requests to the webpage. If the data present in the website changes, you can apply a Cron node to check the website once per week. Finally, before inserting data in MongoDB, the best way to avoid duplicates is to check that swift_code (the primary value of the collection) doesn't exist. I recommend using a proxy for all requests to avoid IP blocks. A good solution for proxy plus IP rotation is scrapoxy.io. This workflow is perfect for small data requirements. If you need to scrape dynamic data, you can use a Headless browser or any other service. If you want to scrape huge lists of URIs, I recommend using Scrapy + Scrapoxy.
mcolomer
Miquel Colomer
HTTP Request node
Webhook node
Notion node

Add articles to a Notion list by accessing a Discord slash command

This workflow allows you to add articles to a Notion reading list by accessing a Discord slash command. Prerequisites A Notion account and credentials, and a reading list similar to this template. A Discord account and credentials, and Discord Slash Command connected to n8n. Nodes Webhook node triggers the workflow whenever the Discord Slash command is issued. IF node checks the type returned by Discord. If the type is not equal to 1, it will return true, otherwise false. HTTP Request node makes an HTTP call to the link and gets the HTML of the webpage. HTML Extract node extracts the title from the HTML which we will use in the next node. Notion node adds the link to your Notion reading list. Set nodes set the reply values for Discord and register the Interaction Endpoint URL.
harshil1712
ghagrawal17
HTTP Request node
Merge node
+2

Parse Ycombinator news page

Extract data from a webpage (Ycombinator news page) and create a nice list using itemList node. It seems that current version in n8n (0.141.1) requires to extract each variable one by one. Hopefully in a futute it will be possible to create the table using just one itemList node. Another nice feature of the workflow is an automatically generated file name with the resulting table. Check out the "fileName" option of the Spreadsheet File node: "Ycombinator_news_{{new Date().toISOString().split('T', 1)[0]}}.{{$parameter[\"fileFormat\"]}}" The resulting table is saved as .xls file and delivered via email
eduard
Eduard

Over 3000 companies switch to n8n every single week

Connect HTML Extract with your company’s tech stack and create automation workflows

Last week I automated much of the back office work for a small design studio in less than 8hrs and I am still mind-blown about it.

n8n is a game-changer and should be known by all SMBs and even enterprise companies.

in other news I installed @n8n_io tonight and holy moly it’s good

it’s compatible with EVERYTHING

We're using the @n8n_io cloud for our internal automation tasks since the beta started. It's awesome! Also, support is super fast and always helpful. 🤗

Need help setting up your HTML Extract integration?

Discover our latest community's recommendations and join the discussions about HTML Extract integration.
Dan Burykin
Jace Byers

Implement complex processes faster with n8n

red icon yellow icon red icon yellow icon