HTTP Request node
Code node

Vector Database as a Big Data Analysis Tool for AI Agents [3/3 - anomaly]

Published 27 days ago

Created by

mrscoopers
Jenny

Categories

Template description

Vector Database as a Big Data Analysis Tool for AI Agents

Workflows from the webinar "Build production-ready AI Agents with Qdrant and n8n".

This series of workflows shows how to build big data analysis tools for production-ready AI agents with the help of vector databases. These pipelines are adaptable to any dataset of images, hence, many production use cases.

  1. Uploading (image) datasets to Qdrant
  2. Set up meta-variables for anomaly detection in Qdrant
  3. Anomaly detection tool
  4. KNN classifier tool

For anomaly detection

  1. The first pipeline to upload an image dataset to Qdrant.
  2. The second pipeline is to set up cluster (class) centres & cluster (class) threshold scores needed for anomaly detection.
    3. This is the third pipeline --- the anomaly detection tool, which takes any image as input and uses all preparatory work done with Qdrant to detect if it's an anomaly to the uploaded dataset.

For KNN (k nearest neighbours) classification

  1. The first pipeline to upload an image dataset to Qdrant.
  2. The second is the KNN classifier tool, which takes any image as input and classifies it on the uploaded to Qdrant dataset.

To recreate both

You'll have to upload crops and lands datasets from Kaggle to your own Google Storage bucket, and re-create APIs/connections to Qdrant Cloud (you can use Free Tier cluster), Voyage AI API & Google Cloud Storage.

[This workflow] Anomaly Detection Tool

This is the tool that can be used directly for anomalous images (crops) detection.
It takes as input (any) image URL and returns a text message telling if whatever this image depicts is anomalous to the crop dataset stored in Qdrant.

  • An Image URL is received via the Execute Workflow Trigger, which is used to generate embedding vectors using the Voyage AI Embeddings API.
  • The returned vectors are used to query the Qdrant collection to determine if the given crop is known by comparing it to threshold scores of each image class (crop type).
  • If the image scores lower than all thresholds, then the image is considered an anomaly for the dataset.

Share Template

More AI workflow templates

OpenAI Chat Model node
SerpApi (Google Search) node

AI agent chat

This workflow employs OpenAI's language models and SerpAPI to create a responsive, intelligent conversational agent. It comes equipped with manual chat triggers and memory buffer capabilities to ensure seamless interactions. To use this template, you need to be on n8n version 1.50.0 or later.
n8n-team
n8n Team
HTTP Request node
Merge node
+7

Scrape and summarize webpages with AI

This workflow integrates both web scraping and NLP functionalities. It uses HTML parsing to extract links, HTTP requests to fetch essay content, and AI-based summarization using GPT-4o. It's an excellent example of an end-to-end automated task that is not only efficient but also provides real value by summarizing valuable content. Note that to use this template, you need to be on n8n version 1.50.0 or later.
n8n-team
n8n Team
HTTP Request node
Markdown node
+5

AI agent that can scrape webpages

⚙️🛠️🚀🤖🦾 This template is a PoC of a ReAct AI Agent capable of fetching random pages (not only Wikipedia or Google search results). On the top part there's a manual chat node connected to a LangChain ReAct Agent. The agent has access to a workflow tool for getting page content. The page content extraction starts with converting query parameters into a JSON object. There are 3 pre-defined parameters: url** – an address of the page to fetch method** = full / simplified maxlimit** - maximum length for the final page. For longer pages an error message is returned back to the agent Page content fetching is a multistep process: An HTTP Request mode tries to get the page content. If the page content was successfuly retrieved, a series of post-processing begin: Extract HTML BODY; content Remove all unnecessary tags to recude the page size Further eliminate external URLs and IMG scr values (based on the method query parameter) Remaining HTML is converted to Markdown, thus recuding the page lengh even more while preserving the basic page structure The remaining content is sent back to an Agent if it's not too long (maxlimit = 70000 by default, see CONFIG node). NB: You can isolate the HTTP Request part into a separate workflow. Check the Workflow Tool description, it guides the agent to provide a query string with several parameters instead of a JSON object. Please reach out to Eduard is you need further assistance with you n8n workflows and automations! Note that to use this template, you need to be on n8n version 1.19.4 or later.
eduard
Eduard
Merge node
Telegram node
Telegram Trigger node
+2

Telegram AI Chatbot

The workflow starts by listening for messages from Telegram users. The message is then processed, and based on its content, different actions are taken. If it's a regular chat message, the workflow generates a response using the OpenAI API and sends it back to the user. If it's a command to create an image, the workflow generates an image using the OpenAI API and sends the image to the user. If the command is unsupported, an error message is sent. Throughout the workflow, there are additional nodes for displaying notes and simulating typing actions.
eduard
Eduard
Google Sheets node
HTTP Request node
Merge node
+4

OpenAI GPT-3: Company Enrichment from website content

Enrich your company lists with OpenAI GPT-3 ↓ You’ll get valuable information such as: Market (B2B or B2C) Industry Target Audience Value Proposition This will help you to: add more personalization to your outreach make informed decisions about which accounts to target I've made the process easy with an n8n workflow. Here is what it does: Retrieve website URLs from Google Sheets Extract the content for each website Analyze it with GPT-3 Update Google Sheets with GPT-3 data
lempire
Lucas Perret
Google Drive node
Binary Input Loader node
Embeddings OpenAI node
OpenAI Chat Model node
+5

Ask questions about a PDF using AI

The workflow first populates a Pinecone index with vectors from a Bitcoin whitepaper. Then, it waits for a manual chat message. When received, the chat message is turned into a vector and compared to the vectors in Pinecone. The most similar vectors are retrieved and passed to OpenAI for generating a chat response. Note that to use this template, you need to be on n8n version 1.19.4 or later.
davidn8n
David Roberts

More SecOps workflow templates

HTTP Request node
Merge node
Slack node
+4

Phishing Analysis - URLScan.io and VirusTotal

This n8n workflow automates the analysis of email messages received in a Microsoft Outlook inbox to identify indicators of compromise (IOCs), specifically suspicious URLs. It can be triggered manually or scheduled to run daily at midnight. The workflow begins by retrieving up to 100 read email messages from the Outlook inbox. However, there seems to be a configuration issue as it should retrieve unread messages, not read ones. It then marks these messages as read to avoid processing them again in the future. The messages are then split into individual items using the Split In Batches node for sequential processing. For each email, the workflow analyzes its content to find URLs, which are considered potential IOCs. If URLs are found, the workflow proceeds to check these URLs for potential threats using two services, URLScan.io and VirusTotal, in parallel. In the first path, URLScan.io scans each URL, and if there are no errors, the results from URLScan.io and VirusTotal are merged. If there are errors, the workflow waits 1 minute before attempting to retrieve the URLScan results again. The loop then continues for the next email. In the second path, VirusTotal is used to scan the URLs, and the results are retrieved. Finally, the workflow checks if the data field is not empty, filtering out items where no data was found. It then sends a summarized Slack message to report details about the analyzed email, including the subject, sender, date, URLScan report URL, and VirusTotal verdict for URLs that were reported as malicious. Potential issues during setup include configuring the Outlook node to retrieve unread messages, resolving a configuration issue in the VirusTotal node, and handling authentication and API keys for both URLScan.io and VirusTotal nodes. Additionally, proper error handling and testing with various email content types and URLs are essential to ensure the workflow accurately identifies IOCs and reports them to the Slack channel.
n8n-team
n8n Team
Google Drive node
+4

Automate Image Validation Tasks using AI Vision

This n8n workflow shows how using multimodal LLMs with AI vision can tackle tricky image validation tasks which are near impossible to achieve with code and often impractical to be done by humans at scale. You may need image validation when users submitted photos or images are required to meet certain criteria before being accepted. A wine review website may require users only submit photos of wine with labels, a bank may require account holders to submit scanned documents for verification etc. In this demonstration, our scenario will be to analyse a set of portraits to verify if they meet the criteria for valid passport photos according to the UK government website (https://www.gov.uk/photos-for-passports). How it works Our set of portaits are jpg files downloaded from our Google Drive using the Google Drive node. Each image is resized using the Edit Image node to ensure a balance between resolution and processing speed. Using the Basic LLM node, we'll define a "user message" option with the type of binary (data). This will allow us to pass our portrait to the LLM as an input. With our prompt containing the criteria pulled off the passport photo requirements webpage, the LLM is able to validate the photo does or doesn't meet its criteria. A structured output parser is used to structure the LLM's response to a JSON object which has the "is_valid" boolean property. This can be useful to further extend the workflow. Requirements Google Gemini API key Google Drive account Customising this workflow Not using Gemini? n8n's LLM node works with any compatible multimodal LLM so feel free to swap Gemini out for OpenAI's GPT4o or Antrophic's Claude Sonnet. Don't need to validate portraits? Try other use cases such as document classification, security footage analysis, people tagging in photos and more.
jimleuk
Jimleuk
Cortex node
TheHive node

Analyze emails with S1EM

With workflow, you analyze Email with TheHive/Cortex https://github.com/V1D1AN/S1EM/wiki/Soar-guide
v1d1an
v1d1an
HTTP Request node
Merge node
Slack node
+7

URL and IP lookups through Greynoise and VirusTotal

This n8n workflow serves as a powerful cybersecurity and threat intelligence tool to look up URLs or IP addresses through industry standard threat intelligence vendors. It starts with either a form submission or a webhook trigger, allowing users to input data, URLs or IPs that require analysis. The workflow then splits into two paths depending on whether the input data is an IP or URL. If an IP was given, it sets the ip variable to the IP; however if a URL was given the workflow will perform a DNS lookup using Google Public DNS and sets the ip variable based on the results from Google. The workflow then checks the obtained IP addresses against GreyNoise services, with one branch utilizing GreyNoise RIOT IP Lookup to assess IP reputation and association with known benign services, and the other using GreyNoise IP Context to evaluate potential threats. The results from both GreyNoise services are merged to create a comprehensive analysis which includes the IP, classification (benign, malicious, or unknown), IP location, tags to identify activity or malware, category, and trust level. In parallel, a VirusTotal scan is initiated for the URL/IP to identify if it is malicious. A 5-second wait ensures proper processing, and the workflow subsequently polls the scan result to determine when the analysis is complete. The workflow then summarizes the analysis including the overall security vendor analysis results, blockList analysis, OpenPhish analysis, the URL, and the IP. Finally, the workflow combines the summarized intelligence from both GreyNoise and VirusTotal to provide a thorough analysis of the URL/IP. This summarized intelligence can then be emailed to the user that filled out the form via Gmail or it can be sent to the user via a Slack message. Setting up this workflow may require proper configuration of the form submission or webhook trigger, and ensuring that the GreyNoise and VirusTotal API credentials are correctly integrated. Users should also consider the potential volume of data and API rate limits, as excessive requests could lead to issues. Proper documentation and validation of input data are crucial to ensure accurate and meaningful results in the final report.
n8n-team
n8n Team
HTTP Request node
Webhook node
Respond to Webhook node
+3

Authenticate a user in a workflow with openid connect

Intro This workflow needs a user to authenticate by using an openid connect provider in order to call the webhook. If the user is not authenticated, it starts a login process by using an Authorization Code with PKCE https://datatracker.ietf.org/doc/html/rfc7636, a standard way to authenticate users with openid connect. Then, after the user logs in, the webhook is refreshed and gets the user's token from a cookie. With this token, all details about the user are requested through the userinfo endpoint on the identity provider. How to set up with Keycloak Keycloak Keycloak is an open source identity and access management solution. Feel free to get a demo realm at https://please-open.it or get your own Keycloak server up and running. After creating a realm, go to "Realm Settings" and click on "OpenID Endpoint Configuration" Retrieve authorization_endpoint, token_endpoint and userinfo_endpoint values. Set those variables in the "Set variables" node. In Keycloak, create a new client (name it as you want) Disable the client authentication, check only "standard flow" : At the third step, put the webhook url in "valid redirect URIs", fill "Web origins" with a "+". You're done, open the webhook and it asks you to authenticate. Usage User informations The userinfo node returns this structure about the user has logged in : [ { "sub":"73a6543f-f420-4fa6-9811-209e903c348b", "email_verified":true, "preferred_username": "mathieu.passenaud@please-open.it", "email": "mathieu.passenaud@please-open.it" } ] I can use those infos in my workflow for custom operations. APIs calls the "code" node returns me a cookie named "n8n-custom-auth" which is the access_token returned by the identity provider. This access_token can be used to call APIs connected to this identity provider (for example, we call userinfo API with this token). Example : asks a user to log in with his Google account then call an API (Gmail, drive...) with his own token. How it works We published a blog post about this flow, how it works and how you can use it : https://blog.please-open.it/n8n-openid-client/
please-open-it
please-open.it
HTTP Request node
Merge node
Webhook node
+3

Analyze Email Headers for IPs and Spoofing

This n8n workflow is designed to analyze email headers received via a webhook. The workflow splits into two main paths based on the presence of the received and authentication results headers. In the first path, if received headers are present, the workflow extracts IP addresses from these headers and then queries the IP Quality Score API to gather information about the IP addresses, including fraud score, abuse history, organization, and more. Geolocation data is also obtained from the IP-API API. The workflow collects and aggregates this information for each IP address. In the second path, if authentication-results headers are present, the workflow extracts SPF, DKIM, and DMARC authentication results. It then evaluates these results and sets fields accordingly (e.g., SPF pass/fail/neutral). The paths merge their results, and the workflow responds to the original webhook with the aggregated analysis, including IP information and authentication results. Potential issues during setup include ensuring proper configuration of the webhook calls with header authentication, handling authentication and API keys for the IP Quality Score API, and addressing any discrepancies or errors in the logic nodes, such as handling SPF, DKIM, and DMARC results correctly. Additionally, thorough testing with various email header formats is essential to ensure accurate analysis and response.
n8n-team
n8n Team

Implement complex processes faster with n8n

red icon yellow icon red icon yellow icon