Easy Image Captioning with Gemini 1.5 Pro

Published 6 months ago

Created by

Jimleuk

Template description

This n8n workflow demonstrates how to automate image captioning tasks using Gemini 1.5 Pro - a multimodal LLM which can accept and analyse images. This is a really simple example of how easy it is to build and leverage powerful AI models in your repetitive tasks.

How it works

For this demo, we'll import a public image from a popular stock photography website, Pexel.com, into our workflow using the HTTP request node.
With multimodal LLMs, there is little do preprocess other than ensuring the image dimensions fit within the LLMs accepted limits. Though not essential, we'll resize the image using the Edit image node to achieve fast processing.
The image is used as an input to the basic LLM node by defining a "user message" entry with the binary (data) type.
The LLM node has the Gemini 1.5 Pro language model attached and we'll prompt it to generate a caption title and text appropriate for the image it sees.
Once generated, the generated caption text is positioning over the original image to complete the task. We can calculate the positioning relative to the amount of characters produced using the code node.

An example of the combined image and caption can be found here: https://res.cloudinary.com/daglih2g8/image/upload/f_auto,q_auto/v1/n8n-workflows/l5xbb4ze4wyxwwefqmnc

Requirements

Google Gemini API Key.
Access to Google Drive.

Customising the workflow

Not using Google Gemini? n8n's basic LLM node supports the standard syntax for image content for models that support it - try using GPT4o, Claude or LLava (via Ollama).
Google Drive is only used for demonstration purposes. Feel free to swap this out for other triggers such as webhooks to fit your use case.

Share Template

More Product workflow templates

+10

Create a Branded AI-Powered Website Chatbot

Create a Branded AI Website Chatbot Engage website visitors with an intelligent chat widget powered by OpenAI. This template includes: 💬 Natural conversation handling 📅 Microsoft Outlook calendar integration 📝 Lead capture and information gathering 🔄 Human handoff capabilities Simply add a JavaScript snippet to your website and configure the workflow to match your needs. Follow our detailed setup guide to get started in minutes. > Note: Widget includes a "Powered By" affiliate link

Wayne Simpson

Chat with Postgresql Database

Who is this template for? This workflow template is designed for any professionals seeking relevent data from database using natural language. How it works Each time user ask's question using the n8n chat interface, the workflow runs. Then the message is processed by AI Agent using relevent tools - Execute SQL Query, Get DB Schema and Tables List and Get Table Definition, if required. Agent uses these tool to form and run sql query which are necessary to answer the questions. Once AI Agent has the data, it uses it to form answer and returns it to the user. Set up instructions Complete the Set up credentials step when you first open the workflow. You'll need a Postgresql Credentials, and OpenAI api key. Template was created in n8n v1.77.0

KumoHQ

+13

AI Agent To Chat With Files In Supabase Storage

Video Guide I prepared a detailed guide explaining how to set up and implement this scenario, enabling you to chat with your documents stored in Supabase using n8n. Youtube Link Who is this for? This workflow is ideal for researchers, analysts, business owners, or anyone managing a large collection of documents. It's particularly beneficial for those who need quick contextual information retrieval from text-heavy files stored in Supabase, without needing additional services like Google Drive. What problem does this workflow solve? Manually retrieving and analyzing specific information from large document repositories is time-consuming and inefficient. This workflow automates the process by vectorizing documents and enabling AI-powered interactions, making it easy to query and retrieve context-based information from uploaded files. What this workflow does The workflow integrates Supabase with an AI-powered chatbot to process, store, and query text and PDF files. The steps include: Fetching and comparing files to avoid duplicate processing. Handling file downloads and extracting content based on the file type. Converting documents into vectorized data for contextual information retrieval. Storing and querying vectorized data from a Supabase vector store. File Extraction and Processing: Automates handling of multiple file formats (e.g., PDFs, text files), and extracts document content. Vectorized Embeddings Creation: Generates embeddings for processed data to enable AI-driven interactions. Dynamic Data Querying: Allows users to query their document repository conversationally using a chatbot. Setup N8N Workflow Fetch File List from Supabase: Use Supabase to retrieve the stored file list from a specified bucket. Add logic to manage empty folder placeholders returned by Supabase, avoiding incorrect processing. Compare and Filter Files: Aggregate the files retrieved from storage and compare them to the existing list in the Supabase files table. Exclude duplicates and skip placeholder files to ensure only unprocessed files are handled. Handle File Downloads: Download new files using detailed storage configurations for public/private access. Adjust the storage settings and GET requests to match your Supabase setup. File Type Processing: Use a Switch node to target specific file types (e.g., PDFs or text files). Employ relevant tools to process the content: For PDFs, extract embedded content. For text files, directly process the text data. Content Chunking: Break large text data into smaller chunks using the Text Splitter node. Define chunk size (default: 500 tokens) and overlap to retain necessary context across chunks. Vector Embedding Creation: Generate vectorized embeddings for the processed content using OpenAI's embedding tools. Ensure metadata, such as file ID, is included for easy data retrieval. Store Vectorized Data: Save the vectorized information into a dedicated Supabase vector store. Use the default schema and table provided by Supabase for seamless setup. AI Chatbot Integration: Add a chatbot node to handle user input and retrieve relevant document chunks. Use metadata like file ID for targeted queries, especially when multiple documents are involved. Testing Upload sample files to your Supabase bucket. Verify if files are processed and stored successfully in the vector store. Ask simple conversational questions about your documents using the chatbot (e.g., "What does Chapter 1 say about the Roman Empire?"). Test for accuracy and contextual relevance of retrieved results.

Mark Shcherbakov

Build Your First AI Data Analyst Chatbot

Enhance your data analysis by connecting an AI Agent to your dataset, using n8n tools. This template teaches you how to build an AI Data Analyst Chatbot that is capable of pulling data from your sources, using tools like Google Sheets or databases. It's designed to be easy and efficient, making it a good starting point for AI-driven data analysis. You can easily replace the current Google Sheets tools for databases like Postgres or MySQL. How It Works The core of the workflow is the AI Agent. It's connected to different data retrieval tools, to get data from Google Sheets (or your preferred database) in many different ways. Once the data is retrieved, the Calculator tool allows the AI to perform mathematical operations, making your data analysis precise. Who is this template for Data Analysts & Researchers:** Pull data from different sources and perform quick calculations. Developers & AI Enthusiasts:** Learn to build your first AI Agent with easy dataset access. Business Owners:** Streamline your data analysis with AI insights and automate repetitive tasks. Automation Experts:** Enhance your automation skills by integrating AI with your existing databases. How to Set Up You can find detailed instructions in the workflow itself. Check out my other templates 👉 https://n8n.io/creators/solomon/

Solomon

🚀 Boost your customer service with this WhatsApp Business bot!

This n8n workflow demonstrates how to automate customer interactions and appointment management via WhatsApp Business bot. After submitting a Google Form, the user receives a notification via WhatsApp. These notifications are sent via a template message. In case user sends a message to the bot, the text and user data is stored in Google Sheets. To reply back to the user, fill in the ReplyText column and change the Status to 'Ready'. In a few seconds n8n will fetch the unsent replies and deliver them one by one via WhatsApp Business node. Customize this workflow to fit your specific needs, connect different online services and enhance your customer communication! 🎉 Setup Instructions To get this workflow up and running, you'll need to: 👇 Create a WhatsApp template message on the Meta Business portal. Obtain an Access Token and WhatsApp Business Account ID from the Meta Developers Portal. This is needed for the WhatsApp Business Node to send messages. Set up a WhatsApp Trigger node with App ID and App Secret from the Meta Developers Portal. Right after that copy the WhatsApp Trigger URL and add it as a Callback URL in the Meta Developers Portal. This trigger is needed to receive incoming messages and their status updates. Connect your Google Sheets account for data storage and management. Check out the documentation page. ⚠️ Important Notes WhatsApp allows automatic custom text messages only within 24 hours of the last user message. Outside with time frame only approved template messages can be sent. The workflow uses a Google Sheet to manage form submissions, incoming messages and prepare responses. You can replace these nodes and connect the WhatsApp bot with other systems.

Eduard

✨ Vision-Based AI Agent Scraper - with Google Sheets, ScrapingBee, and Gemini

Important Notes: Check Legal Regulations: This workflow involves scraping, so ensure you comply with the legal regulations in your country before getting started. Better safe than sorry! Workflow Description: 😮‍💨 Tired of struggling with XPath, CSS selectors, or DOM specificity when scraping ? This AI-powered solution is here to simplify your workflow! With a vision-based AI Agent, you can extract data effortlessly without worrying about how the DOM is structured. This workflow leverages a vision-based AI Agent, integrated with Google Sheets, ScrapingBee, and the Gemini-1.5-Pro model, to extract structured data from webpages. The AI Agent primarily uses screenshots for data extraction but switches to HTML scraping when necessary, ensuring high accuracy. Key Features: Google Sheets Integration**: Manage URLs to scrape and store structured results. ScrapingBee**: Capture full-page screenshots and retrieve HTML data for fallback extraction. AI-Powered Data Parsing**: Use Gemini-1.5-Pro for vision-based scraping and a Structured Output Parser to format extracted data into JSON. Token Efficiency**: HTML is converted to Markdown to optimize processing costs. This template is designed for e-commerce scraping but can be customized for various use cases.

Dataki

More AI workflow templates

AI agent chat

This workflow employs OpenAI's language models and SerpAPI to create a responsive, intelligent conversational agent. It comes equipped with manual chat triggers and memory buffer capabilities to ensure seamless interactions. To use this template, you need to be on n8n version 1.50.0 or later.

n8n Team

Scrape and summarize webpages with AI

This workflow integrates both web scraping and NLP functionalities. It uses HTML parsing to extract links, HTTP requests to fetch essay content, and AI-based summarization using GPT-4o. It's an excellent example of an end-to-end automated task that is not only efficient but also provides real value by summarizing valuable content. Note that to use this template, you need to be on n8n version 1.50.0 or later.

n8n Team

+10

Building Your First WhatsApp Chatbot

This n8n template builds a simple WhatsApp chabot acting as a Sales Agent. The Agent is backed by a product catalog vector store to better answer user's questions. This template is intended to help introduce n8n users interested in building with WhatsApp. How it works This template is in 2 parts: creating the product catalog vector store and building the WhatsApp AI chatbot. A product brochure is imported via HTTP request node and its text contents extracted. The text contents are then uploaded to the in-memory vector store to build a knowledgebase for the chatbot. A WhatsApp trigger is used to capture messages from customers where non-text messages are filtered out. The customer's message is sent to the AI Agent which queries the product catalogue using the vector store tool. The Agent's response is sent back to the user via the WhatsApp node. How to use Once you've setup and configured your WhatsApp account and credentials First, populate the vector store by clicking the "Test Workflow" button. Next, activate the workflow to enable the WhatsApp chatbot. Message your designated WhatsApp number and you should receive a message from the AI sales agent. Tweak datasource and behaviour as required. Requirements WhatsApp Business Account OpenAI for LLM Customising this workflow Upgrade the vector store to Qdrant for persistance and production use-cases. Handle different WhatsApp message types for a more rich and engaging experience for customers.

Jimleuk

AI agent that can scrape webpages

⚙️🛠️🚀🤖🦾 This template is a PoC of a ReAct AI Agent capable of fetching random pages (not only Wikipedia or Google search results). On the top part there's a manual chat node connected to a LangChain ReAct Agent. The agent has access to a workflow tool for getting page content. The page content extraction starts with converting query parameters into a JSON object. There are 3 pre-defined parameters: url** – an address of the page to fetch method** = full / simplified maxlimit** - maximum length for the final page. For longer pages an error message is returned back to the agent Page content fetching is a multistep process: An HTTP Request mode tries to get the page content. If the page content was successfuly retrieved, a series of post-processing begin: Extract HTML BODY; content Remove all unnecessary tags to recude the page size Further eliminate external URLs and IMG scr values (based on the method query parameter) Remaining HTML is converted to Markdown, thus recuding the page lengh even more while preserving the basic page structure The remaining content is sent back to an Agent if it's not too long (maxlimit = 70000 by default, see CONFIG node). NB: You can isolate the HTTP Request part into a separate workflow. Check the Workflow Tool description, it guides the agent to provide a query string with several parameters instead of a JSON object. Please reach out to Eduard is you need further assistance with you n8n workflows and automations! Note that to use this template, you need to be on n8n version 1.19.4 or later.

Eduard

Telegram AI Chatbot

The workflow starts by listening for messages from Telegram users. The message is then processed, and based on its content, different actions are taken. If it's a regular chat message, the workflow generates a response using the OpenAI API and sends it back to the user. If it's a command to create an image, the workflow generates an image using the OpenAI API and sends the image to the user. If the command is unsupported, an error message is sent. Throughout the workflow, there are additional nodes for displaying notes and simulating typing actions.

Eduard

Ask questions about a PDF using AI

The workflow first populates a Pinecone index with vectors from a Bitcoin whitepaper. Then, it waits for a manual chat message. When received, the chat message is turned into a vector and compared to the vectors in Pinecone. The most similar vectors are retrieved and passed to OpenAI for generating a chat response. Note that to use this template, you need to be on n8n version 1.19.4 or later.

David Roberts

More Marketing workflow templates

Scrape business emails from Google Maps without the use of any third party APIs

Who is this template for? This workflow template is designed for sales, marketing, and business development professionals who want a cost-effective and efficient way to generate leads. By leveraging n8n core nodes, it scrapes business emails from Google Maps without relying on third-party APIs or paid services, ensuring there are no additional costs involved. Ideal for small business owners, freelancers, and agencies, this template automates the process of collecting contact information for targeted outreach, making it a powerful tool for anyone looking to scale their lead generation efforts without incurring extra expenses. You can watch the video tutorial here: https://youtu.be/HaiO-UeiKBA How it works This template streamlines email scraping from Google Maps using only n8n core nodes, ensuring a completely free and self-contained solution. Here’s how it operates: Input Queries You provide a list of queries, each consisting of keywords related to the type of business you want to target and the specific region or subregion you’re interested in. Iterates through Queries The workflow processes each query one at a time. For each query, it triggers a sub-workflow dedicated to handling the scraping tasks. Scrapes Google Maps for URLs Using these queries, the workflow scrapes Google Maps to collect URLs of business listings matching the provided criteria. Fetches HTML Content The workflow then fetches the HTML pages of the collected URLs for further processing. Extracts Emails Using a Code Node with custom JavaScript, the workflow runs regular expressions on the HTML content to extract business email addresses. Setup Add Queries: Open the first node, "Run Workflow" and input a list of queries, each containing the business keywords and the target region. Configure the Google Sheets Node: Open the Google Sheets node and select a document and specific sheet where the scraped results will be saved. Run the workflow: Click on "Test workflow" and watch your Google Sheets document gradually receive business email addresses. Customize as Needed: You can adjust the regular expressions in the Code Node to refine the email extraction logic or add logic to extract other kinds of information.

Akram Kadri

Automated Web Scraping: email a CSV, save to Google Sheets & Microsoft Excel

How it works: The workflow starts by sending a request to a website to retrieve its HTML content. It then parses the HTML extracting the relevant information The extracted data is storted and converted into a CSV file. The CSV file is attached to an email and sent to your specified address. The data is simultaneously saved to both Google Sheets and Microsoft Excel for further analysis or use. Set-up steps: Change the website to scrape in the "Fetch website content" node Configure Microsoft Azure credentials with Microsoft Graph permissions (required for the Save to Microsoft Excel 365 node) Configure Google Cloud credentials with access to Google Drive, Google Sheets and Gmail APIs (the latter is required for the Send CSV via e-mail node).

Mihai Farcas

AI-Powered Social Media Content Generator & Publisher

AI-Powered Social Media Content Generator & Publisher 🚀 This AI-driven n8n workflow automates social media content creation and publishing across LinkedIn, Instagram, Facebook, and Twitter (X). It generates engaging, platform-optimized posts using Google Gemini AI, based on user inputs such as a post title, keywords, and an uploaded image. The workflow ensures seamless content generation and publishing, making it a perfect tool for marketers, business owners, influencers, and content creators. 🌟 Features & Benefits ✅ AI-Generated Social Media Posts – Uses Google Gemini AI to create high-quality, optimized content. ✅ Multi-Platform Support – Automatically generates posts for LinkedIn, Instagram, Facebook, and Twitter (X). ✅ Hashtag & SEO Optimization – Includes trending hashtags to enhance visibility and engagement. ✅ Image Upload & Processing – Allows image uploads for Instagram and Facebook using imgbb and Facebook Graph API. ✅ Automated Publishing – Posts are automatically published on all selected platforms. ✅ Custom Call-to-Action – Each platform's post is optimized with CTAs for better engagement. ✅ User-Friendly Form Submission – Easy-to-use form where users can enter post titles, keywords, links, and images. ✅ Performance Tracking – Provides confirmation and tracking links for published posts. 📌 How It Works 1️⃣ User Submission Form Fill out the form with Post Title, Keywords, and an Optional Link. Upload an image for Instagram & Facebook posts. 2️⃣ AI Content Generation Google Gemini AI generates optimized content for each platform. The AI ensures professional, engaging, and audience-specific content. 3️⃣ Content Review Users review and approve the AI-generated content before publishing. 4️⃣ Automated Publishing Posts are automatically published on LinkedIn, Facebook, Instagram, and Twitter (X). Uses Facebook Graph API, LinkedIn API, Twitter API, and Instagram API. 5️⃣ Post Confirmation & Tracking Get links to track published posts on each platform. 🛠️ Prerequisites Before using this workflow, ensure you have: ✅ n8n Instance (Cloud or Self-Hosted) ✅ Social Media API Credentials (Facebook, Instagram, LinkedIn, Twitter API) ✅ Google Gemini AI API Key ✅ imgbb API Key (for image hosting) Buy My Book: Mastering n8n on Amazon Full Courses & Tutorials: http://lms.syncbricks.com 📺 YouTube Video Tutorial 🎥 Watch the step-by-step tutorial on how to set up and use this n8n workflow template: 🔗 YouTube Tutorial - AI-Powered Social Media Posting in n8n 🎯 Use Cases 📌 Marketing Agencies – Automate client content scheduling. 📌 Businesses & Brands – Maintain a consistent brand presence on social media. 📌 Content Creators & Influencers – Generate high-quality posts quickly. 📌 E-commerce & Startups – Promote products and services effortlessly. 📌 Corporate & Enterprise Teams – Streamline internal and external communications. 👨‍💻 Creator Information 👤 Developed by: Amjid Ali 🌐 Website: SyncBricks 📧 Email: info@syncbricks.com 💼 LinkedIn: Amjid Ali 📺 YouTube: SyncBricks 💡 Support & Contributions If you find this workflow helpful, consider supporting my work: 👉 Donate via PayPal For full courses on * AI Automation*, visit: 📚 SyncBricks LMS 📚 AI and Auotmation Course 👉 Get Started with N8N

Amjid Ali

OpenAI GPT-3: Company Enrichment from website content

Enrich your company lists with OpenAI GPT-3 ↓ You’ll get valuable information such as: Market (B2B or B2C) Industry Target Audience Value Proposition This will help you to: add more personalization to your outreach make informed decisions about which accounts to target I've made the process easy with an n8n workflow. Here is what it does: Retrieve website URLs from Google Sheets Extract the content for each website Analyze it with GPT-3 Update Google Sheets with GPT-3 data

Lucas Perret

⚡AI-Powered YouTube Video Summarization & Analysis

-- Disclaimer: This workflow uses a community node and therefore only works for self-hosted n8n users -- Transform YouTube videos into comprehensive summaries and structured analysis instantly. This n8n workflow automatically extracts, processes, and analyzes video transcripts to deliver clear, organized insights without watching the entire video. Time-Saving Features 🚀 Instant Processing Simply provide a YouTube URL and receive a structured summary within seconds, eliminating the need to watch lengthy videos. Perfect for research, learning, or content analysis. 🤖 AI-Powered Analysis Leverages GPT-4o-mini to analyze video transcripts, organizing key concepts and insights into a clear, hierarchical structure with main topics and essential points. Smart Processing Pipeline 📝 Automated Transcript Extraction Supports public YouTube video Handles multiple URL formats Extracts complete video transcripts automatically 🧠 Intelligent Content Organization Breaks down content into main topics Highlights key concepts and terminology Maintains technical accuracy while improving clarity Structures information logically with markdown formatting Perfect For 📚 Researchers & Students Quick comprehension of educational content and lectures without watching entire videos. 💼 Business Professionals Efficient analysis of industry talks, presentations, and training materials. 🎯 Content Creators Rapid research and competitive analysis of video content in your niche. Technical Implementation 🔄 Workflow Components Webhook endpoint for URL submission YouTube API integration for video details Transcript extraction system GPT-4 powered analysis engine Telegram notification system (optional) Transform your video content consumption with an intelligent system that delivers structured, comprehensive summaries while saving hours of viewing time.

Joseph LePage

Flux AI Image Generator

Easily generate images with Black Forest's Flux Text-to-Image AI models using Hugging Face’s Inference API. This template serves a webform where you can enter prompts and select predefined visual styles that are customizable with no-code. The workflow integrates seamlessly with Hugging Face's free tier, and it’s easy to modify for any Text-to-Image model that supports API access. Try it Curious what this template does? Try a public version here: https://devrel.app.n8n.cloud/form/flux Set Up Watch this quick set up video 👇 Accounts required Huggingface.co account (free) Cloudflare.com account (free - used for storage; but can be swapped easily e.g. GDrive) Key Features: Text-to-Image Creation**: Generates unique visuals based on your prompt and style. Hugging Face Integration**: Utilizes Hugging Face’s Inference API for reliable image generation. Customizable Visual Styles**: Select from preset styles or easily add your own. Adaptable**: Swap in any Hugging Face Text-to-Image model that supports API calls. Ideal for: Creators**: Rapidly create visuals for projects. Marketers**: Prototype campaign visuals. Developers**: Test different AI image models effortlessly. How It Works: You submit an image prompt via the webform and select a visual style, which appends style instructions to your prompt. The Hugging Face Inference API then generates and returns the image, which gets hosted on Cloudflare S3. The workflow can be easily adjusted to use other models and styles for complete flexibility.

Max Tkacz

Implement complex processes faster with n8n

Get started

Easy Image Captioning with Gemini 1.5 Pro

Created by

Categories

Template description

How it works

Requirements

Customising the workflow

Share Template

More Product workflow templates

Create a Branded AI-Powered Website Chatbot

Chat with Postgresql Database

AI Agent To Chat With Files In Supabase Storage

Build Your First AI Data Analyst Chatbot

🚀 Boost your customer service with this WhatsApp Business bot!

✨ Vision-Based AI Agent Scraper - with Google Sheets, ScrapingBee, and Gemini

More AI workflow templates

AI agent chat

Scrape and summarize webpages with AI

Building Your First WhatsApp Chatbot

AI agent that can scrape webpages

Telegram AI Chatbot

Ask questions about a PDF using AI

More Marketing workflow templates

Scrape business emails from Google Maps without the use of any third party APIs

Automated Web Scraping: email a CSV, save to Google Sheets & Microsoft Excel

AI-Powered Social Media Content Generator & Publisher

OpenAI GPT-3: Company Enrichment from website content

⚡AI-Powered YouTube Video Summarization & Analysis

Flux AI Image Generator