This n8n workflows builds another example of creating a knowledgebase assistant but demonstrates how a more deliberate and targeted approach to ingesting the data can produce much better results for your chatbot.
In this example, a government tax code policy document is used. Whilst we could split the document into chunks by content length, we often lose the context of chapters and sections which may be required by the user.
Our approach then is to first split the document into chapters and sections before importing into our vector store. Additionally, using metadata correctly is key to allow filtering and scoped queries.
Human: "Tell me about what the tax code says about cargo for intentional commerce?"
AI: "Section 11.25 of the Texas Property Tax Code pertains to "MARINE CARGO CONTAINERS USED EXCLUSIVELY IN INTERNATIONAL COMMERCE." In this section, a person who is a citizen of a foreign country or an en..."
Depending on your use-case, consider returning actual PDF pages (or links) to the user for the extra confirmation and to build trust.
Not using Mistral? You are able to replace but note to match the distance and dimension size of Qdrant collection to your chosen embedding model.
Implement complex processes faster with n8n