Knowledge Base (RAG)

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that makes AI replies smarter by giving the AI access to your actual business information before it writes a response. Here is the simple version:

  1. You upload your company documents, FAQs, product guides, and other reference material to MailTrixy.
  2. When an email comes in, MailTrixy searches those documents for the most relevant information.
  3. The relevant snippets are included in the AI prompt alongside the email, so the AI can write a reply grounded in real facts — not guesses.

Without RAG, the AI only has generic knowledge. With RAG, it knows your products, your policies, your pricing, and your terminology. The result is dramatically more accurate and useful replies.

Supported File Types

You can upload the following file types to your Knowledge Base:

Format Extension Notes
PDF.pdfText-based PDFs work best. Scanned image PDFs require OCR (not included).
Word Document.docxModern Word format. Legacy .doc files should be converted first.
Plain Text.txtSimple text files. Great for pasting raw content.
CSV.csvSpreadsheet data. Each row becomes a searchable entry.
Excel.xlsxFirst sheet is processed by default. Multi-sheet support available.
Markdown.mdIdeal for structured documentation. Headings are preserved as context.

Maximum file size: 25 MB per file. There is no limit on the number of files you can upload (subject to your plan's storage quota).

Website Scraping

Instead of manually uploading documents, you can point MailTrixy at your website and let it automatically extract content from your pages.

  1. Go to Knowledge Base → Add Source → Website.
  2. Enter your website URL (e.g., https://yourcompany.com).
  3. Choose the crawl depth (how many levels of links to follow).
  4. Click Start Scraping.

MailTrixy will crawl the pages, extract the text content (ignoring navigation, footers, and boilerplate), and add it to your Knowledge Base. You can exclude specific URL patterns (e.g., /blog/*) if needed.

Tip: Re-scrape periodically to keep your Knowledge Base in sync with website changes. You can schedule automatic re-scraping on a weekly or monthly basis.

Q&A Pairs

For precise control over AI responses to common questions, you can add manual Q&A pairs. These are question-answer combinations that the AI will prioritize when a matching question is detected.

  1. Go to Knowledge Base → Q&A Pairs.
  2. Click Add Pair.
  3. Enter the question (or multiple variations of the same question).
  4. Enter the exact answer you want the AI to use.

Q&A pairs take priority over document-based context. If a customer asks "What is your refund policy?" and you have a Q&A pair for that exact question, the AI will use your specified answer verbatim rather than searching through uploaded documents.

How Chunking and Embeddings Work

When you upload a document, MailTrixy does not store it as one giant block of text. Instead, it goes through two steps:

Chunking

The document is split into smaller pieces called chunks. Each chunk is typically 500–1000 tokens (roughly a paragraph or two). The system overlaps chunks slightly so that no information is lost at the boundaries. Think of it like cutting a book into index cards, where each card contains a self-contained piece of information.

Embeddings

Each chunk is converted into a numerical representation called an embedding — a long list of numbers that captures the meaning of the text. When an email arrives, the email is also converted into an embedding, and the system finds the chunks whose embeddings are most similar. This is how the AI "searches" your knowledge base: not by keyword matching, but by meaning matching.

For example, if a customer asks about "returning a product" and your document mentions "refund process", the embeddings will recognize these are related concepts even though the exact words differ.

Pinecone Setup (Optional)

Pinecone is a cloud-hosted vector database that stores and searches embeddings at scale. If you have a large Knowledge Base (thousands of documents), Pinecone provides faster and more accurate search results compared to the built-in MySQL fallback.

  1. Create a free account at pinecone.io.
  2. Create a new index with dimension 1536 (for OpenAI embeddings) or 1024 (for other providers). Use cosine similarity metric.
  3. Copy your API Key and Environment values.
  4. In MailTrixy, go to Settings → AI Configuration → Vector Database.
  5. Select Pinecone, paste your API key and environment, and enter the index name.
  6. Click Test Connection to verify.
Note: Pinecone is optional. If you do not configure it, MailTrixy uses MySQL for vector storage, which works perfectly fine for most use cases (up to a few hundred documents).

MySQL Fallback

By default, MailTrixy stores embeddings directly in your MySQL database. This requires no additional setup and works out of the box. The trade-offs compared to Pinecone:

  • Pros: Zero additional cost, no external service dependency, simpler setup.
  • Cons: Slower search at very large scale (10,000+ chunks), higher database storage usage.

For most businesses with up to a few hundred documents, MySQL is more than sufficient. Consider switching to Pinecone only if you notice slow AI reply times with a very large Knowledge Base.

Testing Knowledge Base Quality

After uploading your documents, you should test whether the AI can find and use the right information. MailTrixy provides a built-in testing tool:

  1. Go to Knowledge Base → Test.
  2. Type a sample customer question in the test input.
  3. Click Search to see which chunks are retrieved.
  4. Click Generate Reply to see the full AI response using those chunks.

If the retrieved chunks are irrelevant, consider:

  • Re-uploading the document with clearer headings and structure.
  • Adding Q&A pairs for common questions.
  • Breaking large documents into smaller, topic-focused files.
  • Adjusting the chunk size in advanced settings (smaller chunks = more precise matching, larger chunks = more context per match).
Last updated 25/03/2026