0% Security Worries! Build a Free AI Chatbot with Dify RAG

Terin

Blog Manger

Terin

Blog Manger

Title

0% Security Worries! Build a Free 'Custom Company AI Chatbot' (Dify RAG Guide)


Introduction

Hello, creators! Welcome to a2set's AI Tutorial.

AI is incredibly smart these days, isn't it? However, it has two fatal flaws. First, it doesn't know your 'private information', such as your company's refund policy or the latest posts on your blog. Second, it suffers from 'Hallucination', sometimes making up plausible answers for things it doesn't actually know. Moreover, uploading highly sensitive company documents to AI can trigger security anxiety regarding data leaks.

To perfectly solve these issues, the hottest technology currently taking the global IT industry by storm is 'RAG (Retrieval-Augmented Generation)'.


Today, without any difficult Python coding, we will use the open-source AI platform 'Dify' to build a flawless RAG system using just drag-and-drop. By the end of this tutorial, you will be able to attach a 'perfect custom chatbot' to your website that finds answers exclusively within the hundreds of PDFs or Notion pages you've uploaded.

It sounds a bit DEEP, but once you understand the principles, you'll enter the world of infinitely applicable, practical AI. Shall we dive in?


๐Ÿง  Deep Dive: What exactly is 'RAG'?

Before we start building, let's understand the concept of RAG (Retrieval-Augmented Generation), the hottest keyword right now, in just one minute.


  • Standard ChatGPT (Closed Book Exam):
    This is like taking a test blindfolded, relying solely on memorization. Since it only remembers the past data it was trained on, it gives irrelevant answers when asked about the latest information or internal company rules.


  • RAG AI (Open Book Exam):
    This is an 'Open Book Test'. When a question comes in, before the AI answers, it first 'Retrieves' information from the PDF or document database you uploaded in advance. Then, based on the exact information (cheat sheet) it found, it 'Generates' the sentences. It literally becomes incapable of lying!


Dify is the magical tool that lets you create this complex open-book test system with a single click.


Step 1: Setting up the Dify Workspace & Connecting the Brain (API)

While you can install Dify on your local computer, we will use the cloud version, which is 100% free to use.



  1. Open your web browser, go to Dify.ai, click [Get Started], and log in for free using your GitHub or Google account.

  2. We need to set up the base 'Brain (LLM)' this chatbot will use to write. Click your profile in the top right corner and go to [Settings] -> [Model Provider].

  3. If you have a paid API key for OpenAI (GPT-4o) or Anthropic, enter it here. If you don't have a paid API, simply select one of the free open-source models or models that provide free credits offered by Dify.


Step 2: Building Your Own Library (Mastering Chunking & Index Settings) ๐Ÿ”

This is the core process where you upload your 'custom data' for the chatbot to reference during its open-book test, and refine it so the AI can read it easily.



  1. From the top menu, click the [Knowledge] tab and press [Create Knowledge].

  2. Upload the materials (PDFs, Text files) you want the chatbot to learn.

    • [Practice Copy-Paste Data] If you don't have a document ready right now, copy the virtual refund policy below, save it as a Refund_Policy.txt file, and upload it!


[Virtual Mall 'BlueMarket' Refund Policy]


  1. Refund Period: Refunds are only possible if registered with customer service within 7 days of receiving the product.

  2. Non-Refundable Reasons: Refunds are strictly prohibited for damaged packaging due to a change of mind, signs of use, fresh food, and clearance sale items.

  3. Return Shipping Fee: Fully covered by us in case of product defects. For a simple change of mind, the customer must bear the round-trip shipping cost (6,000 KRW).

  4. Processing Time: Automatic refunds to the original payment method will be processed within 3-5 business days after the returned product arrives and inspection is completed.


  1. Click [Next], and a detailed settings window will appear asking how you want to process the data. Let's select Custom and set it up like a pro!


โš™๏ธ 1. Chunk Settings

If you give an AI a 100-page book all at once, it gets confused. This is the process of chopping the document into smaller paragraph units (Chunks).



  • Maximum chunk length (Default 1024): The maximum length of a text snippet the AI reads at one time. Anywhere between 1024 and 2000 is safe.

  • Chunk overlap (Default 50): If you cut snippets too sharply, the context between them can break. Therefore, this acts as a 'context glue' by overlapping the last 50 characters of the previous paragraph with the beginning of the next one.


๐Ÿ’ฐ 2. Index Method - ๐Ÿšจ Crucial for Free Users!

This determines how the chopped text is stored in the library.
If you haven't linked a paid embedding API (like OpenAI), you must pay attention here!


  • Economical (For Free Users): Extracts and stores only 10 key keywords from the document. If you don't have a paid API linked, you MUST select this option to avoid errors! The API token cost is $0.

  • High Quality (For Paid APIs): Uses a paid embedding API like text-embedding-3-large to store the 'meaning itself' of the sentences as mathematical coordinates (Vectors). The accuracy is overwhelmingly higher, but a small API cost applies.


๐Ÿ”Ž 3. Retrieval Setting

This setting chooses how the library finds the correct answer when a user asks a question. (Some features are limited if Economical is selected.)


  • Full-Text Search: If the user types "Refund," it only fetches paragraphs containing the exact word "Refund." (Mainly used with the Economical setting.)

  • Vector Search: Understands the context. Even if you search "Give me my money back," it smartly finds the "Refund Policy" paragraph with a similar meaning. (Requires a High Quality index.)

  • Hybrid Search (Recommended): The most perfect method that finds both keywords and contextual meaning simultaneously. If you connect a paid API later, definitely use this option!

  1. If you are a free user, select Economical in the Index Method, and click [Save and Process] at the bottom right! Your custom data library is now complete.


Step 3: AI Agent (Chatbot) Studio Setup & Knowledge Connection

Now that we've built the library, let's create the 'Chatbot App' that will act as the librarian.



  1. Navigate to the [Studio] tab from the top menu and click the [Create from Blank] button.

  2. ๐Ÿšจ Warning (Latest UI Update): By default, you will prominently see 'Workflow' and 'Chatflow' on the screen. Don't panic! Click the small text right below them that says MORE BASIC APP TYPES >!

  3. From the expanded menu, select [Chatbot]. (If you choose Chatflow, you will get a complex node-design screen, so you must select Chatbot.)

  4. For the App Name, write 'BlueMarket Support AI' to match our uploaded text, and click the [Create] button at the bottom right.



  1. After clicking Create, you will enter the 'Orchestrate' screen to set up the chatbot's brain. This is the most crucial prompt and knowledge connection step!

  2. Find the Knowledge area around the center of the screen, click the [+ Add] button on the right, and select the library document you created in Step 2 to link it. (Now the chatbot can use your document as a cheat sheet!)

  3. In the largest text box right above it, INSTRUCTIONS, write down the AI's code of conduct. This is the core of RAG.


[The Magical RAG Prompt]

You are a friendly customer service CS agent for BlueMarket.
You must answer questions based strictly on the information provided in the [Context] document.
If the information requested by the user is not found in the [Context]

You are a friendly customer service CS agent for BlueMarket.
You must answer questions based strictly on the information provided in the [Context] document.
If the information requested by the user is not found in the [Context]

You are a friendly customer service CS agent for BlueMarket.
You must answer questions based strictly on the information provided in the [Context] document.
If the information requested by the user is not found in the [Context]


Step 4: Test & Embed on Your Website

Now, let's have a conversation in the 'Debug & Preview' chat window on the right. Ask it: "How much is the return shipping fee for a simple change of mind?"



The AI will spot the exact information from your uploaded document and accurately reply: "For a simple change of mind, the customer must bear the round-trip shipping cost (6,000 KRW)." Below the answer, it clearly displays 'which document and which paragraph it pulled this information from (Citation Source)'. Perfect, right?


You can't keep this chatbot all to yourself. Let's embed it into your shopping mall, blog, or company intranet.


  1. Click the blue [Publish] button in the top right corner and click [Update].

  2. After that, click the [Embed on site] button.

  3. Just like embedding a YouTube video, a short HTML code in the form of an <iframe> or <script> tag will be generated.

  4. Simply copy this code and paste it into the HTML code of your WordPress, Shopify, or company website! A round chatbot icon will appear in the bottom right corner of the screen, opening a perfectly customized, 24/7 AI customer service center!


Conclusion

In the past, building a custom AI chatbot connected to your company's database required forming a specialized development team and spending tens of thousands of dollars. It involved the grueling process of parsing documents, saving them to vector DBs, and coding with LangChain.

However, using Dify.ai as we did today, even non-developers can build an enterprise-grade RAG system in just 10 minutes.

It's time to move beyond simply chatting with standard ChatGPT and build 'your very own AI fed and raised on the data of your life and your business'. Try uploading your thickest PDF manuals or the practice text above to Dify right now!

In the next a2set tutorial, we will cover an advanced course where this Dify chatbot goes beyond simply answering questions to becoming a 'Tool-integrated Agent' that can directly send emails and register events on Google Calendar. Stay tuned!