Bangla QnA Bot (RAG + LangChain)

Overview

Bangla QnA Bot is a Bengali question‑answering system built with Retrieval‑Augmented Generation (RAG). It indexes a document corpus, retrieves the most relevant passages for a query, and then generates grounded answers in Bangla via an LLM orchestrated with LangChain.

Key Features

End‑to‑end RAG pipeline: loaders → chunking → embeddings → vector store → retriever → generator.
Bengali‑aware preprocessing for cleaner sentence boundaries and better retrieval.
Configurable backends: swap embedding models and LLMs without changing app logic.
Deterministic answering: includes retrieved contexts to keep responses faithful to source.
CLI / API ready: simple interfaces for local testing or app integration.

Tech Stack

LangChain for chains, retrievers, and prompt orchestration
Vector store: FAISS / Chroma (configurable)
Embeddings: SBERT / Hugging Face models suited to Bangla (configurable)
LLM: pluggable (OpenAI/Gemini/Local) via LangChain wrappers
Python, FastAPI/Streamlit (optional UI)

How It Works

Ingest & Chunk: Documents are loaded and split using Bengali‑aware rules (। as a primary separator).
Embed & Index: Chunks are embedded and stored in a vector database.
Retrieve: For a user query, the retriever pulls top‑k semantically similar chunks.
Generate: The LLM receives the query + retrieved context and returns a Bangla answer.
(Optional) Cite: Return source snippets for transparency.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Rawhatur Rabbi

Overview

Key Features

Tech Stack

How It Works

Share on