RAGFlow is a self-hostable, Apache 2.0 open-source RAG engine that turns your company's documents into a searchable, citation-backed knowledge base. It does most of what Glean charges $97,500 a year to do, and runs on a $40-a-month server.
RAGFlow, an open-source retrieval-augmented generation engine from InfiniFlow, replaces the core function that enterprise search platforms like Glean charge an average of $97,500 per year to perform: turning your company's scattered documents into a system that actually answers questions. RAGFlow is free to self-host, licensed under Apache 2.0, and sits at 61,000 GitHub stars as of this writing.
That gap between what Glean costs and what RAGFlow costs is not a fringe edge case. It is the entire business model of a category.
What the Paid Version Actually Does
Glean is the enterprise search product that shows up in procurement conversations at mid-sized companies because it promises one thing well: you connect your Slack, Confluence, Google Drive, Salesforce, and GitHub, and Glean indexes all of it so employees can search across everything from one place. Ask it who owns a particular project and it surfaces the answer from a Slack thread three months ago. Ask it what the refund policy is and it finds the Confluence page, not just the folder it lives in.
The problem is the price. Glean does not publish pricing publicly. Based on buyer-reported data from Vendr, the median annual contract is $97,500, with per-user costs ranging from $25 to $50 per month depending on seat count. The entry point commonly requires a 100-seat minimum, putting the floor around $60,000 per year before any implementation or onboarding work. Larger deployments regularly exceed $200,000 annually. There is also a mandatory support fee reported at around 10 percent of contract value on top of that.
For a 50-person company, this is not a rounding error. It is often the third or fourth largest software line item.
What RAGFlow Actually Does
RAGFlow approaches the same problem from a different angle. Rather than indexing everything across all your SaaS tools, it focuses on making your documents intelligent. You upload PDFs, Word files, spreadsheets, scanned reports, or point it at a folder, and RAGFlow does something most RAG tools skip: it reads the document the way a human would.
That means it understands table structure, respects heading hierarchy, recognizes figure captions, and handles scanned PDFs with OCR before any of that content reaches a vector store. Most cheaper RAG implementations treat a PDF as a flat string of text and lose all the structure. RAGFlow preserves it, which matters enormously when the answer is buried in a cell of a compliance table or a footnote in a contract.
Once your documents are ingested, RAGFlow gives you a chat interface where employees can ask questions in plain language. Every answer comes with citations, pointing back to the exact document and section that generated the response. That traceability is not a nice-to-have for legal, compliance, and finance teams. It is the difference between using the tool and not using the tool.
RAGFlow also supports agentic capabilities, meaning it can chain retrieval steps, invoke external APIs, and handle multi-step queries that require reasoning across multiple documents. This puts it closer to the direction Glean is evolving toward, not just a search box.
The Setup Reality
None of this is one-click. RAGFlow is a real system with real infrastructure requirements. To run it locally or on a server, you need Docker, and behind the scenes it spins up MySQL for metadata, Redis for caching and task queuing, MinIO for document storage, and Elasticsearch for vector and keyword indexing.
For a developer-comfortable team, the Docker Compose setup is documented and generally works. You clone the repo, copy an environment file, run docker compose up, and you have a running instance within twenty to thirty minutes. Connecting it to an LLM API (OpenAI, Anthropic, DeepSeek, or a locally running model via Ollama) takes another ten minutes.
For a non-technical team, this is not realistic without someone to own it. You are not buying a product with a support team, onboarding calls, or an account manager who calls you when something breaks. You are operating infrastructure.
The hosting costs are manageable. A small team running a single-tenant deployment fits comfortably in $20 to $40 per month of cloud compute, plus whatever LLM API spend the query volume generates. At a few hundred queries a day with a mid-tier model, that API cost is typically under $50 a month. So the total all-in cost for a 50-person company is likely under $100 a month versus $8,000 a month for Glean.
What you give up is the SaaS connector library. RAGFlow does not natively index your Slack history or your Salesforce records. If your search problem is fundamentally about querying across live SaaS integrations and not documents, RAGFlow is only a partial answer. You would need to export data to it, or combine it with a connector layer. Glean's native integrations remain a real advantage for organizations that need cross-tool search without any engineering work.
Where the Match Is Strongest
RAGFlow is a particularly strong fit for companies where the knowledge problem is document-heavy rather than SaaS-heavy. Law firms handling client documents. Marketing agencies sitting on years of brand briefs and research reports. Finance teams with regulatory filings, model assumptions, and policy PDFs. Manufacturers with technical manuals, compliance records, and supplier documentation.
These organizations often do not have active Glean-sized budgets for search, and their knowledge is mostly in files rather than embedded across cloud tools. RAGFlow gives them a functional, citation-backed answer system for a fraction of the cost, with the trade-off that someone on the team has to own and maintain the infrastructure.
The Apache 2.0 license also means there are no usage restrictions, no per-seat fees that escalate as the company grows, and no feature gates. Everything in the open-source build is available.
RAGFlow Cloud exists if self-hosting is genuinely not an option, and it handles the infrastructure burden for a fee, though pricing is not published openly. That path closes part of the setup gap, though it also closes part of the cost gap.
The Larger Pattern
RAGFlow will not replace Glean for every organization. But for the company paying $97,000 a year for a tool employees mostly use to search old meeting notes and policy documents, it represents a genuinely different option, one that did not exist at this level of completeness two years ago.
The more interesting question is not whether RAGFlow matches Glean feature for feature. It is how many of Glean's customers actually need all of Glean's features, versus how many are paying for a category because they did not know the category had open-source alternatives yet.