Langfuse is a self-hostable, MIT-licensed LLM observability platform with 28,000 GitHub stars that replaces paid tools like Datadog LLM Monitoring and LangSmith, dropping monthly costs from thousands of dollars to the price of a single cloud instance. It just wrapped its fifth launch week with full-text search, CI/CD-integrated experiments, and an expanded MCP server.

Langfuse, the open-source LLM engineering platform, can replace the paid observability stack most AI teams are currently running - primarily Datadog's LLM monitoring add-on or LangChain's LangSmith product - and drop the monthly cost from anywhere between $500 and several thousand dollars down to roughly $50 to $150 in cloud infrastructure. It has crossed 28,000 GitHub stars, records more than 50 million SDK installs per month, and just wrapped its fifth launch week with features that move it meaningfully closer to production-ready for teams that are not full of engineers.

What the problem actually looks like

If your team is building anything with AI right now - a support bot, a document summarizer, an internal copilot - you are almost certainly flying partially blind. You know when the thing crashes. You do not reliably know when it gives a wrong answer, costs more than expected on API calls, starts hallucinating on a specific category of questions, or degrades over a model update you did not choose. That visibility gap is what observability tools are supposed to close.

Datadog added LLM monitoring to its platform and charges approximately $8 per 10,000 requests for that layer. A modest production app generating one million traces per month lands at around $800 in Datadog fees before you touch any of Datadog's existing infrastructure pricing. LangSmith, the observability and evaluation product from LangChain, runs on a seat-based model that pushes past $2,500 per month for teams once usage scales.

Neither of those numbers includes the engineering hours for custom evaluation logic, the cost of a separate prompt management system, or the fees for any guardrails layer.

What Langfuse actually does

Langfuse is a dashboard, SDK, and API that sits between your application and your AI models. Every time your app calls a language model, Langfuse records the input, the output, the latency, the token count, and the cost. It stores that record as a "trace," and you can browse, filter, search, and annotate those traces in a web interface.

That sounds straightforward because the core concept is simple. The depth is in what you do with the traces. Langfuse layers evaluations on top - automated checks that score each response for groundedness, hallucination, or custom criteria you define. It has a prompt management system so your team can update prompts without a code deploy and see which version performs better. It runs experiments you can wire into a CI/CD pipeline so a regression does not reach production without tripping an alert.

The practical business translation: when your support bot starts giving confidently wrong answers to billing questions, you see it in minutes rather than hearing about it from customers. When you switch from GPT-4o to a cheaper model, you can measure whether quality actually dropped before you commit. When a new hire edits a system prompt, you have a record of what changed and what happened afterward.

The self-hosting economics

The MIT license covers the full product. There are no features gated behind a paid tier on the self-hosted version, no seat caps, and no retention limits baked into the license. You run it yourself on PostgreSQL, ClickHouse, and Redis - components that are themselves free and widely understood by cloud providers.

A realistic self-hosted setup on a major cloud provider costs between $50 and $150 per month for the infrastructure, depending on traffic volume and whether you use managed databases or provision your own. At one million traces per month, that compares against roughly $919 on Langfuse's own cloud product, $800 or more on Datadog, and $2,500 or more on LangSmith. The self-hosted path is not free - someone has to maintain it - but the savings at any meaningful scale are real enough to absorb a few hours of engineering time per month.

Langfuse also offers a managed cloud tier starting at $29 per month with 100,000 observations included, which is the honest recommendation for teams smaller than about five people who do not want the operational overhead.

What you should know before committing

Self-hosting Langfuse is not a one-command operation for non-technical teams. It requires standing up a Docker Compose stack or a Kubernetes deployment, managing a database, and handling upgrades. The v4 architecture, which moves to an observation-centric ClickHouse data model for significantly faster query performance, is currently live on Langfuse's managed cloud but is still rolling out to the self-hosted path. If you self-host today, you are on the v3 architecture, which is stable and production-proven but not the newest version.

The evaluation features, while genuinely useful, still benefit from an engineer configuring them initially. The no-code evaluation builder works, but the most powerful customizations involve writing scoring logic in Python. A marketing team cannot set this up without at least one technical person involved in the initial configuration.

Langfuse was acquired by ClickHouse in January 2026. The open-source commitment and MIT license predate the acquisition and have remained intact, but it is worth noting that any future pricing or licensing decisions now involve a larger company. The acquisition has accelerated development rather than slowing it - the v4 performance improvements are a direct result of deeper ClickHouse integration.

Where this sits in a real AI stack

Langfuse does not replace your AI models, your orchestration framework, or your application logic. It instruments whatever you already have. It works with OpenAI, Anthropic, Google Gemini, and essentially every hosted model. It integrates with LangChain, LlamaIndex, CrewAI, and most other popular agent frameworks through its OpenTelemetry-native tracing layer. The recent Launch Week 5 additions include full-text search across traces and an expanded MCP server that lets AI tools like Claude pull production observations, metrics, and dataset runs directly without leaving the interface.

The cost comparison with Datadog is not the whole story. Datadog is a general-purpose infrastructure monitoring platform that added LLM support. Langfuse is purpose-built for the AI observability problem, which means the concepts it surfaces - traces, spans, evaluations, prompts as versioned artifacts - map directly to how AI applications actually fail in production. That specificity is worth something beyond the price difference.

The closing observation

Teams that waited for open-source AI tooling to mature enough to trust in production no longer have to wait. The gap between what Langfuse offers today and what enterprise-priced tools offer has narrowed to the point where the remaining difference is mostly about who does the operations work. For most teams currently writing monthly checks to Datadog or LangSmith for AI monitoring, the math changed quietly in 2025 and the tools to act on it are ready now.