Why IoTDB Is Written in Java: A Decade of Engineering Trade-offs

Since I started working on the development of the time-series database Apache IoTDB in 2016, I've been asked the same question again and again:

Why did you choose Java to build a database? Can Java really be used to write a database system?

In the early days, my standard answer was usually this:

When IoTDB was initiated in 2011, almost all influential distributed systems and databases were built in Java or on the JVM—such as Hadoop, HBase, Spark (Scala on JVM), Cassandra, Kafka, and Flink. To integrate deeply with the big data ecosystem, choosing Java was a natural decision.

That explanation is valid—but clearly insufficient.

What people really want to know is:

  • If you learn Java, do you actually have a chance to build a database?

  • Can Java be used to build a good database?

  • What does choosing Java really mean for a system like IoTDB?

  • ...

These questions cannot be answered by theory alone. The relationship between programming languages and databases is not a matter of ideology—it is a practical trade-off among language characteristics, system complexity, engineering investment, and long-term returns.

After nearly ten years of real-world exploration, we believe we can now give a more grounded answer. Below are the eight key considerations behind IoTDB's choice of Java.

A Mature and Comprehensive Java Ecosystem

Queues, maps, heaps, locks, thread scheduling—nearly every common data structure and concurrency primitive has mature, well-tested implementations in the Java ecosystem. This allows database developers to focus their energy on core database logic and performance optimizations, rather than repeatedly reinventing low-level infrastructure.

More importantly, Java is widely used across enterprise platforms and applications. Middleware components in the Java ecosystem integrate smoothly with each other, which significantly lowers the learning curve for developers adopting Java-based databases. As a result, Java developers can more easily understand, operate, and extend a Java-written database system.

Code Readability and Long-Term Maintainability

This factor is often overlooked, but for someone who has spent years working on database internals, it is critical.

Databases are inherently complex systems. That complexity brings enormous optimization potential—but also substantial risk. A single subtle mistake can introduce severe bugs, which is why newer versions of some databases occasionally perform worse or become less stable than older ones.

Java's object-oriented design provides a natural advantage in code readability and conceptual clarity. In practice, we have found that many community contributors are able to ramp up quickly by understanding IoTDB's design principles and abstractions.

Readable code is not just a matter of elegance—it is a system's lifeline. Only readable and understandable codebases can sustain long-term evolution without collapsing under their own complexity.

Operability and Debugging Efficiency

Most Java developers are familiar with exception handling and detailed stack traces in logs—and those stack traces are invaluable.

In our experience, when users report bugs in IoTDB, engineers can often locate the root cause within the same day, and rarely does debugging exceed one day. The stack information alone usually provides enough context to pinpoint the issue.

By contrast, in discussions with developers of C-based databases, diagnosing production issues such as memory leaks can sometimes take weeks or even months.

No language-level advantage matters more than system stability and recoverability. There is nothing more painful than a production database failure that cannot be quickly diagnosed or fixed.

JVM tooling such as JProfiler and Arthas gives Java developers powerful observability into runtime behavior, enabling fast root-cause analysis and remediation.

Cross-Platform Portability

Today, this is often referred to as localization or hardware adaptation.

Java's promise of "write once, run anywhere" has proven extremely valuable as domestic and heterogeneous hardware platforms have become more common. For IoTDB, we have rarely needed special platform-specific adaptations—if Java can run, IoTDB can run.

This allows us to concentrate on core database logic and optimization, instead of spending engineering effort on platform compatibility.

Efficient Project and Dependency Management

For anyone joining the IoTDB project, the first essential skill is understanding Maven.

Nearly all Java projects—large or small—use Maven for project structure, dependency management, compilation, packaging, and release workflows. Advanced tasks such as code formatting and static analysis can be standardized through Maven profiles.

This consistency significantly reduces onboarding costs. In fact, my earliest blog posts about IoTDB were introductions to Maven-based release pipelines.

Performance: The Question Everyone Cares About

The most common concern is simple:

Can a database written in Java actually perform well?

Let's start with facts.

In major public time-series database benchmarks—such as TPCx-IoT and benchANTIoTDB ranks first in both read/write performance and cost efficiency. These benchmarks include databases written in:

  • Go (InfluxDB, VictoriaMetrics)

  • C (TimescaleDB)

  • C++ (ClickHouse)

IoTDB, written in Java, is not merely competitive—it leads.

Why?

Because databases are often described as the crown jewel of foundational software. Their difficulty does not come from language syntax or runtime mechanics, but from internal system complexity.

As database functionality grows, system complexity increases exponentially—much like governing a large city with countless departments, workflows, and dependencies. This complexity creates vast optimization opportunities: columnar storage, batching, pipelining, indexing, and more. Optimizing even a single execution path can yield order-of-magnitude performance gains.

Java's garbage collection is frequently criticized, but in practice, it is a net positive feature—analogous to memory defragmentation at the OS level. Modern JVM GC algorithms are the result of decades of global engineering effort and perform remarkably well.

For special scenarios, databases can:

  • design smarter caching strategies

  • use off-heap memory

  • isolate memory-sensitive components

and do so transparently at the database layer.

In our production environments, we have never encountered a case where Java GC itself was the performance bottleneck. When serious GC pauses occur, they usually indicate either misconfiguration or memory leaks—issues typically identified and resolved during testing, often within the same day.

A database is a holistic system. No single technical advantage or disadvantage defines its success.

Lightweight Deployment Scenarios

Another frequent concern is whether Java databases can be deployed in edge or constrained environments.

There are two distinct scenarios:

Intelligent terminals

These devices may have limited resources (single-core CPU, 1–2 GB memory, tens of GB storage) but still support a full software stack. In such cases, Java poses no issue.

IoTDB can operate with memory footprints of just a few hundred megabytes, easily meeting edge read/write workloads. It is already running stably in satellite systems, airborne platforms, and power data collection terminals.

Embedded environments

Some embedded systems only support C/C++ runtimes, with tens of megabytes of memory and strict real-time constraints.

In many such cases, a full database is unnecessary; a lightweight file-based approach is often more appropriate. For this reason, we typically deploy the C++ implementation of TsFile, IoTDB's time-series file format, on the device side and upload files upstream.

P.S. The C++ version of TsFile will be open-sourced soon.

Industrial control algorithms rarely require long-term historical data stored in databases. Real-time control logic prioritizes low time complexity and often keeps required historical data fully cached in memory.

As hardware capabilities improve, the focus should shift toward better data processing models, not merely raw resource constraints.

A Strong Java Talent Pool

A database company is not just about code—it depends on a reliable development and operations team.

Although database systems attracted significant attention during recent waves of innovation, participation remains relatively small compared to application-layer projects.

Our experience shows that excellent Java developers can successfully transition into database kernel development. They ramp up quickly, take ownership of modules, and begin contributing meaningful code in a short time.

So why does the perception persist that Java cannot be used to build databases?

Largely due to historical reasons.

Relational databases originated in the 1970s, while Java was introduced in 1995. By the time Java emerged, major databases had already been written in C for decades:

  • Oracle (1977)

  • PostgreSQL (1986)

  • MySQL (1995)

Early skepticism also surrounded the commercial viability of databases themselves—until Oracle proved otherwise.

Java followed a similar trajectory. Today, many high-performance middleware systems and databases—including IoTDB, Cassandra, and H2—demonstrate that Java performance is more than sufficient for database development.

Looking back over IoTDB's decade-long journey, our ability to rapidly iterate on user demands while maintaining high stability and performance owes a great deal to Java.

Java is not only capable of building databases—it is well-suited for the task. This is not a theoretical claim, but a conclusion drawn from practice.

If you are a Java developer, you absolutely have the opportunity to build an excellent database.

If you’re curious about IoTDB, feel free to explore the project on GitHub—and join the community discussions and contributions.