How many databases should you create? How should you model your data to fully utilize hardware resources?
When deploying Apache IoTDB in distributed mode, teams often face the same challenge: how to scale throughput without over-fragmenting the system. This article answers the most frequently asked questions about IoTDB distributed deployment and data modeling.
Recently, during a distributed deployment discussion, a user asked:
Most examples on the IoTDB website focus on smart factory scenarios. Is there a more general data modeling approach? Would creating one database per state improve performance? How should hierarchical paths be structured, like
root.<state>.<license_plate>.<device_type>.<device_id>.<measurement>?
These questions touch several critical architectural concepts in IoTDB. Let’s address them step by step.
p.s. Applicable to IoTDB 1.0x and 2.0x
Do You Need Multiple Databases for Performance?
The short answer is:
No.
IoTDB is a distributed database. It does not require manual database sharding to achieve high throughput. Even a single database can fully utilize machine resources when properly configured.
That said, multiple databases may still be appropriate for semantic or operational reasons:
Different time partition intervals
Different Region counts
Independent permission control
Strong data isolation between business domains
It is important to note that:
Data across databases is isolated.
Cross-database queries are not supported.
Therefore, multiple databases are suitable when strict business isolation is required — not for performance tuning.
The key to distributed performance in IoTDB lies elsewhere — in a core abstraction called Region.
What Is Region? How Should You Tune Region Count?
Fundamentals
Region is one of the most important internal abstractions in IoTDB. Depending on perspective, Region has different roles:
From a distributed systems perspective → a data shard instance
From a storage engine perspective → a serial-write IoT-LSM engine instance
From a replication perspective → the unit of high availability
In practice, Region defines the true concurrency boundary of IoTDB.
The relationship between Database and Region is one-to-many:
One database owns multiple Regions
One Region belongs to exactly one database
Regions with the same ID are replicated across nodes for high availability
On a single DataNode:
More Regions → higher concurrency → better CPU utilization
But each Region consumes memory and runtime resources
Therefore, each DataNode has a soft upper limit on the number of Regions
As data volume increases, Regions expand dynamically until reaching this soft limit.
Understanding this mechanism is critical: Performance scaling in IoTDB is Region-driven, not database-driven.
Recommended Region Configuration
Recommend configuration:
Region soft limit per DataNode = CPU logical cores ÷ 2
This configuration achieves:
Strong write concurrency
Controlled memory consumption
Stable garbage collection behavior
Predictable performance under load
The parameter is configured in iotdb-system.properties:
data_region_per_data_nodeCluster-wide consistency is required.
Version-specific defaults:
≤ 1.3.3 Default = 5 Recommended: manually calculate CPU logical cores ÷ 2
≥ 1.3.4 Default = 0 0 means auto-detect CPU logical cores ÷ 2
You may still set a fixed positive value if your workload requires it.
When Should You Increase Region Count?
Suppose:
data_region_per_data_node = CPU cores ÷ 2You still want higher read/write throughput
Monitoring shows:
Disk I/O is not saturated
Network bandwidth is sufficient
Memory GC is stable
CPU is not fully utilized
In this case, the bottleneck may be insufficient concurrency rather than hardware limits.
You may:
Increase
data_region_per_data_nodeto approximately CPU logical coresRestart the cluster
Wait for new time partitions to trigger new Data Region creation
This increases the number of parallel write engines and allows the system to absorb higher write pressure.
Important Note About Multi-Database Deployments
The data_region_per_data_node parameter is a soft upper limit per DataNode.
With a single database → it effectively occupies the entire soft limit.
With multiple databases → they share the Region quota according to internal balancing policies.
In large-scale scenarios with many databases, the actual Region count may gradually exceed the soft limit as the system scales.
Again, this reinforces a central idea: IoTDB scaling is fundamentally Region-based.
Recommended Data Modeling Strategy
Now let’s return to the original modeling question.
Prefer a Single Database
For most deployments, a single database such as root.db is sufficient.
This:
Does not negatively affect performance
Simplifies cross-region queries (suite for cross-domain queries, depending on circumstances)
Avoids unnecessary data isolation
Configure Region Properly
Set data_region_per_data_node = CPU logical cores ÷ 2
This ensures hardware resources are effectively utilized while maintaining stability.
Hierarchical Path Design Principles
A recommended structure is:
root.db.<province>.<device_type>.<license_plate>.<measurement>Core principle: Place lower-cardinality attributes at higher hierarchy levels.
Why?
IoTDB’s tree structure benefits from hierarchical compression
Fewer distinct nodes at upper levels improve metadata compression efficiency
Balanced tree structures improve memory usage and traversal efficiency
In practice:
Use semantic hierarchy
Place attributes with fewer unique values higher
Avoid excessive fragmentation
Tree Model and Table Model (IoTDB 2.x)
In IoTDB 2.x, both Tree Model and Table Model are supported.
While their access semantics differ, the underlying distributed architecture remains the same.
Region still defines:
Physical storage boundaries
Concurrency units
Replication units
Table Model introduces relational-style access semantics, but the Region-based scaling mechanism and storage engine remain consistent with Tree Model.
Therefore, understanding Region is essential regardless of which model you choose.
Final Takeaway
In distributed IoTDB:
Performance is not improved by manually splitting databases
Concurrency is controlled by Region configuration
Efficiency depends on balanced hierarchical modeling
Once Region is understood as the fundamental concurrency unit, distributed deployment decisions become clear engineering trade-offs rather than trial-and-error experimentation.