Skip to content

Latest commit

 

History

History
165 lines (115 loc) · 6.45 KB

File metadata and controls

165 lines (115 loc) · 6.45 KB

SQLite Storage

MetricsQLite uses SQLite as its storage backend. The main advantage is simplicity - no external database service is required. However, SQLite lacks features found in specialized time series databases, so MetricsQLite implements its own optimizations for storage efficiency.

Database Structure

MetricsQLiteClient.create_tables() creates tables, views, and triggers. To prevent collisions with existing tables, each object is prefixed (default: metricsqlite).

Tables

{prefix}_series - Stores metric metadata (names and labels) to avoid string duplication in the data table.

Column Type Description
series_id INTEGER Primary key
type TEXT Metric type (gauge or counter)
name TEXT Metric name
labels TEXT JSON-encoded labels

{prefix}_data - Stores time series data, referencing series by ID.

Column Type Description
series_id INTEGER Foreign key to series table
start INTEGER Start timestamp (Unix milliseconds)
end INTEGER End timestamp (for compacted/counter data)
sample_count INTEGER Number of samples represented
positive REAL Positive component of value
negative REAL Negative component of value
min REAL Minimum value (compacted data only)
max REAL Maximum value (compacted data only)

Views

{prefix}_gauge and {prefix}_counter - Convenience views that join series metadata with data and compute value as positive + negative.

Triggers

{prefix}_gauge_insert - Handles inserts via the gauge view, automatically creating series entries and splitting values into positive/negative components.

{prefix}_counter_insert - Handles inserts via the counter view. If the new value matches the previous value, extends the existing row instead of inserting a new one.

Using Existing Databases

As long as the prefixed table names don't conflict with existing tables, MetricsQLite can safely share a database with other applications.

Multiple MetricsQLite clients can even work on the same database with different prefixes:

from metricsqlite import MetricsQLiteClient

client1 = MetricsQLiteClient("metrics.db", tables_prefix="solar")
client2 = MetricsQLiteClient("metrics.db", tables_prefix="hvac")

Thread Safety

MetricsQLite is thread-safe. All database operations are protected by a lock, allowing safe concurrent access from multiple threads.

For better concurrent performance, enable WAL (Write-Ahead Logging) mode:

client = MetricsQLiteClient("metrics.db", enable_wal=True)

Note: WAL mode is a database-level setting that affects all connections to the database file. It creates additional files (database.db-wal and database.db-shm) and doesn't work on network filesystems.

Disk Usage

SQLite does not compress data. Each raw sample uses approximately:

  • 1-2 bytes for series reference
  • 8 bytes for timestamp
  • 8 bytes for value
  • 17+ bytes total

At 3600 samples per hour, each metric uses roughly 1.5 MB per day or 500+ MB per year! That is why MetricsQLite implements some clever mechanisms to reduce disk usage of metrics.

Counter Compression

Counter metrics that maintain the same value are not re-inserted. Instead, the end timestamp of the existing row is updated. This significantly reduces storage for slowly-changing counters.

Gauge Compaction

The compact_gauges() method compresses older gauge data into time buckets. Each bucket stores:

  • sample_count - Number of original samples
  • min / max - Value range within the bucket
  • positive / negative - Separate averages for positive and negative values

Why separate positive and negative averages?

A single average loses information when values cross zero. Consider power consumption from the grid where positive values represent consumption and negative values represent solar export. In a 5-minute bucket, you might have both consumed and produced energy. Storing separate averages preserves this information for accurate energy calculations with integrate().

Compression factor

Each bucket stores:

  • 1-2 bytes for series reference
  • 1-2 bytes for sample count
  • 2x8 bytes for start and end timestamps
  • 4x8 bytes for min, max, positive and negative value
  • 50+ bytes total

This is considerately more than a single data row would use. But the total number of rows is much lower. At a compression interval is 5m, there are only 12*24=288 rows per day or about 14.4 KB. This is about 5 MB per metric per year which is a compression factor of about 100x. At larger bucket intervals the disk usage per metric gets even lower.

Compaction Recommendations

Choose bucket sizes that are divisors of your query step intervals. MetricsQLite does not yet interpolate between incompatible step sizes.

For queries like avg_over_time(metric[1h:]), use bucket sizes of 1h, 30m, 15m, 5m, or other divisors of 1 hour.

It's also possible to have different bucket sizes for different historical periods. However, the intervals of older compression buckets MUST be multiples of earlier intervals.

import time

one_hour_ago = int(time.time()) - 3600
client.compact_gauges(older_than=one_hour_ago, interval="1m")
one_week_ago = int(time.time()) - 7 * 24 * 3600
client.compact_gauges(older_than=one_week_ago, interval="5m")
one_month_ago = int(time.time()) - 30 * 24 * 3600
client.compact_gauges(older_than=one_month_ago, interval="1h")

Per-Metric Compaction

You can compact specific metrics using the name and labels filters:

# Compact only temperature metrics
client.compact_gauges(older_than=cutoff, interval="5m", name="temperature")

# Compact only series with a specific label (subset match)
client.compact_gauges(older_than=cutoff, interval="5m", labels={"location": "kitchen"})

# Combine both filters
client.compact_gauges(
    older_than=cutoff,
    interval="5m",
    name="temperature",
    labels={"location": "kitchen"}
)

The labels filter uses subset matching - series must contain the specified key-value pairs but can have additional labels.