feat: monthly global spend litellm database materialized view migration by noahpodgurski · Pull Request #154 · Firefox-AI/MLPA

noahpodgurski · 2026-05-21T15:05:38Z

What's new

Add scripts/migrate-litellm-database.sh to create monthly_global_spend_cache and recreate MonthlyGlobalSpend as a thin wrapper over the cache.
Include new start.sh script which makes starting LiteLLM, running migrations, init scripts all automatic
https://mozilla-hub.atlassian.net/browse/AIPLAT-696
https://github.com/mozilla/dataservices-infra/pull/1968

Results:

$ python scripts/benchmark_spendlogs_table.py --skip-load --benchmark-target monthly-view
[setup] connecting to postgresql://litellm:litellm@localhost:5432/litellm
[summary] current size=1.00 GiB rows=470,000 target=1.00 GiB
[benchmark] server-side EXPLAIN ANALYZE:
  Aggregate  (cost=124348.77..124348.78 rows=1 width=8) (actual time=92.098..94.902 rows=1 loops=1)
    Buffers: shared hit=14642 read=102874 written=54
    ->  Finalize GroupAggregate  (cost=123592.56..124313.54 rows=2818 width=12) (actual time=92.079..94.897 rows=30 loops=1)
          Group Key: (date("LiteLLM_SpendLogs"."startTime"))
          Buffers: shared hit=14642 read=102874 written=54
          ->  Gather Merge  (cost=123592.56..124250.14 rows=5636 width=12) (actual time=92.072..94.884 rows=90 loops=1)
                Workers Planned: 2
                Workers Launched: 2
                Buffers: shared hit=14642 read=102874 written=54
                ->  Sort  (cost=122592.53..122599.58 rows=2818 width=12) (actual time=83.697..83.698 rows=30 loops=3)
                      Sort Key: (date("LiteLLM_SpendLogs"."startTime"))
                      Sort Method: quicksort  Memory: 26kB
                      Buffers: shared hit=14642 read=102874 written=54
                      Worker 0:  Sort Method: quicksort  Memory: 26kB
                      Worker 1:  Sort Method: quicksort  Memory: 26kB
                      ->  Partial HashAggregate  (cost=122395.83..122431.06 rows=2818 width=12) (actual time=83.662..83.672 rows=30 loops=3)
                            Group Key: date("LiteLLM_SpendLogs"."startTime")
                            Batches: 1  Memory Usage: 121kB
                            Buffers: shared hit=14626 read=102874 written=54
                            Worker 0:  Batches: 1  Memory Usage: 121kB
                            Worker 1:  Batches: 1  Memory Usage: 121kB
                            ->  Parallel Seq Scan on "LiteLLM_SpendLogs"  (cost=0.00..121416.67 rows=195833 width=12) (actual time=3.765..71.998 rows=156667 loops=3)
                                  Filter: ("startTime" >= (CURRENT_DATE - '30 days'::interval))
                                  Buffers: shared hit=14626 read=102874 written=54
  Planning:
    Buffers: shared hit=199 read=1
  Planning Time: 0.486 ms
  JIT:
    Functions: 29
    Options: Inlining false, Optimization false, Expressions true, Deforming true
    Timing: Generation 0.947 ms, Inlining 0.000 ms, Optimization 0.570 ms, Emission 10.660 ms, Total 12.177 ms
  Execution Time: 104.687 ms
[benchmark] monthly view query elapsed=0.081291s total_spend=23514392.764500007
$ sh scripts/migrate-litellm-database.sh
...
$ python scripts/benchmark_spendlogs_table.py --skip-load --benchmark-target monthly-view
[setup] connecting to postgresql://litellm:litellm@localhost:5432/litellm
[summary] current size=1.00 GiB rows=470,000 target=1.00 GiB
[benchmark] server-side EXPLAIN ANALYZE:
  Aggregate  (cost=1.01..1.02 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1)
    Buffers: shared hit=1
    ->  Seq Scan on monthly_global_spend_cache  (cost=0.00..1.01 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=1)
          Buffers: shared hit=1
  Planning:
    Buffers: shared hit=61 read=1
  Planning Time: 0.261 ms
  Execution Time: 0.035 ms
[benchmark] monthly view query elapsed=0.000405s total_spend=23514392.764500115

Summary (for 1GB of data in table):

Before migration, execution time: 104.687 ms
Post migration, execution time: 0.035 ms

ti3x · 2026-05-21T21:05:57Z

[Critical] Don't DROP/CREATE the wrapper view on every refresh

The CronJob in dataservices-infra#1968 invokes this script with refresh every 5 minutes, so the current logic re-runs DROP VIEW IF EXISTS "MonthlyGlobalSpend" ... CREATE VIEW ... each time. Two problems:

Every refresh takes an ACCESS EXCLUSIVE lock on MonthlyGlobalSpend and leaves a brief window where queries against it fail with relation does not exist.

DROP VIEW IF EXISTS (without CASCADE) will error if anything else depends on MonthlyGlobalSpend. Adding CASCADE would silently drop those dependents, which is worse. Worth running this once in each env first:

SELECT DISTINCT dependent.relname, dependent.relkind
FROM pg_depend d
JOIN pg_rewrite r        ON r.oid = d.objid
JOIN pg_class  dependent ON dependent.oid = r.ev_class
WHERE d.refobjid = 'public."MonthlyGlobalSpend"'::regclass
  AND dependent.relname <> 'MonthlyGlobalSpend';

Both go away if the refresh path skips DDL entirely:

ACTION="${1:-migrate}"

if [[ "${ACTION}" == "refresh" ]]; then
  if ! psql -d "${LiteLLM_DB_NAME}" -tAc \
      "SELECT to_regclass('public.monthly_global_spend_cache') IS NOT NULL" \
      | grep -qx t; then
    echo "[mlpa-litellm-migrate] cache missing; run migrate first. Skipping."
    exit 0
  fi
  psql -v ON_ERROR_STOP=1 -d "${LiteLLM_DB_NAME}" -c \
    'REFRESH MATERIALIZED VIEW CONCURRENTLY public.monthly_global_spend_cache;'
  exit 0
fi

# existing schema-setup block runs only on the migrate path

The DROP/CREATE then runs once at deploy time instead of 8,640 times a month, and the dependency risk is limited to the controlled migrate window.

ti3x · 2026-05-21T21:06:00Z

[Medium] IF NOT EXISTS will silently keep the old MV definition

Postgres has no CREATE OR REPLACE MATERIALIZED VIEW. If the SELECT inside CREATE MATERIALIZED VIEW IF NOT EXISTS public.monthly_global_spend_cache AS ... ever changes (new column, different window, etc.), this script will quietly keep the existing definition and the change won't take effect.

Suggest a comment in the script so the next person knows what's needed:

-- NOTE: Postgres has no CREATE OR REPLACE MATERIALIZED VIEW. To change the
-- SELECT below, add a one-shot migration step that runs
--   DROP MATERIALIZED VIEW IF EXISTS public.monthly_global_spend_cache;
-- before this block, otherwise the new definition will be ignored.
CREATE MATERIALIZED VIEW IF NOT EXISTS public.monthly_global_spend_cache AS
  ...

Pure documentation — no behavior change today, just saves a future debugging session.

ti3x · 2026-05-22T19:50:34Z

[Worth flagging — likely not an issue] Wrapper view column/row shape change

The new wrapper view exposes a single spend column with one row, while the original (per the Group Key: date("startTime") and ~30 rows in the EXPLAIN) returns (date, spend) × ~30 rows.

SELECT SUM(spend) FROM "MonthlyGlobalSpend" keeps working — that's why the benchmark numbers match. But any caller doing SELECT date, spend FROM "MonthlyGlobalSpend" (time-series UI, daily breakdown chart, etc.) would break.

Probably fine if LiteLLM only uses the SUM path — but worth a quick check before this lands:

# from the LiteLLM source (and any internal callers)
git grep -i 'MonthlyGlobalSpend'

If everything is summing, no action needed. If anything is selecting the daily rows, the MV would need to keep GROUP BY date("startTime") and use date as the unique-index column.

feat: monthly global spend litellm database materialized view migration

c0aeb84

noahpodgurski requested a review from a team as a code owner May 21, 2026 15:05

address comments, add date to view

1826331

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: monthly global spend litellm database materialized view migration#154

feat: monthly global spend litellm database materialized view migration#154
noahpodgurski wants to merge 2 commits into
mainfrom
monthly-global-spend-materialized-view

noahpodgurski commented May 21, 2026 •

edited

Loading

Uh oh!

ti3x commented May 21, 2026

Uh oh!

ti3x commented May 21, 2026

Uh oh!

ti3x commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

noahpodgurski commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's new

Results:

Summary (for 1GB of data in table):

Uh oh!

ti3x commented May 21, 2026

Uh oh!

ti3x commented May 21, 2026

Uh oh!

ti3x commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noahpodgurski commented May 21, 2026 •

edited

Loading