Ladybug version
No response
What operating system are you using?
No response
What happened?
When I opening a database file, if the on-disk HashIndexHeaderOnDisk for a table's primary key index contains corrupted data (e.g., numEntries =
16544419524162413700), the PrimaryKeyIndex constructor proceeds to allocate memory based on these garbage values, causing the process to consume 17+
GB of RAM and get killed by the OS OOM killer. There is no validation or sanity check on the deserialized header values before allocation.
Discussion: Prevention Strategies
This issue raises a broader question about data integrity guarantees. There seem to be two complementary approaches to prevent this class of problem:
For preventing this class of issue, would it make sense to add protection on both sides:
- Write path: ensure critical metadata (like index headers) can't be left in a corrupted state after an interrupted write
- Read path: add basic sanity checks on deserialized values before allocating memory, so corruption surfaces as a clear error rather than an OOM
crash
Ideally both should exist — write-path prevention to minimize corruption risk, and read-path validation as a safety net when corruption does happen
(disk errors, incomplete file copies, etc.).
I'm not sure which scenario caused the corruption in my case (could be a killed process during write, or a file transfer issue), but either way the
current behavior of silently allocating 17 GB and getting OOM-killed makes diagnosis very difficult. A clear error message at load time would have
saved hours of debugging.
Are there known steps to reproduce?
No response
Ladybug version
No response
What operating system are you using?
No response
What happened?
When I opening a database file, if the on-disk
HashIndexHeaderOnDiskfor a table's primary key index contains corrupted data (e.g.,numEntries=16544419524162413700), the
PrimaryKeyIndexconstructor proceeds to allocate memory based on these garbage values, causing the process to consume 17+GB of RAM and get killed by the OS OOM killer. There is no validation or sanity check on the deserialized header values before allocation.
Discussion: Prevention Strategies
This issue raises a broader question about data integrity guarantees. There seem to be two complementary approaches to prevent this class of problem:
For preventing this class of issue, would it make sense to add protection on both sides:
- Write path: ensure critical metadata (like index headers) can't be left in a corrupted state after an interrupted write
- Read path: add basic sanity checks on deserialized values before allocating memory, so corruption surfaces as a clear error rather than an OOM
crash
Ideally both should exist — write-path prevention to minimize corruption risk, and read-path validation as a safety net when corruption does happen
(disk errors, incomplete file copies, etc.).
I'm not sure which scenario caused the corruption in my case (could be a killed process during write, or a file transfer issue), but either way the
current behavior of silently allocating 17 GB and getting OOM-killed makes diagnosis very difficult. A clear error message at load time would have
saved hours of debugging.
Are there known steps to reproduce?
No response