Skip to content

Expose structural indexes#449

Open
mitghi wants to merge 1 commit intosimd-lite:mainfrom
mitghi:expose_structural_indexes
Open

Expose structural indexes#449
mitghi wants to merge 1 commit intosimd-lite:mainfrom
mitghi:expose_structural_indexes

Conversation

@mitghi
Copy link
Copy Markdown

@mitghi mitghi commented Apr 29, 2026

Hello 👋,

Thanks for this great library.

I would like to ask if its suitable to expose the structural indexes of Stage 1?
Stage 1 already computes the byte offsets of every json structural character.
I've been writing a tool that wants to reuse those offsets rather than recompute them,
otherwise the work which simd-json already does needs to done again in order to build
structural indexes. I find this to useful for cases such as deep scan on byte buffer, and based
on a byte's position, the structural index could be used to build a map from byte range to simd-json
Tape and other useful algorithms that this data enables.

This is an example for the use-case, every value's byte span derived correctly from
Buffers::structural_indexes() + tape walk and given a byte index, the inner object associated
with that position can be found.

  Input ({} bytes):
    {"name":"alice","age":30,"tags":["x","y","z"],"profile":{"city":"NYC","zip":"10001"},"score":99.5,"active":true,"note":null}
     ^pos=3              ^pos=23              ^pos=60          ^pos=78        ^pos=100              ^pos=120

  Derived byte spans for every value:
    [ 0] Object  bytes [  0..124) = {"name":"alice", ... ,"note":null}
    [ 1] String  key="name"    bytes [  8..15) = "alice"
    [ 2] Scalar  key="age"     bytes [ 22..24) = 30
    [ 3] Array   key="tags"    bytes [ 32..45) = ["x","y","z"]
    [ 4] String                bytes [ 33..36) = "x"
    [ 5] String                bytes [ 37..40) = "y"
    [ 6] String                bytes [ 41..44) = "z"
    [ 7] Object  key="profile" bytes [ 56..84) = {"city":"NYC","zip":"10001"}
    [ 8] String  key="city"    bytes [ 64..69) = "NYC"
    [ 9] String  key="zip"     bytes [ 76..83) = "10001"
    [10] Scalar  key="score"   bytes [ 93..97) = 99.5
    [11] Scalar  key="active"  bytes [107..111) = true
    [12] Scalar  key="note"    bytes [119..123) = null

  === Practical: zero-copy passthrough output ===
    raw value of "tags":    ["x","y","z"]              span [32, 45)  len 13
    raw value of "profile": {"city":"NYC","zip":"10001"} span [56, 84)  len 28


  byte   3 → Object (root) span [0..124) len 124        # inside "name"
  byte  11 → Object (root) span [0..124) len 124        # inside "alice"
  byte  23 → Object (root) span [0..124) len 124        # inside the 30
  byte  35 → Object (root) span [0..124) len 124        # inside ["x","y","z"] — array's parent is root
  byte  60 → Object key="profile" span [56..84) len 28  # inside profile.city
  byte  78 → Object key="profile" span [56..84) len 28  # inside profile.zip
  byte 100 → Object (root) span [0..124) len 124        # inside score 99.5
  byte 120 → Object (root) span [0..124) len 124        # inside note null

I appreciate if you could please give me a feedback whether this change makes sense.

Thanks

Stage-1 already computes byte offsets of every JSON structural char.
Today the result lives in a private field; downstream callers must
either rerun stage-1 or unsafe-transmute the buffer to read it.

Adds a public read-only accessor:

    pub fn structural_indexes(&self) -> &[u32]

Zero alloc, zero copy. Slice valid until the next parse reusing the
same Buffers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant