There's another strategy that has good storage efficiency and keeps the ability ...

dist1ll · on Sept 18, 2023

Storing byte offset inline, great idea. Thanks for mentioning that.

Just note one disadvantage: if the byte offset is stored in memory, you have a data dependence on the traversal. So even though it's cache-friendly, it may cause serious memory stalls in the processor's pipeline.

dan-robertson · on Sept 18, 2023

One trick for this is to leverage the branch predictor to guess where the next item will be (I first saw this trick for linked list traversal as in practice linked lists are often laid out linearly). Something like

  // or guess based on the size of the current item
  let guessed_next_offset = current_offset + (current_offset - previous_offset);
  let actual_next_offset = byte_offsets[++i];
  previous_offset = current_offset;
  current_offset = guessed_next_offset;
  if(current_offset != actual_next_offset)
    // need to make sure the compiler doesn’t ‘optimise out’ this if and always run the below line
    current_offset = actual_next_offset;
  // ...

In the ordinary case, where the guessed offset is correct, the cpu predicts the branch is not taken, and no data dependency of ... on actual_next_offset is introduced. If the branch is mispredicted then that speculatively executed work with the wrong current_offset is dropped. This is a bit slow but in the case where this happens all the time, the branch predictor just gives you the slightly bad perf of the traversal with the data dependency (computing the guess will be negligible) except you pay if the guess was actually right.

jadbox · on Sept 18, 2023

I'd love for someone to benchmark these variants to tease out their actual characteristics (particularly for x86).

pornel · on Sept 18, 2023

But if you need to mutate such collection, you’ll end up writing a memory allocator (to handle removals, changes to larger variant, and to deal with fragmentation).

loeg · on Sept 18, 2023

Offsets can add a significant amount of space relative to small T, if their size is not optimized (e.g., 64-bit size_t and uint8_t T). So as long as we are careful about offset size this seems like a reasonable approach.

dan-robertson · on Sept 18, 2023

Indeed part of this is pointed out in the video about Carbon linked in the OP: juxtaposition gives you a ‘free’ pointer going from each node of your data structure and for some tasks you don’t really need more than that.