Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's another strategy that has good storage efficiency and keeps the ability to iterate over elements. vec #1 is a list of tags, vec #2 keeps the byte offset of each element, and vec #3 (not really a vec) is the packed variant data pointed to by vec #2. This:

- is half the number of vecs of the author's final solution (6 vs 3)

- wastes no bytes for padding (unless needed for alignment)

- lets you iterate in a cache-friendly way since the data is laid out in order in memory regardless of type

- even lets you access items by index in O(1)

Generally it has the same performance characteristics as a Vec<T>, but for heterogeneous data.



Storing byte offset inline, great idea. Thanks for mentioning that.

Just note one disadvantage: if the byte offset is stored in memory, you have a data dependence on the traversal. So even though it's cache-friendly, it may cause serious memory stalls in the processor's pipeline.


One trick for this is to leverage the branch predictor to guess where the next item will be (I first saw this trick for linked list traversal as in practice linked lists are often laid out linearly). Something like

  // or guess based on the size of the current item
  let guessed_next_offset = current_offset + (current_offset - previous_offset);
  let actual_next_offset = byte_offsets[++i];
  previous_offset = current_offset;
  current_offset = guessed_next_offset;
  if(current_offset != actual_next_offset)
    // need to make sure the compiler doesn’t ‘optimise out’ this if and always run the below line
    current_offset = actual_next_offset;
  // ...
In the ordinary case, where the guessed offset is correct, the cpu predicts the branch is not taken, and no data dependency of ... on actual_next_offset is introduced. If the branch is mispredicted then that speculatively executed work with the wrong current_offset is dropped. This is a bit slow but in the case where this happens all the time, the branch predictor just gives you the slightly bad perf of the traversal with the data dependency (computing the guess will be negligible) except you pay if the guess was actually right.


I'd love for someone to benchmark these variants to tease out their actual characteristics (particularly for x86).


But if you need to mutate such collection, you’ll end up writing a memory allocator (to handle removals, changes to larger variant, and to deal with fragmentation).


Offsets can add a significant amount of space relative to small T, if their size is not optimized (e.g., 64-bit size_t and uint8_t T). So as long as we are careful about offset size this seems like a reasonable approach.


Indeed part of this is pointed out in the video about Carbon linked in the OP: juxtaposition gives you a ‘free’ pointer going from each node of your data structure and for some tasks you don’t really need more than that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: