Thanks! What kind of performance do you get with some of those more intricate pointer structures? My own research right now is focused on how to build some common abstract data types with less pointer indirection, so as to be more easily vectorizable.
Lately I've been more focused on getting the features running and less on performance, so I haven't completely explored the impact of using these structures.
I'd be interested to hear more about what you're doing. Do you have any links you could share?
I've actually only really just started, I'm less than a year into my PhD, so I don't have anything published yet. My current project involves doing some non-trivial calculations on a large matrix of multi-precision integers - the obvious way to lay out the data (an array of dynamically resizable vectors) is dead wrong for GPU (the memory bandwidth is completely swamped by fetching from a different pointer for each thread - I got 100x slowdown from the CPU reference implementation), so right now I'm redoing the experiments based on a vector of fixed-length arrays (the 0th elements of all the vectors from the first formulation go in one array, the 1st elements in the next, and so forth).
I guess if you generalize this transformation sufficiently you end up doing whole-program flattening like NESL and DPH. Or is there a less intrusive way to pull that off?
Sounds like in his case the unboxed vector soa rep might be enough, though I'd enjoy hearing more about this too. (Doing some gpu work with Haskell a bunch this fall)
I don't think there's a less intrusive way to pull it off, but with sufficient compiler support you should be able to make it fairly transparent to the programmer.