Are the ref counts themselves included in the "waste" that is shown, or are they counted as "requested memory?"
--
You received this message because you are subscribed to the Google Groups "memory-safety-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/memory-safety-dev/CO1PR11MB477090C2133499B43BC593A9FA699%40CO1PR11MB4770.namprd11.prod.outlook.com.
For more options, visit https://groups.google.com/a/chromium.org/d/optout.
| As you noted, it'd make it harder to calculate ref-count location, but also slot start for regular allocation requests. It'd also throw off the algorithm that tightly packs into multi-page slot spans. More importantly, it'd also throw off AlignedAlloc which relies on power-of-2-sized slots to be power-of-2-aligned.
What if the ref counts were packed into reserved slots at intervals that depend on the slot size? This would avoid the need to change any slot boundaries. For example, for 16-byte slots, every fifth slot could store ref counts for the preceding four slots. For 32-byte slots, every ninth slot could store ref counts for the preceding eight slots, and so forth. Of course, this would further increase the distance between a slot and its corresponding ref count in some cases, which would further reduce cache locality between data and metadata. However, some ref counts would still fit in the same cachelines as their associated data. Many ref counts would also at least fit on the same pages as their associated data.
There would probably also be some slot size threshold beyond which it would be preferable to store the ref counts in the bitmaps region of the super page to avoid needing to reserve a huge slot for ref counts.
| If there are multiple ref-count next to each, they'll share a cacheline, causing it to ping-pong between processors.
Could the memory savings from clustering ref counts help to motivate potentially enhancing PA to group allocations for a thread in a contiguous portion of each bucket when possible?
Thanks,
Michael