FastMove: Optimizing System.Move

This post describes a technique that can be used to optimize some RTL functions at run-time. At the end of the post there’s a unit; add it to your uses list (preferably as the first one) and the System.Move routine will gain a significant speed boost. The unit will patch up the Move routine at run-time with MMX/SSE/SSE2/SSE3 versions depending on what SIMD sets your CPU supports.

It performs these steps at run-time (in initialization section):

  1. Detect the supported SIMD sets using CPUID instruction. (see GetSupportedSimdInstructionSets, CPUIIDSupports, CPUID functions)
  2. Detect the L2 cache size of your CPU to be used in optimizing the moving operation. (see GetL2CacheSize and GetExtendedL2CacheSize functions)
  3. Depending on what SIMD sets are supported the best function variant is used (SSE3, then SSE2, SSE and MMX as the last option).
  4. If your CPU doesn’t have support for any of those SIMD sets the original Move is left untouched.
  5. After the proper variant of the function has been selected, the System.Move routine is patched and a “JMP NEW_OPTIMIZED_VERSION” jump is written over the first instructions — this forces the use of our optimized versions. (see PatchMethod function)
  6. VirtualProtect windows function to un-protect the address space in which System.Move routine resides and then restore the protection.

Of course I could have just exported those function variants and make them available to consumer code, but he interesting side-effect of run-time patching is that all code in Delphi that uses Move will get that speed boost, including RTL and VCL units. Anyway, no more words, get your copy now (while it’s hot):
(Available under LGPL License – the original incense of the YAWE project)

Note: All SIMD versions of Move method were written by Seth and initially included in our YAWE project. You can find much more interesting stuff if you look at the code we’ve laid down there.