Nutster, on 05 August 2009 - 01:03 PM, said:
Remember that Intel's Pentium is not the only processor for which there is a C compiler.

The details of how other processors do things will be slightly different. For example, check out
Sun's UltraSPARC.
Very true, I had made the assumption that the code was being used for Windows programming - the generated code/speed may very well differ for the two functions on other processors.
Nutster, on 05 August 2009 - 01:03 PM, said:
Even using the method you outline, there are several steps you skipped which will slow things down for my first (array dereferencing) method. The processor does not inherently know how big the array elements are; a relatively expensive multiplication must be done to determine the index location. Sure there can be less instructions, but how long does each instruction take? An indexed lookup is considerably slower than a direct lookup; this may be hard to tell using instruction pipelining processors, but with older processors, like a 6510, an indexed lookup took 7 clock cycles, while a direct look-up took only 3. Okay, give me a break; it's the only one I remember off the top of my head.
I was working with what you gave as an example, the char * - which is either ANSI or Unicode, and for that particular case, I didn't really miss anything. However, if you do consider arrays of larger (non-string) types, then yes - a multiplication or extra addition would be needed. And you might be right about the
older processors clock-cycle-wise, but simple indexing like in the function you posted wouldn't be costly at all (at least on any x86 architecture since, I dunno - the 386?

).
Nutster, on 05 August 2009 - 01:03 PM, said:
Using array index dereferencing, the general steps required are (described in a processor independent manner):
[ code='text' ] (
Popup )
load i -> Reg(1) // index
load sizeof(element) -> Reg(2) // element size. Determined at compile time.
mult Reg(1), Reg(2) -> Reg(1) // Determine byte-count within array for start of element. On some processors this must be done in software (much slower!).
load array -> Reg(2) // get pointer to start of array
add Reg(1), Reg(2) -> Reg(1) // Add the locations of array with location in array
load Reg(1) -> Reg(2) // Get the actual data in the array.
Using pointer dereferencing, the general steps required are:
[ code='text' ] (
Popup )
load s1 -> Reg(1) // load the pointer into a register
load Reg(1) -> Reg(2) // Done.
Okay, a few notes:
1. (array index dereferencing) If you are doing a loop (and you have a halfway decent compiler), the base pointer won't need to be loaded each time through the loop - just once at the start. The indexed reference (in general terms) would indeed get the multiply and add, but indexed lookup like [esi+ebx] (on x86 cpu's) could cut off at least one instruction
2. (pointer dereferencing) If you are doing one-by-one increments, or a specific hardcoded # of increments (in a loop), then you'll just need to add '
add s1, sizeof(element) * #-to-increment'. If you are using structures/classes, the size of the object would need to be loaded (due to inheritance), and then multiplied* - putting you back to the code in (array index dereferencing). However, if its a fixed-size datatype, the add calculation should be decided at runtime.
3. (pointer dereferencing) Additionally, if you don't know ahead of time the amount to increment the pointer, or the amount is stored in a variable - you'll need everything in (array index dereferencing).
In summary: pointer dereferencing will probably only ever be faster (in the general sense) for fixed C++ datatypes (int, char, etc). Otherwise, with structures/classes, I don't see your code making a difference
*edit: regarding #2 - actually, for structures/classes I don't know how C++ would get the next item, that's actually got me wondering now...
Edited by Ascend4nt, 05 August 2009 - 02:32 PM.