Given these definitions:
#define SIZE 10000000
int x[SIZE];
int y[SIZE];
int i;
int *p1, *p2;
Which loop is more efficient? Why?
  1. for (i = 0; i < SIZE; i++) x[i] = y[i];
  2. for (p1 = x, p2 = y; p1 - x < SIZE; ) *p1++ = *p2++;
  3. for (p1 = x, p2 = y; p1 < &x[SIZE]; ) *p1++ = *p2++;
  4. register int *p1, *p2; for (p1 = x, p2 = y; p1 < &x[SIZE]; ) *p1++ = *p2++;
Average time for each loop to copy 10,000,000 elements on a 800 MHz PIII with 256 MB of RAM. Time is in milliseconds. (Your mileage may vary).
          Debug    Release
----------------------------
Loop1      377       248
Loop2      380       378
Loop3      379       376
Loop4      380       289
All data: (tests correspond to Loops)
Debug
 1:  test1 = 381  test2 = 380  test3 = 381  test4 = 370
 2:  test1 = 371  test2 = 390  test3 = 371  test4 = 380
 3:  test1 = 370  test2 = 401  test3 = 380  test4 = 381
 4:  test1 = 370  test2 = 371  test3 = 390  test4 = 371
 5:  test1 = 381  test2 = 370  test3 = 381  test4 = 380
 6:  test1 = 401  test2 = 370  test3 = 381  test4 = 380
 7:  test1 = 380  test2 = 381  test3 = 370  test4 = 381
 8:  test1 = 381  test2 = 370  test3 = 381  test4 = 401
 9:  test1 = 371  test2 = 390  test3 = 381  test4 = 380
10:  test1 = 370  test2 = 381  test3 = 380  test4 = 381

Ave: test1 = 377, test2 = 380, test3 = 379, test4 = 380

Release
 1:  test1 = 250  test2 = 371  test3 = 380  test4 = 281
 2:  test1 = 250  test2 = 391  test3 = 371  test4 = 290
 3:  test1 = 251  test2 = 370  test3 = 381  test4 = 290
 4:  test1 = 260  test2 = 371  test3 = 370  test4 = 301
 5:  test1 = 250  test2 = 391  test3 = 380  test4 = 281
 6:  test1 = 251  test2 = 370  test3 = 371  test4 = 300
 7:  test1 = 241  test2 = 380  test3 = 371  test4 = 290
 8:  test1 = 240  test2 = 381  test3 = 390  test4 = 291
 9:  test1 = 250  test2 = 381  test3 = 370  test4 = 291
10:  test1 = 241  test2 = 380  test3 = 381  test4 = 280

Ave: test1 = 248, test2 = 378, test3 = 376, test4 = 289
Assembly listing (modified)
C code

Moral of the Story: