> If you declare
>   int a[10][10];
> you get 100 contiguous int slots and the compiler generates code to  
> do the address calculations. The rightmost subscript changes fastest.

Note that to keep the CPU data pipeline fed, iterate down the  
contiguous memory  in the
inner loop.

