By default, in the am335x u-boot supplied in the SDK (ti-sdk-am335x-evm-06.00.00.00/board-support/u-boot-2013.01.01-psp06.00.00.00), the instruction cache is enabled and the data cache is disabled. This is due to DMA coherency issues somewhere in the drivers (see the u-boot git history). I'm not sure where the problem lies.
I found that for my configuration (NAND), I could enable dcache in u-boot (but not in u-boot SPL!), and it did speed things up.