Hi Rahaul and Marc,
Thank you very much for the help.
I tried to separate bn_mul_mont() implementation (openssl/crypto/bn/asm/armv4-mont.pl) from openssl and wrote my own test program to test it, my test runs OK without issue. So I guess there is something else cause the system slowness, most likely the root is still in kernel space (a driver), which cause whole system hang, but oprofile seems cannot detect the root. Any idea how to debug this?
Regards.
Yong