I have a kernel code which is taking 8 ms. Kernel code is large, i want to know which line or part of kernel is causing bottleneck?

What is the best way to identify bottleneck inside kernel?

Note: I am using AMD machine.