Profiling & Debugging Problems on Modern Linux in the Cloud
In the presenter's opinion, there are a number of different layers to performance problems, much like an onion. There are problems that are easy to fix using the information provided by the Oracle Wait Interface. The Wait Interface has been helping DBAs for a long time, and despite some shortcomings in timings it is extremely useful in working out what the system does when off-CPU. This is a tricky area to debug if the application (in this case the Oracle database) is not instrumented!
The second layer is one below: if the Wait Interface doesn't provide insights on what's happening and the session seems stuck on the same wait event, performance counters can help. With Oracle 12c Release 2 more than a thousand performance counters are recorded on session level, providing invaluable insights into session activity.
But sometimes, even that is not enough: maybe a process is "stuck" on the O/S level or similar mischief occurred. Enter the O/S profiler and tracers! There are quite a few of them, and many of them bring their own frameworks. Whilst the situation is pretty easy to comprehend on Solaris (DTRace!) there are far too many profilers on Linux: strace, perf, ftrace, SystemTap, a DTrace port, and most recently eBPF. This talk is about a quick recap of the Wait Interface and session counters before the audience is introduced to kernel profiling. A few examples of profiling and where it helped are given, including stack tracing and heat maps.