Restructure and update documentation (#1029)
BIN
.assets/images/AggregatedView.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
.assets/images/AsyncProfiler.png
Normal file
|
After Width: | Height: | Size: 1.8 MiB |
BIN
.assets/images/ProfilerSamplings.png
Normal file
|
After Width: | Height: | Size: 52 KiB |
BIN
.assets/images/SortedSamplings.png
Normal file
|
After Width: | Height: | Size: 57 KiB |
BIN
.assets/images/collapsed_example.png
Normal file
|
After Width: | Height: | Size: 362 KiB |
BIN
.assets/images/comptask_feature.png
Normal file
|
After Width: | Height: | Size: 132 KiB |
|
Before Width: | Height: | Size: 68 KiB After Width: | Height: | Size: 68 KiB |
BIN
.assets/images/flamegraph_example.png
Normal file
|
After Width: | Height: | Size: 475 KiB |
BIN
.assets/images/pcaddr_feature.png
Normal file
|
After Width: | Height: | Size: 271 KiB |
BIN
.assets/images/treeview_example.png
Normal file
|
After Width: | Height: | Size: 166 KiB |
BIN
.assets/images/vtable_feature.png
Normal file
|
After Width: | Height: | Size: 19 KiB |
648
README.md
@@ -1,4 +1,6 @@
|
||||
# async-profiler
|
||||

|
||||
|
||||
# About
|
||||
|
||||
This project is a low overhead sampling profiler for Java
|
||||
that does not suffer from [Safepoint bias problem](http://psy-lob-saw.blogspot.ru/2016/02/why-most-sampling-java-profilers-are.html).
|
||||
@@ -7,23 +9,25 @@ and to track memory allocations. The profiler works with
|
||||
OpenJDK and other Java runtimes based on the HotSpot JVM.
|
||||
|
||||
async-profiler can trace the following kinds of events:
|
||||
- CPU cycles
|
||||
- Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
|
||||
- Allocations in Java Heap
|
||||
- Contented lock attempts, including both Java object monitors and ReentrantLocks
|
||||
|
||||
- CPU cycles
|
||||
- Hardware and Software performance counters like cache misses, branch misses, page faults, context switches etc.
|
||||
- Allocations in Java Heap
|
||||
- Contented lock attempts, including both Java object monitors and ReentrantLocks
|
||||
and [more](https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingModes.md).
|
||||
|
||||
See our [3 hours playlist](https://www.youtube.com/playlist?list=PLNCLTEx3B8h4Yo_WvKWdLvI9mj1XpTKBr)
|
||||
to learn about more features.
|
||||
to learn about more features.
|
||||
|
||||
## Download
|
||||
# Download
|
||||
|
||||
Current release (3.0):
|
||||
|
||||
- Linux x64: [async-profiler-3.0-linux-x64.tar.gz](https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz)
|
||||
- Linux arm64: [async-profiler-3.0-linux-arm64.tar.gz](https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-arm64.tar.gz)
|
||||
- macOS x64/arm64: [async-profiler-3.0-macos.zip](https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-macos.zip)
|
||||
- Converters between profile formats: [converter.jar](https://github.com/async-profiler/async-profiler/releases/download/v3.0/converter.jar)
|
||||
(JFR to Flame Graph, JFR to pprof, collapsed stacks to Flame Graph)
|
||||
- Linux x64: [async-profiler-3.0-linux-x64.tar.gz](https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz)
|
||||
- Linux arm64: [async-profiler-3.0-linux-arm64.tar.gz](https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-arm64.tar.gz)
|
||||
- macOS x64/arm64: [async-profiler-3.0-macos.zip](https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-macos.zip)
|
||||
- Converters between profile formats: [converter.jar](https://github.com/async-profiler/async-profiler/releases/download/v3.0/converter.jar)
|
||||
(JFR to Flame Graph, JFR to pprof, collapsed stacks to Flame Graph)
|
||||
|
||||
[Previous releases](https://github.com/async-profiler/async-profiler/releases)
|
||||
|
||||
@@ -32,610 +36,62 @@ For more information refer to [IntelliJ IDEA documentation](https://www.jetbrain
|
||||
|
||||
[Nightly releases](https://github.com/async-profiler/async-profiler/releases/tag/nightly) (published on each commit to master)
|
||||
|
||||
For the build corresponding to a previous commit, go to the corresponding `Publish Nightly Builds` Github Action and scroll down to the artifacts section. These binaries are kept for 30 days.
|
||||
For the build corresponding to a previous commit, go to
|
||||
[Nightly Builds](https://github.com/async-profiler/async-profiler/actions/workflows/test-and-publish-nightly.yml),
|
||||
click the desired build and scroll down to the artifacts section. These binaries are kept for 30 days.
|
||||
|
||||
## Supported platforms
|
||||
# Supported platforms
|
||||
|
||||
| | Officially maintained builds | Other available ports |
|
||||
|-----------|------------------------------|-------------------------------------------|
|
||||
| **Linux** | x64, arm64 | x86, arm32, ppc64le, riscv64, loongarch64 |
|
||||
| **macOS** | x64, arm64 | |
|
||||
|
||||
## CPU profiling
|
||||
# Quick start
|
||||
|
||||
In this mode profiler collects stack trace samples that include **Java** methods,
|
||||
**native** calls, **JVM** code and **kernel** functions.
|
||||
|
||||
The general approach is receiving call stacks generated by `perf_events`
|
||||
and matching them up with call stacks generated by `AsyncGetCallTrace`,
|
||||
in order to produce an accurate profile of both Java and native code.
|
||||
Additionally, async-profiler provides a workaround to recover stack traces
|
||||
in some [corner cases](https://bugs.openjdk.java.net/browse/JDK-8178287)
|
||||
where `AsyncGetCallTrace` fails.
|
||||
|
||||
This approach has the following advantages compared to using `perf_events`
|
||||
directly with a Java agent that translates addresses to Java method names:
|
||||
|
||||
* Does not require `-XX:+PreserveFramePointer`, which introduces
|
||||
performance overhead that can be sometimes as high as 10%.
|
||||
|
||||
* Does not require generating a map file for translating Java code addresses
|
||||
to method names.
|
||||
|
||||
* Displays interpreter frames.
|
||||
|
||||
* Does not produce large intermediate files (perf.data) for further processing in
|
||||
user space scripts.
|
||||
|
||||
If you wish to resolve frames within `libjvm`, the [debug symbols](#installing-debug-symbols) are required.
|
||||
|
||||
## ALLOCATION profiling
|
||||
|
||||
The profiler can be configured to collect call sites where the largest amount
|
||||
of heap memory is allocated.
|
||||
|
||||
async-profiler does not use intrusive techniques like bytecode instrumentation
|
||||
or expensive DTrace probes which have significant performance impact.
|
||||
It also does not affect Escape Analysis or prevent from JIT optimizations
|
||||
like allocation elimination. Only actual heap allocations are measured.
|
||||
|
||||
The profiler features TLAB-driven sampling. It relies on HotSpot-specific
|
||||
callbacks to receive two kinds of notifications:
|
||||
- when an object is allocated in a newly created TLAB;
|
||||
- when an object is allocated on a slow path outside TLAB.
|
||||
|
||||
Sampling interval can be adjusted with `--alloc` option.
|
||||
For example, `--alloc 500k` will take one sample after 500 KB of allocated
|
||||
space on average. Prior to JDK 11, intervals less than TLAB size will not take effect.
|
||||
|
||||
### Installing Debug Symbols
|
||||
|
||||
Prior to JDK 11, the allocation profiler required HotSpot debug symbols.
|
||||
Some OpenJDK distributions (Amazon Corretto, Liberica JDK, Azul Zulu)
|
||||
already have them embedded in `libjvm.so`, other OpenJDK builds typically
|
||||
provide debug symbols in a separate package. For example, to install
|
||||
OpenJDK debug symbols on Debian / Ubuntu, run:
|
||||
In a typical use case, profiling a Java application is just a matter of a running `asprof` with a PID of a
|
||||
running Java process.
|
||||
```
|
||||
# apt install openjdk-17-dbg
|
||||
$ asprof -d 30 -f /tmp/flamegraph.html <PID>
|
||||
```
|
||||
(replace `17` with the desired version of JDK).
|
||||
The above command translates to: After running profiler for 30 seconds, results will be saved to `/tmp/flamegraph.html`
|
||||
as an interactive `Flame Graph` that can be viewed in a browser.
|
||||
|
||||
On CentOS, RHEL and some other RPM-based distributions, this could be done with
|
||||
[debuginfo-install](http://man7.org/linux/man-pages/man1/debuginfo-install.1.html) utility:
|
||||
```
|
||||
# debuginfo-install java-1.8.0-openjdk
|
||||
```
|
||||
[](https://htmlpreview.github.io/?https://github.com/async-profiler/async-profiler/blob/master/.assets/html/flamegraph.html)
|
||||
|
||||
On Gentoo the `icedtea` OpenJDK package can be built with the per-package setting
|
||||
`FEATURES="nostrip"` to retain symbols.
|
||||
# Documentation
|
||||
|
||||
The `gdb` tool can be used to verify if debug symbols are properly installed for the `libjvm` library.
|
||||
For example, on Linux:
|
||||
```
|
||||
$ gdb $JAVA_HOME/lib/server/libjvm.so -ex 'info address UseG1GC'
|
||||
```
|
||||
This command's output will either contain `Symbol "UseG1GC" is at 0xxxxx`
|
||||
or `No symbol "UseG1GC" in current context`.
|
||||
## Basic usage
|
||||
|
||||
## Wall-clock profiling
|
||||
* [Getting Started](https://github.com/async-profiler/async-profiler/blob/master/docs/GettingStarted.md)
|
||||
* [Profiler Options](https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilerOptions.md)
|
||||
* [Profiling Modes](https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingModes.md)
|
||||
* [Integrating async-profiler](https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md)
|
||||
* [Profiling In Container](https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md)
|
||||
|
||||
`-e wall` option tells async-profiler to sample all threads equally every given
|
||||
period of time regardless of thread status: Running, Sleeping or Blocked.
|
||||
For instance, this can be helpful when profiling application start-up time.
|
||||
## Profiler output
|
||||
|
||||
Wall-clock profiler is most useful in per-thread mode: `-t`.
|
||||
* [Output Formats](https://github.com/async-profiler/async-profiler/blob/master/docs/OutputFormats.md)
|
||||
* [FlameGraph Interpretation](https://github.com/async-profiler/async-profiler/blob/master/docs/FlamegraphInterpretation.md)
|
||||
* [JFR Visualization](https://github.com/async-profiler/async-profiler/blob/master/docs/JfrVisualization.md)
|
||||
* [Converter Usage](https://github.com/async-profiler/async-profiler/blob/master/docs/ConverterUsage.md)
|
||||
|
||||
Example: `asprof -e wall -t -i 5ms -f result.html 8983`
|
||||
## Advanced usage
|
||||
|
||||
## Java method profiling
|
||||
* [CPU Sampling Engines](https://github.com/async-profiler/async-profiler/blob/master/docs/CpuSamplingEngines.md)
|
||||
* [StackWalkingModes](https://github.com/async-profiler/async-profiler/blob/master/docs/StackWalkingModes.md)
|
||||
* [Advanced Stacktrace Features](https://github.com/async-profiler/async-profiler/blob/master/docs/AdvancedStacktraceFeatures.md)
|
||||
|
||||
`-e ClassName.methodName` option instruments the given Java method
|
||||
in order to record all invocations of this method with the stack traces.
|
||||
# Profiling Non-Java applications
|
||||
|
||||
Example: `-e java.util.Properties.getProperty` will profile all places
|
||||
where `getProperty` method is called from.
|
||||
Unlike traditional Java profilers, async-profiler monitors non-Java threads (e.g., GC threads)
|
||||
and also shows native frames in Java stack traces. This enables it to work with C and C++
|
||||
applications.
|
||||
|
||||
Only non-native Java methods are supported. To profile a native method,
|
||||
use hardware breakpoint event instead, e.g. `-e Java_java_lang_Throwable_fillInStackTrace`
|
||||
For more details, please refer to
|
||||
[Profiling Non-Java Applications](https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingNonJavaApplications.md).
|
||||
|
||||
**Be aware** that if you attach async-profiler at runtime, the first instrumentation
|
||||
of a non-native Java method may cause the [deoptimization](https://github.com/openjdk/jdk/blob/bf2e9ee9d321ed289466b2410f12ad10504d01a2/src/hotspot/share/prims/jvmtiRedefineClasses.cpp#L4092-L4096)
|
||||
of all compiled methods. The subsequent instrumentation flushes only the _dependent code_.
|
||||
# Troubleshooting
|
||||
|
||||
The massive CodeCache flush doesn't occur if attaching async-profiler as an agent.
|
||||
|
||||
Here are some useful native methods that you may want to profile:
|
||||
* ```G1CollectedHeap::humongous_obj_allocate``` - trace _humongous allocations_ of the G1 GC,
|
||||
* ```JVM_StartThread``` - trace creation of new Java threads,
|
||||
* ```Java_java_lang_ClassLoader_defineClass1``` - trace class loading.
|
||||
|
||||
## Building
|
||||
|
||||
Build status: [](https://github.com/async-profiler/async-profiler/actions/workflows/test-and-publish-nightly.yml)
|
||||
|
||||
Make sure the `JAVA_HOME` environment variable points to your JDK installation,
|
||||
and then run `make`. GCC or Clang is required. After building, the profiler binaries
|
||||
will be in the `build` subdirectory.
|
||||
|
||||
## Basic Usage
|
||||
|
||||
As of Linux 4.6, capturing kernel call stacks using `perf_events` from a non-root
|
||||
process requires setting two runtime variables. You can set them using
|
||||
sysctl or as follows:
|
||||
|
||||
```
|
||||
# sysctl kernel.perf_event_paranoid=1
|
||||
# sysctl kernel.kptr_restrict=0
|
||||
```
|
||||
|
||||
async-profiler works in the context of the target Java application,
|
||||
i.e. it runs as an agent in the process being profiled.
|
||||
`asprof` is a tool to attach and control the agent.
|
||||
|
||||
A typical workflow would be to launch your Java application, attach
|
||||
the agent and start profiling, exercise your performance scenario, and
|
||||
then stop profiling. The agent's output, including the profiling results, will
|
||||
be displayed on the console where you've started `asprof`.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
$ jps
|
||||
9234 Jps
|
||||
8983 Computey
|
||||
$ asprof start 8983
|
||||
$ asprof stop 8983
|
||||
```
|
||||
|
||||
The following may be used in lieu of the `pid` (8983):
|
||||
|
||||
- The keyword `jps`, which will use the most recently launched Java process.
|
||||
- The application name as it appears in the `jps` output: e.g. `Computey`
|
||||
|
||||
Alternatively, you may specify `-d` (duration) argument to profile
|
||||
the application for a fixed period of time with a single command.
|
||||
|
||||
```
|
||||
$ asprof -d 30 8983
|
||||
```
|
||||
|
||||
By default, the profiling frequency is 100Hz (every 10ms of CPU time).
|
||||
Here is a sample of the output printed to the Java application's terminal:
|
||||
|
||||
```
|
||||
--- Execution profile ---
|
||||
Total samples: 687
|
||||
Unknown (native): 1 (0.15%)
|
||||
|
||||
--- 6790000000 (98.84%) ns, 679 samples
|
||||
[ 0] Primes.isPrime
|
||||
[ 1] Primes.primesThread
|
||||
[ 2] Primes.access$000
|
||||
[ 3] Primes$1.run
|
||||
[ 4] java.lang.Thread.run
|
||||
|
||||
... a lot of output omitted for brevity ...
|
||||
|
||||
ns percent samples top
|
||||
---------- ------- ------- ---
|
||||
6790000000 98.84% 679 Primes.isPrime
|
||||
40000000 0.58% 4 __do_softirq
|
||||
|
||||
... more output omitted ...
|
||||
```
|
||||
|
||||
This indicates that the hottest method was `Primes.isPrime`, and the hottest
|
||||
call stack leading to it comes from `Primes.primesThread`.
|
||||
|
||||
## Launching as an Agent
|
||||
|
||||
If you need to profile some code as soon as the JVM starts up, instead of using the `asprof`,
|
||||
it is possible to attach async-profiler as an agent on the command line. For example:
|
||||
|
||||
```
|
||||
$ java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=profile.html ...
|
||||
```
|
||||
|
||||
Agent library is configured through the JVMTI argument interface.
|
||||
The format of the arguments string is described
|
||||
[in the source code](https://github.com/async-profiler/async-profiler/blob/v3.0/src/arguments.cpp#L44).
|
||||
`asprof` actually converts command line arguments to that format.
|
||||
|
||||
For instance, `-e wall` is converted to `event=wall`, `-f profile.html`
|
||||
is converted to `file=profile.html`, and so on. However, some arguments are processed
|
||||
directly by `asprof`. E.g. `-d 5` results in 3 actions:
|
||||
attaching profiler agent with start command, sleeping for 5 seconds,
|
||||
and then attaching the agent again with stop command.
|
||||
|
||||
## Multiple events
|
||||
|
||||
It is possible to profile CPU, allocations, and locks at the same time.
|
||||
Instead of CPU, you may choose any other execution event: wall-clock,
|
||||
perf event, tracepoint, Java method, etc.
|
||||
|
||||
The only output format that supports multiple events together is JFR.
|
||||
The recording will contain the following event types:
|
||||
- `jdk.ExecutionSample`
|
||||
- `jdk.ObjectAllocationInNewTLAB` (alloc)
|
||||
- `jdk.ObjectAllocationOutsideTLAB` (alloc)
|
||||
- `jdk.JavaMonitorEnter` (lock)
|
||||
- `jdk.ThreadPark` (lock)
|
||||
|
||||
To start profiling cpu + allocations + locks together, specify
|
||||
```
|
||||
asprof -e cpu,alloc,lock -f profile.jfr ...
|
||||
```
|
||||
or use `--alloc` and `--lock` parameters with the desired threshold:
|
||||
```
|
||||
asprof -e cpu --alloc 2m --lock 10ms -f profile.jfr ...
|
||||
```
|
||||
The same, when starting profiler as an agent:
|
||||
```
|
||||
-agentpath:/path/to/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=profile.jfr
|
||||
```
|
||||
|
||||
## Flame Graph visualization
|
||||
|
||||
async-profiler provides out-of-the-box [Flame Graph](https://github.com/BrendanGregg/FlameGraph) support.
|
||||
Specify `-o flamegraph` argument to dump profiling results as an interactive HTML Flame Graph.
|
||||
Also, Flame Graph output format will be chosen automatically if the target filename ends with `.html`.
|
||||
|
||||
```
|
||||
$ jps
|
||||
9234 Jps
|
||||
8983 Computey
|
||||
$ asprof -d 30 -f /tmp/flamegraph.html 8983
|
||||
```
|
||||
|
||||
[](https://htmlpreview.github.io/?https://github.com/async-profiler/async-profiler/blob/master/demo/flamegraph.html)
|
||||
|
||||
## Profiler Options
|
||||
|
||||
`asprof` command-line options.
|
||||
|
||||
* `start` - starts profiling in semi-automatic mode, i.e. profiler will run
|
||||
until `stop` command is explicitly called.
|
||||
|
||||
* `resume` - starts or resumes earlier profiling session that has been stopped.
|
||||
All the collected data remains valid. The profiling options are not preserved
|
||||
between sessions, and should be specified again.
|
||||
|
||||
* `stop` - stops profiling and prints the report.
|
||||
|
||||
* `dump` - dump collected data without stopping profiling session.
|
||||
|
||||
* `check` - check if the specified profiling event is available.
|
||||
|
||||
* `status` - prints profiling status: whether profiler is active and
|
||||
for how long.
|
||||
|
||||
* `meminfo` - prints used memory statistics.
|
||||
|
||||
* `list` - show the list of profiling events available for the target process
|
||||
(if PID is specified) or for the default JVM.
|
||||
|
||||
* `-d N` - the profiling duration, in seconds. If no `start`, `resume`, `stop`
|
||||
or `status` option is given, the profiler will run for the specified period
|
||||
of time and then automatically stop.
|
||||
Example: `asprof -d 30 8983`
|
||||
|
||||
* `-e event` - the profiling event: `cpu`, `alloc`, `lock`, `cache-misses` etc.
|
||||
Use `list` to see the complete list of available events.
|
||||
|
||||
In allocation profiling mode the top frame of every call trace is the class
|
||||
of the allocated object, and the counter is the heap pressure (the total size
|
||||
of allocated TLABs or objects outside TLAB).
|
||||
|
||||
In lock profiling mode the top frame is the class of lock/monitor, and
|
||||
the counter is number of nanoseconds it took to enter this lock/monitor.
|
||||
|
||||
Two special event types are supported on Linux: hardware breakpoints
|
||||
and kernel tracepoints:
|
||||
- `-e mem:<func>[:rwx]` sets read/write/exec breakpoint at function
|
||||
`<func>`. The format of `mem` event is the same as in `perf-record`.
|
||||
Execution breakpoints can be also specified by the function name,
|
||||
e.g. `-e malloc` will trace all calls of native `malloc` function.
|
||||
- `-e trace:<id>` sets a kernel tracepoint. It is possible to specify
|
||||
tracepoint symbolic name, e.g. `-e syscalls:sys_enter_open` will trace
|
||||
all `open` syscalls.
|
||||
|
||||
* `-i N` - sets the profiling interval in nanoseconds or in other units,
|
||||
if N is followed by `ms` (for milliseconds), `us` (for microseconds),
|
||||
or `s` (for seconds). Only CPU active time is counted. No samples
|
||||
are collected while CPU is idle. The default is 10000000 (10ms).
|
||||
Example: `asprof -i 500us 8983`
|
||||
|
||||
* `--alloc N` - allocation profiling interval in bytes or in other units,
|
||||
if N is followed by `k` (kilobytes), `m` (megabytes), or `g` (gigabytes).
|
||||
|
||||
* `--live` - retain allocation samples with live objects only
|
||||
(object that have not been collected by the end of profiling session).
|
||||
Useful for finding Java heap memory leaks.
|
||||
|
||||
* `--lock N` - lock profiling threshold in nanoseconds (or other units).
|
||||
In lock profiling mode, sample contended locks when total lock duration
|
||||
overflows the threshold.
|
||||
|
||||
* `-j N` - sets the maximum stack depth. The default is 2048.
|
||||
Example: `asprof -j 30 8983`
|
||||
|
||||
* `-t` - profile threads separately. Each stack trace will end with a frame
|
||||
that denotes a single thread.
|
||||
Example: `asprof -t 8983`
|
||||
|
||||
* `-s` - print simple class names instead of FQN.
|
||||
|
||||
* `-n` - normalize names of hidden classes / lambdas.
|
||||
|
||||
* `-g` - print method signatures.
|
||||
|
||||
* `-a` - annotate JIT compiled methods with `_[j]`, inlined methods with `_[i]`, interpreted methods with `_[0]` and C1 compiled methods with `_[1]`.
|
||||
|
||||
* `-l` - prepend library names to symbols, e.g. ``libjvm.so`JVM_DefineClassWithSource``.
|
||||
|
||||
* `-o fmt` - specifies what information to dump when profiling ends.
|
||||
`fmt` can be one of the following options:
|
||||
- `traces[=N]` - dump call traces (at most N samples);
|
||||
- `flat[=N]` - dump flat profile (top N hot methods);
|
||||
can be combined with `traces`, e.g. `traces=200,flat=200`
|
||||
- `jfr` - dump events in Java Flight Recorder format readable by Java Mission Control.
|
||||
This *does not* require JDK commercial features to be enabled.
|
||||
- `collapsed` - dump collapsed call traces in the format used by
|
||||
[FlameGraph](https://github.com/brendangregg/FlameGraph) script. This is
|
||||
a collection of call stacks, where each line is a semicolon separated list
|
||||
of frames followed by a counter.
|
||||
- `flamegraph` - produce Flame Graph in HTML format.
|
||||
- `tree` - produce Call Tree in HTML format.
|
||||
`--reverse` option will generate backtrace view.
|
||||
|
||||
* `--total` - count the total value of the collected metric instead of the number of samples,
|
||||
e.g. total allocation size.
|
||||
|
||||
* `--chunksize N`, `--chunktime N` - approximate size and time limits for a single JFR chunk.
|
||||
A new chunk will be started whenever either limit is reached.
|
||||
The default `chunksize` is 100MB, and the default `chunktime` is 1 hour.
|
||||
Example: `asprof -f profile.jfr --chunksize 100m --chunktime 1h 8983`
|
||||
|
||||
* `-I include`, `-X exclude` - filter stack traces by the given pattern(s).
|
||||
`-I` defines the name pattern that *must* be present in the stack traces,
|
||||
while `-X` is the pattern that *must not* occur in any of stack traces in the output.
|
||||
`-I` and `-X` options can be specified multiple times. A pattern may begin or end with
|
||||
a star `*` that denotes any (possibly empty) sequence of characters.
|
||||
Example: `asprof -I 'Primes.*' -I 'java/*' -X '*Unsafe.park*' 8983`
|
||||
|
||||
* `-L level` - log level: `debug`, `info`, `warn`, `error` or `none`.
|
||||
|
||||
* `-F features` - comma separated list of HotSpot-specific features
|
||||
to include in stack traces. Supported features are:
|
||||
- `vtable` - display targets of megamorphic virtual calls as an extra frame
|
||||
on top of `vtable stub` or `itable stub`.
|
||||
- `comptask` - display current compilation task (a Java method being compiled)
|
||||
in a JIT compiler stack trace.
|
||||
|
||||
* `--title TITLE`, `--minwidth PERCENT`, `--reverse` - FlameGraph parameters.
|
||||
Example: `asprof -f profile.html --title "Sample CPU profile" --minwidth 0.5 8983`
|
||||
|
||||
* `-f FILENAME` - the file name to dump the profile information to.
|
||||
`%p` in the file name is expanded to the PID of the target JVM;
|
||||
`%t` - to the timestamp;
|
||||
`%n{MAX}` - to the sequence number;
|
||||
`%{ENV}` - to the value of the given environment variable.
|
||||
Example: `asprof -o collapsed -f /tmp/traces-%t.txt 8983`
|
||||
|
||||
* `--loop TIME` - run profiler in a loop (continuous profiling).
|
||||
The argument is either a clock time (`hh:mm:ss`) or
|
||||
a loop duration in `s`econds, `m`inutes, `h`ours, or `d`ays.
|
||||
Make sure the filename includes a timestamp pattern, or the output
|
||||
will be overwritten on each iteration.
|
||||
Example: `asprof --loop 1h -f /var/log/profile-%t.jfr 8983`
|
||||
|
||||
* `--all-user` - include only user-mode events. This option is helpful when kernel profiling
|
||||
is restricted by `perf_event_paranoid` settings.
|
||||
|
||||
* `--sched` - group threads by Linux-specific scheduling policy: BATCH/IDLE/OTHER.
|
||||
|
||||
* `--cstack MODE` - how to walk native frames (C stack). Possible modes are
|
||||
`fp` (Frame Pointer), `dwarf` (DWARF unwind info),
|
||||
`lbr` (Last Branch Record, available on Haswell since Linux 4.1),
|
||||
`vm` (HotSpot VM Structs) and `no` (do not collect C stack).
|
||||
|
||||
By default, C stack is shown in cpu, ctimer, wall-clock and perf-events profiles.
|
||||
Java-level events like `alloc` and `lock` collect only Java stack.
|
||||
|
||||
* `--signal NUM` - use alternative signal for cpu or wall clock profiling.
|
||||
To change both signals, specify two numbers separated by a slash: `--signal SIGCPU/SIGWALL`.
|
||||
|
||||
* `--clock SOURCE` - clock source for JFR timestamps: `tsc` (default)
|
||||
or `monotonic` (equivalent for `CLOCK_MONOTONIC`).
|
||||
|
||||
* `--begin function`, `--end function` - automatically start/stop profiling
|
||||
when the specified native function is executed.
|
||||
|
||||
* `--ttsp` - time-to-safepoint profiling. An alias for
|
||||
`--begin SafepointSynchronize::begin --end RuntimeService::record_safepoint_synchronized`
|
||||
It is not a separate event type, but rather a constraint. Whatever event type
|
||||
you choose (e.g. `cpu` or `wall`), the profiler will work as usual, except that
|
||||
only events between the safepoint request and the start of the VM operation
|
||||
will be recorded.
|
||||
|
||||
* `--jfropts OPTIONS` - comma separated list of JFR recording options.
|
||||
Currently, the only available option is `mem` supported on Linux 3.17+.
|
||||
`mem` enables accumulating events in memory instead of flushing
|
||||
synchronously to a file.
|
||||
|
||||
* `--jfrsync CONFIG` - start Java Flight Recording with the given configuration
|
||||
synchronously with the profiler. The output .jfr file will include all regular
|
||||
JFR events, except that execution samples will be obtained from async-profiler.
|
||||
This option implies `-o jfr`.
|
||||
- `CONFIG` is a predefined JFR profile or a JFR configuration file (.jfc)
|
||||
or a list of JFR events started with `+`
|
||||
|
||||
Example: `asprof -e cpu --jfrsync profile -f combined.jfr 8983`
|
||||
|
||||
* `--fdtransfer` - runs a background process that provides access to perf_events
|
||||
to an unprivileged process. `--fdtransfer` is useful for profiling a process
|
||||
in a container (which lacks access to perf_events) from the host.
|
||||
See [Profiling Java in a container](#profiling-java-in-a-container).
|
||||
|
||||
* `-v`, `--version` - prints the version of profiler library. If PID is specified,
|
||||
gets the version of the library loaded into the given process.
|
||||
|
||||
## Profiling Java in a container
|
||||
|
||||
It is possible to profile Java processes running in a Docker or LXC container
|
||||
both from within a container and from the host system.
|
||||
|
||||
When profiling from the host, `pid` should be the Java process ID in the host
|
||||
namespace. Use `ps aux | grep java` or `docker top <container>` to find
|
||||
the process ID.
|
||||
|
||||
async-profiler should be run from the host by a privileged user - it will
|
||||
automatically switch to the proper pid/mount namespace and change
|
||||
user credentials to match the target process. Also make sure that
|
||||
the target container can access `libasyncProfiler.so` by the same
|
||||
absolute path as on the host.
|
||||
|
||||
By default, Docker container restricts the access to `perf_event_open`
|
||||
syscall. There are 3 alternatives to allow profiling in a container:
|
||||
1. You can modify the [seccomp profile](https://docs.docker.com/engine/security/seccomp/)
|
||||
or disable it altogether with `--security-opt seccomp=unconfined` option. In
|
||||
addition, `--cap-add SYS_ADMIN` may be required.
|
||||
2. You can use "fdtransfer": see the help for `--fdtransfer`.
|
||||
3. Last, you may fall back to `-e ctimer` profiling mode, see [Troubleshooting](#troubleshooting).
|
||||
|
||||
## Restrictions/Limitations
|
||||
|
||||
* macOS profiling is limited to user space code only.
|
||||
|
||||
* On most Linux systems, `perf_events` captures call stacks with a maximum depth
|
||||
of 127 frames. On recent Linux kernels, this can be configured using
|
||||
`sysctl kernel.perf_event_max_stack` or by writing to the
|
||||
`/proc/sys/kernel/perf_event_max_stack` file.
|
||||
|
||||
* Profiler allocates 8kB perf_event buffer for each thread of the target process.
|
||||
Make sure `/proc/sys/kernel/perf_event_mlock_kb` value is large enough
|
||||
(more than `8 * threads`) when running under unprivileged user.
|
||||
Otherwise the message _"perf_event mmap failed: Operation not permitted"_
|
||||
will be printed, and no native stack traces will be collected.
|
||||
|
||||
* You will not see the non-Java frames _preceding_ the Java frames on the
|
||||
stack, unless `--cstack vm` is specified.
|
||||
For example, if `start_thread` called `JavaMain` and then your Java
|
||||
code started running, you will not see the first two frames in the resulting
|
||||
stack. On the other hand, you _will_ see non-Java frames (user and kernel)
|
||||
invoked by your Java code.
|
||||
|
||||
* No Java stacks will be collected if `-XX:MaxJavaStackTraceDepth` is zero
|
||||
or negative. The exception is `--cstack vm` mode, which does not take
|
||||
`MaxJavaStackTraceDepth` into account.
|
||||
|
||||
* Too short profiling interval may cause continuous interruption of heavy
|
||||
system calls like `clone()`, so that it will never complete;
|
||||
see [#97](https://github.com/async-profiler/async-profiler/issues/97).
|
||||
The workaround is simply to increase the interval.
|
||||
|
||||
* When agent is not loaded at JVM startup (by using -agentpath option) it is
|
||||
highly recommended to use `-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints` JVM flags.
|
||||
Without those flags the profiler will still work correctly but results might be
|
||||
less accurate. For example, without `-XX:+DebugNonSafepoints` there is a high chance
|
||||
that simple inlined methods will not appear in the profile. When the agent is attached at runtime,
|
||||
`CompiledMethodLoad` JVMTI event enables debug info, but only for methods compiled after attaching.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
```
|
||||
Failed to change credentials to match the target process: Operation not permitted
|
||||
```
|
||||
Due to limitation of HotSpot Dynamic Attach mechanism, the profiler must be run
|
||||
by exactly the same user (and group) as the owner of target JVM process.
|
||||
If profiler is run by a different user, it will try to automatically change
|
||||
current user and group. This will likely succeed for `root`, but not for
|
||||
other users, resulting in the above error.
|
||||
|
||||
```
|
||||
Could not start attach mechanism: No such file or directory
|
||||
```
|
||||
The profiler cannot establish communication with the target JVM through UNIX domain socket.
|
||||
|
||||
Usually this happens in one of the following cases:
|
||||
1. Attach socket `/tmp/.java_pidNNN` has been deleted. It is a common
|
||||
practice to clean `/tmp` automatically with some scheduled script.
|
||||
Configure the cleanup software to exclude `.java_pid*` files from deletion.
|
||||
How to check: run `lsof -p PID | grep java_pid`
|
||||
If it lists a socket file, but the file does not exist, then this is exactly
|
||||
the described problem.
|
||||
2. JVM is started with `-XX:+DisableAttachMechanism` option.
|
||||
3. `/tmp` directory of Java process is not physically the same directory
|
||||
as `/tmp` of your shell, because Java is running in a container or in
|
||||
`chroot` environment. `jattach` attempts to solve this automatically,
|
||||
but it might lack the required permissions to do so.
|
||||
Check `strace build/jattach PID properties`
|
||||
4. JVM is busy and cannot reach a safepoint. For instance,
|
||||
JVM is in the middle of long-running garbage collection.
|
||||
How to check: run `kill -3 PID`. Healthy JVM process should print
|
||||
a thread dump and heap info in its console.
|
||||
|
||||
```
|
||||
Target JVM failed to load libasyncProfiler.so
|
||||
```
|
||||
The connection with the target JVM has been established, but JVM is unable to load profiler shared library.
|
||||
Make sure the user of JVM process has permissions to access `libasyncProfiler.so` by exactly the same absolute path.
|
||||
For more information see [#78](https://github.com/async-profiler/async-profiler/issues/78).
|
||||
|
||||
```
|
||||
No access to perf events. Try --fdtransfer or --all-user option or 'sysctl kernel.perf_event_paranoid=1'
|
||||
```
|
||||
or
|
||||
```
|
||||
Perf events unavailable
|
||||
```
|
||||
`perf_event_open()` syscall has failed.
|
||||
|
||||
Typical reasons include:
|
||||
1. `/proc/sys/kernel/perf_event_paranoid` is set to restricted mode (>=2).
|
||||
2. seccomp disables `perf_event_open` API in a container.
|
||||
3. OS runs under a hypervisor that does not virtualize performance counters.
|
||||
4. perf_event_open API is not supported on this system, e.g. WSL.
|
||||
|
||||
For permissions-related reasons (such as 1 and 2), using `--fdtransfer` while running the profiler
|
||||
as a privileged user may solve the issue.
|
||||
|
||||
If changing the configuration is not possible, you may fall back to
|
||||
`-e ctimer` profiling mode. It is similar to `cpu` mode, but does not
|
||||
require perf_events support. As a drawback, there will be no kernel
|
||||
stack traces.
|
||||
|
||||
```
|
||||
No AllocTracer symbols found. Are JDK debug symbols installed?
|
||||
```
|
||||
The OpenJDK debug symbols are required for allocation profiling.
|
||||
See [Installing Debug Symbols](#installing-debug-symbols) for more details.
|
||||
If the error message persists after a successful installation of the debug symbols,
|
||||
it is possible that the JDK was upgraded when installing the debug symbols.
|
||||
In this case, profiling any Java process which had started prior to the installation
|
||||
will continue to display this message, since the process had loaded
|
||||
the older version of the JDK which lacked debug symbols.
|
||||
Restarting the affected Java processes should resolve the issue.
|
||||
|
||||
```
|
||||
VMStructs unavailable. Unsupported JVM?
|
||||
```
|
||||
JVM shared library does not export `gHotSpotVMStructs*` symbols -
|
||||
apparently this is not a HotSpot JVM. Sometimes the same message
|
||||
can be also caused by an incorrectly built JDK
|
||||
(see [#218](https://github.com/async-profiler/async-profiler/issues/218)).
|
||||
In these cases installing JDK debug symbols may solve the problem.
|
||||
|
||||
```
|
||||
Could not parse symbols from <libname.so>
|
||||
```
|
||||
Async-profiler was unable to parse non-Java function names because of
|
||||
the corrupted contents in `/proc/[pid]/maps`. The problem is known to
|
||||
occur in a container when running Ubuntu with Linux kernel 5.x.
|
||||
This is the OS bug, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1843018.
|
||||
|
||||
```
|
||||
Could not open output file
|
||||
```
|
||||
Output file is written by the target JVM process, not by the profiler script.
|
||||
Make sure the path specified in `-f` option is correct and is accessible by the JVM.
|
||||
For known issues faced while running async-profiler and their detailed troubleshooting,
|
||||
please refer [here](https://github.com/async-profiler/async-profiler/blob/master/docs/Troubleshooting.md).
|
||||
|
||||
35
docs/AdvancedStackatraceFeatures.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# Advanced Stacktrace Features
|
||||
|
||||
## Display JIT compilation task
|
||||
|
||||
Async-profiler samples JIT compiler threads as well as Java threads, and hence can show
|
||||
CPU percentage spent on JIT compilation. At the same time, Java methods are different:
|
||||
some take more resources to compile, other take less. Furthermore, there are cases when
|
||||
a bug in C2 compiler causes a JIT thread to stuck in an infinite loop consuming 100% CPU.
|
||||
Async-profiler can highlight which particular Java methods take most CPU time to compile.
|
||||
|
||||

|
||||
|
||||
The feature can be enabled with the option `-F comptask` (or its agent equivalent `features=comptask`).
|
||||
|
||||
## Display actual implementation in vtable
|
||||
|
||||
In some applications, a significant amount of CPU time is spent on dispatching megamorphic virtual/interface calls.
|
||||
async-profiler shows a pseudo-frame on top of v/itable stub with the actual type of object the virtual method is
|
||||
called on. This should make clear the proportion of different receivers for the particular call site.
|
||||
|
||||

|
||||
|
||||
The feature can be enabled with the option `-F vtable` (or its agent equivalent `features=vtable`).
|
||||
|
||||
## Display instruction addresses
|
||||
|
||||
Sometimes, for low-level performance analysis, it is important to know where exactly
|
||||
CPU time is spent inside a method. As an intermediate step to the instruction-level
|
||||
profiling, async-profiler provides an option to record PC address of the currently
|
||||
running method for each execution sample. In this case, each stack trace will include
|
||||
a synthetic frame with the address at the top of every stack trace.
|
||||
|
||||

|
||||
|
||||
The feature can be enabled with the option `-F pcaddr` (or its agent equivalent `features=pcaddr`).
|
||||
129
docs/ConverterUsage.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Converter usage & demo
|
||||
|
||||
async-profiler provides a converter utility to convert the profile output to other popular formats. async-profiler
|
||||
provides `jfrconv` as part of the compressed package which is found in the same location as the `asprof` binary. A
|
||||
standalone converter binary is also available [here](ttps://github.com/async-profiler/async-profiler/releases/download/v3.0/converter.jar).
|
||||
|
||||
## Supported conversions
|
||||
|
||||
* collapsed -> html, collapsed
|
||||
* html -> html, collapsed
|
||||
* jfr -> html, collapsed, pprof, pb.gz
|
||||
|
||||
## Usage
|
||||
|
||||
`jfrconv [options] <input> [<input>...] <output>`
|
||||
|
||||
The output format specified can be only one at a time for conversion from one format to another.
|
||||
|
||||
### Available arguments
|
||||
|
||||
```
|
||||
Conversion options:
|
||||
-o --output FORMAT, -o can be omitted if the output file extension unambiguously determines the format, e.g. profile.collapsed
|
||||
|
||||
FORMAT can be any of the following:
|
||||
# collapsed: This is a collection of call stacks, where each line is a semicolon separated
|
||||
list of frames followed by a counter. This is used by the FlameGraph script to
|
||||
generate the FlameGraph visualization of the profile data.
|
||||
|
||||
# flamegraph: Flamegraph is a hierarchical representation of call traces of the profiled
|
||||
software in a color coded format that helps to identify a particular resource
|
||||
usage like CPU and memory for the application.
|
||||
|
||||
# pprof: pprof is a profiling visualization and analysis tool from Google. More details on
|
||||
pprof on the official github page https://github.com/google/pprof.
|
||||
|
||||
# pb.gz: This is a compressed version of pprof output.
|
||||
|
||||
|
||||
JFR options:
|
||||
--cpu Generate only CPU profile during conversion
|
||||
--wall Generate only Wall clock profile during conversion
|
||||
--alloc Generate only Allocation profile during conversion
|
||||
--live Build allocation profile from live objects only during conversion
|
||||
--lock Generate only Lock contention profile during conversion
|
||||
-t --threads Split stack traces by threads
|
||||
-s --state LIST Filter thread states: runnable, sleeping, default. State name is case insensitive
|
||||
and can be abbreviated, e.g. -s r
|
||||
--classify Classify samples into predefined categories
|
||||
--total Accumulate total value (time, bytes, etc.) instead of samples
|
||||
--lines Show line numbers
|
||||
--bci Show bytecode indices
|
||||
--simple Simple class names instead of fully qualified names
|
||||
--norm Normalize names of hidden classes/lambdas, e.g. Original JFR transforms
|
||||
lambda names to something like pkg.ClassName$$Lambda+0x00007f8177090218/543846639
|
||||
which gets normalized to pkg.ClassName$$Lambda
|
||||
--dot Dotted class names, e.g. java.lang.String instead of java/lang/String
|
||||
--from TIME Start time in ms (absolute or relative)
|
||||
--to TIME End time in ms (absolute or relative)
|
||||
TIME can be:
|
||||
# an absolute timestamp specified in millis since epoch;
|
||||
# an absolute time in hh:mm:ss or yyyy-MM-dd'T'hh:mm:ss format;
|
||||
# a relative time from the beginning of recording;
|
||||
# a relative time from the end of recording (a negative number).
|
||||
|
||||
Flame Graph options:
|
||||
--title STRING Convert to Flame Graph with provided title
|
||||
--minwidth X Skip frames smaller than X%
|
||||
--grain X Coarsen Flame Graph to the given grain size
|
||||
--skip N Skip N bottom frames
|
||||
-r --reverse Reverse stack traces (icicle graph)
|
||||
-I --include REGEX Include only stacks with the specified frames, e.g. -I 'jdk\.GC.*' -I 'jdk\.Thread.*'
|
||||
-X --exclude REGEX Exclude stacks with the specified frames, e.g. -X 'jdk\.GC.*'
|
||||
--highlight REGEX Highlight frames matching the given pattern
|
||||
```
|
||||
|
||||
### Example usages with `jfrconv`
|
||||
|
||||
This section explains how the binary `jfrconv` can be used which exists in the same bin folder as
|
||||
`asprof`binary.
|
||||
|
||||
The below command will generate a foo.html. If no output file is specified, it defaults to a
|
||||
Flame Graph output.
|
||||
|
||||
```
|
||||
jfrconv foo.jfr
|
||||
```
|
||||
|
||||
Profiling in JFR mode allows multi-mode profiling. So the command above will generate a Flame Graph
|
||||
output, however, for a multi-mode profile output with both `cpu` and `wall-clock` events, the
|
||||
Flame Graph will have an aggregation of both in the view. Such a view wouldn't make much sense and
|
||||
hence it is advisable to use JFR conversion filter options like `--cpu` to filter out events
|
||||
during a conversion.
|
||||
|
||||
```
|
||||
jfrconv --cpu foo.jfr -o foo.html
|
||||
```
|
||||
or
|
||||
```
|
||||
jfrconv --cpu foo.jfr
|
||||
```
|
||||
for HTML output as HTML is the default format for conversion from JFR.
|
||||
|
||||
In case the conversion output is a Flame Graph, it can be further formatted with the use of flags
|
||||
specified above under `Flame Graph options`. The below command(s) will add a title string named `Title`
|
||||
to the Flame Graph instead of the default `Flame Graph` title and also will reverse the graph view
|
||||
by reversing the stack traces.
|
||||
```
|
||||
jfrconv --cpu foo.jfr foo.html -r --title Title
|
||||
```
|
||||
or
|
||||
```
|
||||
jfrconv --cpu foo.jfr --reverse --title Title
|
||||
```
|
||||
|
||||
These are few common use cases. Similarly, a JFR output can be converted to `collapsed`, `pprof` and
|
||||
`pb.gz` formats based on specific needs.
|
||||
|
||||
### Example usages with standalone converter
|
||||
|
||||
The usage with standalone converter jar provided in
|
||||
[Download](https://github.com/async-profiler/async-profiler/?tab=readme-ov-file#Download)
|
||||
section is very similar to `jfrconv`.
|
||||
|
||||
Below is an example usage:
|
||||
|
||||
`java -cp /path/to/standalone-converter-jar --cpu foo.jfr --reverse --title Application CPU profile`
|
||||
|
||||
The only difference lies in how the binary is used.
|
||||
56
docs/CpuSamplingEngines.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# CPU Sampling Engines
|
||||
|
||||
Async-profiler has three options for CPU profiling: `-e cpu`, `-e itimer` and `-e ctimer`.
|
||||
|
||||
## cpu & itimer mode
|
||||
|
||||
Both cpu and itimer mode measure the CPU time spent by the running threads. For example,
|
||||
if an application uses 2 cpu cores, each with 30% utilization, and the sampling interval is
|
||||
10ms, then the profiler will collect about 2 * 0.3 * 100 = 60 samples per second.
|
||||
|
||||
In other words, 1 profiling sample means that one CPU was actively running for N nanoseconds,
|
||||
where N is the profiling interval.
|
||||
|
||||
- `itimer` mode is based on [setitimer(ITIMER_PROF)](https://man7.org/linux/man-pages/man2/setitimer.2.html)
|
||||
syscall, which ideally generates a signal every given interval of the CPU time consumed by the process.
|
||||
- `cpu` mode relies on [perf_events](https://man7.org/linux/man-pages/man2/perf_event_open.2.html).
|
||||
The idea is the same - to generate a signal every `N` nanoseconds of CPU time, which in this case
|
||||
is achieved by configuring PMU to generate an interrupt every `K` CPU cycles. `cpu` mode has few additional features:
|
||||
- `perf_events` availability is now automatically checked by trying to create a dummy perf_event.
|
||||
- If kernel-space profiling using `perf_events` is not available (including when restricted by `perf_event_paranoid`
|
||||
setting or by `seccomp`), async-profiler transparently falls back to `ctimer` mode.
|
||||
- If `perf_events` are available, but kernel symbols are hidden (e.g., by `kptr_resitrct` setting), async-profiler
|
||||
continues to use `perf_events`, emits a warning and does not show kernel stack traces.
|
||||
- To force using `perf_events` for user-space only profiling, specify `-e cpu-clock --all-user` instead of `-e cpu`.
|
||||
- `allkernel` option has been removed.
|
||||
- JFR recording now contains engine setting with the current profiling engine: `perf_events`, `ctimer`, `wall` etc.
|
||||
|
||||
|
||||
Ideally, both `itimer` and `cpu` should collect the same number of samples. Typically, the
|
||||
profiles indeed look very similar. However, in [some cases](https://github.com/golang/go/issues/14434)
|
||||
cpu profile appears a bit more accurate though, since the signal is delivered exactly to the thread
|
||||
that overflowed a hardware counter.
|
||||
|
||||
## ctimer mode
|
||||
|
||||
[perf_events](https://man7.org/linux/man-pages/man2/perf_event_open.2.html) are not always available,
|
||||
e.g., because of perf_event_paranoid settings or seccomp restrictions. Perf events are often disabled
|
||||
in containers. Furthermore, async-profiler opens one perf_event descriptor per thread, which can be
|
||||
problematic for an application with many threads running under a low
|
||||
[ulimit](https://ss64.com/bash/ulimit.html) for the number of open file descriptors.
|
||||
|
||||
`itimer` works fine in containers, but may suffer from inaccuracies caused by the following limitations:
|
||||
|
||||
- only one `itimer` signal can be delivered to a process at a time.
|
||||
- signals are not distributed evenly between running threads.
|
||||
- sampling resolution is limited by the size of [jiffies](https://man7.org/linux/man-pages/man7/time.7.html).
|
||||
|
||||
|
||||
`ctimer` aims to address these limitations of `perf_events` and `itimer`. `ctimer` relies on
|
||||
[timer_create](https://man7.org/linux/man-pages/man2/timer_create.2.html). It combines benefits of
|
||||
`-e cpu` and `-e itimer`, except that it does not allow collecting kernel stacks. `timer_create` is used
|
||||
in [Go profiler](https://felixge.de/2022/02/11/profiling-improvements-in-go-1.18/). Below are some of
|
||||
the benefits of `ctimer`:
|
||||
- Works in containers by default.
|
||||
- Does not suffer from `itimer` biases.
|
||||
- Does not consume file descriptors.
|
||||
79
docs/FlamegraphInterpretation.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# FlameGraph interpretation
|
||||
|
||||
To interpret a flame graph, the best way forward is to understand how they are created. Sampling
|
||||
profiling results are a set of stack traces.
|
||||
|
||||
## Example application to profile
|
||||
Let's take the below example:
|
||||
```
|
||||
main() {
|
||||
// some business logic
|
||||
func3() {
|
||||
// some business logic
|
||||
func7();
|
||||
}
|
||||
|
||||
// some business logic
|
||||
func4();
|
||||
|
||||
// some business logic
|
||||
func1() {
|
||||
// some business logic
|
||||
func5();
|
||||
}
|
||||
|
||||
// some business logic
|
||||
func2() {
|
||||
// some business logic
|
||||
func6() {
|
||||
// some business logic
|
||||
func8(); // cpu intensive work here
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Profiler sampling
|
||||
Profiling starts by taking samples x times per second. Whenever a sample is taken, the current call stack for it is saved. The diagram below shows the unsorted sampling view before the sorting and aggregation takes place.
|
||||
|
||||

|
||||
|
||||
Below are the sampling numbers:
|
||||
* `func3()->func7()`: 3 samples
|
||||
* `func4()`: 1 sample
|
||||
* `func1()->func5()`: 2 samples
|
||||
* `func2()->func8()`: 4 samples
|
||||
* `func2()->func6()`: 1 sample
|
||||
|
||||
## Sorting samples
|
||||
Samples are then alphabetically sorted at the base level just after root(or main method) of the application.
|
||||
|
||||

|
||||
|
||||
## Aggregated view
|
||||
For the aggregated view, the blocks for the same functions at each
|
||||
level of stack depth are stitched together to get the aggregated
|
||||
view of the flame graph.
|
||||

|
||||
|
||||
In this example, except func4() no other function actually consumes
|
||||
any resource at the base level of stack depth. func5(), func6(),
|
||||
func7() and func8() are the ones consuming resources, with func8()
|
||||
being a likely candidate for performance optimization.
|
||||
|
||||
CPU utilization is the most common use case for flame graphs, however
|
||||
there are other modes of profiling like allocation profiling to view
|
||||
heap utilization and wall-clock profiling to view latency.
|
||||
|
||||
[More on various modes of profiling](https://github.com/async-profiler/async-profiler/?tab=readme-ov-file#profiling-modes)
|
||||
|
||||
## Understanding FlameGraph colors
|
||||
The various colours in a FlameGraph output with their relation to
|
||||
underlying code for a Java application:
|
||||
|
||||
* <span style="color:green">green</span> : JIT
|
||||
* <span style="color:aqua">aqua</span> : inlined
|
||||
* <span style="color:yellow">yellow</span> : C++
|
||||
* <span style="color:orange">orange</span> : kernel
|
||||
* <span style="color:red">red</span> : native (user-level)
|
||||
|
||||
Please note the colours in the example diagrams above have no relation to the official FlameGraph colour palette.
|
||||
104
docs/GettingStarted.md
Normal file
@@ -0,0 +1,104 @@
|
||||
# Getting started guide
|
||||
|
||||
## Before we start profiling
|
||||
As of Linux 4.6, capturing kernel call stacks using `perf_events` from a non-root
|
||||
process requires setting two runtime variables. You can set them using
|
||||
sysctl or as follows:
|
||||
|
||||
```
|
||||
# sysctl kernel.perf_event_paranoid=1
|
||||
# sysctl kernel.kptr_restrict=0
|
||||
```
|
||||
|
||||
## Find process to profile
|
||||
Common ways to find the target process include using `jps` and `pgrep`. `pgrep` can be used
|
||||
along with necessary flags for example `-l` and search for `java` to find all running java
|
||||
processes and find the target process name and id from the list `pgrep ssh -l`. The next
|
||||
section includes an example using `jps`.
|
||||
|
||||
## Start profiling
|
||||
async-profiler works in the context of the target Java application,
|
||||
i.e. it runs as an agent in the process being profiled.
|
||||
`asprof` is a tool to attach and control the agent.
|
||||
|
||||
A typical workflow would be to launch your Java application, attach
|
||||
the agent and start profiling, exercise your performance scenario, and
|
||||
then stop profiling. The agent's output, including the profiling results, will
|
||||
be displayed on the console where you've started `asprof`.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
$ jps
|
||||
9234 Jps
|
||||
8983 Computey
|
||||
$ asprof start 8983
|
||||
$ asprof stop 8983
|
||||
```
|
||||
|
||||
The following may be used in lieu of the `pid` (8983):
|
||||
|
||||
- The keyword `jps`, which will use the most recently launched Java process.
|
||||
- The application name as it appears in the `jps` output: e.g. `Computey`
|
||||
|
||||
Alternatively, you may specify `-d` (duration) argument to profile
|
||||
the application for a fixed period of time with a single command.
|
||||
|
||||
```
|
||||
$ asprof -d 30 8983
|
||||
```
|
||||
|
||||
By default, the profiling frequency is 100Hz (every 10ms of CPU time).
|
||||
Here is a sample of the output printed to the Java application's terminal:
|
||||
|
||||
```
|
||||
--- Execution profile ---
|
||||
Total samples: 687
|
||||
Unknown (native): 1 (0.15%)
|
||||
|
||||
--- 6790000000 (98.84%) ns, 679 samples
|
||||
[ 0] Primes.isPrime
|
||||
[ 1] Primes.primesThread
|
||||
[ 2] Primes.access$000
|
||||
[ 3] Primes$1.run
|
||||
[ 4] java.lang.Thread.run
|
||||
|
||||
... a lot of output omitted for brevity ...
|
||||
|
||||
ns percent samples top
|
||||
---------- ------- ------- ---
|
||||
6790000000 98.84% 679 Primes.isPrime
|
||||
40000000 0.58% 4 __do_softirq
|
||||
|
||||
... more output omitted ...
|
||||
```
|
||||
|
||||
This indicates that the hottest method was `Primes.isPrime`, and the hottest
|
||||
call stack leading to it comes from `Primes.primesThread`.
|
||||
|
||||
## Other use cases
|
||||
|
||||
* [Launching as an agent](https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#launching-as-an-agent)
|
||||
* [Java API](https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#using-java-api)
|
||||
* [IntelliJ IDEA](https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#intellij-idea)
|
||||
|
||||
## FlameGraph visualization
|
||||
|
||||
async-profiler provides out-of-the-box [Flame Graph](https://github.com/BrendanGregg/FlameGraph) support.
|
||||
Specify `-o flamegraph` argument to dump profiling results as an interactive HTML Flame Graph.
|
||||
Also, Flame Graph output format will be chosen automatically if the target filename ends with `.html`.
|
||||
|
||||
```
|
||||
$ jps
|
||||
9234 Jps
|
||||
8983 Computey
|
||||
$ asprof -d 30 -f /tmp/flamegraph.html 8983
|
||||
```
|
||||
|
||||
[](https://htmlpreview.github.io/?https://github.com/async-profiler/async-profiler/blob/master/.assets/html/flamegraph.html)
|
||||
|
||||
The flame graph html can be opened in any browser of your choice for further interpretation.
|
||||
|
||||
Please refer to
|
||||
[Interpreting a Flame Graph](https://github.com/async-profiler/async-profiler/blob/master/docs/FlamegraphInterpretation.md)
|
||||
to understand more on how to interpret a Flame Graph.
|
||||
50
docs/IntegratingAsyncProfiler.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Integrating async-profiler
|
||||
|
||||
## Launching as an agent
|
||||
|
||||
If you need to profile some code as soon as the JVM starts up, instead of using the `asprof`,
|
||||
it is possible to attach async-profiler as an agent on the command line. For example:
|
||||
|
||||
```
|
||||
$ java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=profile.html ...
|
||||
```
|
||||
|
||||
Agent library is configured through the JVMTI argument interface.
|
||||
The format of the arguments string is described
|
||||
[in the source code](https://github.com/async-profiler/async-profiler/blob/v3.0/src/arguments.cpp#L44).
|
||||
`asprof` actually converts command line arguments to that format.
|
||||
|
||||
Another important use of attaching async-profiler as an agent is for continuous profiling.
|
||||
|
||||
## Using Java API
|
||||
async-profiler Java API is published to maven central. Like any other dependency, we have to
|
||||
just include the
|
||||
[dependency](https://mvnrepository.com/artifact/tools.profiler/async-profiler/latest)
|
||||
from maven.
|
||||
|
||||
### Example usage with the API
|
||||
|
||||
```
|
||||
AsyncProfiler profiler = AsyncProfiler.getInstance();
|
||||
```
|
||||
|
||||
The above gives us an instance of `AsyncProfiler` object which can be further used to start
|
||||
actual profiling.
|
||||
|
||||
```
|
||||
profiler.execute("start,jfr,event=cpu,file=/path/to/%p.jfr");
|
||||
// do some meaningful work
|
||||
profiler.execute("stop");
|
||||
```
|
||||
|
||||
`%p` equates to the PID of the process. There are other options as well for filename which
|
||||
can be found in [Profiler Options](https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilerOptions.md).
|
||||
`file` should be specified only once, either in
|
||||
`start` command with `jfr` output or in `stop` command with any other format.
|
||||
|
||||
## Intellij IDEA
|
||||
|
||||
Intellij IDEA comes bundled with async-profiler, which can be further configured to our needs
|
||||
by selecting the `Java Profiler` menu option at `Settings/Preferences > Build, Execution, Deployment`
|
||||
Agent options can be modified for specific use cases and also `Collect native calls` can be checked
|
||||
to monitor non-java threads and native frames in Java stack traces.
|
||||
34
docs/JfrVisualization.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# JFR Visualization
|
||||
|
||||
JFR output produced by async-profiler can be viewed using multiple options which are explained
|
||||
below:
|
||||
|
||||
## Built-in converter
|
||||
|
||||
async-profiler provides a built-in converter which can be used to convert `jfr` output to
|
||||
readable formats like `FlameGraph` visualization. More details on the built-in converter usage
|
||||
can be found [here](https://github.com/async-profiler/async-profiler/blob/master/docs/ConverterUsage.md).
|
||||
|
||||
## JMC
|
||||
|
||||
Java Mission Control or `jmc` is part of the OpenJDK distribution. It is a GUI tool where a `jfr`
|
||||
output from async-profiler can be fed to visualize the profiled events in a human-readable format.
|
||||
It has various sections which helps developers to see various resource usages. The
|
||||
`Analyze a Flight Recording Using JMC` section in the
|
||||
[official user guide](https://docs.oracle.com/en/java/java-components/jdk-mission-control/9/user-guide/using-jdk-flight-recorder.html)
|
||||
provides details on how a `jfr` output can be interpreted using `JMC`.
|
||||
|
||||
## IntelliJ IDEA
|
||||
|
||||
An open-source profiler
|
||||
[plugin](https://plugins.jetbrains.com/plugin/20937-java-jfr-profiler) for JDK 11+ allows us to
|
||||
profile your Java application with JFR and async-profiler and view the results in IntelliJ IDEA,
|
||||
as well as opening JFR files.
|
||||
|
||||
## JFR command line
|
||||
|
||||
`jfr` provides a command line option to filter, summarize and output flight recording files
|
||||
into human-readable format. The
|
||||
[official documentation](https://docs.oracle.com/en/java/javase/21/docs/specs/man/jfr.html)
|
||||
provides complete information on how to manipulate the contents and translate them as per
|
||||
developers' needs to debug performance issues with their Java applications.
|
||||
50
docs/OutputFormats.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Output Formats
|
||||
|
||||
async-profiler currently supports the below output formats:
|
||||
* `collapsed` - This is a collection of call stacks, where each line is a semicolon separated list of frames followed
|
||||
by a counter. This is used by the FlameGraph script to generate the FlameGraph visualization of the profile data.
|
||||

|
||||
|
||||
|
||||
* `flamegraph` - FlameGraph is a hierarchical representation of call traces of the profiled software in a color coded
|
||||
format that helps to identify a particular resource usage like CPU and memory for the application.
|
||||

|
||||
|
||||
|
||||
* `tree` - Profile output generated in a html format showing a tree view of resource usage beginning with the call stack
|
||||
with the highest resource usage and then showing other call stacks in descending order of resource usage. Expanding a
|
||||
parent frame follows the same hierarchical representation within that frame.
|
||||

|
||||
|
||||
* `text` - If no output format is specified with `-o` and filename has no extension provided, profiled output is
|
||||
generated in text format.
|
||||
```
|
||||
--- Execution profile ---
|
||||
Total samples : 733
|
||||
|
||||
--- 8208 bytes (19.58%), 1 sample
|
||||
[ 0] byte[]
|
||||
[ 1] java.util.jar.Manifest$FastInputStream.<init>
|
||||
[ 2] java.util.jar.Manifest$FastInputStream.<init>
|
||||
[ 3] java.util.jar.Manifest.read
|
||||
[ 4] java.util.jar.Manifest.<init>
|
||||
[ 5] java.util.jar.Manifest.<init>
|
||||
[ 6] java.util.jar.JarFile.getManifestFromReference
|
||||
[ 7] java.util.jar.JarFile.getManifest
|
||||
[ 8] jdk.internal.loader.URLClassPath$JarLoader$2.getManifest
|
||||
[ 9] jdk.internal.loader.BuiltinClassLoader.defineClass
|
||||
[10] jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull
|
||||
[11] jdk.internal.loader.BuiltinClassLoader.loadClassOrNull
|
||||
[12] jdk.internal.loader.BuiltinClassLoader.loadClass
|
||||
[13] jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass
|
||||
[14] java.lang.ClassLoader.loadClass
|
||||
[15] java.lang.Class.forName0
|
||||
[16] java.lang.Class.forName
|
||||
[17] sun.launcher.LauncherHelper.loadMainClass
|
||||
[18] sun.launcher.LauncherHelper.checkAndLoadMain
|
||||
```
|
||||
|
||||
* `jfr` - Java Flight Recording(JFR) is a widely known tool for profiling Java applications. The `jfr` format collects data
|
||||
about the JVM as well as the Java application running on it. async-profiler can generate output in `jfr` format
|
||||
compatible with tools capable of viewing and analyzing `jfr` files. Java Mission Control(JMC) and Intellij IDEA are
|
||||
some of many options to visualize `jfr` files. More details [here](https://github.com/async-profiler/async-profiler/blob/master/JfrVisualization.md).
|
||||
104
docs/ProfilerOptions.md
Normal file
@@ -0,0 +1,104 @@
|
||||
# Profiler options
|
||||
|
||||
The below tables list the profiler options available with `asprof` and also when
|
||||
[launching as an agent](https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#launching-as-an-agent).
|
||||
Some tables are output specific, which means some options are applicable to only one or more output formats but not all.
|
||||
|
||||
```
|
||||
Usage: asprof [action] [options] [PID]
|
||||
```
|
||||
|
||||
## Actions
|
||||
|
||||
The below options are `action`s for async-profiler and common for both `asprof` binary and when `launching as an agent`
|
||||
|
||||
| Option | Description |
|
||||
|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `start` | Starts profiling in semi-automatic mode, i.e. profiler will run until `stop` command is explicitly called. |
|
||||
| `resume` | Starts or resumes earlier profiling session that has been stopped. All the collected data remains valid. The profiling options are not preserved between sessions, and should be specified again. |
|
||||
| `stop` | Stops profiling and prints the report. |
|
||||
| `dump` | Dump collected data without stopping profiling session. |
|
||||
| `check` | Check if the specified profiling event is available. |
|
||||
| `status` | Prints profiling status: whether profiler is active and for how long. |
|
||||
| `meminfo` | Prints used memory statistics. |
|
||||
| `list` | Show the list of profiling events available for the target process specified with PID. |
|
||||
|
||||
## Options applicable to any output format
|
||||
|
||||
| asprof | Launch as agent | Description |
|
||||
|--------------------|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `-o fmt` | `fmt` | Specifies what information to dump when profiling ends. For various dump option details, please refer to [Dump Option Appendix](#dump-option). |
|
||||
| `-d N` | N/A | asprof-only option designed for interactive use. It is a shortcut for running 3 actions: start, sleep for N seconds, stop. If no `start`, `resume`, `stop` or `status` option is given, the profiler will run for the specified period of time and then automatically stop.<br/>Example: `asprof -d 30 <pid>`` |
|
||||
| `--timeout N` | `timeout=N` | The profiling duration, in seconds. The profiler will run for the specified period of time and then automatically stop.<br/>Example: `java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,timeout=30,file=profile.html <application>` |
|
||||
| `-e --event EVENT` | `event=EVENT` | The profiling event: `cpu`, `alloc`, `lock`, `cache-misses` etc. Use `list` to see the complete list of available events.</br>Please refer to [Special Event Types](https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingModes.md#special-event-types-supported-on-linux) for additional information. |
|
||||
| `-i --interval N` | `interval=N` | Interval has different meaning depending on the event. For CPU profiling, it's CPU time. In wall clock mode, it's wall clock time. For Java method profiling or native function profiling, it's number of calls. For PMU profiling, it's number of events.<br/>Example: `asprof -e cpu -i 500us 8983` |
|
||||
| `--alloc N` | `alloc=N` | Allocation profiling interval in bytes or in other units, if N is followed by `k` (kilobytes), `m` (megabytes), or `g` (gigabytes). |
|
||||
| `--live` | `live` | Retain allocation samples with live objects only (object that have not been collected by the end of profiling session). Useful for finding Java heap memory leaks. |
|
||||
| `--lock DURATION` | `lock=DURATION` | In lock profiling mode, sample contended locks when total lock duration overflows the threshold |
|
||||
| `-j N` | `jstackdepth=N` | Sets the maximum stack depth. The default is 2048.</br>Example: `asprof -j 30 8983` |
|
||||
| `-I PATTERN` | `include=PATTERN` | Filter stack traces by the given pattern(s). `-I` defines the name pattern that *must* be present in the stack traces. `-I` can be specified multiple times. A pattern may begin or end with a star `*` that denotes any (possibly empty) sequence of characters.</br>Example: `asprof -I 'Primes.*' -I 'java/*' 8983` |
|
||||
| `-X PATTERN` | `exclude=PATTERN` | Filter stack traces by the given pattern(s). `-X` defines the name pattern that *must not* occur in any of stack traces in the output. `-X` can be specified multiple times. A pattern may begin or end with a star `*` that denotes any (possibly empty) sequence of characters.</br>Example: `asprof -X '*Unsafe.park*' 8983` |
|
||||
| `-L level` | `loglevel=level` | Log level: `debug`, `info`, `warn`, `error` or `none`. |
|
||||
| `-F features` | `features=LIST` | Comma separated(another delimiter when launching as an agent because `,` is already used as a delimiter for different options) list of HotSpot-specific features to include in stack traces. Supported features are:<ul><li>`vtable` - display targets of megamorphic virtual calls as an extra frame on top of `vtable stub` or `itable stub`.</li><li>`comptask` - display current compilation task (a Java method being compiled) in a JIT compiler stack trace.</li><li>`pcaddr` - display instruction addresses .</li></ul>More details [here](https://github.com/async-profiler/async-profiler/blob/master/docs/AdvancedStacktraceFeatures.md). |
|
||||
| `-f FILENAME` | `file` | The file name to dump the profile information to. </br>`%p` in the file name is expanded to the PID of the target JVM;</br>`%t` - to the timestamp;</br>`%n{MAX}` - to the sequence number;</br>`%{ENV}` - to the value of the given environment variable.</br>Example: `asprof -o collapsed -f /tmp/traces-%t.txt 8983` |
|
||||
| `--loop TIME` | `loop=TIME` | Run profiler in a loop (continuous profiling). The argument is either a clock time (`hh:mm:ss`) or a loop duration in `s`econds, `m`inutes, `h`ours, or `d`ays. Make sure the filename includes a timestamp pattern, or the output will be overwritten on each iteration.</br>Example: `asprof --loop 1h -f /var/log/profile-%t.jfr 8983` |
|
||||
| `--all-user` | `alluser` | Include only user-mode events. This option is helpful when kernel profiling is restricted by `perf_event_paranoid` settings. |
|
||||
| `--sched` | `sched` | Group threads by Linux-specific scheduling policy: BATCH/IDLE/OTHER. |
|
||||
| `--cstack MODE` | `cstack=MODE` | How to walk native frames (C stack). Possible modes are `fp` (Frame Pointer), `dwarf` (DWARF unwind info), `lbr` (Last Branch Record, available on Haswell since Linux 4.1), `vm` (HotSpot VM Structs) and `no` (do not collect C stack).</br></br>By default, C stack is shown in cpu, ctimer, wall-clock and perf-events profiles. Java-level events like `alloc` and `lock` collect only Java stack. |
|
||||
| `--signal NUM` | `signal=NUM` | Use alternative signal for cpu or wall clock profiling. To change both signals, specify two numbers separated by a slash: `--signal SIGCPU/SIGWALL`. |
|
||||
| `--clock SOURCE` | `clock=SOURCE` | Clock source for JFR timestamps: `tsc` (default) or `monotonic` (equivalent for `CLOCK_MONOTONIC`). |
|
||||
| `--begin function` | `begin=FUNCTION` | Automatically start profiling when the specified native function is executed. |
|
||||
| `--end function` | `end=FUNCTION` | Automatically stop profiling when the specified native function is executed. |
|
||||
| `--ttsp` | `ttsp` | time-to-safepoint profiling. An alias for `--begin SafepointSynchronize::begin --end RuntimeService::record_safepoint_synchronized`.</br>It is not a separate event type, but rather a constraint. Whatever event type you choose (e.g. `cpu` or `wall`), the profiler will work as usual, except that only events between the safepoint request and the start of the VM operation will be recorded. |
|
||||
| `--ttsp` | `ttsp` | time-to-safepoint profiling. An alias for `--begin SafepointSynchronize::begin --end RuntimeService::record_safepoint_synchronized`.</br>It is not a separate event type, but rather a constraint. Whatever event type you choose (e.g. `cpu` or `wall`), the profiler will work as usual, except that only events between the safepoint request and the start of the VM operation will be recorded. |
|
||||
| `--libpath PATH` | `libpath=PATH` | Full path to libasyncProfiler.so in the container |
|
||||
| `--filter FILTER` | `filter=FILTER` | Filter threads with thread ids during wall-clock profiling mode |
|
||||
| `--fdtransfer` | `fdtransfer` | Runs a background process that provides access to perf_events to an unprivileged process. `--fdtransfer` is useful for profiling a process in a container (which lacks access to perf_events) from the host.</br>See [Profiling Java in a container](#https://github.com/async-profiler/async-profiler/blob/master/docs/ProfilingInContainer.md). |
|
||||
| `-v --version` | `version` | Prints the version of profiler library. If PID is specified, gets the version of the library loaded into the given process. |
|
||||
|
||||
## Options applicable to JFR output only
|
||||
| asprof | Launch as agent | Description |
|
||||
|----------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `--chunksize N` | `chunksize=N` | Approximate size for a single JFR chunk. A new chunk will be started whenever specified size is reached. The default `chunksize` is 100MB.</br>Example: `asprof -f profile.jfr --chunksize 100m 8983` |
|
||||
| `--chunktime N` | `chunktime=N` | Approximate time limit for a single JFR chunk. A new chunk will be started whenever specified time limit is reached. The default `chunktime` is 1 hour.</br>Example: `asprof -f profile.jfr --chunktime 1h 8983` |
|
||||
| `--jfropts OPTIONS` | `jfropts=OPTIONS` | Comma separated list of JFR recording options. Currently, the only available option is `mem` supported on Linux 3.17+. `mem` enables accumulating events in memory instead of flushing synchronously to a file. |
|
||||
| `--jfrsync CONFIG` | `jfrsync[=CONFIG]` | Start Java Flight Recording with the given configuration synchronously with the profiler. The output .jfr file will include all regular JFR events, except that execution samples will be obtained from async-profiler. This option implies `-o jfr`.</br>`CONFIG` is a predefined JFR profile or a JFR configuration file (.jfc) or a list of JFR events started with `+`.</br></br>Example: `asprof -e cpu --jfrsync profile -f combined.jfr 8983` |
|
||||
|
||||
## Options applicable to FlameGraph and Tree view outputs only
|
||||
| asprof | Launch as agent | Description |
|
||||
|----------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `--title TITLE` | `title=TITLE` | FlameGraph parameter to profile user specified title.</br>Example: `asprof -f profile.html --title "Sample CPU profile" 8983` |
|
||||
| `--minwidth PERCENT` | `minwidth=PERCENT` | FlameGraph parameter to specify minimum width of frames.</br>Example: `asprof -f profile.html --minwidth 0.5 8983` |
|
||||
| `--reverse` | `reverse` | FlameGraph parameter to reverse the FlameGraph view.</br>Example: `asprof -f profile.html --reverse 8983` |
|
||||
|
||||
## Options applicable to any output format except JFR
|
||||
| asprof | Launch as agent | Description |
|
||||
|----------------|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `-t --threads` | `threads` | Profile threads separately. Each stack trace will end with a frame that denotes a single thread.</br>Example: `asprof -t 8983` |
|
||||
| `-s --simple` | `simple` | Print simple class names instead of fully qualified names. |
|
||||
| `-n --norm` | `norm` | Normalize names of hidden classes / lambdas. |
|
||||
| `-g --sig` | `sig` | Print method signatures. |
|
||||
| `-l --lib` | `lib` | Prepend library names to symbols, e.g. ``libjvm.so`JVM_DefineClassWithSource``. |
|
||||
| `--total` | `total` | Count the total value of the collected metric instead of the number of samples, e.g. total allocation size. |
|
||||
| `-a --ann` | `ann` | Annotate JIT compiled methods with `_[j]`, inlined methods with `_[i]`, interpreted methods with `_[0]` and C1 compiled methods with `_[1]`. FlameGraph and Tree view will color frames depending on their type regardless of this option. |
|
||||
|
||||
## Appendix
|
||||
|
||||
### Dump Option
|
||||
|
||||
`-o fmt` - specifies what information to dump when profiling ends.
|
||||
`fmt` can be one of the following options:
|
||||
- `traces[=N]` - dump call traces (at most N samples);
|
||||
- `flat[=N]` - dump flat profile (top N hot methods);
|
||||
can be combined with `traces`, e.g. `traces=200,flat=200`
|
||||
- `jfr` - dump events in Java Flight Recorder format readable by Java Mission Control.
|
||||
This *does not* require JDK commercial features to be enabled.
|
||||
- `collapsed` - dump collapsed call traces in the format used by
|
||||
[FlameGraph](https://github.com/brendangregg/FlameGraph) script. This is
|
||||
a collection of call stacks, where each line is a semicolon separated list
|
||||
of frames followed by a counter.
|
||||
- `flamegraph` - produce Flame Graph in HTML format.
|
||||
- `tree` - produce Call Tree in HTML format.
|
||||
`--reverse` option will generate backtrace view.
|
||||
|
||||
It is possible to specify multiple dump options at the same time.
|
||||
22
docs/ProfilingInContainer.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Profiling Java in a container
|
||||
|
||||
async-profiler provides the ability to profile Java processes running in a Docker or LXC
|
||||
container both from within a container and from the host system.
|
||||
|
||||
When profiling from the host, `pid` should be the Java process ID in the host
|
||||
namespace. Use `ps aux | grep java` or `docker top <container>` to find
|
||||
the process ID.
|
||||
|
||||
async-profiler should be run from the host by a privileged user - it will
|
||||
automatically switch to the proper pid/mount namespace and change
|
||||
user credentials to match the target process. Also make sure that
|
||||
the target container can access `libasyncProfiler.so` by the same
|
||||
absolute path as on the host.
|
||||
|
||||
By default, Docker container restricts the access to `perf_event_open`
|
||||
syscall. There are 3 alternatives to allow profiling in a container:
|
||||
1. You can modify the [seccomp profile](https://docs.docker.com/engine/security/seccomp/)
|
||||
or disable it altogether with `--security-opt seccomp=unconfined` option. In
|
||||
addition, `--cap-add SYS_ADMIN` may be required.
|
||||
2. You can use "fdtransfer": see the help for `--fdtransfer`.
|
||||
3. Last, you may fall back to `-e ctimer` profiling mode, see [Troubleshooting](#troubleshooting).
|
||||
187
docs/ProfilingModes.md
Normal file
@@ -0,0 +1,187 @@
|
||||
# Profiling modes
|
||||
|
||||
async-profiler provides various other profiling modes other than just `CPU` like `Allocation`, `Wall Clock`, `Java Method`
|
||||
and even a `Multiple Events` profiling mode.
|
||||
|
||||
## CPU profiling
|
||||
|
||||
In this mode profiler collects stack trace samples that include **Java** methods,
|
||||
**native** calls, **JVM** code and **kernel** functions.
|
||||
|
||||
The general approach is receiving call stacks generated by `perf_events`
|
||||
and matching them up with call stacks generated by `AsyncGetCallTrace`,
|
||||
in order to produce an accurate profile of both Java and native code.
|
||||
Additionally, async-profiler provides a workaround to recover stack traces
|
||||
in some [corner cases](https://bugs.openjdk.java.net/browse/JDK-8178287)
|
||||
where `AsyncGetCallTrace` fails.
|
||||
|
||||
This approach has the following advantages compared to using `perf_events`
|
||||
directly with a Java agent that translates addresses to Java method names:
|
||||
|
||||
* Does not require `-XX:+PreserveFramePointer`, which introduces
|
||||
performance overhead that can be sometimes as high as 10%.
|
||||
|
||||
* Does not require generating a map file for translating Java code addresses
|
||||
to method names.
|
||||
|
||||
* Displays interpreter frames.
|
||||
|
||||
* Does not produce large intermediate files (perf.data) for further processing in
|
||||
user space scripts.
|
||||
|
||||
If you wish to resolve frames within `libjvm`, the [debug symbols](#installing-debug-symbols) are required.
|
||||
|
||||
## ALLOCATION profiling
|
||||
|
||||
The profiler can be configured to collect call sites where the largest amount
|
||||
of heap memory is allocated.
|
||||
|
||||
async-profiler does not use intrusive techniques like bytecode instrumentation
|
||||
or expensive DTrace probes which have significant performance impact.
|
||||
It also does not affect Escape Analysis or prevent from JIT optimizations
|
||||
like allocation elimination. Only actual heap allocations are measured.
|
||||
|
||||
The profiler features TLAB-driven sampling. It relies on HotSpot-specific
|
||||
callbacks to receive two kinds of notifications:
|
||||
- when an object is allocated in a newly created TLAB;
|
||||
- when an object is allocated on a slow path outside TLAB.
|
||||
|
||||
Sampling interval can be adjusted with `--alloc` option.
|
||||
For example, `--alloc 500k` will take one sample after 500 KB of allocated
|
||||
space on average. Prior to JDK 11, intervals less than TLAB size will not take effect.
|
||||
|
||||
In allocation profiling mode the top frame of every call trace is the class
|
||||
of the allocated object, and the counter is the heap pressure (the total size
|
||||
of allocated TLABs or objects outside TLAB).
|
||||
|
||||
### Installing Debug Symbols
|
||||
|
||||
Prior to JDK 11, the allocation profiler required HotSpot debug symbols.
|
||||
Some OpenJDK distributions (Amazon Corretto, Liberica JDK, Azul Zulu)
|
||||
already have them embedded in `libjvm.so`, other OpenJDK builds typically
|
||||
provide debug symbols in a separate package. For example, to install
|
||||
OpenJDK debug symbols on Debian / Ubuntu, run:
|
||||
```
|
||||
# apt install openjdk-17-dbg
|
||||
```
|
||||
(replace `17` with the desired version of JDK).
|
||||
|
||||
On CentOS, RHEL and some other RPM-based distributions, this could be done with
|
||||
[debuginfo-install](http://man7.org/linux/man-pages/man1/debuginfo-install.1.html) utility:
|
||||
```
|
||||
# debuginfo-install java-1.8.0-openjdk
|
||||
```
|
||||
|
||||
On Gentoo the `icedtea` OpenJDK package can be built with the per-package setting
|
||||
`FEATURES="nostrip"` to retain symbols.
|
||||
|
||||
The `gdb` tool can be used to verify if debug symbols are properly installed for the `libjvm` library.
|
||||
For example, on Linux:
|
||||
```
|
||||
$ gdb $JAVA_HOME/lib/server/libjvm.so -ex 'info address UseG1GC'
|
||||
```
|
||||
This command's output will either contain `Symbol "UseG1GC" is at 0xxxxx`
|
||||
or `No symbol "UseG1GC" in current context`.
|
||||
|
||||
## Wall-clock profiling
|
||||
|
||||
`-e wall` option tells async-profiler to sample all threads equally every given
|
||||
period of time regardless of thread status: Running, Sleeping or Blocked.
|
||||
For instance, this can be helpful when profiling application start-up time.
|
||||
|
||||
Wall-clock profiler is most useful in per-thread mode: `-t`.
|
||||
|
||||
Example: `asprof -e wall -t -i 5ms -f result.html 8983`
|
||||
|
||||
|
||||
## Lock profiling
|
||||
|
||||
`-e lock` option tells async-profiler to measure lock contention in the profiled application. Lock profiling can help
|
||||
developers understand lock acquisition patterns, lock contention (when threads have to wait to acquire locks), time
|
||||
spent waiting for locks and which code paths are blocked due to locks
|
||||
|
||||
In lock profiling mode the top frame is the class of lock/monitor, and the counter is number of nanoseconds it took to
|
||||
enter this lock/monitor.
|
||||
|
||||
Example: `asprof -e lock -t -i 5ms -f result.html 8983`
|
||||
|
||||
## Java method profiling
|
||||
|
||||
`-e ClassName.methodName` option instruments the given Java method
|
||||
in order to record all invocations of this method with the stack traces.
|
||||
|
||||
Example: `-e java.util.Properties.getProperty` will profile all places
|
||||
where `getProperty` method is called from.
|
||||
|
||||
Only non-native Java methods are supported. To profile a native method,
|
||||
use hardware breakpoint event instead, e.g. `-e Java_java_lang_Throwable_fillInStackTrace`
|
||||
|
||||
**Be aware** that if you attach async-profiler at runtime, the first instrumentation
|
||||
of a non-native Java method may cause the [deoptimization](https://github.com/openjdk/jdk/blob/bf2e9ee9d321ed289466b2410f12ad10504d01a2/src/hotspot/share/prims/jvmtiRedefineClasses.cpp#L4092-L4096)
|
||||
of all compiled methods. The subsequent instrumentation flushes only the _dependent code_.
|
||||
|
||||
The massive CodeCache flush doesn't occur if attaching async-profiler as an agent.
|
||||
|
||||
### Java native method profiling
|
||||
Here are some useful native methods to profile:
|
||||
* ```G1CollectedHeap::humongous_obj_allocate``` - trace _humongous allocations_ of the G1 GC,
|
||||
* ```JVM_StartThread``` - trace creation of new Java threads,
|
||||
* ```Java_java_lang_ClassLoader_defineClass1``` - trace class loading.
|
||||
|
||||
## Multiple events
|
||||
|
||||
It is possible to profile CPU, allocations, and locks at the same time.
|
||||
Instead of CPU, you may choose any other execution event: wall-clock,
|
||||
perf event, tracepoint, Java method, etc.
|
||||
|
||||
The only output format that supports multiple events together is JFR.
|
||||
The recording will contain the following event types:
|
||||
- `jdk.ExecutionSample`
|
||||
- `jdk.ObjectAllocationInNewTLAB` (alloc)
|
||||
- `jdk.ObjectAllocationOutsideTLAB` (alloc)
|
||||
- `jdk.JavaMonitorEnter` (lock)
|
||||
- `jdk.ThreadPark` (lock)
|
||||
|
||||
To start profiling cpu + allocations + locks together, specify
|
||||
```
|
||||
asprof -e cpu,alloc,lock -f profile.jfr ...
|
||||
```
|
||||
or use `--alloc` and `--lock` parameters with the desired threshold:
|
||||
```
|
||||
asprof -e cpu --alloc 2m --lock 10ms -f profile.jfr ...
|
||||
```
|
||||
The same, when starting profiler as an agent:
|
||||
```
|
||||
-agentpath:/path/to/libasyncProfiler.so=start,event=cpu,alloc=2m,lock=10ms,file=profile.jfr
|
||||
```
|
||||
|
||||
## Continuous profiling
|
||||
Continuous profiling is a means using which an application can be profiled
|
||||
continuously and dump profile outputs after a specified amount of time duration.
|
||||
It is a very effective technique in finding performance degradations proactively
|
||||
and efficiently. Continuous profiling helps users to understand performance
|
||||
differences between versions of the same application. Recent outputs can
|
||||
be compared with continuous profiling output history to find differences
|
||||
and optimize the changes introduced in case of performance degradations.
|
||||
aysnc-profiler provides the ability to continously profile an application with
|
||||
the `loop` option. Make sure the filename includes a timestamp pattern, or the
|
||||
output will be overwritten on each iteration.
|
||||
```
|
||||
asprof --loop 1h -f /var/log/profile-%t.jfr 8983
|
||||
```
|
||||
|
||||
## Special event types supported on Linux
|
||||
|
||||
Below special event types are supported on Linux:
|
||||
- `-e mem:<func>[:rwx]` sets read/write/exec breakpoint at function
|
||||
`<func>`. The format of `mem` event is the same as in `perf-record`.
|
||||
Execution breakpoints can be also specified by the function name,
|
||||
e.g. `-e malloc` will trace all calls of native `malloc` function.
|
||||
- `-e trace:<id>` sets a kernel tracepoint. It is possible to specify
|
||||
tracepoint symbolic name, e.g. `-e syscalls:sys_enter_open` will trace
|
||||
all `open` syscalls.
|
||||
- Raw PMU event, e.g. `-e r4d2` selects `MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM` event, which corresponds to event 0xd2, umask 0x4
|
||||
- PMU event descriptor, e.g. `-e cpu/event=0xd2,umask=4/`. The same syntax can be used for uncore and vendor-specific events, e.g. `amd_l3/event=0x01,umask=0x80/`
|
||||
- Symbolic name of a dynamic PMU event, e.g. `-e cpu/topdown-fetch-bubbles/`
|
||||
- kprobe/kretprobe, e.g. `-e kprobe:do_sys_open`, `-e kretprobe:do_sys_open`
|
||||
- uprobe/uretprobe, e.g. `-e uprobe:/usr/lib64/libc-2.17.so+0x114790`
|
||||
91
docs/ProfilingNonJavaApplications.md
Normal file
@@ -0,0 +1,91 @@
|
||||
# Profiling Non-Java applications
|
||||
|
||||
The scope of profiling non-java applications is limited to the case when profiler is controlled
|
||||
programmatically from the process being profiled and with LD_PRELOAD. It is worth noting that
|
||||
[dynamic attach](https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#launching-as-an-agent)
|
||||
which is available for Java is not supported for non-Java profiling.
|
||||
|
||||
|
||||
## C API
|
||||
Similar to the
|
||||
[Java API](https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#using-java-api),
|
||||
there is a C API for using inside native applications.
|
||||
```
|
||||
typedef const char* asprof_error_t;
|
||||
typedef void (*asprof_writer_t)(const char* buf, size_t size);
|
||||
|
||||
// Should be called once prior to any other API functions
|
||||
DLLEXPORT void asprof_init();
|
||||
typedef void (*asprof_init_t)();
|
||||
|
||||
// Returns an error message for the given error code or NULL if there is no error
|
||||
DLLEXPORT const char* asprof_error_str(asprof_error_t err);
|
||||
typedef const char* (*asprof_error_str_t)(asprof_error_t err);
|
||||
|
||||
// Executes async-profiler command using output_callback as an optional sink
|
||||
// for the profiler output. Returning an error code or NULL on success.
|
||||
DLLEXPORT asprof_error_t asprof_execute(const char* command, asprof_writer_t output_callback);
|
||||
typedef asprof_error_t (*asprof_execute_t)(const char* command, asprof_writer_t output_callback);
|
||||
```
|
||||
To use it in a C/C++ application, include asprof.h. Below is an example usage showing how to use async-profiler command with the API. The :
|
||||
```
|
||||
void test_output_callback(const char* buffer, size_t size) {
|
||||
fwrite(buffer, sizeof(char), size, stderr);
|
||||
}
|
||||
|
||||
int main() {
|
||||
dlerror();
|
||||
void* lib = dlopen("path/to/libasyncProfiler.so", RTLD_NOW);
|
||||
if (lib == NULL) {
|
||||
printf("%s\n", dlerror());
|
||||
dlclose(NULL);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
asprof_init_t asprof_init = dlsym(lib, "asprof_init");
|
||||
if(asprof_init == NULL) {
|
||||
printf("%s\n", dlerror());
|
||||
dlclose(lib);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
asprof_init();
|
||||
|
||||
char cmd[] = "start,event=cpu,loglevel=debug,file=profile.jfr";
|
||||
|
||||
printf("Starting profiler\n");
|
||||
|
||||
asprof_execute_t asprof_execute = dlsym(lib, "asprof_execute");
|
||||
if(asprof_execute == NULL) {
|
||||
printf("%s\n", dlerror());
|
||||
dlclose(lib);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
asprof_error_t err = asprof_execute(cmd, test_output_callback);
|
||||
if (err != NULL) {
|
||||
fprintf(stderr, "%s\n", asprof_error_str(err));
|
||||
exit(1);
|
||||
}
|
||||
|
||||
// some meaningful work
|
||||
|
||||
printf("Stopping profiler\n");
|
||||
err = asprof_execute("stop", test_output_callback);
|
||||
if (err != NULL) {
|
||||
fprintf(stderr, "%s\n", asprof_error_str(err));
|
||||
exit(1);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
In addition, async-profiler can be injected into a native application through LD_PRELOAD mechanism:
|
||||
```
|
||||
LD_PRELOAD=/path/to/libasyncProfiler.so ASPROF_COMMAND=start,event=cpu,file=profile.jfr NativeApp [args]
|
||||
```
|
||||
|
||||
All basic functionality remains the same. Profiler can run in cpu, wall and other perf_events
|
||||
modes. Flame Graph and JFR output formats are supported, although JFR files will obviously lack
|
||||
Java-specific events.
|
||||
61
docs/StackWalkingModes.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# Stack Walking Modes
|
||||
|
||||
## Frame pointer
|
||||
|
||||
The default stacking walking in async-profiler, `Frame Pointer (FP)` stack walking, is a technique for collecting call
|
||||
stacks by tracking frame pointers in memory. Each function call maintains a pointer to its caller's stack frame, creating
|
||||
a linked chain that can be traversed to reconstruct the program's execution path. It's particularly efficient as it is
|
||||
very fast compared to other stack walking methods introducing less overhead but requires code to be compiled with frame
|
||||
pointers enabled (`-fno-omit-frame-pointer`).
|
||||
|
||||
## VM
|
||||
|
||||
VM stack walking is the process of traversing the JVM's call stack to determine the sequence of method calls. Each
|
||||
method invocation creates a stack frame containing local variables, parameters, and return addresses.
|
||||
|
||||
This mode of stack walking has been introduced in async-profiler due to issues with `AsyncCallGetTrace`.
|
||||
AsyncGetCallTrace (AGCT) is a non-standard extension of HotSpot JVM to obtain Java stack traces outside safepoints.
|
||||
async-profiler had been relying on AGCT heavily, and it even got its name after this function.
|
||||
|
||||
`AsyncGetCallTrace` being non-API, was never supported in OpenJDK well enough, it did not receive enough testing, it was
|
||||
broken several times even in minor JDK updates, e.g. [JDK-8307549](https://bugs.openjdk.org/browse/JDK-8307549).
|
||||
|
||||
AsyncGetCallTrace is notorious for its inability to walk Java stack in different corner cases. There is a long-standing
|
||||
bug [JDK-8178287](https://bugs.openjdk.org/browse/JDK-8178287) with several examples. But the worst aspect is that
|
||||
AsyncGetCallTrace can crash JVM, and there is no reliable way to get around this outside the JVM.
|
||||
|
||||
Due to issues with AGCT from time to time, including random crashes and missing stack traces, `vm` stack walking mode
|
||||
was introduced in async-profiler. `vm` stack walking in async-profiler has the following advantages:
|
||||
- Fully enclosed by the crash protection based on `setjmp`/`longjmp`.
|
||||
- Displays all frames: Java, native and JVM stubs throughout the whole stack.
|
||||
- Provides additional information on each frame, like JIT compilation type.
|
||||
|
||||
The feature can be enabled with the option `--cstack vm` (or its agent equivalent `cstack=vm`).
|
||||
|
||||
With this option, async-profiler collects mixed stack traces that have Java and native frames interleaved. The total
|
||||
stack depth is controlled with `-j jstackdepth` option. Since the stack walker does not modify any VM structures and is
|
||||
in the full control of async-profiler, it is safe to interrupt it anywhere in the middle of execution.
|
||||
|
||||
## LBR
|
||||
|
||||
Modern Intel CPUs can profile branch instructions, including `call`s and `ret`s, and store their source and destination
|
||||
addresses (Last Branch Records) in hardware registers. Starting from Haswell, CPU can match these addresses to form a
|
||||
branch stack. This branch stack will be effectively a call chain automatically collected by the hardware.
|
||||
|
||||
LBR stacks are not always complete or accurate, but they still appear much more helpful comparing to fp-based stack
|
||||
walking, when a native library is compiled with omitted frame pointers. It works only with hardware events like
|
||||
`-e cycles` (`instructions`, `cache-misses` etc.) and the maximum call chain depth is 32(hardware limit).
|
||||
|
||||
The feature can be enabled with the option `--cstack lbr` (or its agent equivalent `cstack=lbr`).
|
||||
|
||||
## DWARF
|
||||
|
||||
DWARF stack walking is a method to reconstruct call stacks using debug information embedded in executables. Unlike
|
||||
frame-pointer-based unwinding, it works reliably even with optimized code where frame pointers are omitted.
|
||||
|
||||
DWARF unwinding does require extra memory (e.g. the lookup
|
||||
table for `libjvm.so` is about 2MB). It requires debug information to be present in binaries and is slower than the
|
||||
traditional FP-based stack walker, but it's still fast enough for on-the-fly unwinding due to being signal safe in
|
||||
async-profiler.
|
||||
|
||||
The feature can be enabled with the option `--cstack dwarf` (or its agent equivalent `cstack=dwarf`).
|
||||
143
docs/Troubleshooting.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Troubleshooting
|
||||
|
||||
* ```
|
||||
perf_event mmap failed: Operation not permitted
|
||||
```
|
||||
Profiler allocates 8kB perf_event buffer for each thread of the target process.
|
||||
Make sure `/proc/sys/kernel/perf_event_mlock_kb` value is large enough
|
||||
(more than `8 * threads`) when running under unprivileged user. Otherwise, the above message
|
||||
will be printed, and no native stack traces will be collected.
|
||||
|
||||
* ```
|
||||
Failed to change credentials to match the target process: Operation not permitted
|
||||
```
|
||||
Due to limitation of HotSpot Dynamic Attach mechanism, the profiler must be run
|
||||
by exactly the same user (and group) as the owner of target JVM process.
|
||||
If profiler is run by a different user, it will try to automatically change
|
||||
current user and group. This will likely succeed for `root`, but not for
|
||||
other users, resulting in the above error.
|
||||
|
||||
|
||||
* ```
|
||||
Could not start attach mechanism: No such file or directory
|
||||
```
|
||||
|
||||
The profiler cannot establish communication with the target JVM through UNIX domain socket.
|
||||
Usually this happens in one of the following cases:
|
||||
1. Attach socket `/tmp/.java_pidNNN` has been deleted. It is a common
|
||||
practice to clean `/tmp` automatically with some scheduled script.
|
||||
Configure the cleanup software to exclude `.java_pid*` files from deletion.
|
||||
How to check: run `lsof -p PID | grep java_pid`
|
||||
If it lists a socket file, but the file does not exist, then this is exactly
|
||||
the described problem.
|
||||
2. JVM is started with `-XX:+DisableAttachMechanism` option.
|
||||
3. `/tmp` directory of Java process is not physically the same directory
|
||||
as `/tmp` of your shell, because Java is running in a container or in
|
||||
`chroot` environment. `jattach` attempts to solve this automatically,
|
||||
but it might lack the required permissions to do so.
|
||||
Check `strace build/jattach PID properties`
|
||||
4. JVM is busy and cannot reach a safepoint. For instance,
|
||||
JVM is in the middle of long-running garbage collection.
|
||||
How to check: run `kill -3 PID`. Healthy JVM process should print
|
||||
a thread dump and heap info in its console.
|
||||
|
||||
* ```
|
||||
Target JVM failed to load libasyncProfiler.so
|
||||
```
|
||||
The connection with the target JVM has been established, but JVM is unable to load profiler shared library.
|
||||
Make sure the user of JVM process has permissions to access `libasyncProfiler.so` by exactly the same absolute path.
|
||||
For more information see [#78](https://github.com/async-profiler/async-profiler/issues/78).
|
||||
|
||||
|
||||
* ```
|
||||
Perf events unavailable. Try --fdtransfer or --all-user option or 'sysctl kernel.perf_event_paranoid=1'
|
||||
```
|
||||
or
|
||||
```
|
||||
Perf events unavailable
|
||||
```
|
||||
|
||||
`perf_event_open()` syscall has failed. Typical reasons include:
|
||||
1. `/proc/sys/kernel/perf_event_paranoid` is set to restricted mode (>=2).
|
||||
2. seccomp disables `perf_event_open` API in a container.
|
||||
3. OS runs under a hypervisor that does not virtualize performance counters.
|
||||
4. perf_event_open API is not supported on this system, e.g. WSL.</br>
|
||||
|
||||
</br>For permissions-related reasons (such as 1 and 2), using `--fdtransfer` while running the profiler
|
||||
as a privileged user may solve the issue.
|
||||
|
||||
If changing the configuration is not possible, you may fall back to
|
||||
`-e ctimer` profiling mode. It is similar to `cpu` mode, but does not
|
||||
require perf_events support. As a drawback, there will be no kernel
|
||||
stack traces.
|
||||
|
||||
* ```
|
||||
No AllocTracer symbols found. Are JDK debug symbols installed?
|
||||
```
|
||||
The OpenJDK debug symbols are required for allocation profiling for applications developed
|
||||
with JDK prior to 11. See [Installing Debug Symbols](#installing-debug-symbols) for more
|
||||
details. If the error message persists after a successful installation of the debug symbols,
|
||||
it is possible that the JDK was upgraded when installing the debug symbols.
|
||||
In this case, profiling any Java process which had started prior to the installation
|
||||
will continue to display this message, since the process had loaded
|
||||
the older version of the JDK which lacked debug symbols.
|
||||
Restarting the affected Java processes should resolve the issue.
|
||||
|
||||
* ```
|
||||
VMStructs unavailable. Unsupported JVM?
|
||||
```
|
||||
JVM shared library does not export `gHotSpotVMStructs*` symbols -
|
||||
apparently this is not a HotSpot JVM. Sometimes the same message
|
||||
can be also caused by an incorrectly built JDK
|
||||
(see [#218](https://github.com/async-profiler/async-profiler/issues/218)).
|
||||
In these cases installing JDK debug symbols may solve the problem.
|
||||
|
||||
* ```
|
||||
Could not parse symbols from <libname.so>
|
||||
```
|
||||
Async-profiler was unable to parse non-Java function names because of
|
||||
the corrupted contents in `/proc/[pid]/maps`. The problem is known to
|
||||
occur in a container when running Ubuntu with Linux kernel 5.x.
|
||||
This is the OS bug, see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1843018.
|
||||
|
||||
* ```
|
||||
Could not open output file
|
||||
```
|
||||
Output file is written by the target JVM process, not by the profiler script.
|
||||
Make sure the path specified in `-f` option is correct and is accessible by the JVM.
|
||||
|
||||
|
||||
* No Java stacks will be collected if `-XX:MaxJavaStackTraceDepth` is zero
|
||||
or negative. The exception is `--cstack vm` mode, which does not take
|
||||
`MaxJavaStackTraceDepth` into account.
|
||||
|
||||
|
||||
* Too short profiling interval may cause continuous interruption of heavy
|
||||
system calls like `clone()`, so that it will never complete;
|
||||
see [#97](https://github.com/async-profiler/async-profiler/issues/97).
|
||||
The workaround is simply to increase the interval.
|
||||
|
||||
|
||||
* When agent is not loaded at JVM startup (by using -agentpath option) it is
|
||||
highly recommended to use `-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints` JVM flags.
|
||||
Without those flags the profiler will still work correctly but results might be
|
||||
less accurate. For example, without `-XX:+DebugNonSafepoints` there is a high chance
|
||||
that simple inlined methods will not appear in the profile. When the agent is attached at runtime,
|
||||
`CompiledMethodLoad` JVMTI event enables debug info, but only for methods compiled after attaching.
|
||||
|
||||
|
||||
* On most Linux systems, `perf_events` captures call stacks with a maximum depth
|
||||
of 127 frames. On recent Linux kernels, this can be configured using
|
||||
`sysctl kernel.perf_event_max_stack` or by writing to the
|
||||
`/proc/sys/kernel/perf_event_max_stack` file.
|
||||
|
||||
|
||||
* You will not see the non-Java frames _preceding_ the Java frames on the
|
||||
stack, unless `--cstack vm` is specified.
|
||||
For example, if `start_thread` called `JavaMain` and then your Java
|
||||
code started running, you will not see the first two frames in the resulting
|
||||
stack. On the other hand, you _will_ see non-Java frames (user and kernel)
|
||||
invoked by your Java code.
|
||||
|
||||
|
||||
* macOS profiling is limited to user space code only.
|
||||