Kokkos Application Monitoring


Kokkos is a C++ framework for implementing parallel codes. One feature of the framework is the ability to profile various performance and resource utilization characteristics of the application. This is accomplished by loading a shared library at the time the application is started. The base classes and templates in the Kokkos Framework have profiling hooks. By default, these hooks do nothing, however, they can be overridden by loading a Kokkos Profiling Library.

There are several application profilers provided with the framework, and it is possible for developers to implement their own. The application profiling libraries are available here https://github.com/tom95858/kokkos-tools.

LDMS Monitoring of Kokkos Applications

Configuring the Application

The Kokkos Profiling Libary called kp-simple-kernel-timer has been modified to use the LDMS Streams facility to upload profiling data to an LDMS Daemon where it is stored in a SOS database. The application job submission script must be modified to specify the location of the profiling library and other configuration options in order for the framework to activate the plugin.

# Tell the framework what profiler library to load
export KOKKOS_PROFILE_LIBRARY="/path-to-library/kp_kernel_timer.so"

# Set to 1 t tell the profiler to keep the json file

# The component id identifies the node
. /etc/component_id.env

# The instance data describes the application being run

# Tell the profiler where to send the profile data
export LDMSD_STREAM_NAME=kokkos
export LDMSD_STREAM_HOST="localhost"
export LDMSD_STREAM_PORT="10001"
export LDMSD_STREAM_XPRT="sock"
export LDMSD_STREAM_AUTH="munge"


The full path to the dynamically loadable shared library that implements the profiler.


Set to 1, to retain the output file after it has been sent to the ldmd. This is useful for examining the output of the profiler to ensure that your settings are providing the data you intended.


The component id is an integer that identifies the node on which the data was collected. In the example above, this information is sourced from /etc/component_id.env.


The instance data describes the application being run and is used for comparing the performance of an application across multiple runs. To be useful, the string should include the information necessary to discriminate between applications that are running with different job sizes, rank, etc… and thereby preclude a meaningful comparison of their performance.


The stream name which the Kokkos data will be delivered. This name must match the stream= configuration parameter for the ldmd plugin.


The hostname on which the ldmd is running that will receive the Kokkos stream data.


The port number on which the ldmd is listening.


The transport on which the LDMS Daemon was configured to listen.

The LDMS Streams facility uses the LDMS transport to send data to the configured LDMS Daemon and therefore supports transports like rdma, uGNI, and fabric, but also is secured by an authentication strategy. Selecting a secure authentication strategy like munge ensures that your system is secure and that data like the user id/group id cannot be spoofed.


The authentication method to use to authenticate the user.

The profiler uses the LDMS_COMPONENT_ID environment variable to annotate the content with the node id.

Configuring the LDMS Daemon

The LDMS Daemon must be configured to monitor and store the stream being published by the profiler in order for the data to be collected. Add the following to the LDMS Daemon configuration file:

load name=kokkos_store
config name=kokkos_store path= stream=kokkos
prdcr_subscribe regex=.* stream=kokkos

This will cause all stream data to be stored into the SOS container specified in the <container-path> argument in the config statement above. Note that this is subtly different than other LDMS Store configuration in which the path only specifies the directory, not the container name. The kokkos_store does not require an LDMS Storage Policy because data is received directly from an LDMS Stream; not through the normal metric-set update path and therefore includes the complete container path.

You must restart the LDMSD Aggregator service to apply the modified configuration.

# systemctl restart ldmsd.aggregator

Examining the Data

After running a few jobs with profiling enabled, there should be data in the SOS container. Note that the data is only stored when the job completes; there will be no data for jobs that have not yet completed.

The Kokkos Profiling data is stored in two schema: kokkos_app, and kokkos_kernel. The kokkos_app schema contains per-rank summary data for the entire application, the kokkos_kernel schema contains per-rank data for each kokkos kernel.

Attribute NameIndexedDescription
job_idThe job ID
job_nameThe job name
app_idThe application ID
job_tagA string describing the application and it’s configuration data
start_timeThe time the application was started
end_timeThe time the application completed
mpi_rankThe number of MPI rank in the job
hostnameThe host on which this MPI rank ran
user_idThe ID of the user who ran the job
component_idThe LDMS component_id of the hostname
total_app_timeThe total time spent in the application
total_kernel_timesThe total time spent in Kokkos kernels
total_non_kernel_timesThe total time spent in code outside of Kokkos kernels
percent_in_kernelsThe % of total application time spent in Kokkos kernels
unique_kernel_callsThe number of Kokkos kernels in the application
time_job_compYJoin index ordering by start_time, job_id, and component_id
time_comp_jobYJoin index ordering by start_time, component_id, and job_id
job_comp_timeYJoin index ordering by job_id, component_id, and start_time
job_time_compYJoin index by job_id, start_time, and component_id
comp_time_jobYJoin index by component_id, start_time, and job_id
comp_job_timeYJoin index by component_id, job_id, and start_time
The kokkos_app Schema
Attribute NameIndexedDescription
job_id The job ID
app_idThe application ID
inst_dataThe application’s instance data
start_timeThe time at which the kernel was 1st called
end_timeThe time at which the kernel was last called
mpi_rankThe MPI rank on which this data was collected
kernel_nameThe SHA256 hash of the kernel name
regionThe application region containing this kernel
call_countThe number of times this kernel was called
total_timeThe total time spent in this kernel
time_per_callThis is (end_time – start_time) / call_count
time_job_compYJoin index ordering by start_time, job_id, and component_id
time_comp_jobYJoin index ordering by start_time, component_id, and job_id
job_comp_timeYJoin index ordering by job_id, component_id, and start_time
job_time_compYJoin index ordering by job_id, start_time, and component_id
comp_time_jobYJoin index ordering by component_id, start_time, and job_id
comp_job_timeYJoin index ordering by component_id, job_id, and start_time
The kokkos_kernel Schema

Querying the SOS Container

The SOS container can be queried to determine if Kokkos data is being successfully stored as follows:

sos_cmd -C  -qS kokkos_app -X time_job_comp \
   -V start_time -V job_id -V -V component_id -V ppercent_in_kernels

Note that at least one Kokkos job must have completed execution before data will be available in the SOS container.