Overview
Kokkos is a C++ framework for implementing parallel codes. One feature of the framework is the ability to profile various performance and resource utilization characteristics of the application. This is accomplished by loading a shared library at the time the application is started. The base classes and templates in the Kokkos Framework have profiling hooks. By default, these hooks do nothing, however, they can be overridden by loading a Kokkos Profiling Library.
There are several application profilers provided with the framework, and it is possible for developers to implement their own. The application profiling libraries are available here https://github.com/tom95858/kokkos-tools.
LDMS Monitoring of Kokkos Applications
Configuring the Application
The Kokkos Profiling Libary called kp-simple-kernel-timer has been modified to use the LDMS Streams facility to upload profiling data to an LDMS Daemon where it is stored in a SOS database. The application job submission script must be modified to specify the location of the profiling library and other configuration options in order for the framework to activate the plugin.
# Tell the framework what profiler library to load export KOKKOS_PROFILE_LIBRARY="/path-to-library/kp_kernel_timer.so" # Set to 1 t tell the profiler to keep the json file export KOKKOS_KEEP_JSON_FILE=0 # The component id identifies the node . /etc/component_id.env export LDMS_COMPONENT_ID=${COMPONENT_ID} # The instance data describes the application being run export LDMS_INSTANCE_DATA="MACHINE=orion\ NUM_NODES=${SLURM_JOB_NUM_NODES}\ NUM_TASKS=${SLURM_NTASKS}\ PART=${SLURM_JOB_PARTITION} ${CMD}" # Tell the profiler where to send the profile data export LDMSD_STREAM_NAME=kokkos export LDMSD_STREAM_HOST="localhost" export LDMSD_STREAM_PORT="10001" export LDMSD_STREAM_XPRT="sock" export LDMSD_STREAM_AUTH="munge"
KOKKOS_PROFILE_LIBRARY
The full path to the dynamically loadable shared library that implements the profiler.
KOKKOS_KEEP_JSON_FILE
Set to 1, to retain the output file after it has been sent to the ldmd. This is useful for examining the output of the profiler to ensure that your settings are providing the data you intended.
LDMS_COMPONENT_ID
The component id is an integer that identifies the node on which the data was collected. In the example above, this information is sourced from /etc/component_id.env.
LDMS_INSTANCE_DATA
The instance data describes the application being run and is used for comparing the performance of an application across multiple runs. To be useful, the string should include the information necessary to discriminate between applications that are running with different job sizes, rank, etc… and thereby preclude a meaningful comparison of their performance.
LDMSD_STREAM_NAME
The stream name which the Kokkos data will be delivered. This name must match the stream= configuration parameter for the ldmd plugin.
LDMSD_STREAM_HOST
The hostname on which the ldmd is running that will receive the Kokkos stream data.
LDMSD_STREAM_PORT
The port number on which the ldmd is listening.
LDMSD_STREAM_XPRT
The transport on which the LDMS Daemon was configured to listen.
The LDMS Streams facility uses the LDMS transport to send data to the configured LDMS Daemon and therefore supports transports like rdma, uGNI, and fabric, but also is secured by an authentication strategy. Selecting a secure authentication strategy like munge ensures that your system is secure and that data like the user id/group id cannot be spoofed.
LDMSD_STREAM_AUTH
The authentication method to use to authenticate the user.
The profiler uses the LDMS_COMPONENT_ID environment variable to annotate the content with the node id.
Configuring the LDMS Daemon
The LDMS Daemon must be configured to monitor and store the stream being published by the profiler in order for the data to be collected. Add the following to the LDMS Daemon configuration file:
load name=kokkos_store config name=kokkos_store path=stream=kokkos prdcr_subscribe regex=.* stream=kokkos
This will cause all stream data to be stored into the SOS container specified in the <container-path> argument in the config statement above. Note that this is subtly different than other LDMS Store configuration in which the path only specifies the directory, not the container name. The kokkos_store does not require an LDMS Storage Policy because data is received directly from an LDMS Stream; not through the normal metric-set update path and therefore includes the complete container path.
You must restart the LDMSD Aggregator service to apply the modified configuration.
# systemctl restart ldmsd.aggregator
Examining the Data
After running a few jobs with profiling enabled, there should be data in the SOS container. Note that the data is only stored when the job completes; there will be no data for jobs that have not yet completed.
The Kokkos Profiling data is stored in two schema: kokkos_app, and kokkos_kernel. The kokkos_app schema contains per-rank summary data for the entire application, the kokkos_kernel schema contains per-rank data for each kokkos kernel.
Attribute Name | Indexed | Description |
---|---|---|
job_id | The job ID | |
job_name | The job name | |
app_id | The application ID | |
job_tag | A string describing the application and it’s configuration data | |
start_time | The time the application was started | |
end_time | The time the application completed | |
mpi_rank | The number of MPI rank in the job | |
hostname | The host on which this MPI rank ran | |
user_id | The ID of the user who ran the job | |
component_id | The LDMS component_id of the hostname | |
total_app_time | The total time spent in the application | |
total_kernel_times | The total time spent in Kokkos kernels | |
total_non_kernel_times | The total time spent in code outside of Kokkos kernels | |
percent_in_kernels | The % of total application time spent in Kokkos kernels | |
unique_kernel_calls | The number of Kokkos kernels in the application | |
time_job_comp | Y | Join index ordering by start_time, job_id, and component_id |
time_comp_job | Y | Join index ordering by start_time, component_id, and job_id |
job_comp_time | Y | Join index ordering by job_id, component_id, and start_time |
job_time_comp | Y | Join index by job_id, start_time, and component_id |
comp_time_job | Y | Join index by component_id, start_time, and job_id |
comp_job_time | Y | Join index by component_id, job_id, and start_time |
Attribute Name | Indexed | Description |
---|---|---|
job_id | The job ID | |
app_id | The application ID | |
inst_data | The application’s instance data | |
start_time | The time at which the kernel was 1st called | |
end_time | The time at which the kernel was last called | |
mpi_rank | The MPI rank on which this data was collected | |
kernel_name | The SHA256 hash of the kernel name | |
kernel_type | “PARALLEL_FOR”, “PARALLEL_SCAN”, “PARALLEL_REDUCE” | |
region | The application region containing this kernel | |
call_count | The number of times this kernel was called | |
total_time | The total time spent in this kernel | |
time_per_call | This is (end_time – start_time) / call_count | |
time_job_comp | Y | Join index ordering by start_time, job_id, and component_id |
time_comp_job | Y | Join index ordering by start_time, component_id, and job_id |
job_comp_time | Y | Join index ordering by job_id, component_id, and start_time |
job_time_comp | Y | Join index ordering by job_id, start_time, and component_id |
comp_time_job | Y | Join index ordering by component_id, start_time, and job_id |
comp_job_time | Y | Join index ordering by component_id, job_id, and start_time |
Querying the SOS Container
The SOS container can be queried to determine if Kokkos data is being successfully stored as follows:
sos_cmd -C-qS kokkos_app -X time_job_comp \ -V start_time -V job_id -V -V component_id -V ppercent_in_kernels
Note that at least one Kokkos job must have completed execution before data will be available in the SOS container.