This tutorial provides an overview of querying and analyzing Kokkos application profiling data that is stored in a SOS container. The facility for doing this is provided a Python module called NumSOS that provides convenience classes that link SciPy(NumPy) and Pandas to various storage services.
This tutorial will use the Scalable Object Store as this is the facility that is available on the platforms on which the Kokkos profiling data is being collected.
Querying Object Data
The NumSOS classes are designed to work with a number of storage back ends including SOS and InfluxDB. A factory method called datasource() is used to construct an instance of a DataSource that is the interface for accessing your data.
from numsos.DataSource import datasource
src = datasource("sos")
src.config(path="path-to-container")
Object in SOS are described by a schema. The schema available in a container are displayed with the data source’s show_schemas() method. For example,
>>> from numsos.DataSource import datasource
>>> src = datasource("sos")
>>> src.config(path="/DATA15/orion/ldms-data")
>>> src.show_schemas()
Name Id Attr Count
------------- ------------ ------------
kokkos_kernel 130 19
kokkos_app 129 21
>>>
In this case, the container has two schema: kokkos_kernel, and kokkos_app. The detail for these schema can be queried with the data source’s show_schema() method as follows:
>>> src.show_schema('kokkos_app')
Name Id Type Indexed Info
-------------------------------- -------- ------------ -------- --------------------------------
job_id 0 UINT64 False
job_name 1 STRING False
app_id 2 UINT64 False
job_tag 3 STRUCT False
start_time 4 TIMESTAMP False
end_time 5 TIMESTAMP False
mpi_rank 6 UINT64 False
hostname 7 STRING False
user_id 8 UINT32 False
component_id 9 UINT64 False
total_app_time 10 DOUBLE False
total_kernel_times 11 DOUBLE False
total_non_kernel_times 12 DOUBLE False
percent_in_kernels 13 DOUBLE False
unique_kernel_calls 14 DOUBLE False
time_job_comp 15 JOIN True start_time, job_id, component_id
time_comp_job 16 JOIN True start_time, component_id, job_id
job_comp_time 17 JOIN True job_id, component_id, start_time
job_time_comp 18 JOIN True job_id, start_time, component_id
comp_time_job 19 JOIN True component_id, start_time, job_id
comp_job_time 20 JOIN True component_id, job_id, start_time
>>>
The DataSource class’s select method is used to specify the query criteria for your data. For example:
src.select([ '*' ],
from_ = [ 'kokkos_app' ],
where = [ [ 'job_start', Sos.COND_GE, time.time() - 3600] ],
order_by = 'time_job_comp')
The first parameter is the list of attributes from the object that you want in the result. The special character ‘*’ means all attributes are to be returned. The from_ keyword argument is a list of schema that will be searched for the attributes listed in the column list. In this case, we are querying for ‘kokkos_app’ objects. The where keyword argument is a list of conditions for filtering the data. We are asking for any job that starts within the last hour. Finally, the order_by keyword argument is asking that the data returned is ordered by start_time, job_id, and then component_id. This can be seen from the time_job_comp index shown in the schema detail above.
The data source has a convenience method called show that is useful for testing your select conditions. It has a keyword parameter called limit that will show only a subset of the results as follows:
>>> src.show(limit=4)
kokkos_app
job_id job_name app_id job_tag start_time end_time mpi_rank hostname user_id component_id total_app_time total_kernel_times total_non_kernel_times percent_in_kernels unique_kernel_calls
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
6865328 run-2x4.sh 0 bytearray(b'\xc7\xd4\x81\xc1\xf1\xff>\x96L\xc3\x08\rtj\x98\xa6\xe1\x83\x9e\x83%\xb6\xfe\xb0\xa5ezdjZ\xe7\xe5') (1587330855, 0) (1587330953, 0) 2 orion-01 1002 10001 97.782757 97.376143 0.406614 99.58 31.0
6865328 run-2x4.sh 0 bytearray(b'\xc7\xd4\x81\xc1\xf1\xff>\x96L\xc3\x08\rtj\x98\xa6\xe1\x83\x9e\x83%\xb6\xfe\xb0\xa5ezdjZ\xe7\xe5') (1587330855, 0) (1587330953, 0) 0 orion-01 1002 10001 98.109406 97.699838 0.409568 99.58 31.0
6865328 run-2x4.sh 0 bytearray(b'\xc7\xd4\x81\xc1\xf1\xff>\x96L\xc3\x08\rtj\x98\xa6\xe1\x83\x9e\x83%\xb6\xfe\xb0\xa5ezdjZ\xe7\xe5') (1587330855, 0) (1587330953, 0) 3 orion-01 1002 10001 98.270293 97.813985 0.456308 99.54 31.0
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ---------------
4 record(s)
Performing Analysis
Data is available from the DataSource using the get_results() method. This method returns a DataSet class that encapsulates all of the data from the query as a collection of numpy arrays. Each array is named for the schema attribute from which it came. Simple arithmetic can be performed on the data either indirectly using the DataSet class or directly by accessing the underlying numpy arrays that contain the data. To access the numpy array for the total_kernel_times attribute, do the following:
>>> res = src.get_results()
>>> numpy_array = res.array('total_kernel_times')
>>> numpy_array
array([ 97.376143, 97.699838, 97.813985, 98.140389, 97.494842,
97.73381 , 97.9573 , 98.131917, 66.678124, 66.755199,
...
In this way, you can perform all of your analysis using the numpy arrays directly.
The DataSet Class
The DataSet class provides a lot of convenient functionality that you may find makes your analysis easier to implement. For example, the following compute each job’s duration across all job data in the result.
>>> duration = res['end_time'] - res['start_time']
>>> duration.show(limit=4)
(end_time-
start_time)
----------------
98000000 microseconds
98000000 microseconds
98000000 microseconds
98000000 microseconds
----------------
4 results
Note that the name of the result is automatically constructed from the terms in the expression. This can be controlled using the >> operator as follows:
>>> duration = res['end_time'] - res['start_time'] >> 'duration'
>>> duration.show(limit=4)
duration
----------------
98000000 microseconds
98000000 microseconds
98000000 microseconds
98000000 microseconds
----------------
4 results
To add this new series to your dataset, use the << operator. This will add the result duration to the original dataset:
>>> res <<= duration
>>> res.show([ 'job_id', 'job_name', 'duration' ], limit=4)
job_id job_name duration
---------------- ---------------- ----------------
6865356.0 run-2x4.sh 98000000 microseconds
6865356.0 run-2x4.sh 98000000 microseconds
6865356.0 run-2x4.sh 98000000 microseconds
6865356.0 run-2x4.sh 98000000 microseconds
---------------- ---------------- ----------------
4 results
The Python help() function is available to provide more complete documentation on the capabilities of the DataSet class.
The Transform Class
A common use case is that the data returned by the query cannot be analyzed as a single time series, but rather must be grouped appropriately to the analysis. For example, computing the average duration across all ranks in a job requires that the data be grouped by job_id. Using the DataSet class alone, you can compute the mean, but that would result in the mean across all jobs in the query, not for each job in the query. The Transform class provides this kind of capability. For example, to compute the mean for each job, you would do the following:
>>> xfrm = Transform(src, None, limit=1024*1024)
>>> res = xfrm.begin()
>>> xfrm.show()
[TOP] 1034 ['job_id', 'job_name', 'app_id', 'job_tag', 'start_time',
'end_time', 'mpi_rank', 'hostname', 'user_id', 'component_id',
'total_app_time', 'total_kernel_times', 'total_non_kernel_times',
'percent_in_kernels', 'unique_kernel_calls']
The Transform class maintains a stack of results and operates on the stack, consuming the entry at the top and pushing the result back. The show() method displays the contents of this stack. Computing the mean across data in the result by job is a single line of code as follows:
>> mean = xfrm.mean([ 'total_app_time', 'total_kernel_times', 'total_non_kernel_times' ], group_name = 'job_id')
>>> xfrm.show()
[TOP] 84 ['job_id', 'total_app_time_mean', 'total_kernel_times_mean',
'total_non_kernel_times_mean']
>>> xfrm.top().show()
total_app_time_m total_kernel_tim total_non_kernel
job_id ean es_mean _times_mean
---------------- ---------------- ---------------- ----------------
6865356.0 98.312842625 97.8850485 0.427794
6865357.0 67.4997825 67.1489595 0.350823
6865358.0 67.59273175 67.243073 0.349659
. . .
6865439.0 67.50131925 67.12249025 0.378829
---------------- ---------------- ---------------- ----------------
84 results
The methods in the Transform class leave the result on the top of the stack, but also return the result for convenience. In this way, your code can combine the algebraic style supported by the DataSet as well as the stack style used by the Transform class when either is most appropriate.
Another key feature of the Transform class is the for_each method. This method will iterate through the query result in a particular order. It is effectively a multi-dimensional group_name function. Consider the problem of computing the mean time spent in each of the Kokkos kernels. This requires grouping the data first by job_id, and then by kernel_name.
The for_each signature is as follows:
for_each(self, series_list, xfrm_fn)
The function xfrm_fn will be called once for each unique group of data in the series_list. For example:
xfrm.for_each([ 'job_id', 'kernel_name' ], xfrm.kernel_stats )
This will call the xfrm.kernel_stats function once for each unique combination of job_id, and kernel_name. On the top of the stack at each invocation is a DataSet containing all of the data that has the same job_id and kernel_name.
For this example, we’ll create a sub-class of Transform.
class SHA256_Mapper:
def __init__(self, cont):
"""Implements a SHA256 ---> String mapping service
kernel_names and job_tag are stored as SHA256 hash
values because the associated strings can be very large
"""
self.src = datasource("sos")
self.src.config(cont=cont)
def string(self, sha256):
self.src.select([ '*' ],
from_ = [ 'sha256_string' ],
where = [
[ 'sha256', Sos.COND_EQ, sha256 ],
],
order_by = 'sha256',
)
res = self.src.get_results(limit=1)
if res:
return res.array('string')[0]
return ""
class Xfrm(Transform):
def __init__(self, src, sink):
Transform.__init__(self, src, sink, limit=1024 * 1024)
self.mapper = SHA256_Mapper(src.cont)
def string(self, sha256):
return self.mapper.string(sha256)
def kernel_stats(self, values):
data = self.pop()
avg = data.mean([ 'total_time' ]) >> 'avg_time'
std = data.std([ 'total_time' ]) >> 'std_time'
maxt = data.maxrow('total_time')
mint = data.minrow('total_time')
kernel_name = self.string(mint.array('kernel_name')[0])
if len(kernel_name) > 40:
print(kernel_name)
kernel_name = ""
print("{0:40} {1:12.8f} {2:12.8f} {3:12.8f}/{4:<4} {5:12.8f}/{6:<4}".format(
kernel_name,
avg.array(0)[0], std.array(0)[0],
mint.array('total_time')[0], int(mint.array('mpi_rank')[0]),
maxt.array('total_time')[0], int(maxt.array('mpi_rank')[0])
))
return
Note that the kernel_stats method pops the data off the top of the stack and then uses the more convenient algebraic operations available from the DataSet class. This is possible because the data has been ordered and filtered as specified in the series list.
The SHA256 class is to convert the kernel_name from a SHA256 hash to the source string value. The kernel_name is stored as a hash in order to save space. Because these kernel names are generated by the C++ compiler from the template name and templates can be nested, these names can get upwards of 4K in size.
A full working example program is available here.