Numerical Analysis of Kokkos Data in SOS

This tutorial provides an overview of querying and analyzing Kokkos application profiling data that is stored in a SOS container. The facility for doing this is provided a Python module called NumSOS that provides convenience classes that link SciPy(NumPy) and Pandas to various storage services.

This tutorial will use the Scalable Object Store as this is the facility that is available on the platforms on which the Kokkos profiling data is being collected.

Querying Object Data

The NumSOS classes are designed to work with a number of storage back ends including SOS and InfluxDB. A factory method called datasource() is used to construct an instance of a DataSource that is the interface for accessing your data.

from numsos.DataSource import datasource
src = datasource("sos")
src.config(path="path-to-container")

Object in SOS are described by a schema. The schema available in a container are displayed with the data source’s show_schemas() method. For example,

>>> from numsos.DataSource import datasource
>>> src = datasource("sos")
>>> src.config(path="/DATA15/orion/ldms-data")
>>> src.show_schemas()
Name          Id           Attr Count
------------- ------------ ------------
kokkos_kernel          130           19
kokkos_app             129           21
>>>

In this case, the container has two schema: kokkos_kernel, and kokkos_app. The detail for these schema can be queried with the data source’s show_schema() method as follows:

>>> src.show_schema('kokkos_app')
Name                             Id       Type         Indexed  Info
-------------------------------- -------- ------------ -------- --------------------------------
job_id                                  0 UINT64       False    
job_name                                1 STRING       False    
app_id                                  2 UINT64       False    
job_tag                                 3 STRUCT       False    
start_time                              4 TIMESTAMP    False    
end_time                                5 TIMESTAMP    False    
mpi_rank                                6 UINT64       False    
hostname                                7 STRING       False    
user_id                                 8 UINT32       False    
component_id                            9 UINT64       False    
total_app_time                         10 DOUBLE       False    
total_kernel_times                     11 DOUBLE       False    
total_non_kernel_times                 12 DOUBLE       False    
percent_in_kernels                     13 DOUBLE       False    
unique_kernel_calls                    14 DOUBLE       False    
time_job_comp                          15 JOIN         True     start_time, job_id, component_id
time_comp_job                          16 JOIN         True     start_time, component_id, job_id
job_comp_time                          17 JOIN         True     job_id, component_id, start_time
job_time_comp                          18 JOIN         True     job_id, start_time, component_id
comp_time_job                          19 JOIN         True     component_id, start_time, job_id
comp_job_time                          20 JOIN         True     component_id, job_id, start_time
>>> 

The DataSource class’s select method is used to specify the query criteria for your data. For example:

src.select([ '*' ],
           from_    = [ 'kokkos_app' ],
           where    = [ [ 'job_start', Sos.COND_GE, time.time() - 3600] ],
           order_by = 'time_job_comp')

The first parameter is the list of attributes from the object that you want in the result. The special character ‘*’ means all attributes are to be returned. The from_ keyword argument is a list of schema that will be searched for the attributes listed in the column list. In this case, we are querying for ‘kokkos_app’ objects. The where keyword argument is a list of conditions for filtering the data. We are asking for any job that starts within the last hour. Finally, the order_by keyword argument is asking that the data returned is ordered by start_time, job_id, and then component_id. This can be seen from the time_job_comp index shown in the schema detail above.

The data source has a convenience method called show that is useful for testing your select conditions. It has a keyword parameter called limit that will show only a subset of the results as follows:

>>> src.show(limit=4)
kokkos_app                                                                                                                                                                                                                                      
job_id          job_name        app_id          job_tag         start_time      end_time        mpi_rank        hostname        user_id         component_id    total_app_time  total_kernel_times total_non_kernel_times percent_in_kernels unique_kernel_calls 
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- 
        6865328      run-2x4.sh               0 bytearray(b'\xc7\xd4\x81\xc1\xf1\xff>\x96L\xc3\x08\rtj\x98\xa6\xe1\x83\x9e\x83%\xb6\xfe\xb0\xa5ezdjZ\xe7\xe5') (1587330855, 0) (1587330953, 0)               2        orion-01            1002           10001       97.782757       97.376143        0.406614           99.58            31.0 
        6865328      run-2x4.sh               0 bytearray(b'\xc7\xd4\x81\xc1\xf1\xff>\x96L\xc3\x08\rtj\x98\xa6\xe1\x83\x9e\x83%\xb6\xfe\xb0\xa5ezdjZ\xe7\xe5') (1587330855, 0) (1587330953, 0)               0        orion-01            1002           10001       98.109406       97.699838        0.409568           99.58            31.0 
        6865328      run-2x4.sh               0 bytearray(b'\xc7\xd4\x81\xc1\xf1\xff>\x96L\xc3\x08\rtj\x98\xa6\xe1\x83\x9e\x83%\xb6\xfe\xb0\xa5ezdjZ\xe7\xe5') (1587330855, 0) (1587330953, 0)               3        orion-01            1002           10001       98.270293       97.813985        0.456308           99.54            31.0 
--------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- 
4 record(s)

Performing Analysis

Data is available from the DataSource using the get_results() method. This method returns a DataSet class that encapsulates all of the data from the query as a collection of numpy arrays. Each array is named for the schema attribute from which it came. Simple arithmetic can be performed on the data either indirectly using the DataSet class or directly by accessing the underlying numpy arrays that contain the data. To access the numpy array for the total_kernel_times attribute, do the following:

>>> res = src.get_results()
>>> numpy_array = res.array('total_kernel_times')
>>> numpy_array
array([  97.376143,   97.699838,   97.813985,   98.140389,   97.494842,
         97.73381 ,   97.9573  ,   98.131917,   66.678124,   66.755199,
...

In this way, you can perform all of your analysis using the numpy arrays directly.

The DataSet Class

The DataSet class provides a lot of convenient functionality that you may find makes your analysis easier to implement. For example, the following compute each job’s duration across all job data in the result.

>>> duration = res['end_time'] - res['start_time'] 
>>> duration.show(limit=4)
      (end_time- 
     start_time) 
---------------- 
98000000 microseconds 
98000000 microseconds 
98000000 microseconds 
98000000 microseconds 
---------------- 
4 results

Note that the name of the result is automatically constructed from the terms in the expression. This can be controlled using the >> operator as follows:

>>> duration = res['end_time'] - res['start_time']  >> 'duration'
>>> duration.show(limit=4)
        duration 
---------------- 
98000000 microseconds 
98000000 microseconds 
98000000 microseconds 
98000000 microseconds 
---------------- 
4 results

To add this new series to your dataset, use the << operator. This will add the result duration to the original dataset:

>>> res <<= duration
>>> res.show([ 'job_id', 'job_name', 'duration' ], limit=4)
          job_id         job_name         duration 
---------------- ---------------- ---------------- 
       6865356.0 run-2x4.sh       98000000 microseconds 
       6865356.0 run-2x4.sh       98000000 microseconds 
       6865356.0 run-2x4.sh       98000000 microseconds 
       6865356.0 run-2x4.sh       98000000 microseconds 
---------------- ---------------- ---------------- 
4 results

The Python help() function is available to provide more complete documentation on the capabilities of the DataSet class.

The Transform Class

A common use case is that the data returned by the query cannot be analyzed as a single time series, but rather must be grouped appropriately to the analysis. For example, computing the average duration across all ranks in a job requires that the data be grouped by job_id. Using the DataSet class alone, you can compute the mean, but that would result in the mean across all jobs in the query, not for each job in the query. The Transform class provides this kind of capability. For example, to compute the mean for each job, you would do the following:

>>> xfrm = Transform(src, None, limit=1024*1024)
>>> res = xfrm.begin()
>>> xfrm.show()
[TOP] 1034  ['job_id', 'job_name', 'app_id', 'job_tag', 'start_time',
            'end_time', 'mpi_rank', 'hostname', 'user_id', 'component_id',
            'total_app_time', 'total_kernel_times', 'total_non_kernel_times',
            'percent_in_kernels', 'unique_kernel_calls']

The Transform class maintains a stack of results and operates on the stack, consuming the entry at the top and pushing the result back. The show() method displays the contents of this stack. Computing the mean across data in the result by job is a single line of code as follows:

>> mean = xfrm.mean([ 'total_app_time', 'total_kernel_times', 'total_non_kernel_times' ], group_name = 'job_id')
>>> xfrm.show()
[TOP]   84  ['job_id', 'total_app_time_mean', 'total_kernel_times_mean',
            'total_non_kernel_times_mean']
>>> xfrm.top().show()
                 total_app_time_m total_kernel_tim total_non_kernel 
          job_id              ean          es_mean      _times_mean 
---------------- ---------------- ---------------- ---------------- 
       6865356.0     98.312842625       97.8850485         0.427794 
       6865357.0       67.4997825       67.1489595         0.350823 
       6865358.0      67.59273175        67.243073         0.349659 
. . .
       6865439.0      67.50131925      67.12249025         0.378829 
---------------- ---------------- ---------------- ---------------- 
84 results

The methods in the Transform class leave the result on the top of the stack, but also return the result for convenience. In this way, your code can combine the algebraic style supported by the DataSet as well as the stack style used by the Transform class when either is most appropriate.

Another key feature of the Transform class is the for_each method. This method will iterate through the query result in a particular order. It is effectively a multi-dimensional group_name function. Consider the problem of computing the mean time spent in each of the Kokkos kernels. This requires grouping the data first by job_id, and then by kernel_name.

The for_each signature is as follows:

for_each(self, series_list, xfrm_fn)

The function xfrm_fn will be called once for each unique group of data in the series_list. For example:

xfrm.for_each([ 'job_id', 'kernel_name' ], xfrm.kernel_stats )

This will call the xfrm.kernel_stats function once for each unique combination of job_id, and kernel_name. On the top of the stack at each invocation is a DataSet containing all of the data that has the same job_id and kernel_name.

For this example, we’ll create a sub-class of Transform.

class SHA256_Mapper:
    def __init__(self, cont):
        """Implements a SHA256 ---> String mapping service

        kernel_names and job_tag are stored as SHA256 hash
        values because the associated strings can be very large
        """
        self.src = datasource("sos")
        self.src.config(cont=cont)

    def string(self, sha256):
        self.src.select([ '*' ],
            from_    = [ 'sha256_string' ],
            where    = [
                        [ 'sha256', Sos.COND_EQ, sha256 ],
                        ],
            order_by = 'sha256',
        )
        res = self.src.get_results(limit=1)
        if res:
            return res.array('string')[0]
        return ""

class Xfrm(Transform):
    def __init__(self, src, sink):
        Transform.__init__(self, src, sink, limit=1024 * 1024)
        self.mapper = SHA256_Mapper(src.cont)

    def string(self, sha256):
        return self.mapper.string(sha256)

    def kernel_stats(self, values):
        data = self.pop()
        avg = data.mean([ 'total_time' ]) >> 'avg_time'
        std = data.std([ 'total_time' ]) >> 'std_time'
        maxt = data.maxrow('total_time')
        mint = data.minrow('total_time')
        kernel_name = self.string(mint.array('kernel_name')[0])
        if len(kernel_name) > 40:
            print(kernel_name)
            kernel_name = ""
        print("{0:40} {1:12.8f} {2:12.8f} {3:12.8f}/{4:<4} {5:12.8f}/{6:<4}".format(
            kernel_name,
            avg.array(0)[0], std.array(0)[0],
            mint.array('total_time')[0], int(mint.array('mpi_rank')[0]),
            maxt.array('total_time')[0], int(maxt.array('mpi_rank')[0])
        ))
        return

Note that the kernel_stats method pops the data off the top of the stack and then uses the more convenient algebraic operations available from the DataSet class. This is possible because the data has been ordered and filtered as specified in the series list.

The SHA256 class is to convert the kernel_name from a SHA256 hash to the source string value. The kernel_name is stored as a hash in order to save space. Because these kernel names are generated by the C++ compiler from the template name and templates can be nested, these names can get upwards of 4K in size.

A full working example program is available here.