LDMSD systemd service

Intro

This page contain information about configuring, starting and stopping LDMS sampler daemon (ldmsd.sampler) and LDMS aggregator daemon (ldmsd.aggregator) using systemd.

It is also encouraging to learn LDMS (ldmsd, ldms_ls, etc) by manually start/stop and configure daemon via CLI to know how they work in harmony. For such purpose, please consult ldmsd(8), ldmsd_controller(8), and ldms_ls(8) man pages (located in /opt/ovis/share/man/man8, e.g. man /opt/ovis/share/man/man8/ldmsd.8).

Configuration

We separate ldmsd.sampler and ldmsd.aggregator to conveniently differentiate the role between the two services, which can be configured to listen on different port, etc. Also, this will make running both aggregator daemon and sampler daemon on the same node easier.

To make things more clear, let’s setup an example of having 2 sampler daemons and 1 aggregator daemon. We will have 2 systems: vmtest1 and vmtest2. vmtest1 will run one sampler daemon, while vmtest2 will run both a sampler daemon and an aggregator daemon. The aggregator daemon will aggregate data from the two sampler daemons and store the data using store_sos.

ldmsd.sampler configuration

There are two files concerning the configuration of sampler daemon in /opt/ovis/etc/ldms/ directory: 1) ldmsd.sampler.env and 2) ldmsplugin.sampler.conf.

ldmsd.sampler.env controls transport type, service port number, daemon-configuring port number, shared-secret authentication file, plugin configuration file location, and library paths.

When starting an ldmsd using systemd producer, the ldmsd log messages are printed to the system log, and there is no UNIX domain socket file. You could get the ldmsd log messages by calling the command journalctl | grep ldmsd.

ldmsplugin.sampler.conf controls plugin configuration.

The following is a set of configuration examples for each sampler daemon on vmtest1 and vmtest2.

# file: ldmsd.sampler.env are the same on both vmtest1 and vmtest2

# LDMS transport option (sock, rdma, or ugni)
LDMSD_XPRT=sock

# LDMS Daemon service port
LDMSD_PORT=10001

# LDMSD configuration port, for ldmsd_controller over the network. Configuration
# transport always use 'socket', and is separated from the LDMS_XPRT. Comment the following 
# line to not open the control/config port. If it is commented out, ldmsd_controller cannot 
# be used when starting an ldmsd with systemd.
LDMSD_CONFIG_PORT=10101

# Authentication file path, see /opt/ovis/etc/ldms/ldmsauth.conf for an example
LDMSD_AUTH_FILE=/opt/ovis/etc/ldms/ldmsauth.conf

# LDMS plugin configuration file, see /opt/ovis/etc/ldms/ldmsplugin.sampler.conf 
# for an example. The path to a customized configuration file can be given here.
LDMSD_PLUGIN_CONFIG_FILE=/opt/ovis/etc/ldms/ldmsplugin.sampler.conf

# These are configured by configure script, no need to change.
LDMSD_PLUGIN_LIBPATH=/opt/ovis/lib64/ovis-ldms
ZAP_LIBPATH=/opt/ovis/lib64/ovis-lib
# file: ldmsplugin.sampler.conf on vmtest1
# Load and config the popular meminfo sampler 
load name=meminfo 
config name=meminfo producer=vmtest1 instance=vmtest1/meminfo 
start name=meminfo interval=2000000 offset=0 
 
# Load and config array_example sampler 
load name=array_example 
config name=array_example producer=vmtest1 instance=vmtest1/array_example
start name=array_example interval=2000000 offset=0 
# file: ldmsplugin.sampler.conf on vmtest2
# Load and config the popular meminfo sampler 
load name=meminfo 
config name=meminfo producer=vmtest2 instance=vmtest2/meminfo 
start name=meminfo interval=2000000 offset=0 
 
# Load and config array_example sampler 
load name=array_example 
config name=array_example producer=vmtest2 instance=vmtest2/array_example
start name=array_example interval=2000000 offset=0

NOTE: Please notice the producer and instance name differences in the sampler plugin configurations.

ldmsd.aggregator configuration

Similar to the sampler daemon, aggregator daemon also has environment file and plugin configuration file to configure as follows:

# file: ldmsd.aggregator.env on vmtest2

# LDMS transport option (sock, rdma, or ugni)
LDMSD_XPRT=sock

# LDMS Daemon service port
LDMSD_PORT=10000

# LDMSD configuration port, for ldmsd_controller over the network. Configuration
# transport always use 'socket', and is separated from the LDMS_XPRT. Comment the following 
# line out to not open the config/control port and ldmsd_controller cannot be used when 
# starting an ldmsd with systemd.
LDMSD_CONFIG_PORT=10100

# Authentication file path, see /opt/ovis/etc/ldms/ldmsauth.conf for an example
LDMSD_AUTH_FILE=/opt/ovis/etc/ldms/ldmsauth.conf

# LDMS plugin configuration file, see /opt/ovis/etc/ldms/ldmsplugin.aggregator.conf
# for an example
LDMSD_PLUGIN_CONFIG_FILE=/opt/ovis/etc/ldms/ldmsplugin.aggregator.conf

# These are configured by configure script, no need to change.
LDMSD_PLUGIN_LIBPATH=/opt/ovis/lib64/ovis-ldms
ZAP_LIBPATH=/opt/ovis/lib64/ovis-lib

NOTE: We use different port numbers than that in ldmsd.sampler so that the sampler daemon and aggregator daemon can be run on the same host.

# file: ldmsplugin.aggregator.conf

# Adding 1 producer per ldmsd.sampler, if you have thousands of nodes, feel free
# to use a script to generate the configuration file. Producers take care only
# the LDMS connection aspect. Updater will take care of the data updating logic.
prdcr_add name=prd1 host=vmtest1 type=active xprt=$LDMSD_XPRT port=10001 \
          interval=1000000
prdcr_start name=prd1

prdcr_add name=prd2 host=vmtest2 type=active xprt=$LDMSD_XPRT port=10001 \
          interval=1000000
prdcr_start name=prd2

# Create an updater for all producers and all sets.
updtr_add name=update_all interval=2000000 offset=1000000
# Add all producers.
updtr_prdcr_add name=update_all regex=.*
# By default, all sets in each producer will be updated.

# Start the updater
updtr_start name=update_all

# Storage plugin
load name=store_sos
config name=store_sos path=/opt/ovis/var/lib/ldms

# Add storage policy, one for each schema
# Storage policy for meminfo schema
strgp_add name=meminfo_sos plugin=store_sos container=msos schema=meminfo
strgp_prdcr_add name=meminfo_sos regex=.*
strgp_start name=meminfo_sos

# Storage policy for array_example schema
strgp_add name=array_example_sos plugin=store_sos container=msos schema=array_example
strgp_prdcr_add name=array_example_sos regex=.*
strgp_start name=array_example_sos
 

Firewall

Before running the daemon, let’s make sure that the ports we configure ldms daemons to use are allowed by the firewalld.

# On vmtest1, add only the sampler daemon service port
$ firewall-cmd --add-port 10001/tcp 
$ firewall-cmd --add-port 10001/tcp --permanent

# on vmtest2, we need to add both sampler daemon and aggregator daemon service ports
$ firewall-cmd --add-port 10000/tcp 
$ firewall-cmd --add-port 10000/tcp --permanent
$ firewall-cmd --add-port 10001/tcp 
$ firewall-cmd --add-port 10001/tcp --permanent

# If LDMSD_CONFIG_PORT is assigned, it is recommended to add the port(s) as above.

Automatically Start the daemon at system startup

systemctl enable ldmsd.sampler (or ldmsd.aggregator) is the systemctl command to enable ldmsd.sampler (or ldmsd.aggregator) service to start automatically at boot. If the command doesn’t work due to systemctl bug (here), the work-around is to manually create the soft-link in /etc/systemd/system/multi-user.target.wants/ directory as follow:

$ ln -s /opt/ovis/etc/systemd/system/ldmsd.sampler.service \
      /etc/systemd/system/multi-user.target.wants/
$ systemctl daemon-reload 
# For ldmsd.aggregator.service, simply follow the same procedure.
$ ln -s /opt/ovis/etc/systemd/system/ldmsd.aggregator.service \
 /etc/systemd/system/multi-user.target.wants/
$ systemctl daemon-reload

After enabling it using the work-around, if you wish to disable it (i.e. not start automatically at boot), systemctl disable ldmsd.sampler (or ldmsd.aggregator) will work just fine.

Start/Stop daemon

After we are done configuring, simply use systemctl command to start/stop the daemons, for examples:

# on vmtest1
$ systemctl start ldmsd.sampler.service
$ systemctl status ldmsd.sampler.service # to check the status

# on vmtest2
$ systemctl start ldmsd.sampler.service
$ systemctl start ldmsd.aggregator.service

# We can test if it is working as follows.
# On any node, we should be able to reach any of the ldmsds
$ ldms_ls -x sock -p 10001 -h vmtest1
vmtest1/meminfo
vmtest1/array_example
$ ldms_ls -x sock -p 10001 -h vmtest2
vmtest2/meminfo
vmtest2/array_example
$ ldms_ls -x sock -p 10000 -h vmtest2
vmtest2/meminfo
vmtest2/array_example
vmtest1/meminfo
vmtest1/array_example

# On vmtest2, we can also use `sos_cmd` to check if the aggregator is storing data
$ cd /opt/ovis/var/lib/ldms
$ sos_cmd -C msos -q -S meminfo -X comp_time | tail -n 3 
1478030356.002677 1478030356 1478030356 0 0  1016860 498808 683976 0 267900 456 143312 206592 13032 87364 130280 119228 0 0 2097148 2096104 0 0 81548  23860 18392 128684 96568 32116 1888  4832 0 0 0 2605576 377756 343 59738367 6028 34359729604 0 6144 0 0 0 0 2048 49088 999424 
-------------------------------- ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ 
Records 4180/4180.

# Feel free to do this a couple of times, 2 seconds apart, to see if new data is coming in 

Investigating the Daemon Log

The daemon logs will go to system logging facility. Use the following command to see the daemon logs.

# For ldmsd.sampler.service
$ journalctl _SYSTEMD_UNIT=ldmsd.sampler.service
# For ldmsd.sampler.service
$ journalctl _SYSTEMD_UNIT=ldmsd.aggregator.service

HPC Machine Data Mining