Computing Grid Open Scalable High Fidelity Low Overhead Exascale System Monitoring mouse

Open Grid Computing

Lightweight Distributed Metric Service

 

 

 

The Lightweight Distributed Metric Service (LDMS) was conceived over 10 years ago with the goal to leverage the  capabilities of RDMA transports to deliver high-fidelity system monitoring data at statistically undetectable job runtime cost.

Over time the functionality of the system has expanded to include high resiliency configuration management (Maestro), distributed time series data storage (Axis), sophisticated analysis (Focus),  and visualization (Prism), and automated remediation (Pivot).

Maestro

CONFIGURATION MANAGEMENT AND LOAD BALANCING

Maestro manages groups of LDMSD daemons (Nexus) in a distributed hierarchy. Each level of the hierarchy is supported by one or more LDMS daemons. The configuration for each Nexus daemon is dynamically managed by Maestro from a single global configuration.

Nexus

DATA  COLLECTION

An LDMSD Daemon (Nexus) is responsible for gathering monitoring data from a cluster resource. These resources may be hardware,  storage and network assets, or they may be other Nexus daemons.

Axis

FLEXIBLE DATA STORAGE

Axis implements a configurable distributed storage infrastructure that allows for the optimization of storage assets with different performance, capacity and cost characteristics to handle workloads that vary from high volume insertion to sophisticated queries. 

Focus

Focus leverages SciPy and DataFrames to allow for the efficient development and implementation of sophisticated data analysis on a high performance infrastructure. These analysis  can be performed on time series data obtained from Axis, InfluxDB, TSDB, SQL and CSV sources.

Prism

Prism is implemented on the popular Grafana visualization toolset with storage and analysis plugins to support a variety of visualizations from a variety of data sources. Prism includes a pluggable infrastructure that allows customers to implement and utilize custom analysis modules.

Pivot

AUTOMATION

Pivot allows system administrators to configure and execute remedial actions for problems detected by analysis implemented in the Focus infrastructure.