automatically add specific reports to cluster_view and host_view #191

jh23453 · 2013-07-21T12:29:22Z

We have lots of json-reports for clusters that we generate with a
script. At the same time we recreate the cluster_.json file and
add these reports to the included_reports. This works quite well, but
means that there can't be local changes in the includes/excludes
reports.

So we've patched cluster_view.php, to look for reports matching
cluster__.*_report.json (or php). If such a report exists, it is
automatically added to the included_reports array and gets displayed.

Since the change is very easy to port to the host_view (and we also
have lot's of reports for hosts as well), that hunk is included too.

Since this is a very visible change, review is welcome. If the changes
are merged a description should be added to some documentation and
possibly the ganglia book as well.

script. At the same time we recreate the cluster_<name>.json file and add these reports to the included_reports. This works quite well, but means that there can't be local changes in the includes/excludes reports. So we've patched cluster_view.php, to look for reports matching cluster_<name>_.*_report.json (or php). If such a report exists, it is automatically added to the included_reports array and gets displayed. Since the change is very easy to port to the host_view (and we also have lot's of reports for hosts as well), that hunk is included too. Since this is a very visible change, review is welcome. If the changes are merged a description should be added to some documentation and possibly the ganglia book as well.

The comment talks about cluster graphs, but here we are in host context.

jh23453 · 2013-07-22T05:19:39Z

As requested by mnikhil-git some examples from our installation (I'll add the exact json files later).

We use Ganglia to monitor our Power/AIX Systems. Each of these hardware machines is
a cluster in our Ganglia installation. The machines are partitioned into LPARs (and sometimes
lots of LPARs). Each LPAR is a host in Ganglia (and has its own OS, hostname and IP address).
This works quite well with Ganglia.

LPARs can be "dedicated", which means that the CPUs are permanently assigned to an LPAR.

For our customers we use a feature of the Power hypervisor to group the LPARs into pools.
All LPARs in the pool have an "entitled capacity" for CPUs the are guaranteed to get, even if
all LPARs in the pool are busy. But if one LPAR doesn't use the CPUs, other LPARs can
use them. So if the CPU peaks are somewhat distributed, we get away with less CPUs assigned.

Michael Perzl has a graph cpu_used_report that works well for exactly one CPU pool. But
we have more pools (up fo five right now). So we generate a graph for each CPU pool and
attach the graph to the cluster. We display:

the number of CPUs in the pool (line, this is the upper bound for the graph and we
can see, if the pool is fully used)
for each LPAR in the pool the current CPU usage (stacked).

So we have a good understanding which pool is busy and which LPARs are busy.

For the host graphs we look into our applications/databases. On each host/LPAR we
have one or more DB2 databases and/or SAP systems running. Lets have a look at the
DB2 graphs.

DB2 stores open transactions in log files in /db2//log_dir. This is a filesystem with
log files. We monitor and graph:

size of the filesystems (line)
used space in the filesystem (DB2 can create mor log files if configured and needed, and
the unused files get archived, but until that is done can fill the filesystem (line)
the currently configured size of the logs (primary = always there + seconday = added as needed) (line)
space used for secondary log (so we see, which system might not be configured right - this
shouldn't happen for our systems) (line)
currently used logspace (area)
high watermark of log usage (so we see when there has been a peak usage. (line)

We generate similar graphs for the SAP enqueue system (number of locks defined and used).

Hope that helps to understand what we are doing.

jh23453 · 2013-07-22T07:10:06Z

Here is an example for a pool graph. There can be more than one pool for a box and lot's of LPARs:

{
"report_name" : "cluster_box5_pool_0_report",
"report_type" : "standard",
"title" : "pool 0 report",
"vertical_label" : "CPU Uses",
"series" : [
{ "hostname": "host24", "clustername": "box5", "metric": "cpu_used", "color": "00ff00", "label": "host24", "type": "stack" },
{ "hostname": "host28", "clustername": "box5", "metric": "cpu_used", "color": "0000ff", "label": "host28", "type": "stack" },
{ "hostname": "host36", "clustername": "box5", "metric": "cpu_used", "color": "ffff00", "label": "host42", "type": "stack" },
{ "hostname": "host42", "clustername": "box5", "metric": "cpu_in_pool", "color": "000000", "label": "CPU in Pool", "line_width": "2", "
type": "line" }
]
}

Right now we generate the json files with a cron job. cpu_in_pool is a metric from Michael Perzls Power Modules, the pool number is also part of the LPARs metrics.

…op-down list In the cluster view we can select the metric that is displayed for each host. Until now all reports/metrics are displayed in the dropdown-list "Metric". In the "Edit Optional Graphs" we had all reports for cluster and host. In our system we have dozens of reports for our clusters (but only some relevant for each cluster) and hundreds of different reports for hosts (and again only some relevant for each host). The earlier patches added the reports automagically to clusters and hosts respectively, so there is no need to add these reports in the edit_optional_graphs.php and the metric drop-down list in the cluster_view. For both lists we exclude the (cluster|host)_*_report.json from view, because they are always included as we need it.

We have hundreds of local metrics that are only useful for a single host. Currently all of these metrics are added to the drop down menu in the cluster view. This patch defines the configuration option 'cluster_hide_metrics_from_menu', a regular expression. All metrics matching this regular expression are hidden from the drop down menu.

jh23453 added 2 commits July 21, 2013 14:17

Fix comment after cut&paste

41c56e6

The comment talks about cluster graphs, but here we are in host context.

jh23453 added 2 commits August 14, 2013 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatically add specific reports to cluster_view and host_view #191

automatically add specific reports to cluster_view and host_view #191

jh23453 commented Jul 21, 2013

jh23453 commented Jul 22, 2013

jh23453 commented Jul 22, 2013

automatically add specific reports to cluster_view and host_view #191

Are you sure you want to change the base?

automatically add specific reports to cluster_view and host_view #191

Conversation

jh23453 commented Jul 21, 2013

jh23453 commented Jul 22, 2013

jh23453 commented Jul 22, 2013