Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically add specific reports to cluster_view and host_view #191

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jh23453
Copy link
Contributor

@jh23453 jh23453 commented Jul 21, 2013

We have lots of json-reports for clusters that we generate with a
script. At the same time we recreate the cluster_.json file and
add these reports to the included_reports. This works quite well, but
means that there can't be local changes in the includes/excludes
reports.

So we've patched cluster_view.php, to look for reports matching
cluster__.*_report.json (or php). If such a report exists, it is
automatically added to the included_reports array and gets displayed.

Since the change is very easy to port to the host_view (and we also
have lot's of reports for hosts as well), that hunk is included too.

Since this is a very visible change, review is welcome. If the changes
are merged a description should be added to some documentation and
possibly the ganglia book as well.

script. At the same time we recreate the cluster_<name>.json file and
add these reports to the included_reports. This works quite well, but
means that there can't be local changes in the includes/excludes
reports.

So we've patched cluster_view.php, to look for reports matching
cluster_<name>_.*_report.json (or php). If such a report exists, it is
automatically added to the included_reports array and gets displayed.

Since the change is very easy to port to the host_view (and we also
have lot's of reports for hosts as well), that hunk is included too.

Since this is a very visible change, review is welcome. If the changes
are merged a description should be added to some documentation and
possibly the ganglia book as well.
The comment talks about cluster graphs, but here we are in host
context.
@jh23453
Copy link
Contributor Author

jh23453 commented Jul 22, 2013

As requested by mnikhil-git some examples from our installation (I'll add the exact json files later).

We use Ganglia to monitor our Power/AIX Systems. Each of these hardware machines is
a cluster in our Ganglia installation. The machines are partitioned into LPARs (and sometimes
lots of LPARs). Each LPAR is a host in Ganglia (and has its own OS, hostname and IP address).
This works quite well with Ganglia.

LPARs can be "dedicated", which means that the CPUs are permanently assigned to an LPAR.

For our customers we use a feature of the Power hypervisor to group the LPARs into pools.
All LPARs in the pool have an "entitled capacity" for CPUs the are guaranteed to get, even if
all LPARs in the pool are busy. But if one LPAR doesn't use the CPUs, other LPARs can
use them. So if the CPU peaks are somewhat distributed, we get away with less CPUs assigned.

Michael Perzl has a graph cpu_used_report that works well for exactly one CPU pool. But
we have more pools (up fo five right now). So we generate a graph for each CPU pool and
attach the graph to the cluster. We display:

  • the number of CPUs in the pool (line, this is the upper bound for the graph and we
    can see, if the pool is fully used)
  • for each LPAR in the pool the current CPU usage (stacked).

So we have a good understanding which pool is busy and which LPARs are busy.

For the host graphs we look into our applications/databases. On each host/LPAR we
have one or more DB2 databases and/or SAP systems running. Lets have a look at the
DB2 graphs.

DB2 stores open transactions in log files in /db2//log_dir. This is a filesystem with
log files. We monitor and graph:

  • size of the filesystems (line)
  • used space in the filesystem (DB2 can create mor log files if configured and needed, and
    the unused files get archived, but until that is done can fill the filesystem (line)
  • the currently configured size of the logs (primary = always there + seconday = added as needed) (line)
  • space used for secondary log (so we see, which system might not be configured right - this
    shouldn't happen for our systems) (line)
  • currently used logspace (area)
  • high watermark of log usage (so we see when there has been a peak usage. (line)

We generate similar graphs for the SAP enqueue system (number of locks defined and used).

Hope that helps to understand what we are doing.

@jh23453
Copy link
Contributor Author

jh23453 commented Jul 22, 2013

Here is an example for a pool graph. There can be more than one pool for a box and lot's of LPARs:

{
"report_name" : "cluster_box5_pool_0_report",
"report_type" : "standard",
"title" : "pool 0 report",
"vertical_label" : "CPU Uses",
"series" : [
{ "hostname": "host24", "clustername": "box5", "metric": "cpu_used", "color": "00ff00", "label": "host24", "type": "stack" },
{ "hostname": "host28", "clustername": "box5", "metric": "cpu_used", "color": "0000ff", "label": "host28", "type": "stack" },
{ "hostname": "host36", "clustername": "box5", "metric": "cpu_used", "color": "ffff00", "label": "host42", "type": "stack" },
{ "hostname": "host42", "clustername": "box5", "metric": "cpu_in_pool", "color": "000000", "label": "CPU in Pool", "line_width": "2", "
type": "line" }
]
}

Right now we generate the json files with a cron job. cpu_in_pool is a metric from Michael Perzls Power Modules, the pool number is also part of the LPARs metrics.

…op-down list

In the cluster view we can select the metric that is displayed
for each host.  Until now all reports/metrics are displayed in
the dropdown-list "Metric".

In the "Edit Optional Graphs" we had all reports for cluster and host.

In our system we have dozens of reports for our clusters (but only
some relevant for each cluster) and hundreds of different reports for
hosts (and again only some relevant for each host).

The earlier patches added the reports automagically to clusters and
hosts respectively, so there is no need to add these reports in the
edit_optional_graphs.php and the metric drop-down list in the
cluster_view.

For both lists we exclude the (cluster|host)_*_report.json from
view, because they are always included as we need it.
We have hundreds of local metrics that are only useful for a single
host. Currently all of these metrics are added to the drop down
menu in the cluster view.

This patch defines the configuration option
'cluster_hide_metrics_from_menu', a regular expression. All metrics
matching this regular expression are hidden from the drop down menu.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants