-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement sampler discovery using regular expressions
This commit introduces a new feature that simplifies aggregator configuration. * Previously, admins needed to manually specify hostnames for all samplers in the aggregator configuration using the `prdcr_add` command. * This change enables automatic sampler discovery. Admins specifies the aggregator hostname and listening port in sampler configuration via the `advertise_add` command and start the advertisement with the `advertise_start` command. The samplers now advertise their hostname to the aggregator. * On the aggregator, admins specifies a regular expression to be matched with sampler hostname via the `prdcr_listen_add` command. The `prdcr_listen_start` command is used to tell the aggregator to automatically add producers corresponding to a sampler of which the hostname matches the regular expression. This feature eliminates the need for manual configuration of sampler hostnames in the aggregator configuration file. The aggregator can now automatically discover samplers and add them as metric set producers.
- Loading branch information
Showing
14 changed files
with
2,220 additions
and
186 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
\" Manpage for ldmsd_sampler_discovery | ||
.TH man 7 "27 March 2024" "v5" "LDMSD Sampler Discovery man page" | ||
|
||
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""/. | ||
.SH NAME | ||
ldmsd_sampler_disconvery - Manual for LDMSD Sampler Discovery | ||
|
||
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""/. | ||
.SH SYNOPSIS | ||
|
||
**Sampler side Commands** | ||
|
||
.IP \fBadvertise_add | ||
.RI "name=" NAME " xprt=" XPRT " host=" HOST " port=" PORT | ||
.RI "[auth=" AUTH_DOMAIN "]" | ||
|
||
.IP \fBadvertise_start | ||
.RI "name=" NAME | ||
|
||
.IP \fBadvertise_stop | ||
.RI "name=" NAME | ||
|
||
.IP \fBadvertise_del | ||
.RI "name=" NAME | ||
|
||
.IP \fBadvertise_status | ||
.RI "[name=" NAME "]" | ||
|
||
.PP | ||
**Aggregator Side Commands** | ||
|
||
.IP \fBprdcr_listen_add | ||
.RI "name=" NAME " RECONNECT=" INTVL | ||
.RI "[regex=" REGEX "] [rail=" SIZE "] [credits=" BYTES "] [rx_rate=" RATE_LIMIT "]" | ||
|
||
.IP \fBprdcr_listen_start | ||
.RI "name=" NAME | ||
|
||
.IP \fBprdcr_listen_stop | ||
.RI "name=" NAME | ||
|
||
.IP \fBprdcr_listen_del | ||
.RI "name=" NAME | ||
|
||
.IP \fBprdcr_listen_status | ||
|
||
.SH DESCRIPTION | ||
|
||
LDMSD Sampler Discovery is a capability that enables LDMSD automatically add | ||
producers that its hostname matches a given regular expression. The feature | ||
eliminates the need for manual configuration of sampler hostname in the | ||
aggregator configuration file. | ||
|
||
Admins specify the aggregator hostname and the listening port in sampler | ||
configuration via the \fBadvertise_add\fR command and start the advertisement | ||
with the \fBadvertise_start\fR command. The samplers now advertise their | ||
hostname to the aggregator. On the aggregator, admins specify a regular | ||
expression to be matched with sampler hostname via the \fBprdcr_listen_add\fR | ||
command. The \fBprdcr_listen_start\fR command is used to tell the aggregator to | ||
automatically add producers corresponding to a sampler of which the hostname | ||
matches the regular expression. | ||
|
||
The auto-generated producers is of the ‘generated’ type. The producer name is | ||
the same as the name given at the \fBadvertise_add\fR line in the sampler | ||
configuration file. LDMSD automatically starts them; however, admins need to | ||
stop them manually by using the command \fBprdcr_stop\fR or | ||
\fBprdcr_stop_regex\fR. They can be restarted by using the command | ||
\fBprdcr_start\fR or \fBprdcr_start_regex\fR. | ||
|
||
The description for each command and its parameters are as follows. | ||
|
||
**Sampler Side Commands** | ||
|
||
\fBadvertise_add\fR adds a new advertisement. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
String of the advertisement name. The aggregator uses the string as the producer name as well. | ||
.IP \fBhost\fR=\fIHOST | ||
Aggregator hostname | ||
.IP \fBxprt\fR=\fIXPRT | ||
Transport to connect to the aggregator | ||
.IP \fBport\fR=\fIPORT | ||
Listen port of the aggregator | ||
.IP \fBreconnect\fR=\fIINTERVAL | ||
Reconnect interval | ||
.IP \fB[auth\fR=\fIAUTH_DOMAIN\fB] | ||
The authentication domain to be used to connect to the aggregator | ||
.RE | ||
|
||
\fBadvertise_start\fR starts an advertisement. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
The advertisement name to be started | ||
.RE | ||
|
||
|
||
\fBadvertise_stop\fR stops an advertisement. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
The advertisement name to be stopped | ||
.RE | ||
|
||
\fBadvertise_del\fR deletes an advertisement. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
The advertisement name to be deleted | ||
.RE | ||
|
||
\fBadvertise_status reports the status of each advertisement. An optional parameter is: | ||
.RS | ||
.IP \fB[name\fR=\fINAME\fB] | ||
Advertisement name | ||
.RE | ||
|
||
.PP | ||
**Aggregator Side commands** | ||
|
||
\fBprdcr_listen_add\fR adds a regular expression to match sampler advertisements. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
String of the prdcr_listen name. | ||
.IP \fBreconnect\fR=\fIINTERVAL | ||
Reconnect interval of the auto-generated producers that the hostname matches the regular expression | ||
.IP \fB[regex\fR=\fIREGEX\fB] | ||
Regular expression to match with hostnames in sampler advertisements | ||
.IP \fB[rail\fR=\fIRAIL\fB] | ||
Number of rails | ||
.IP \fB[credit\fR=\fICREDIT\fB] | ||
Receive credits each producer connection accepts in bytes | ||
.IP \fB[rx_rate\fR=\fIRATE\fB] | ||
Receive rate limit each producer connection acceipts | ||
.RE | ||
|
||
\fBprdcr_listen_start\fR starts accepting sampler advertisement with matches hostnames. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
Name of prdcr_listen to be started | ||
.RE | ||
|
||
\fBprdcr_listen_stop\fR stops accepting sampler advertisement with matches hostnames. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
Name of prdcr_listen to be stopped | ||
.RE | ||
|
||
\fBprdcr_listen_del\fR deletes a regular expression to match hostnames in sampler advertisements. The parameters are: | ||
.RS | ||
.IP \fBname\fR=\fINAME | ||
Name of prdcr_listen to be deleted | ||
.RE | ||
|
||
\fBprdcr_listen_status\fR report the status of each prdcr_listen object. There is no parameter. | ||
|
||
.SH EXAMPLE | ||
|
||
In this example, there are three LDMS daemons running on \fBnode-1\fR, | ||
\fBnode-2\fR, and \fBnode03\fR. LDMSD running on \fBnode-1\fR and \fBnode-2\fR | ||
are sampler daemons, namely \fBsamplerd-1\fR and \fBsamplerd-2\fR. The | ||
aggregator (\fBagg\fR) runs on \fBnode-3\fR. All LDMSD listen on port 411. | ||
|
||
The sampler daemons collect the \fBmeminfo\fR set, and they are configured to | ||
advertise themselves and connect to the aggregator using sock on host | ||
\fBnode-3\fR at port 411. The following are the configuration files of the | ||
\fBsamplerd-1\fR and \fBsamplerd-2\fR. | ||
|
||
.EX | ||
.B | ||
> cat samplerd-1.conf | ||
.RS 4 | ||
# Add and start an advertisement | ||
advertise_add name=samplerd-1 xprt=sock host=node-3 port=411 reconnect=10s | ||
advertise_start name=samplerd-1 | ||
# Load, configure, and start the meminfo plugin | ||
load name=meminfo | ||
config name=meminfo producer=samplerd-1 instance=samplerd-1/meminfo | ||
start name=meminfo interval=1s | ||
.RE | ||
.B | ||
> cat samplerd-2.conf | ||
.RS 4 | ||
# Add and start an advertisement | ||
advertise_add name=samplerd-2 host=node-3 port=411 reconnect=10s | ||
advertise_start name=samplerd-2 | ||
# Load, configure, and start the meminfo plugin | ||
load name=meminfo | ||
config name=meminfo producer=samplerd-2 instance=samplerd-2/meminfo | ||
start name=meminfo interval=1s | ||
.RE | ||
.EE | ||
|
||
The aggregator is configured to accept advertisements from the sampler daemons | ||
that the hostnames match the regular expressions \fBnode0[1-2]\fR. The | ||
auto-added producers will check for an establish connection with the samplers | ||
every 10 seconds if the connection becomes disconnected. An updater is added to | ||
update the sets of all producers on the aggregators every 10 seconds at the 100 | ||
milliseconds offset. | ||
|
||
.EX | ||
.B | ||
> cat agg.conf | ||
.RS 4 | ||
# Accept advertisements sent from LDMSD running on hostnames matched node-[1-2] | ||
prdcr_listen_add name=computes regex=node-[1-2] reconnect=10s | ||
prdcr_listen_start name=computes | ||
# Add and start an updater | ||
updtr_add name=all_sets interval=1s offset=100ms | ||
updtr_prdcr_add name=all_sets regex=.* | ||
updtr_start name=all | ||
.RE | ||
.EE | ||
|
||
LDMSD provides the command \fBadvertise_status\fR to report the status of | ||
advertisement of a sampler daemon. | ||
|
||
.EX | ||
.B | ||
> ldmsd_controller -x sock -p 10001 -h node-1 | ||
Welcome to the LDMSD control processor | ||
sock:node-1:10001> advertise_status | ||
Name Aggregator Host Aggregator Port Transport Reconnect (us) State | ||
---------------- ---------------- --------------- ------------ --------------- ------------ | ||
samplerd-1 node-3 10001 sock 10000000 CONNECTED | ||
sock:node-1:10001> | ||
.EE | ||
|
||
Similarly, LDMSD provides the command \fBprdcr_listen_status\fR to report the | ||
status of all prdcr_listen objects on an aggregator. The command also reports | ||
the list of auto-added producers corresponding to each prdcr_listen object. | ||
|
||
.EX | ||
.B | ||
> ldmsd_controller -x sock -p 10001 -h node-3 | ||
Welcome to the LDMSD control processor | ||
sock:node-3:10001> prdcr_listen_status | ||
Name Regex Reconnect(us) State | ||
-------------------- --------------- --------------- ---------- | ||
compute node-[1-2] 10000000 running | ||
Producers: samplerd-1, samplerd-2 | ||
sock:node-3:10001> | ||
.EE | ||
|
||
.SH SEE ALSO | ||
.BR ldmsd (8) | ||
.BR ldmsd_controller (8) |
Oops, something went wrong.