Skip to content

Commit

Permalink
Implement sampler discovery using regular expressions
Browse files Browse the repository at this point in the history
This commit introduces a new feature that simplifies aggregator
configuration.

* Previously, admins needed to manually specify hostnames for all
  samplers in the aggregator configuration using the `prdcr_add`
  command.
* This change enables automatic sampler discovery. Admins specifies the
  aggregator hostname and listening port in sampler configuration via
  the `advertise_add` command and start the advertisement with the
  `advertise_start` command. The samplers now advertise their hostname
  to the aggregator.
* On the aggregator, admins specifies a regular expression to be matched
  with sampler hostname via the `prdcr_listen_add` command. The
  `prdcr_listen_start` command is used to tell the aggregator to
  automatically add producers corresponding to a sampler of which the
  hostname matches the regular expression.

This feature eliminates the need for manual configuration of sampler
hostnames in the aggregator configuration file. The aggregator can now
automatically discover samplers and add them as metric set producers.
  • Loading branch information
nichamon committed May 14, 2024
1 parent bfa827c commit 5f75652
Show file tree
Hide file tree
Showing 14 changed files with 2,220 additions and 186 deletions.
245 changes: 245 additions & 0 deletions ldms/man/ldmsd_sampler_discovery.man
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
\" Manpage for ldmsd_sampler_discovery
.TH man 7 "27 March 2024" "v5" "LDMSD Sampler Discovery man page"

.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""/.
.SH NAME
ldmsd_sampler_disconvery - Manual for LDMSD Sampler Discovery

.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""/.
.SH SYNOPSIS

**Sampler side Commands**

.IP \fBadvertise_add
.RI "name=" NAME " xprt=" XPRT " host=" HOST " port=" PORT
.RI "[auth=" AUTH_DOMAIN "]"

.IP \fBadvertise_start
.RI "name=" NAME

.IP \fBadvertise_stop
.RI "name=" NAME

.IP \fBadvertise_del
.RI "name=" NAME

.IP \fBadvertise_status
.RI "[name=" NAME "]"

.PP
**Aggregator Side Commands**

.IP \fBprdcr_listen_add
.RI "name=" NAME " RECONNECT=" INTVL
.RI "[regex=" REGEX "] [rail=" SIZE "] [credits=" BYTES "] [rx_rate=" RATE_LIMIT "]"

.IP \fBprdcr_listen_start
.RI "name=" NAME

.IP \fBprdcr_listen_stop
.RI "name=" NAME

.IP \fBprdcr_listen_del
.RI "name=" NAME

.IP \fBprdcr_listen_status

.SH DESCRIPTION

LDMSD Sampler Discovery is a capability that enables LDMSD automatically add
producers that its hostname matches a given regular expression. The feature
eliminates the need for manual configuration of sampler hostname in the
aggregator configuration file.

Admins specify the aggregator hostname and the listening port in sampler
configuration via the \fBadvertise_add\fR command and start the advertisement
with the \fBadvertise_start\fR command. The samplers now advertise their
hostname to the aggregator. On the aggregator, admins specify a regular
expression to be matched with sampler hostname via the \fBprdcr_listen_add\fR
command. The \fBprdcr_listen_start\fR command is used to tell the aggregator to
automatically add producers corresponding to a sampler of which the hostname
matches the regular expression.

The auto-generated producers is of the ‘generated’ type. The producer name is
the same as the name given at the \fBadvertise_add\fR line in the sampler
configuration file. LDMSD automatically starts them; however, admins need to
stop them manually by using the command \fBprdcr_stop\fR or
\fBprdcr_stop_regex\fR. They can be restarted by using the command
\fBprdcr_start\fR or \fBprdcr_start_regex\fR.

The description for each command and its parameters are as follows.

**Sampler Side Commands**

\fBadvertise_add\fR adds a new advertisement. The parameters are:
.RS
.IP \fBname\fR=\fINAME
String of the advertisement name. The aggregator uses the string as the producer name as well.
.IP \fBhost\fR=\fIHOST
Aggregator hostname
.IP \fBxprt\fR=\fIXPRT
Transport to connect to the aggregator
.IP \fBport\fR=\fIPORT
Listen port of the aggregator
.IP \fBreconnect\fR=\fIINTERVAL
Reconnect interval
.IP \fB[auth\fR=\fIAUTH_DOMAIN\fB]
The authentication domain to be used to connect to the aggregator
.RE

\fBadvertise_start\fR starts an advertisement. The parameters are:
.RS
.IP \fBname\fR=\fINAME
The advertisement name to be started
.RE


\fBadvertise_stop\fR stops an advertisement. The parameters are:
.RS
.IP \fBname\fR=\fINAME
The advertisement name to be stopped
.RE

\fBadvertise_del\fR deletes an advertisement. The parameters are:
.RS
.IP \fBname\fR=\fINAME
The advertisement name to be deleted
.RE

\fBadvertise_status reports the status of each advertisement. An optional parameter is:
.RS
.IP \fB[name\fR=\fINAME\fB]
Advertisement name
.RE

.PP
**Aggregator Side commands**

\fBprdcr_listen_add\fR adds a regular expression to match sampler advertisements. The parameters are:
.RS
.IP \fBname\fR=\fINAME
String of the prdcr_listen name.
.IP \fBreconnect\fR=\fIINTERVAL
Reconnect interval of the auto-generated producers that the hostname matches the regular expression
.IP \fB[regex\fR=\fIREGEX\fB]
Regular expression to match with hostnames in sampler advertisements
.IP \fB[rail\fR=\fIRAIL\fB]
Number of rails
.IP \fB[credit\fR=\fICREDIT\fB]
Receive credits each producer connection accepts in bytes
.IP \fB[rx_rate\fR=\fIRATE\fB]
Receive rate limit each producer connection acceipts
.RE

\fBprdcr_listen_start\fR starts accepting sampler advertisement with matches hostnames. The parameters are:
.RS
.IP \fBname\fR=\fINAME
Name of prdcr_listen to be started
.RE

\fBprdcr_listen_stop\fR stops accepting sampler advertisement with matches hostnames. The parameters are:
.RS
.IP \fBname\fR=\fINAME
Name of prdcr_listen to be stopped
.RE

\fBprdcr_listen_del\fR deletes a regular expression to match hostnames in sampler advertisements. The parameters are:
.RS
.IP \fBname\fR=\fINAME
Name of prdcr_listen to be deleted
.RE

\fBprdcr_listen_status\fR report the status of each prdcr_listen object. There is no parameter.

.SH EXAMPLE

In this example, there are three LDMS daemons running on \fBnode-1\fR,
\fBnode-2\fR, and \fBnode03\fR. LDMSD running on \fBnode-1\fR and \fBnode-2\fR
are sampler daemons, namely \fBsamplerd-1\fR and \fBsamplerd-2\fR. The
aggregator (\fBagg\fR) runs on \fBnode-3\fR. All LDMSD listen on port 411.

The sampler daemons collect the \fBmeminfo\fR set, and they are configured to
advertise themselves and connect to the aggregator using sock on host
\fBnode-3\fR at port 411. The following are the configuration files of the
\fBsamplerd-1\fR and \fBsamplerd-2\fR.

.EX
.B
> cat samplerd-1.conf
.RS 4
# Add and start an advertisement
advertise_add name=samplerd-1 xprt=sock host=node-3 port=411 reconnect=10s
advertise_start name=samplerd-1
# Load, configure, and start the meminfo plugin
load name=meminfo
config name=meminfo producer=samplerd-1 instance=samplerd-1/meminfo
start name=meminfo interval=1s
.RE
.B
> cat samplerd-2.conf
.RS 4
# Add and start an advertisement
advertise_add name=samplerd-2 host=node-3 port=411 reconnect=10s
advertise_start name=samplerd-2
# Load, configure, and start the meminfo plugin
load name=meminfo
config name=meminfo producer=samplerd-2 instance=samplerd-2/meminfo
start name=meminfo interval=1s
.RE
.EE

The aggregator is configured to accept advertisements from the sampler daemons
that the hostnames match the regular expressions \fBnode0[1-2]\fR. The
auto-added producers will check for an establish connection with the samplers
every 10 seconds if the connection becomes disconnected. An updater is added to
update the sets of all producers on the aggregators every 10 seconds at the 100
milliseconds offset.

.EX
.B
> cat agg.conf
.RS 4
# Accept advertisements sent from LDMSD running on hostnames matched node-[1-2]
prdcr_listen_add name=computes regex=node-[1-2] reconnect=10s
prdcr_listen_start name=computes
# Add and start an updater
updtr_add name=all_sets interval=1s offset=100ms
updtr_prdcr_add name=all_sets regex=.*
updtr_start name=all
.RE
.EE

LDMSD provides the command \fBadvertise_status\fR to report the status of
advertisement of a sampler daemon.

.EX
.B
> ldmsd_controller -x sock -p 10001 -h node-1
Welcome to the LDMSD control processor
sock:node-1:10001> advertise_status
Name Aggregator Host Aggregator Port Transport Reconnect (us) State
---------------- ---------------- --------------- ------------ --------------- ------------
samplerd-1 node-3 10001 sock 10000000 CONNECTED
sock:node-1:10001>
.EE

Similarly, LDMSD provides the command \fBprdcr_listen_status\fR to report the
status of all prdcr_listen objects on an aggregator. The command also reports
the list of auto-added producers corresponding to each prdcr_listen object.

.EX
.B
> ldmsd_controller -x sock -p 10001 -h node-3
Welcome to the LDMSD control processor
sock:node-3:10001> prdcr_listen_status
Name Regex Reconnect(us) State
-------------------- --------------- --------------- ----------
compute node-[1-2] 10000000 running
Producers: samplerd-1, samplerd-2
sock:node-3:10001>
.EE

.SH SEE ALSO
.BR ldmsd (8)
.BR ldmsd_controller (8)
Loading

0 comments on commit 5f75652

Please sign in to comment.