Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct rails, credits, rx_rate values in advertised producers #1422

Merged
merged 1 commit into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 114 additions & 34 deletions ldms/man/ldmsd_sampler_advertisement.man
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ ldmsd_sampler_advertisement - Manual for LDMSD Sampler Advertisement

.IP \fBprdcr_listen_add
.RI "name=" NAME "
.RI "[disabled_start=" TURE|FALSE "] [regex=" REGEX "] [ip=" CIDR "] [rail=" SIZE "] [credits=" BYTES "] [rx_rate=" RATE_LIMIT "]"
.RI "[disable_start=" TURE|FALSE "] [regex=" REGEX "] [ip=" CIDR "]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TURE --> TRUE
Do we want the spaces around the values? A a user that copied that would get an error; and likely a very confusing one.


.IP \fBprdcr_listen_start
.RI "name=" NAME
Expand All @@ -47,26 +47,32 @@ ldmsd_sampler_advertisement - Manual for LDMSD Sampler Advertisement

.SH DESCRIPTION

LDMSD Sampler Discovery is a capability that enables LDMSD automatically add
producers that its hostname matches a given regular expression. The feature
eliminates the need for manual configuration of sampler hostname in the
aggregator configuration file.
LDMSD Sampler Discovery is a capability that enables LDMSD to automatically add
producers whose hostnames match a given regular expression or whose IP
addresses fall in a given IP range. If neither regular expression nor an IP
range is given, LDMSD adds a producer whenever it receives an advertisement
message. The feature eliminates the need for manual configuration of sampler
hostnames in the aggregator configuration file.

Admins specify the aggregator hostname and the listening port in sampler
configuration via the \fBadvertiser_add\fR command and start the advertisement
with the \fBadvertiser_start\fR command. The samplers now advertise their
hostname to the aggregator. On the aggregator, admins specify a regular
expression to be matched with sampler hostname via the \fBprdcr_listen_add\fR
command. The \fBprdcr_listen_start\fR command is used to tell the aggregator to
automatically add producers corresponding to a sampler of which the hostname
matches the regular expression.

The auto-generated producers is of the ‘advertised’ type. The producer name is
the same as the name given at the \fBadvertiser_add\fR line in the sampler
configuration file. LDMSD automatically starts them; however, admins need to
stop them manually by using the command \fBprdcr_stop\fR or
\fBprdcr_stop_regex\fR. They can be restarted by using the command
\fBprdcr_start\fR or \fBprdcr_start_regex\fR.
with the \fBadvertiser_start\fR command. The sampler now advertises its
hostname to the aggregator. On the aggregator, admins may specify a regular
expression to be matched with the sampler hostname or an IP range via the
\fBprdcr_listen_add\fR command. The \fBprdcr_listen_start\fR command is used to
tell the aggregator to automatically add producers corresponding to a sampler
of which the hostname matches the regular expression or the IP address falls in
the given IP range.

The automatically added producers are of the 'advertised' type. The producer's
name is the same as the value of the ‘name’ attribute given at the
\fBadvertiser_add\fR line in the sampler configuration file. LDMSD
automatically starts the advertised producers. Admins could provide the
\fBdisable_start\fR attribute at the \fBprdcr_listen_add\fR with the ‘true’
value to let LDMSD not automatically start the advertised producers. Admins can
stop an advertised producer using the \fBprdcr_stop\fR or
\fBprdcr_stop_regex\fR commands. They can be restarted by using the
\fBprdcr_start\fR or \fBprdcr_start_regex\fR commands.

The description for each command and its parameters are as follows.

Expand All @@ -89,7 +95,9 @@ d
The authentication domain to be used to connect to the aggregator
.RE

\fBadvertiser_start\fR starts an advertisement. If the advertiser does not exist, LDMSD will create the advertiser. In this case, the mandatory attributes for \fBadvertiser_add\fB must be given. The parameters are:
\fBadvertiser_start\fR starts an advertisement. If the advertiser does not
exist, LDMSD will create the advertiser. In this case, the mandatory attributes
for \fBadvertiser_add\fB must be given. The parameters are:
.RS
.IP \fBname\fR=\fINAME
The advertisement name to be started
Expand Down Expand Up @@ -130,18 +138,12 @@ Advertisement name
.RS
.IP \fBname\fR=\fINAME
String of the prdcr_listen name.
.IP \fB[disabled_start\fR=\fITRUE|FALSE\fB]
.IP \fB[disable_start\fR=\fITRUE|FALSE\fB]
True to tell LDMSD not to start producers automatically
.IP \fB[regex\fR=\fIREGEX\fB]
Regular expression to match with hostnames in sampler advertisements
.IP \fBip\fR=\fICIDR\fB]
IP Range in the CIDR format either in IPV4 or IPV6
.IP \fB[rail\fR=\fIRAIL\fB]
Number of rails
.IP \fB[credit\fR=\fICREDIT\fB]
Receive credits each producer connection accepts in bytes
.IP \fB[rx_rate\fR=\fIRATE\fB]
Receive rate limit each producer connection acceipts
.RE

\fBprdcr_listen_start\fR starts accepting sampler advertisement with matches hostnames. The parameters are:
Expand All @@ -164,6 +166,18 @@ Name of prdcr_listen to be deleted

\fBprdcr_listen_status\fR report the status of each prdcr_listen object. There is no parameter.

.SH Managing Receive Credits and Rate Limits for Auto-Added Producers

The receive credits and rate limit control machanisms govern the amount of data
a producer receives from the data source connected through ldms_xprt. This helps
prevent data bursts that could overwhelm the LDMS daemon host and network
resources. To configure receive credits and rate limits, users can create a
listening endpoint on the aggregator using the \fBlisten\fR command specifying
the desired values of the \fBcredits\fR and \fBrx_rate\fR attributes. Moreover,
users configure the sampler daemons to advertise to the listening endpoint
created on the aggregator, including the preferred creceive credits and rate
limit values.

.SH EXAMPLE

In this example, there are three LDMS daemons running on \fBnode-1\fR,
Expand All @@ -173,13 +187,16 @@ aggregator (\fBagg\fR) runs on \fBnode-3\fR. All LDMSD listen on port 411.

The sampler daemons collect the \fBmeminfo\fR set, and they are configured to
advertise themselves and connect to the aggregator using sock on host
\fBnode-3\fR at port 411. The following are the configuration files of the
\fBsamplerd-1\fR and \fBsamplerd-2\fR.
\fBnode-3\fR at port 411. They will try to reconnect to the aggregator every 10
seconds until the connection is established. The following are the configuration
files of the \fBsamplerd-1\fR and \fBsamplerd-2\fR.

.EX
.B
> cat samplerd-1.conf
.RS 4
# Create a listening endpoint
listen xprt=sock port=411
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly needed for the advertiser function, correct? What is the reasoning for including it in the example?

# Add and start an advertisement
advertiser_add name=samplerd-1 xprt=sock host=node-3 port=411 reconnect=10s
advertiser_start name=samplerd-1
Expand All @@ -192,6 +209,8 @@ start name=meminfo interval=1s
.B
> cat samplerd-2.conf
.RS 4
# Create a listening endpoint
listen xprt=sock port=411
# Add and start an advertisement using only the advertiser_start command
advertiser_start name=samplerd-2 host=node-3 port=411 reconnect=10s
# Load, configure, and start the meminfo plugin
Expand All @@ -202,16 +221,15 @@ start name=meminfo interval=1s
.EE

The aggregator is configured to accept advertisements from the sampler daemons
that the hostnames match the regular expressions \fBnode0[1-2]\fR. The
auto-added producers will check for an establish connection with the samplers
every 10 seconds if the connection becomes disconnected. An updater is added to
update the sets of all producers on the aggregators every 10 seconds at the 100
milliseconds offset.
that the hostnames match the regular expressions \fBnode0[1-2]\fR. The name of
the auto-added producers is the name of the advertiser on the sampler daemons.

.EX
.B
> cat agg.conf
.RS 4
# Create a listening endpoint
listen xprt=sock port=411
# Accept advertisements sent from LDMSD running on hostnames matched node-[1-2]
prdcr_listen_add name=computes regex=node-[1-2]
prdcr_listen_start name=computes
Expand All @@ -232,7 +250,7 @@ Welcome to the LDMSD control processor
sock:node-1:10001> advertiser_status
Name Aggregator Host Aggregator Port Transport Reconnect (us) State
---------------- ---------------- --------------- ------------ --------------- ------------
samplerd-1 node-3 10001 sock 10000000 CONNECTED
samplerd-1 node-3 411 sock 10000000 CONNECTED
sock:node-1:10001>
.EE

Expand All @@ -252,6 +270,68 @@ Producers: samplerd-1, samplerd-2
sock:node-3:10001>
.EE

Next is an example that controls the receive credits and rate limits of the
auto-added producers on agg11. Similar to the first example, the aggregator,
agg11, listens on port 411 and waits for advertisements. Moreover, a listening
endpoint on port 412 is added with a receive credits value. The aggregator also
creates producers when an advertisement sent from the host its IP address
falling into the subnet 192.168.0.0:16.

.EX
.B
> cat agg11.conf
.RS 4
# Create a listening endpoint
listen xprt=sock port=411
# Create the listening endpoint for receiving advertisement
listen xprt=sock port=412 credit=4000
# Accept advertisements sent from LDMSD running on hostnames their IP address
# falling in the range 192.168.0.0:16.
prdcr_listen_add name=compute ip=192.168.0.0:16
prdcr_listen_start name=compute
# Add and start an updater
updtr_add name=all_sets interval=1s offset=100ms
updtr_prdcr_add name=all_sets regex=.*
updtr_start name=all
.RE
.EE

There are two sampler daemons, which are configured to advertise to port 412 so
that the auto-added producers adopt the receive credidts of the listening
endpoint on port 412.

.EX
.B
> cat samplerd-3.conf
.RS 4
# Create a listening endpoint
listen xprt=sock port=411
# Start an advertiser that sends the advertisement to port 412 on the aggregator
# host
advertiser_start name=samplerd-3 host=agg11 xprt=sock port=412 reconnect=10s
# Load, configure, and start the meminfo plugin
load name=meminfo
config name=meminfo producer=samplerd-3 instance=samplerd-3/meminfo
start name=meminfo interval=1s
.RE
.EE

.EX
.B
> cat samplerd-4.conf
.RS 4
# Create a listening endpoint
listen xprt=sock port=411
# Start an advertiser that sends the advertisement to port 412 on the aggregator
# host
advertiser_start name=samplerd-4 host=agg11 xprt=sock port=412 reconnect=10s
# Load, configure, and start the meminfo plugin
load name=meminfo
config name=meminfo producer=samplerd-4 instance=samplerd-4/meminfo
start name=meminfo interval=1s
.RE
.EE

.SH SEE ALSO
.BR ldmsd (8)
.BR ldmsd_controller (8)
18 changes: 5 additions & 13 deletions ldms/python/ldmsd/ldmsd_communicator.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@
'credits', 'rx_rate' ] },
'advertiser_stop': {'req_attr': ['name'], 'opt_attr': []},
'prdcr_listen_add': {'req_attr': ['name'],
'opt_attr': ['rail', 'ip', 'credits', 'rx_rate', 'regex', 'disabled_start']},
'opt_attr': ['ip', 'regex', 'disable_start']},
'prdcr_listen_del': {'req_attr': ['name'], 'opt_attr': []},
'prdcr_listen_start': {'req_attr': ['name'], 'opt_attr': []},
'prdcr_listen_stop': {'req_attr': ['name'], 'opt_attr': []},
Expand Down Expand Up @@ -2568,7 +2568,7 @@ def advertiser_del(self, name):
self.close()
return errno.ENOTCONN, str(e)

def prdcr_listen_add(self, name, disabled_start=None, regex=None, ip=None, rail=None, credits=None, rx_rate=None):
def prdcr_listen_add(self, name, disable_start=None, regex=None, ip=None):
"""
Tell an aggregator to wait for advertisements from samplers

Expand All @@ -2578,27 +2578,19 @@ def prdcr_listen_add(self, name, disabled_start=None, regex=None, ip=None, rail=
Parameters:
- Name of the producer listen
- Regular expression to match sampler hostnames
- The number of rail
- The credits in bytes
- The receive rate limit
- IP range in the CIDR format

Return:
- status is an errno from the errno module
- data is an error message if status !=0 or None
"""
attr_list = [ LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.NAME, value=name) ]
if disabled_start is not None:
attr_list.append(LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.AUTO_INTERVAL, value=disabled_start))
if disable_start is not None:
attr_list.append(LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.AUTO_INTERVAL, value=disable_start))
if regex is not None:
attr_list.append(LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.REGEX, value=regex))
if ip is not None:
attr_list.append(LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.IP, value=ip))
if rail is not None:
attr_list.append(LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.RAIL, value=rail))
if credits is not None:
attr_list.append(LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.CREDITS, value=credits))
if rx_rate is not None:
attr_list.append(LDMSD_Req_Attr(attr_id=LDMSD_Req_Attr.RX_RATE, value=rx_rate))

req = LDMSD_Request(command_id=LDMSD_Request.PRDCR_LISTEN_ADD,
attrs=attr_list)
Expand Down
10 changes: 1 addition & 9 deletions ldms/python/ldmsd/ldmsd_controller
Original file line number Diff line number Diff line change
Expand Up @@ -2945,16 +2945,8 @@ class LdmsdCmdParser(cmd.Cmd):
Parameters:
name= A unique name of the producer listen
reconnect= The retry interval to check for connection establishment of producers matched the regular expression.
[disabled_start=] Tell LDMSD not to start the producers
[disable_start=] Tell LDMSD not to start the producers
[regex=] A regular expression to match sampler hostnames
[rail=] The number of rail endpoints for the prdcr (default: 1).
[credits=] The send credits our ldmsd (the one we are controlling)
advertises to the prdcr (default: value from ldmsd --credits
option). This limits how much outstanding data our ldmsd
holds for the prdcr. The prdcr drops messages when it does
not have enough send credits.
[rx_rate=] The recv rate (bytes/sec) limit for this connection. The
default is -1 (unlimited).
"""
arg = self.handle_args('prdcr_listen_add', arg)
if arg is None:
Expand Down
20 changes: 20 additions & 0 deletions ldms/src/core/ldms.h
Original file line number Diff line number Diff line change
Expand Up @@ -1029,6 +1029,26 @@ int ldms_xprt_is_remote_rail(ldms_t x);
*/
int ldms_xprt_rail_eps(ldms_t x);

/**
* \brief Get the receive limit of an endpoint
*
* \param x The transport handle
*
* \retval Receive limit is retunred.
* \retval -EINVAL if \c x is NULL or not a rail
*/
int64_t ldms_xprt_recv_limit(ldms_t x);

/**
* \brief Get the receive rate limit of an endpoint
*
* \param x The transport handle
*
* \retval Receive limit is retunred.
* \retval -EINVAL if \c x is NULL or not a rail
*/
int64_t ldms_xprt_recv_rate_limit(ldms_t x);

/**
* \brief Get the send credit
*
Expand Down
20 changes: 20 additions & 0 deletions ldms/src/core/ldms_rail.c
Original file line number Diff line number Diff line change
Expand Up @@ -1283,6 +1283,26 @@ int ldms_xprt_rail_eps(ldms_t _r)
return r->n_eps;
}

int64_t ldms_xprt_recv_limit(ldms_t _r)
{
ldms_rail_t r = (void*)_r;
if (!_r)
return -EINVAL;
if (!XTYPE_IS_RAIL(_r->xtype))
return -EINVAL;
return r->recv_limit;
}

int64_t ldms_xprt_recv_rate_limit(ldms_t _r)
{
ldms_rail_t r = (void*)_r;
if (!_r)
return -EINVAL;
if (!XTYPE_IS_RAIL(_r->xtype))
return -EINVAL;
return r->recv_rate_limit;
}

void __rail_ep_limit(ldms_t x, void *msg, int msg_len)
{
/* x is the legacy ldms xprt in the rail, its context is the assocated
Expand Down
3 changes: 0 additions & 3 deletions ldms/src/ldmsd/ldmsd.h
Original file line number Diff line number Diff line change
Expand Up @@ -392,9 +392,6 @@ typedef struct ldmsd_prdcr_listen {
} state;
const char *hostname_regex_s;
regex_t regex;
int rails; /* Rail size */
int recv_credits; /* bytes */
int rate_limits; /* bytes/sec */
int auto_start; /* default is 1, i.e., auto start producers */

/* Network Address & prefix_len from a given CIDR IP address string */
Expand Down
Loading
Loading