-
Notifications
You must be signed in to change notification settings - Fork 1
QuickStart (v4)
ovis-hpc edited this page Mar 1, 2019
·
58 revisions
v4 page is under construction.
- openssl package
- gnu compiler
- autotools
- This example shows cloning into ~/Source/ovis-4.2 and putting the build in ~/Build/OVIS-4.2
> mkdir ~/Source > mkdir ~/Build > cd ~/Source > git clone https://github.com/ovis-hpc/ovis.git ovis-4.2
- Go to your source directory:
> cd ~Source/ovis-4.2 > git checkout -b OVIS-4.2.1-rc1 origin/OVIS-4.2.1-rc1
- Run autogen.sh
> ./autogen.sh
- Configure and Build (Builds default linux samplers. Build installation directory is prefix):
> mkdir build > cd build > ../configure --prefix=/Build/OVIS-4.x > make > make install
- Set up environment:
TOP=<absolute_path>/Build/OVIS-4.x export LD_LIBRARY_PATH=$TOP/lib/:$TOP/lib:$LD_LIBRARY_PATH export LDMSD_PLUGIN_LIBPATH=$TOP/lib/ovis-ldms/ export ZAP_LIBPATH=$TOP/lib/ovis-lib/ export PATH=$TOP/sbin:$TOP/bin:$PATH export PYTHONPATH=$TOP/lib/python2.7/site-packages/
- Make a configuration file (called sampler.conf) to load the meminfo and vmstat samplers with the following contents:
load name=meminfo config name=meminfo producer=${HOSTNAME} instance=${HOSTNAME}/meminfo component_id=${COMPONENT_ID} schema=meminfo job_set=${HOSTNAME}/jobinfo uid=12345 gid=12345 perm=0700 start name=meminfo interval=${SAMPLE_INTERVAL} offset=${SAMPLE_OFFSET} # load name=vmstat config name=vmstat producer=${HOSTNAME} instance=${HOSTNAME}/vmstat component_id=${COMPONENT_ID} schema=vmstat job_set=${HOSTNAME}/jobinfo uid=0 gid=0 perm=0755 start name=vmstat interval=${SAMPLE_INTERVAL} offset=${SAMPLE_OFFSET}
- Note the specification of munge-based permissions here. However, munge is optional.
- Set up additional environment variables for configuration file:
export COMPONENT_ID=2 export SAMPLE_INTERVAL=1000000 export SAMPLE_OFFSET=50000
- This will set the samplers to collect at 1 second intervals.
- Run a daemon using munge authentication:
ldmsd -x sock:10444 -c sampler.conf -l /tmp/demo_ldmsd_log -v DEBUG -a munge
- This will also write out DEBUG-level information to the specified (-l) log.
- Run ldms_ls on that node to see set, meta-data, and contents:
ldms_ls -h localhost -x sock -p 10444 -a munge ldms_ls -h localhost -x sock -p 10444 -v -a munge ldms_ls -h localhost -x sock -p 10444 -l -a munge
- Note the use of munge. Users will not be able to query a daemon launched with munge if not querying with munge. Users will only be able to see sets as allowed by the permissions in response to ldms_ls.
> ldms_ls -h localhost -x sock -p 10444 -l -v -a munge host1/vmstat: consistent, last update: Mon Oct 22 16:58:15 2018 -0600 [1385us] APPLICATION SET INFORMATION ------ updt_hint_us : 5000000:0 METADATA -------- Producer Name : host1 Instance Name : host1/vmstat Schema Name : vmstat Size : 5008 Metric Count : 110 GN : 2 User : root(0) Group : root(0) Permissions : -rwxr-xr-x DATA ------------ Timestamp : Mon Oct 22 16:58:15 2018 -0600 [1385us] Duration : [0.000106s] Consistent : TRUE Size : 928 GN : 110 ----------------- M u64 component_id 2 D u64 job_id 0 D u64 app_id 0 D u64 nr_free_pages 32522123 ... D u64 pglazyfree 1082699829 host1/meminfo: consistent, last update: Mon Oct 22 16:58:15 2018 -0600 [1278us] APPLICATION SET INFORMATION ------ updt_hint_us : 5000000:0 METADATA -------- Producer Name : host1 Instance Name : host1/meminfo Schema Name : meminfo Size : 1952 Metric Count : 46 GN : 2 User : myuser(12345) Group : myuser(12345) Permissions : -rwx------ DATA ------------ Timestamp : Mon Oct 22 16:58:15 2018 -0600 [1278us] Duration : [0.000032s] Consistent : TRUE Size : 416 GN : 46 ----------------- M u64 component_id 0 D u64 job_id 0 D u64 app_id 0 D u64 MemTotal 131899616 D u64 MemFree 130088492 D u64 MemAvailable 129556912 ... D u64 DirectMap1G 134217728
- Start another sampler daemon with a similar configuration on host2, as above.
- Make a configuration file (called agg11.conf) to aggregate from the two samplers at different intervals with the following contents:
prdcr_add name=host1 host=host1 type=active xprt=sock port=10444 interval=20000000 prdcr_start name=host1 updtr_add name=policy_h1 interval=1000000 offset=100000 updtr_prdcr_add name=policy_h1 regex=host1 updtr_start name=policy_h1 prdcr_add name=host2 host=host2 type=active xprt=sock port=10444 interval=20000000 prdcr_start name=host2 updtr_add name=policy_h2 interval=2000000 offset=100000 updtr_prdcr_add name=policy_h2 regex=host2 updtr_start name=policy_h2
- On host3, set up the environment as above and run a daemon:
ldmsd -x sock:10445 -c agg11.conf -l /tmp/demo_ldmsd_log -v DEBUG -a munge
- Run ldms_ls on the aggregator node to see set listing:
> ldms_ls -h localhost -x sock -p 10445 -a munge host1/meminfo host1/vmstat host2/meminfo host2/vmstat
- You can also run ldms_ls to query the ldms daemon on the remote node:
> ldms_ls -h host1 -x sock -p 10444 -a munge host1/meminfo host1/vmstat
- ldms_ls -l shows the detailed output, including timestamps. This can be used to verify that the aggregator is aggregating the two hosts' sets at different intervals.
- Use same sampler configurations as as above.
- Make a configuration file (called agg11_push.conf) to cause the two samplers to push their data to the aggregator as they update.
- Note that the prdcr configs remain the same as above but the updater_add includes the additional options: push=onchange auto_interval=false.
- Note that the updtr_add interval has no effect in this case but is currently required due to syntax checking
prdcr_add name=host1 host=host1 type=active xprt=sock port=10444 interval=20000000 prdcr_start name=host1 prdcr_add name=host2 host=host2 type=active xprt=sock port=10444 interval=20000000 prdcr_start name=host2 updtr_add name=policy_all interval=5000000 push=onchange auto_interval=false updtr_prdcr_add name=policy_all regex=.* updtr_start name=policy_all
- On host3, set up the environment as above and run a daemon:
ldmsd -x sock:10445 -c agg11_push.conf -l /tmp/demo_ldmsd_log -v DEBUG -a munge
- Run ldms_ls on the aggregator node to see set listing:
> ldms_ls -h localhost -x sock -p 10445 -a munge host1/meminfo host1/vmstat host2/meminfo host2/vmstat
- Use same sampler configurations as above
- Make a configuration file (called agg11.conf) to aggregate from one sampler with the following contents:
prdcr_add name=host1 host=host1 type=active xprt=sock port=10444 interval=20000000 prdcr_start name=host1 updtr_add name=policy_all interval=1000000 offset=100000 updtr_prdcr_add name=policy_all regex=.* updtr_start name=policy_all failover_config host=host3 port=10446 xprt=sock type=active interval=1000000 peer_name=agg12 timeout_factor=2 failover_start
- Make a configuration file (called agg12.conf) to aggregate from the other sampler with the following contents:
prdcr_add name=host2 host=host2 type=active xprt=sock port=10444 interval=20000000 prdcr_start name=host2 updtr_add name=policy_all interval=1000000 offset=100000 updtr_prdcr_add name=policy_all regex=.* updtr_start name=policy_all failover_config host=host3 port=10445 xprt=sock type=active interval=1000000 peer_name=agg11 timeout_factor=2 failover_start
- On host3, set up the environment as above and run two daemons as follows:
ldmsd -x sock:10445 -c agg11_push.conf -l /tmp/demo_ldmsd_log -v DEBUG -n agg11 -a munge ldmsd -x sock:10446 -c agg12_push.conf -l /tmp/demo_ldmsd_log -v DEBUG -n agg12 -a munge
- Run ldms_ls on each aggregator node to see set listing:
> ldms_ls -h localhost -x sock -p 10445 -a munge host1/meminfo host1/vmstat > ldms_ls -h localhost -x sock -p 10446 -a munge host2/meminfo host2/vmstat
- Kill one daemon
> pkill <pid of daemon listening on 10445>
- Make sure it died
- Run ldms_ls on the remaining aggregator to see set listing:
> ldms_ls -h localhost -x sock -p 10446 -a munge host1/meminfo host1/vmstat host2/meminfo host2/vmstat
- Home
- Search
- Feature Overview
- LDMS Data Facilitates Analysis
- Contributing patches
- User Group Meeting Notes - BiWeekly!
- Publications
- News - now in Discussions
- Mailing Lists
- Help
Tutorials are available at the conference websites
- Coming soon!
- Testing Overview
- Test Plans & Documentation: ldms-test
- Man pages currently not posted, but they are available in the source and build
V3 has been deprecated and will be removed soon
- Configuring
- Configuration Considerations
- Running