-
Notifications
You must be signed in to change notification settings - Fork 172
Mellanox_Switch_Support
Table of Contents
- Overview
- Mellanox Switch SSH, syslog and SNMP Configuration
- UFM xdsh, syslog and SNMP Configuration
- Docs
This design is for a xCAT configuration and support of the Mellanox Switch, UFM, and Mellanox adapters. The function planned is:
- Mellanox switches:
- set up ssh
- use the ssh capability to query and set settings via rspconfig or straight thru xdsh
- trap snmp alerts on the MN/SN
- channel snmp alerts into TEAL for analysis (TEAL team will work on this)
- consolidate syslog to MN/SN
- channel syslog entries into TEAL for analysis (TEAL team will work on this) ??
- a TEAL analyzer that will do the following for the snmp alerts and syslog entries: (NM team will work on this??)
- associate the IB info with the correct node
- associate link errors with benign events like powering off nodes
- apply thresholds to link errors and create a TEAL alert when a threshold has been exceeded
- UFM
- set up xdsh to UFM and backup
- list settings for UFM and backup to help admin make sure they are in sync
- give UFM the xcat node names so the UFM error events will have that info in it (dependent on Mellanox providing a way to give them the node names)
- get SNMP alerts - documentation only, no need to automate
- channel snmp alerts into TEAL for analysis (TEAL team will work on this)
- a TEAL analyzer that will do the same things mentioned for the switch events above ??
- consolidate syslog to MN/SN
- investigate consolidating UFM-specific log files to MN/SN ??
- Mellanox IB Adapters:
- install libraries in the node OS image
- configure adapters at node boot time
Use the following chdef command to define the mellanox switch ( for example mswitch).
chdef -t node -o mswitch groups=all nodetype=switch mgt=switch
Add the ssh user name and password to the switches table:
tabch switch=mswitch switches.sshusername=admin switches.sshpassword=admin switches.switchtype=MellanoxIB
The switches table will look like this:
#switch,snmpversion,username,password,privacy,auth,linkports,sshusername,sshpassword,switchtype,comments,disable
"mswitch",,,,,,,"admin","admin","MellanoxIB",,
If there is one admin and one password for all the switches then put an entry in the xCAT passwd table for the admin id and password to use to login. This is need to setup the ssh keys, so then the Mellanox commands can be run from the Management Node using xdsh.
#key,username,password,cryptmethod,comments,disable
"switch","admin","admin",,,
Three new attributes will be added to the switches table:
sshuserid -- ssh user name.
sshpassword -- ssh password.
switchtype -- the type of the switch. The valid value is: MellanxIB.
Attribute mgt would be set to "switch".
Attribute nodetype would be set to "switch".
Use "switch" as the key for he default username and password for all the switches.
rspconfig will be used to setup the ssh keys to the switch for passwordless ssh access.
rspconfig mswitch sshcfg=enable/disable
xdsh must create a special ssh command for the switch.
The syntax of a working command to the switch is the following:
ssh [email protected] 'cli "enable" "configure terminal" "show ssh server host-keys"'
The input to xdsh will be the following:
xdsh mswitch -l admin --devicetype IBSwitch::Mellanox 'enable;configure terminal;show ssh server host-keys'
Then xdsh should be able to construct the correct syntax of the command. Note" cli is required on all commands, so xdsh should add it. For example, xdsh will send:
ssh admin@mswitch cli "enable" "configure terminal" "show ssh server host-keys"
xdsh will have a config file for the Mellanox switch. The file name will be: /var/opt/xcat/IBSwitch/Mellanox/config. The contents are:
[main]
[xdsh]
pre-command=cli
post-command=NULL
A sample is shipped in /opt/xcat/share/xcat/ib/scripts/Mellanox/config.
We can add the return code command to the post-command if available.
Right now all commands good and bad return only the good return from ssh. Need to work with them to get a command like we have for QLogic "showLastRetcode".
Use the following command to consolidate the syslog to the MN or the SN:
rspconfig mswitch logdest=<ip>
This will be done through the monitoring plugin called snmpmon. New code will be added to support Mellanox IB swith. The code will use rspconfig under the cover. Supported rspconfig commands are described in next section.
First, get http://www.mellanox.com/related-docs/prod_ib_switch_systems/MELLANOX-MIB.zip, unzip it. Copy the mib file MELLANOX-MIB.txt to /usr/share/snmp/mibs directory on the mn and sn (if the sn is the snmp trap destination.)
Then,
To configure, run:
monadd snmpmon <mswitch>
moncfg snmpmon <mswitch>
To start monitoring, run:
monstart snmpmon <mswitch>
To stop monitoring, run:
monstop snmpmon <mswitch>
To deconfigure, run:
mondecfg snmpmon <mswitch>
Setup the snmp alert destination:
rspconfig <switch> snmpdest=<ip> [remove]
where "remove" means to remove this ip from the snmp destination list.
Enable/disable setting the snmp traps.
rspconfig <switch> alert=enable/disable
Define the read only community for snmp version 1 and 2.
rspconfig <switch> community=<string>
Enable/disable snmp function on the swithc.
rspconfig <switch> snmpcfg=enable/disable
Enable/disable ssh-ing to the switch without password.
rspconfig <switch> sshcfg=enable/disable
Setup the syslog remove receiver for this switch, and also define the minimum level of severity of the logs that are sent. The valid levels are: emerg, alert, crit, err, warning, notice, info, debug, none, remove. "remove" means to remove the given ip from the receiver list.
rspconfig <switch> logdest=<ip> [<level>]
For doing other tasks on the switch, use xdsh. For example:
xdsh mswitch -l admin --devicetype IBSwitch::Mellanox 'show logging'
UFM server are just regular Linix boxes with UFM installed. xCAT can help install and configure the UFM servers. The xCAT mn can send remote command to UFM through xdsh. It can also collect SNMP traps and syslogs from the UFM servers.
Assume we have two hosts with UFM installed, called host1 and host2. First define the two hosts in the xCAT cluster. Usually the network that the UFM hosts are in a different than the compute nodes, make sure to assign correct servicenode and xcatmaster in the noderes table. And also make sure to assign correct os and arch values in the nodetype table for the UFM hosts. For example:
mkdef -t node -o host1,host2 groups=ufm,all os=sles11.1 arch=x86_64 servicenode=10.0.0.1 xcatmaster=10.0.0.1
Then exchange the SSH key so that it can run xdsh.
xdsh host1,host2 -K
Now we can run xdsh on the UFM hosts.
xdsh ufm date
Run the following command to make the UFM hosts to send the syslogs to the xCAT mn:
updatenode ufm -P syslog
To test, runt the following commands on the UFM hosts and see if the xCAT mn receives the new messages in /var/log/messages
logger xCAT "This is a test"
You need to have the Advanced License for UFM in order to send SNMP traps.
1. Copy the mib file to /usr/share/snmp/mibs directory on the mn.
scp ufmhost:/opt/ufm/files/conf/vol_ufm3_0.mib /usr/share/snmp/mibs
where ufmhost is the host where UFM is installed.
2. On the UFM host, open the /opt/ufm/conf/gv.cfg configuration file. Under the [Notifications] line, set
snmp_listeners = <IP Address 1>[:<port 1>][,<IP Address 2>[:<port 2>]…]
the default port is 162. For example:
ssh ufmhost
vi /opt/ufm/conf/gv.cfg
....
[Notifications]
snmp_listeners = 10.0.0.1
where 10.0.0.1 is the the ip address of the management node.
3. On the UFM host, restart the ufmd.
service ufmd restart
4. From UFM GUI, click on the "Config" tab; bring up the "Event Management" Policy Table. Then select the SNMP check boxes for the events you are interested in to enable the system to send an SNMP traps for these events. Click "OK".
There are different logs on a UFM hosts besides syslogs. It's better to consolidate them to the xCAT mn. This item has low priority for now. It will be implemented later.
UFM will use the REST API(v2) for xCAT functions. It will get the node info and incorporate these info into the events. The REST APIs can be found here:
[REST_API_v2]
- Managing Mellonox switch. (new)
- xCAT Monitoring (update)
- Mar 08, 2023: xCAT 2.16.5 released.
- Jun 20, 2022: xCAT 2.16.4 released.
- Nov 17, 2021: xCAT 2.16.3 released.
- May 25, 2021: xCAT 2.16.2 released.
- Nov 06, 2020: xCAT 2.16.1 released.
- Jun 17, 2020: xCAT 2.16 released.
- Mar 06, 2020: xCAT 2.15.1 released.
- Nov 11, 2019: xCAT 2.15 released.
- Mar 29, 2019: xCAT 2.14.6 released.
- Dec 07, 2018: xCAT 2.14.5 released.
- Oct 19, 2018: xCAT 2.14.4 released.
- Aug 24, 2018: xCAT 2.14.3 released.
- Jul 13, 2018: xCAT 2.14.2 released.
- Jun 01, 2018: xCAT 2.14.1 released.
- Apr 20, 2018: xCAT 2.14 released.
- Mar 14, 2018: xCAT 2.13.11 released.
- Jan 26, 2018: xCAT 2.13.10 released.
- Dec 18, 2017: xCAT 2.13.9 released.
- Nov 03, 2017: xCAT 2.13.8 released.
- Sep 22, 2017: xCAT 2.13.7 released.
- Aug 10, 2017: xCAT 2.13.6 released.
- Jun 30, 2017: xCAT 2.13.5 released.
- May 19, 2017: xCAT 2.13.4 released.
- Apr 14, 2017: xCAT 2.13.3 released.
- Feb 24, 2017: xCAT 2.13.2 released.
- Jan 13, 2017: xCAT 2.13.1 released.
- Dec 09, 2016: xCAT 2.13 released.
- Dec 06, 2016: xCAT 2.9.4 (AIX only) released.
- Nov 11, 2016: xCAT 2.12.4 released.
- Sep 30, 2016: xCAT 2.12.3 released.
- Aug 19, 2016: xCAT 2.12.2 released.
- Jul 08, 2016: xCAT 2.12.1 released.
- May 20, 2016: xCAT 2.12 released.
- Apr 22, 2016: xCAT 2.11.1 released.
- Mar 11, 2016: xCAT 2.9.3 (AIX only) released.
- Dec 11, 2015: xCAT 2.11 released.
- Nov 11, 2015: xCAT 2.9.2 (AIX only) released.
- Jul 30, 2015: xCAT 2.10 released.
- Jul 30, 2015: xCAT migrates from sourceforge to github
- Jun 26, 2015: xCAT 2.7.9 released.
- Mar 20, 2015: xCAT 2.9.1 released.
- Dec 12, 2014: xCAT 2.9 released.
- Sep 5, 2014: xCAT 2.8.5 released.
- May 23, 2014: xCAT 2.8.4 released.
- Jan 24, 2014: xCAT 2.7.8 released.
- Nov 15, 2013: xCAT 2.8.3 released.
- Jun 26, 2013: xCAT 2.8.2 released.
- May 17, 2013: xCAT 2.7.7 released.
- May 10, 2013: xCAT 2.8.1 released.
- Feb 28, 2013: xCAT 2.8 released.
- Nov 30, 2012: xCAT 2.7.6 released.
- Oct 29, 2012: xCAT 2.7.5 released.
- Aug 27, 2012: xCAT 2.7.4 released.
- Jun 22, 2012: xCAT 2.7.3 released.
- May 25, 2012: xCAT 2.7.2 released.
- Apr 20, 2012: xCAT 2.7.1 released.
- Mar 19, 2012: xCAT 2.7 released.
- Mar 15, 2012: xCAT 2.6.11 released.
- Jan 23, 2012: xCAT 2.6.10 released.
- Nov 15, 2011: xCAT 2.6.9 released.
- Sep 30, 2011: xCAT 2.6.8 released.
- Aug 26, 2011: xCAT 2.6.6 released.
- May 20, 2011: xCAT 2.6 released.
- Feb 14, 2011: Watson plays on Jeopardy and is managed by xCAT!
- xCAT OS And Hw Support Matrix
- Oct 22, 2010: xCAT 2.5 released.
- Apr 30, 2010: xCAT 2.4 is released.
- Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
- Apr 16, 2009: xCAT 2.2 released.
- Oct 31, 2008: xCAT 2.1 released.
- Sep 12, 2008: Support for xCAT 2 can now be purchased!
- June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
- May 30, 2008: xCAT 2.0 for Linux officially released!
- Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
- Oct 31, 1999: xCAT 1.0 is born!
xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.