Skip to content

minidesign_xcatprobe

Weihua Hu edited this page May 27, 2016 · 9 revisions

Overview

We want to make a new command to probe all the possible issues in xCAT. It can probe xCAT MN and xCAT node definition statically. It also can probe the node discovery and node deployment staticaly. The goal is to make a command to help xCAT users to predict and debug xCAT problems easily.

Interface

The syntax of the xcatprobe command:

xcatprobe <probe_type> [parameters]
  • xcatprobe # same as xcatprobe help
  • xcatprobe help
  • xcatprobe nodedef <noderange>
  • xcatprobe osdef <osimage>
  • xcatprobe xcatmn
  • xcatprobe node <noderange>
  • xcatprobe switch
  • xcatprobe nodeready <noderange> [console] [deployment]
  • xcatprobe nodediscover
  • xcatprobe nodedeploy

xcatprobe help

Display the usage of xcatprobe.

  • Display the basic usage
  • Display all the probe type

xcatprobe nodedef

Probe node definition.

  • Check validate of node name
  • Check ip<=>node entry in /etc/hosts
  • Check DNS resolution
  • Check HWcontrol: check definition and try the rpower status to make sure hwcontrol is ready for using.
  • Check attributes: mgt, netboot, mac

xcatprobe osdef

Probe the definition of OSimage

  • check the basic attribtues: imagetype, osarch, osdistroname, osname, osvers
  • check the existence of packages in pkgdir
  • check the packages in the otherpkgdir
  • check the entries in the pkglist and otherpkglist
  • check the rootimage in rootimgdir for netboot image

xcatprobe xcatmn

Probe the readiness of xCAT MN

  • Check the hostname, long name
  • Check xcatd has been started sucessfully(six processes is working)
  • Check xcatd is listening on 2 important port
  • Check the basic configuration of xcat: site table, passwd table, network table
  • Check mnip is configured on current server and is a static ip
  • Check the selinux has been disabled
  • Check the firewall has been closed
  • Check the free disk space of /tmp /var /install
  • Check the size of dhcpd.leases file less than 100MB
  • Check the network services are running configured properly: dhcpd, named, tftpd, httpd
  • Verify the all the above items for all the service nodes

xcatprobe node

Probe whether the node is ready for using

  • ssh without password
  • syslog has been configured
  • verify the parallel commands like xdsh: xdcp

xcatprobe switch

Probe the configuration of switches

  • Check whether the IP, user, password, auth have been configured
  • Check whether the snmp v1 /v3 are enabled
  • Check the system description/name from snmp
  • Display the mac table for the switch

xcatprobe nodeready [console] [deployment]

Probe the readiness of node.

  • Check the console configuration

    • check the node attributes: cons, serial*
    • check the cfg in /etc/conserver.cf
  • Check the readiness for OS deployment:

    • provmethod is set, readiness of osimage
    • dhcp set in dhcpd.leases
    • readiness of bootloader and bootloader cfg file
    • readiness of installer kernel + initrd
    • readiness of installer cfg file
  • It can handle all the nodes in the

xcatprobe nodediscover [noderange]

Probe the node discovery process

Start a process to check the following stages for a node discovery process

  • check the dhcp dynamic range for BMC and host
    • if possible, display the free ips in the range
  • check the readiness of genesis packages first
    • check the genesis has been installed
    • check the mknb has been run, the genesis kernel+initrd has been created
    • check the cfg files have been created and the name is same with the one which has been cfged in the dhcpd.conf
  • for the case the [noderange] is specified
    • check the nextbootorder to be network
  • node sends dhcp request and get an ip(syslog)
    • the ip should be one from the host dynamic ip range
  • node downloads bootloader
    • for x86_64: xnba (syslog/httplog)
    • for ppc64le: none
    • else: error
  • node downloads cfg file for bootloader
    • for x86_64: xnba cfg (net_cfg for discovery)
    • for ppc64le: petitboot cfg
  • node downloads genesis (kernel + initrd)
  • node run doxcat
  • node run discovery
  • node finish the info collection
  • node send findme request to xCAT MN
  • xcatd handle the findme request
  • xcat find or cannot find a matched node for the discovered node:
  • [for findme code specific instead of xcatprobe]
    • if matched: display the matched node. (Add prefix with the discovery method like: [MTMS], [Switch], [SEQ])
    • if not matched: display the
      • [MTMS]: my MTMS is xxxx, cannot find any pre-defined node;
      • [Switch]: my mac is xxxx, my switch port is yyyy+zzzz, cannot find any pre-defined node; Display the mac address table for the switch;
      • [SQE]: cannot find free host or bmc
    • log the findme info if xcatdebugmode is enabled
  • update matched node
  • finished the node discovery
  • do the next task: bmcsetup ...

xcatprobe nodedeploy

Probe the process of node deployment

  • node sends dhcp request (syslog)
  • node downloads xnba (syslog/httplog)
  • node downloads xnba cfg (node specific cfg file)
  • node downloads installer (kernel + initrd)
  • node downloads cfg file for installer (kickstart, autoyast)
  • node start package install
  • node run postscript (A, B, C)
  • node reboot
  • node run postbootscript
  • node is sshd

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally