Skip to content

Managing_Multiple_Sites_and_Clusters

ligc edited this page Jul 30, 2015 · 5 revisions

Table of Contents

{{:Design Warning}}

Requirements

We are starting to get requirements to manage from a single point sets of nodes that are more geographically or logically disperse than can be easily handled by service nodes. Some of the perceive requirements are:

  • network connectivity between sites may be slow
  • different sites may be controlled by different organizations, making a single consolidated db undesirable

Implementation

We want to take a relatively simple approach to satisfying these requirements, based on our remote client support. Here are some ideas:

  • On the global client (GC, the central point of control), it should have an ssl certificate (for xcatd) and an ssh key for each xcat MN it communicates with.
    • This would allow any xcat cmd to be run towards a single MN.
    • With a small modification to the p cmds (to not use xcatd to resolve the node range), all of them could work to the MNs (psh, prsync, etc.)
  • The global client should have a list of the clusters that are being managed, i.e. a list of the MNs
    • This could either just be the list of ssl certificates, or a simpler list of hostnames in a config file
    • This would allow the p cmds above to support some simple groups like "all" in this context
    • We should also have a file on this machine like /etc/xCATGC that indicates this is a global client (similar to the /etc/xcATMN and /etc/xCATSN files). Then code like the p cmds can use this to know it should get node ranges from a different place.
  • We should support running an xcat cmd to multiple MNs in one invocation.
    • This could be implemented as a new front end cmd like: xcatsh <nr> <xcatcmd>
    • Or existing the existing xcat cmd client scripts (xcatclient and xcatclientnnr) could be modified to automatically do this if it detects a special node range. But there are more client front ends than these, so they would all have to be modified.
    • In either case, the node range syntax supported should be something like: mn1%grp1,mn2%n1-n5
    • Then the output should be prefixed by the MN it came from so that xcoll can separate it
  • Packaging:
    • A new meta pkg called xCATgc that requires xCAT-client

As an alternative implementation, we could install xcatd on the GC and have it dispatch cmds to the other MNs. In some ways, this would be a more elegant solution. But i'm concerned it would make xcatd even more complicated than it already is, which is a problem.

Usage Scenarios

  • rpower stat of all nodes in all clusters:
    • xcatsh 'all%all' rpower stat | xcoll
  • Show the nodelist.status atttribute for all nodes in mn1 and mn2:
    • xcatsh mn1,mn2%all nodelist nodelist.status | xcoll
  • Push content for the policy table to all clusters:
    • pscp /tmp/policy.csv all:/tmp/policy.csv
    • xcatsh all tabrestore /tmp/policy.csv
  • Roll out a new stateless image to all clusters:
    • prsync /install/netboot/rhels6/x86_64/compute all:/install/netboot/rhels6/x86_64/
    • xcatsh all%compute nodeset netboot
    • xcatsh all%compute rpower boot

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally