Skip to content

post.xcat_restructure

penguhyang edited this page Dec 1, 2015 · 38 revisions

The mini-design of post.xcat restructure

Background

As the code logic in original post.xcat file has some problem. We should identify some critical error and plain error for easy to debug. When error happens, we should record the detail information on MN and the node.

critical error

Solution: write this error information on the node and MN, halt the system.

1. openssl is not installed on the system
2. download the postscripts failure
    We use wget command to download the postscripts from the http://$i$INSTALLDIR/postscripts/ on MN, it maybe failure for a serial reasons.
    1) Without wget command
    2) The network is unreachable
3. getpostscript.awk not exist
    First we try to download the mypostscript.$NODE file from the MN, we will rename it to mypostscript if MN have this file. If MN don't have this file, we will try to create mypostscript file using getpostscript.awk. If the getpostscript.awk file is not in the /xcatpost folder, then the error happens.
4. create the mypostscript failure
    The mypostscript file is used to generate the mypostscript.post and other files. If this file can't generate with these two methods, then the error happens. 

plain error

Solution: write this error information on the node and MN, but not halt the system.

1. download the precreate mypostscript file failure
2. create the mypostscript.post file failure
3. create the xcatpostinit1 file failure
4. create the xcatinstallpost file failure
5. create the xcatdsklspost file failure
6. create the mypostscript file failure

Code Logic and Process

  1. Export environment variable information, such as MASTER_IP, NODESTATUS, TFTPDIR and etc..
  2. Include the library of the xCAT to use some functions.
  3. Set the value for the variable:INSTALLDIR, TFTPDIR if they haven't set.
  4. Sleep for a while, then download the postscripts from management node.
  5. Before download postscripts form management node, exam whether the openssl is installed or not, if not then the system should halt.
  6. Time to download postscripts, use wget command the postscripts from MN and create a variable GOTALL as a flag to show whether the download is sucessfully, if not then the sytem should halt.
  7. Fortunately the postscripts have been downloaded sucessfully, then we will create the mypostscript file.
  8. First try to download the mypostscript.$NODE file, this file is created when set the precreatemypostscripts attribute to 1. If this file exists, rename this file to mypostscript.
  9. If there is not mypostscript.$NODE file, then we should generate mypostscript file through getpostscript.awk. If the getpostscript.awk file not exist, then the system should halt.
  10. We use a while loop to generate mypostscript with getpostscript.awk in case there is a failure.
  11. Use sed command to add run_ps before the commands in the mypostscript file. We output the run_ps subroutine and append the mypostscript file content to recreate mypostscript file. Unfortunately, this file can't be created, so the system will halt.
  12. Now we have the mypostscript file. It's time to use the mypostscript file to create the mypostscript.post file according sed command to delete the items between postscripts-start-here and postscripts-end-here
  13. Create the post init file(xcatpostinit1)
  14. Create the xcatinstallpost file
  15. Create the dskls post file(xcatdsklspost)
  16. Finally create the mypostscript file according sed command to delete the items between postbootscripts-start-here and postbootscripts-end-here
  17. update the node status using updateflag.awk

Planning Outputs

When xcatdebugmode is on, the log information will be saved.

  1. The system will sleep for a while to get ready, the output will looks like

    sleep 16

  2. Before download postscripts from the management node, exam whether the openssl is installed or not, if not the output will looks like:

    /usr/bin/openssl does not exist, hang ...

  3. When download postscripts file from the management node

    1. Show this message as a reminder that we are going to download the postscripts trying to download postscripts from http:///install/postscripts/
    2. If the system have no wget command, we can't download. Output: /usr/bin/wget does not exist, hang ...
    3. It's time to download the postscripts file from the management node.
      1. If the postscripts downloaded sucessfully, the output will looks like: postscripts downloaded successfully
        1. After downloaded the postscripts, generate the xcatinfo file. Output: /opt/xcat/xcatinfo generated
  4. If we can't download the postscripts, the output will looks like: failed to download postscripts from http://$i$INSTALLDIR/postscripts/, hang ...

  5. Now we generate the getpostscript file

    1. According the precreated mypostscript file

      1. Show this message as a reminder that we are going to download the precreated mypostscript file trying to download precreated mypostscript file http://$i$TFTPDIR/mypostscripts/mypostscript.$NODE

      2. If the precreated mypostscript file download sucessfully, the output will looks like: precreated mypostscript downloaded successfully

    2. According the getpostscript.awk

      1. If we can't download the precreated mypostscript, then we will try to generate the getpostscript file using getpostscript.awk. Show this message as a reminder that we are going to generate it. failed to download precreated mypostscript, trying to generate with getpostscript.awk
      2. If the getpostscript.awk file don't exist, the output will looks like: /xcatpost/getpostscript.awk does not exist, hang ...
    3. If this file can't generate with these two methods, the output will looks like: generate mypostscript file failure, hang ...

    4. If this file generated successfully, output: generate mypostscript file successfully

    5. Now we output the run_ps subroutine and append the mypostscript file content to recreate mypostscript file.

      1. If successfully generated, output: mypostscript generated successfully
      2. If failed to generate, output: failed to generate mypostscript file, hang ...
    6. Time to generate mypostscript.post

      1. If successfully generated, output: /xcatpost/mypostscript.post generated
      2. If failed to generate, output: failed to generate /xcatpost/mypostscript.post
    7. Time to generate xcatpostinit1

      1. If successfully generated, output: /etc/init.d/xcatpostinit1 generated
      2. If failed to generate, output: failed to generate /etc/init.d/xcatpostinit1
    8. Time to generate xcatinstallpost

      1. If successfully generated, output: /opt/xcat/xcatinstallpost generated
      2. If failed to generate, output: failed to generate /opt/xcat/xcatdsklspost
    9. Time to generate xcatdsklspost

      1. If successfully generated, output: /opt/xcat/xcatdsklspost generated
      2. If failed to generate, output: failed to generate /opt/xcat/xcatdsklspost
    10. Time to generate mypostscript

      1. If successfully generated, output: /xcatpost/mypostscript generated
      2. If failed to generate, output: failed to generate /xcatpost/mypostscript
      3. show this message as a reminder that mypostscript will run running mypostscript
      4. show this message as a reminder that mypostscript return mypostscript returned
    11. show this message as a reminder that grub has updated /boot/grub/grub.conf updated

    12. report the installation status finished node installation, reporting status...

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally