Skip to content

numSOS Examples Directory

oceandlr edited this page Mar 25, 2019 · 26 revisions

Here we discuss some examples in numSOS under bin. We will use the example SOS database created in the QuickStart. Two things to note:

  1. That database had no job_info data (e.g., user name, job name). The examples in this directory assume the existence of such a database. Therefore, we will do some work arounds for this case.
  2. That database had two job ids: 5078835 and 0.

This can be determined as follows:

In [8]: src.select(['job_id'],from_ = ['meminfo_E5-2698'],order_by = 'job_id', unique = True)
In [9]: dst = src.get_results()
In [10]: dst.show()
          job_id 
---------------- 
             0.0 
       5078835.0 
---------------- 
2 results
 
Alternately, python types can be used:
In [10]: myset = set(dst.array('job_id'))
 
In [11]: print(myset)
set([0.0, 5078835.0])

First, we discuss some related/support functions under numsos:

Support functions

ArgParse

Supports getting command line arguments, including parsing time arguments. Some predefined arguments are in the code as follows:

class ArgParse(object):
def __init__(self, description):
    self.parser = argparse.ArgumentParser(description=description)
    self.parser.add_argument(
        "--path", required=True,
        help="The path to the database.")
    self.parser.add_argument(
        "--create", action="store_true",
        help="Create a new SOS database. " \
        "The --path parameter specifies the path to the new " \
        "database.")
    self.parser.add_argument(
        "--mode", metavar="FILE-CREATION-MASK", type=int,
        help="The permissions to assign to SOS database files.")
    self.parser.add_argument(
        "--verbose", action="store_true",
        help="Request verbose query output")
    self.parser.add_argument(
        "--monthly", action="store_true",
        help="Show results in the last 30 days")
    self.parser.add_argument(
        "--weekly", action="store_true",
        help="Show results in the last 7 days")
    self.parser.add_argument(
        "--daily", action="store_true",
        help="Show results in the last 24 hours")
    self.parser.add_argument(
        "--today", action="store_true",
        help="Show today's results (since midnight)")
    self.parser.add_argument(
        "--hourly", action="store_true",
        help="Show results in the last hour")
    self.parser.add_argument(
        "--begin",
        type=valid_date,
        help="Specify the start time/date for similar jobs. " \
        "Format is [CC]YY/MM/DD HH:MM or [CC]YY-MM-DD HH:MM")
    self.parser.add_argument(
        "--end",
        type=valid_date,
        help="Specify the end time/date for similar jobs. ")
    self.parser.add_argument(
        "--period",
        type=period_spec,
        help="Specify a period for the analysis." \
        "The format is [count][units] where," \
        "  count : A number\n" \
        "  units :\n" \
        "        s - seconds\n" \
        "        m - minutes\n" \
        "        h - hours\n" \
        "        d - days\n")

 ...

Examples

mem_used

Main uses ArgParse to get the required arguments, which are path, schema (if not default value), and optionally times.

if __name__ == "__main__":
  parser = ArgParse(description="Compute memory summary statistics for a job")
  parser.add_argument(
      "--schema", required=False, default='meminfo',
      help="The meminfo schema name.")
  args = parser.parse_args()
  (start, end) = get_times_from_args(args)
  where = []
  if start > 0:
      where.append([ 'timestamp', Sos.COND_GE, start ])
  if end > 0:
      where.append([ 'timestamp', Sos.COND_LE, end ])

MemUsed uses the path to create a SOS container, which is used to create a SoSDataSource()

  cont = Sos.Container(args.path)
  src = SosDataSource()
  src.config(cont=cont)
  src.select([ 'timestamp', 'job_id', 'component_id', 'MemTotal', 'MemFree' ],
             from_    = [ args.schema ],
             where    = where,
             order_by = 'timestamp')

One could test these commands in ipython. NOTE: the '/' at the end of the value for the path is necessary:

In [20]: cont = Sos.Container('/XXX/')

In [21]: src = SosDataSource()

In [22]: src.config(cont=cont)

In [23]: src.select(['timestamp','component_id','Active','MemFree'],from_ = ['meminfo_E5-2698'], order_by = 'comp_time')
 
In [24]: src.show()
meminfo_E5-2698                                                 
timestamp       component_id    Active          MemFree         
--------------- --------------- --------------- --------------- 
(1518803953, 3055)              12           82672       129869024 
(1518803954, 2905)              12           82672       129864932 
...

MemUsed then creates a class, Xfrm, which inherits from Transform, and reads in the values (NOTE: need more description here on how this iterator works):

xfrm = Xfrm(src, None, limit=1024 * 1024)  
# class Xfrm(Transform)                                                
res = xfrm.begin()                                                                         
while res:                                                                                 
    res = xfrm.next()                                                                      
    if res is not None:                                                                    
        # concatenate TOP and TOP~1                                                        
        xfrm.concat()                                                                      

Then, the derived class's mem_stats function is called for each job_id.

 xfrm.for_each([ 'job_id' ], xfrm.mem_stats)  

The mem_stats function calculates various memory based statistics, using the defined functions (e.g., min, mean), some computations (e.g., ratio), and operations using the Stack.

MemUsed also assumes the SOS database has a schema called 'job info' with job info:

 job_id = int(values[0])
 job_name, job_user = self.get_job_info(job_id)
 
 # with:
 def get_job_info(self, job_id):
      src = SosDataSource()
      src.config(cont=self.source.cont)
      src.select([ 'job_id', 'job_name', 'job_user' ],
                 from_    = [ 'jobinfo' ],
                 where    = [ [ 'job_id', Sos.COND_EQ, job_id ] ],
                 order_by = 'job_id')
      res = src.get_results(limit=1)
      return res.array('job_name')[0], res.array('job_user')[0]

In our test case, we don't have this schema in our database. For the sake of the example you can hardwire some temporary values in for the job info:

 # job_name, job_user = self.get_job_info(job_id) # COMMENT THIS OUT
 job_name = 'foo'
 job_user = 'bar'

You can then run the example as follows:

python mem_used --path='/dir/my-container/' --schema='meminfo_E5-2698'
Job ID       Job Name     User         Job Size     Min Used NID Min Used %   Max Used NID Max Used %   Mean %       Std         
------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
           0 foo          bar                    98           36         1.55          160   3.39796090   1.81018352   0.58637189 
     5078835 foo          bar                     2           13         1.50           12   1.54239437   1.52214499   0.01864555 
------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------

Main

Basic

Data Computations

Reference Docs

Other

Clone this wiki locally