Add schedstats instrument #123

bjackman · 2017-05-11T15:51:36Z

Not sure how useful this will actually be hence RFC marker and not too much care or documentation. This basically jimmies /proc/schedstat into the Instrument API. What do you think?

Low priority at the moment, don't spend ages reviewing if you're busy :)

setrofim · 2017-05-11T15:57:58Z

devlib/instrument/__init__.py

+                        msg = 'Unexpected channel "{}"; must be in {}'
+                        raise ValueError(msg.format(chan_name, self.channels.keys()))
+            else:
+                for chan in self.channels.values():


in the else clause, channels is None so this will raise a NoAttributeError

channels is None but not self.channels

Ah, my bad.

setrofim · 2017-05-11T15:59:14Z

doc/instrumentation.rst

+   ``active_channels`` attribute of the ``Instrument``.
+
+   If ``channels`` is provided, it is a list of names of channels to enable and
+   ``sites`` and ``kinds`` are ignored.


I don't like the fact that user-provided parameters may be silently ignored. Either, they should be combined with the channels or an error should be raised.

Good point. I'll make it raise an error if you provide both channels and sites or kinds

setrofim · 2017-05-11T16:04:34Z

devlib/instrument/schedstats.py

+                # params. Does it matter?
+                measurement_type = MeasurementType(
+                    measurement_name, '', measurement_category)
+                self.add_channel(site=site,


All of this should be done int the __init__ -- setup() is not guaranteed to be invoked on every execution. (Set up is intended for persistent changes, such as deploying assets to the target.

setrofim · 2017-05-11T16:05:42Z

devlib/instrument/schedstats.py

+                                 measure=measurement_type)
+
+    def teardown(self):
+        self.target.write_value(self.sysctl_path, self.old_sysctl_value)


This should probably happen in reset() instead.

Actually, ignore the above. Doing this in teardown is correct.

setrofim · 2017-05-11T16:07:00Z

devlib/instrument/schedstats.py

+    based on identifiers from the scheduler code.
+    """
+    sysctl_path = '/proc/sys/kernel/sched_schedstats'
+    schedstat_path = '/proc/schedstat'


Should set mode to INSTANTANEOUS

derkling · 2017-05-12T10:10:56Z

devlib/instrument/schedstats.py

+                measures = DOMAIN_MEASURES
+            else:
+                raise TargetError(
+                    'Unrecognised schedstats line: "{}"'.format(line))


This will fails in all our Android systems, since we have additional stats reporting EAS codepaths.
In Android kernel we will have stats like:

cpu0 9703 0 227370 45501 209763 87404 75510237990 76927923766 168389 eas 50510 0 0 29148 0 21362 51469 11340 3022 43 14180 169 22712 40129 3 0 3023 28011 15055 11640 domain0 0f 5237 5089 86 95279 66 0 0 5089 2021 1945 62 74554 19 1 3 1942 43224 36754 5499 4393208 971 0 7128 29626 1 0 1 0 0 0 0 0 0 39102 1466 0 eas 0 0 0 9389 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13480 0 domain1 ff 3981 3733 220 673383 33 2 5 1149 1076 1069 6 36930 2 0 3 176 42131 36952 4441 18425660 738 3 261 36691 1 0 1 0 0 0 0 0 0 83215 794 0 eas 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15055 0

Where basically, after each cpu and domain we do have an eas entry with EAS specific counters.

I was wondering if we can end up with a more "generic" parser which allows to accept custom defined lines.

Maybe we can use a "map" to define what are the columns for each specific "channel", the map can be defined by default using the hardcoded values you have now... but we can allow the user to specify a custom map.
In that case, the custom map should allow to parse additional fields (e.g. eas stats) as well as open the door to parse stats versions not supported by default.

What do you think?

I'd rather not design & document an API for that given that these EAS stats are probably the only ever user (also TBH we have broken schedstats with those additions, we should really change the version string to "15.eas0" or something IMO). How about if:

Unrecognised lines are ignored with a warning

I rewrite this so we can inherit from it and write a subclass that knows about the EAS stats

Then, if we decide that the EAS stats format is "stable", or come up with a way to fixup the version string so we can detect changes, we just add support for the EAS stats directly here

I rewrite this so we can inherit from it and write a subclass that knows about the EAS stats
+1 for this - if we handle each line in a class member which can access curr_cpu and add curr_domain, then 'eas' lines can be parsed easily in a subclassed instrument

I started doing part 2. and realised that it's also pretty unpalatable; the parsing state (curr_cpu and curr_domain) makes the code quite arcane and it still isn't generic. If we want to keep fiddling with schedstats we're going to have to fiddle with this code again anyway so I propose just skipping part 2. and only doing parts 1. and 3.

bjackman · 2017-05-23T16:23:37Z

Latest version

Doesn't fail with the "eas" schedstats data but does spam WARNINGs. Could always just silently ignore "eas" data without a warning, until we add the ability to parse it.
Doesn't use the sysctl unless it's necessary, as the sysctl is v4.6+

setrofim · 2017-09-11T12:37:15Z

@derkling @bjackman sorry for the delay in following up on this. Is this still needed? Is it ready for merge? If so, please could you rebase onto tip of master.

bjackman · 2017-09-11T13:16:45Z

I think this could be really useful for use with WA3 but I would say it's low priority. It's also just occurred to me that I should think about how to get this to report diffs, i.e. you'd like to take a sample at the beginning and a sample at the end, and have the metric report the diff in schedstats during workload execution (maybe WA already does that?). Also would like some input from @derkling on the EAS stats thing. Are you OK with having this PR sit around a little bit longer?

setrofim · 2017-09-11T13:23:25Z

maybe WA already does that?

WA has a generic "sysfs_extractor", which takes a list of sysfs locations, which it then pulls before and after each iteration, and then performs a generic diff; but it has no awareness of what it's diff'ing, so it doesn't generate any metrics (you just get a directory structure with the diffs).

Are you OK with having this PR sit around a little bit longer?

Yeah, no problem. There is no rush. It's just that there hasn't been any activity on this for a while, so I just wanted to make sure this is still relevant.

bjackman · 2017-10-11T13:59:42Z

I've just refreshed this and plugged it into WA3 as an experiment.. definitely interesting but the only problem is it produces literally hundreds of metrics for each workload!

setrofim · 2017-10-11T14:42:09Z

I've just refreshed this and plugged it into WA3 as an experiment.. definitely interesting but the only problem is it produces literally hundreds of metrics for each workload!

That's not necessarily an issue, if those metrics are actually individually useful. An alternative though, if they're more of a "state dump" for debug/deep-dive analysis, is to write them out to a file that is than added as an Artifact, rather than individually adding them as metrics in WA (or adding only a few key/summary ones as metrics).

bjackman · 2017-10-11T14:47:57Z

Hmm yeah.. my dream for this Instrument is that you can immediately see that your kernel change e.g. "increased the average number of failed load balances" for a certain workload. All the added stats are potentially individually useful, but the particular one that's individually useful is probably going to change in every situation.

On the other hand I've just realised that's only going to happen if you explicitly ask for schedstats data, so maybe it's fine.

There are also a few potential aggregates I can think of (aggregate across CPU/sched_domain/idle type/any combination of those 3) but none of them makes any more sense than the others, so probably best considered a future feature. Subject to a bit more testing this is probably ready for merge.

setrofim · 2017-10-11T15:00:45Z

That sounds good to me. There is nothing in WA which demands or even encourages reducing the number of metrics. We specifically use __slots__ for metrics to minimize the impact of having vast numbers of these objects. And, as you've say, this only matters if the user enables the instrument.

Ultimately, it comes down to what's easier to work with. As you'll likely be, at least initially, the only/the primary user of this instrument, this is for you to decide.

I should also point out that there is also the option of tagging the metrics with some kind of classifier to make it easier to filter/split them from the rest of the results.

bjackman · 2017-10-12T09:28:40Z

OK sweet. Last time I tested this I saw some suspicious values, once I've investigated that, let's go ahead (this is still low priority though so might be a while).

bjackman · 2017-12-01T16:04:38Z

OK I am now using this instrument in anger and getting sane results.

derkling · 2017-12-01T16:24:55Z

devlib/instrument/schedstats.py

+
+     - If :method:`reset` is called with ``kinds=['alb_pushed']`` then the count
+       of migrations successfully triggered by active_load_balance will be
+       colelcted for each sched domain, with a channel for each domain.


s/colelcted/collected/

derkling · 2017-12-01T16:28:25Z

devlib/instrument/schedstats.py

+        match = re.search(r'version ([0-9]+)', lines[0])
+        if not match or match.group(1) != '15':
+            raise TargetError(
+                'Unsupported schedstat version string: "{}"'.format(lines[0]))


Maybe worth to report which versions we support?

derkling · 2017-12-01T16:33:09Z

devlib/instrument/schedstats.py

+
+        # On 4.6+ kernels, schedstats needs to be enabled via kernel cmdline or
+        # sysctl. If not, we'll just get a load of zeroes.
+        self.old_sysctl_value = None


Just for code readability, maybe by using some "returns" on success/failure, can we move the following checks (lines 128-145) into a dedicated function, something like _check_availability() ?

Yeah, seems like a good idea, will try it.

OK, so I split out this particular chunk (that enables the runtime config) into a separate function but kept the bit that checks for the presence and version of schedstat in the kernel, since the lines variable is re-used later to add the channels (to avoid more round-trips to the target slowing things down). If you like I can split it up further and just drop the "optimisation".

derkling · 2017-12-01T16:38:30Z

devlib/instrument/schedstats.py

+        # create.
+        # We'll create a site for each CPU and a site for each sched_domain.
+        for site, measures in self._get_sample().iteritems():
+            if site.startswith('cpu'):


Is it not more generic to parse the first column (up to excluding the ID) and create a category with that name?
Something like that should do the job:

site_re = regexp.compile("^(?P<category>[a-z]+)(?P<id>[0-9]+)")

IIRC I thought about that, but the thing is that domains are a sub-category of cpus, and there's no way to detect that: the parser has to be aware of this relationship. I can't imagine any other types of category (tasks? They have a separate schedstats file though. Task groups & sched entities would too, if they had schedstats), so I can't predict whether it makes sense to assume that the other category is a sub-entity of cpus, a sub-category of domains, or a new category entirely. Does that make sense?

Edit: Wait, that makes no sense.. I thought this comment was on different code (I assumed it was attached to a line in _get_sample) and interpreted it totally wrong. Will think about this!

Ah OK now I understand you!

To be honest while it would be more generic I don't think this is worth it unless there are more categories.. even with the EAS stats there are still only 2 category types, and even with the addition of tasks there are only 3. I don't think it's worth adding a regex (which nobody likes reading) just to deal with that..

derkling · 2017-12-01T16:43:50Z

devlib/instrument/schedstats.py

+                # it might not be stable.
+                continue
+            else:
+                self.logger.warning(


Dunno if it really can make sense... but... can we think about generating at run-time a category which has the name of this line's first column and a "generic list of measures", e.g. measure1, measure2, measure3... etc..

This will allows to parse Android specific fileds and then map the measures to proper names in the client code.

Yeah that seems like a sensible idea (although in fact it might be easier to just add the EAS-specific schedstats?). Do you mind if we leave that for a separate feature though?

derkling · 2017-12-01T16:44:35Z

devlib/instrument/schedstats.py

+
+        return ret
+
+


Remove empty lines?

bjackman

Thanks for the review! I think I will have to wait until monday to fix it up.

bjackman · 2017-12-01T16:56:22Z

devlib/instrument/schedstats.py

+
+        # On 4.6+ kernels, schedstats needs to be enabled via kernel cmdline or
+        # sysctl. If not, we'll just get a load of zeroes.
+        self.old_sysctl_value = None


Yeah, seems like a good idea, will try it.

bjackman · 2017-12-01T17:03:25Z

devlib/instrument/schedstats.py

+        # create.
+        # We'll create a site for each CPU and a site for each sched_domain.
+        for site, measures in self._get_sample().iteritems():
+            if site.startswith('cpu'):


IIRC I thought about that, but the thing is that domains are a sub-category of cpus, and there's no way to detect that: the parser has to be aware of this relationship. I can't imagine any other types of category (tasks? They have a separate schedstats file though. Task groups & sched entities would too, if they had schedstats), so I can't predict whether it makes sense to assume that the other category is a sub-entity of cpus, a sub-category of domains, or a new category entirely. Does that make sense?

Edit: Wait, that makes no sense.. I thought this comment was on different code (I assumed it was attached to a line in _get_sample) and interpreted it totally wrong. Will think about this!

bjackman · 2017-12-01T17:06:07Z

devlib/instrument/schedstats.py

+                # it might not be stable.
+                continue
+            else:
+                self.logger.warning(


Yeah that seems like a sensible idea (although in fact it might be easier to just add the EAS-specific schedstats?). Do you mind if we leave that for a separate feature though?

This instrument collects snapshots of the CPU/sched_domain scheduler statistics in /proc/schedstat, and reports a Measure for each of them. It currently doesn't support the per-task stats reported in /proc/<pid>/schedstat, and it ignores the additional EAS schedstats reported by Android kernels. The measures are named according to the corresponding identifiers in the kernel source code. Although This might appear to be an obscure way to refer to them, the information in /proc/schedstats is not going to be any use to anyone who doesn't have the kernel code handy. There is an English description in Documentation/scheduler/sched-stats.txt but it can only really be considered a reference for those who already understand the scheduler. Some of the measures are representing time intervals, but their units are jiffies, the meaning of which depends on the kernel configuration. Therefore no units are reported for the measurements.

setrofim reviewed May 11, 2017

View reviewed changes

setrofim requested changes May 11, 2017

View reviewed changes

setrofim reviewed May 11, 2017

View reviewed changes

derkling reviewed May 12, 2017

View reviewed changes

bjackman force-pushed the schedstats branch 5 times, most recently from f7c2a09 to 2e4943f Compare May 23, 2017 16:20

bjackman force-pushed the schedstats branch from 2e4943f to bb15ab8 Compare May 31, 2017 09:17

bjackman force-pushed the schedstats branch 2 times, most recently from 0821c1f to 3675dee Compare October 11, 2017 13:56

bjackman force-pushed the schedstats branch 2 times, most recently from 5db1f83 to 0f45fbc Compare December 1, 2017 16:03

bjackman changed the title ~~[RFC] Add schedstats instrument~~ Add schedstats instrument Dec 1, 2017

bjackman force-pushed the schedstats branch from 0f45fbc to 435ebf6 Compare December 1, 2017 16:13

derkling reviewed Dec 1, 2017

View reviewed changes

devlib/instrument/schedstats.py Outdated

return ret

Copy link

Contributor

derkling Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty lines?

bjackman reacted with thumbs up emoji

bjackman commented Dec 1, 2017

View reviewed changes

bjackman force-pushed the schedstats branch from 435ebf6 to 9e16a6b Compare December 1, 2017 17:39

bjackman mentioned this pull request Dec 4, 2017

schedstats: Add schedstats instrument ARM-software/workload-automation#555

Open

Add schedstats instrument #123

Are you sure you want to change the base?

Add schedstats instrument #123

Conversation

bjackman commented May 11, 2017 • edited Loading

setrofim May 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

setrofim May 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman May 18, 2017 • edited Loading

Choose a reason for hiding this comment

bjackman commented May 23, 2017

setrofim commented Sep 11, 2017

bjackman commented Sep 11, 2017

setrofim commented Sep 11, 2017

bjackman commented Oct 11, 2017

setrofim commented Oct 11, 2017

bjackman commented Oct 11, 2017

setrofim commented Oct 11, 2017

bjackman commented Oct 12, 2017

bjackman commented Dec 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman Dec 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman Dec 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjackman commented May 11, 2017 •

edited

Loading

setrofim May 11, 2017 •

edited

Loading

setrofim May 11, 2017 •

edited

Loading

bjackman May 18, 2017 •

edited

Loading

bjackman Dec 1, 2017 •

edited

Loading

bjackman Dec 1, 2017 •

edited

Loading