Multi Instance Plugins #1324

narategithub · 2024-01-15T16:52:05Z

This series of commits include:

multi-instance plugin support in ldmsd
apply ovis_ref to cfgobj
Making the following plugins multi-instances:
- store_sos (test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_store_sos_test)
- store_csv (test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_store_csv_test)
- store_avro_kafka (test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_store_avro_kafka_test)
- json_stream_sampler (test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_json_stream_sampler_test)
- test_sampler (test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_test_sampler_test)
- procnetdev2 (test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_json_stream_sampler_test)

This was a copy-paste typo. It should be LDMS_STREAM_EVENT_UNSUBSCRIBE_STATUS, not LDMS_STREAM_EVENT_SUBSCRIBE_STATUS.

This patch add multi-instance plugin infrastructure in ldmsd.

- Add plugin instance. - Update format for consistency.

NOTE: This passed `ldms-test/multi_store_sos_test`, which tests both multi-instance loading and traditional loading.

NOTE: This passed `ldms-test/multi_json_stream_sampler_test` which tests both multi-instance loading and traditional loading. It also tests terminating the plugin instance.

When ldmsd daemonized (fork + exec), `zap_init()` function is called again from the pthread atfork call chain. Because the logger has already been registered before daemonizing, `ovis_log_register()` failed to register the same logger after ldmsd was daemonized. This patch adds a checking condition to not register the logger if we alread have the log handle.

This passed `ldms-test/multi_store_csv_test`.

This passed `ldms-test/multi_store_avro_kafka_test`.

- Make procnetdev2 multiple instances - Remove the 32 device list limitation - Add `excludes` option to exclude the listed interfaces - For example, excludes=lo,eth0 NOTE: passed 'ldms-test/multi_procnetdev2_test'

morrone · 2024-01-17T23:52:56Z

I would really, really like to see some changes to the approach taken here.

First of all, I would love to see plugins only have to have implement known API functions, rather than bundling function pointers into an ldmsd_plugin struct. The ldmsd_plugin struct, and how it tracks the functions pointers from a plugin, are ldmsd implementation details and should not be exposed to the plugin author.

Next, I would really, really, really want to see us do away with this practice of forcing plugin authors to play tricks of wrapping the ldmsd_plugin struct with hidden state data after the end. The plugin instance's state data should be passed back explicitly when the intance is initialized (for instance, it could be the return value from config()). The state data for an instance should be explicitly passed in to every subsequent call. This is pretty standard in most C code that I see. Moving things from hidden-and-implicit to being explicit in the API will make things much easier to understand for new plugin authors.

We need a new plugin API call to allow the plugin to express whether or not it is multi-instance capable, and ldmsd should guarantee that if the answer is "no", that it will not allow attempts to configure more than one instance.

I think it may be in scope, but maybe this is more debateable: we need the API calls fully documented in the header file, including their semantics. As things stand, the semantics are muddy at best. For insance, it should absolutely be enforced by ldmsd that it will never call deconstructor functions on an instance that has not had its constructor called. The sample() functon should never becalled on something that has not had its constructor called, or after the destructor has run. We shouldn't have to have every single plugin author reinvent the wheel when we can handle these semantics once in ldmsd, and document the semantics.

morrone

I would like to see API changes.

morrone · 2024-01-18T18:37:17Z

I don't know how much further review I should continue to do. If my high level requests are addressed, I think a number of things will naturally fall out of that and get better.

For instance, the lack of symmetry and clarity in just allocating and freeing the memory for plugin instances needs to be addressed. The plugin author (at least for a store that I am looking at right now) needs to allocate the instance state in get_plugin_instance(). But there is no matching put_plugin_instance(). Instead the author must free the memory in ptr->store.base.term().

We need to call "ldmsd_store_alloc()" in get_plugin_instance(). We would then expect to call ldmsd_store_free() in term(), right? But instead we need to call ldmsd_store_cleanup() and then free().

morrone · 2024-01-18T18:41:27Z

ldms/src/ldmsd/ldmsd.h

-extern int ldmsd_config_init(char *name);
-struct ldmsd_plugin_cfg *ldmsd_get_plugin(char *name);
+struct ldmsd_plugin *ldmsd_get_plugin(const char *name);
+void ldmsd_put_plugin(struct ldmsd_plugin *pi);


I am not a fan of having both ldmsd_put_plugin and ldmsd_plugin_put. Too close in naming, and there is no documentation here about what they do either.

morrone · 2024-01-18T18:48:26Z

What are the symantics of sample() in the new paradigm? Does sample() need to be reentrant, or will ldmsd ensure that it is not required?

narategithub · 2024-01-19T15:10:38Z

@morrone Thank you for a detailed review :)

I think the structure extension is quite standard practice. And I'm not sure if we should force the names of the functions. I think @tom95858 can elaborate on this matter better than me.

After plugin creation, all calls from ldmsd to the plugin always supply the plugin structure that was created by the plugin. It is the same as self in Python. The plugin can track its state in the extended part of the structure.

I agree with you that the plugin creation is quite cumbersome. I'll change the ldmsd_sampler_alloc() and ldmsd_store_alloc() to also receive the necessary options and all common initialization can be done in there.

The existing plugins that does not have get_plugin_instance() is guaranteed to have only 1 instance. If the plugin implements get_plugin_instance(), then it must support multi instances.

sample() is thread safe by plugin instance. What I mean is that sample() function of a plugin (e.g. procnetdev2) can be called simultaneously but for different instances of that function.

There is also a lack of developer documentation that shall be addressed. The asymmetric that you mention should be either addressed or explained in the next update.

Again, thank you for the review :)

morrone · 2024-01-19T18:27:39Z

I think the structure extension is quite standard practice.

I can't speak to whether it is common in some circles, but it certainly isn't standard. I think you would be hard pressed to find that called out as a best practice in text books. In fact in most uses, it is not the best approach. There is no good reason to make the plugin's state data a hidden data at the end of ldmsd's internal pointer structure, when we can easily make them separate and explicit.

I think there are various reasons why this is wrong from a code readability and maintainability standpoint, but those would take a long discussion of softrware development pracitices to make a good case for. Here is a clear technical reason why this is a bad approach:

This approach virtually guarantees that plugin ABI compatibility is impossible. Any time we add a function to the API, or tweak the structure in any way, all plugins must be recompiled and tracked against a specific build of LDMS.

So one big reason that I believe that my proposed approach to plugin API design is that it entirely allows for making backwards-compatible API improvements in a clean and straight-forward way. New optional API functions do not break the ABI.

morrone · 2024-01-19T19:41:01Z

Temporarily, to move the conversation forward, I'll set aside the idea of exporting all the of the API functions in a plugin module (rather than exporting one function that returns the struct of API function pointers). I think we can address my other major concern pretty easily.

To try to expand on why I think it is bad practice, in this case, to hide an instance's data at the end of the API function pointer structure, I think basically we are failing to have good separation of concerns. We are creating a mishmash of different concepts that don't really go together. In this case, I think there should be a clear division between:

Loading and discovering a plugin's exported functions (and by "plugin", here I mean the shared-library module file)
Launching and interacting with instances (of a sampler, store, etc) using the plugin's exported functions

In the proposed change, these are combined. But we should only load the module/plugin and learn its' exported functions once. On the other hand, instances and their state need to be capable of multiple interactions.

This PR proposes that we replace get_plugin() with get_plugin_instance(). Instead of that, how about we keep get_plugin(), and make its job to just return the struct of API function pointers. And we do not hide any plugin state hanging off the end of that structure.

Next we introduce a new pair of functions to the API struct:

  /* Construct a new instance
   * RETURN: Opaque pointer to the new instance's internal state, or NULL is unable to do so
   */
  void *instance_init([whatever parameters]);

  /* Destroy an instance 
    * IN self: Opaque pointer to the instance's internal state
    */
  void instance_fini(void *self);

By doing this, we eliminate any need to combine the API pointer struct (which is something that is only used internally in ldmsd), with the instance state (which is only something used internally in the plugin).

Since we are changing the API, I'd like to take the opportunity to improve naming a bit. Instead of calling it "get_plugin", maybe the name could be:

ldmsdplug_import

And perhaps we could then at least export a second function named something like:

ldmsdplug_export

I think these suggestions untangle the the two concepts (a module and an instance), improves encapsulation, and helps to introduce a stronger degree of degree of symmetry to the API.

tom95858 · 2024-01-22T23:05:06Z

Hi @narategithub, we have mixed the ldmsd_noun_verb_ function naming and ldmsd_verb_noun_ in some places. We should make these consistent in the API. I think most of the API is ldmsd_noun_verb().

tom95858 · 2024-01-22T23:29:47Z

@morrone, @narategithub let's get a bit more specific:

Requirements

Support a single plugin with multiple configurations
Support existing plugins that do not support multiple configurations
Make it easier, not harder to write plugins
Better documentation

Currently Non Requirements (but may be desirable)

Support running OVIS-4.3.x plugins in a OVIS-4.4.2 ldmsd
- This creates an impossibly complicated test matrix that I don't believe that we want to undertake
Completely hide a plugin's context data inside a structure understood only by the plugin
- It is currently not the case that a plugin's data is entirely private to the plugin, the most obvious example is the sample interval and offset. These are known to the updater in order to schedule calls to sample()
- We could however, have plugin functions that export the data that needs to be shared, e.g. interval/offset.

Other Thoughts

Having a base structure that is overloaded to contain additional data for those plugins/libraries/clients that understand the extensions to the structure is quite common in my experience; especially in the Linux kernel. I don't have any numbers on how often and/or if this paradigm is frowned upon in academic circles. That said we could trivially avoid this with a context pointer in the self pointer that is handed around.

Let's maybe plan a white-board meeting on Zoom to toss around ideas?

morrone · 2024-01-23T04:31:54Z

I don't think anyone is suggesting the "Currently Non Requirements", so I think we're all good there.

Having a base structure that is overloaded to contain additional data for those plugins/libraries/clients that understand the extensions to the structure is quite common in my experience; especially in the Linux kernel.

First, the kernel is not always the best example for user-space programming decisions. The kernel needs to deal with lower level details that often explain what they are doing. And often the kernel has very good reasons for the niche techniques that it employs that would not also apply here. And of course historically some of the kernel's C usage has been gcc-specific rather than generally C-compliant. And I think if we look closer at those use cases, we'll find that they don't apply here.

I don't think it is only academics that would frown. :)

This isn't really "subclassing". We're in C, not C++, and there are probably quite a few more things that would need to be done to make this qualify as subclassing. And more importantly, we are talking about designing an API, and we are not setting up proper abstraction across that API.

But let's consider for a moment that this is subclassing. If that were the case, then its an example of bad subclassing. It is a mistake in code abstraction. The plugin never calls a single method in the class. It doesn't even employ any of the data in the base classes. The only data it uses is its own internal data, which is hidden at the end of the class. Combining unrelated classes in a memory management trick would be less than desirable coding practice. So there is a clear division in use of data and methods, but for some reason, this code smashes them together.

This is going to be strange and unexpected for any developer new to this code base. I can tell you it already was when I started on LDMS, and this moves in the directions of even more unusual. It violates the Principle of Least Surprise. And like you said, one or our goals is "Make it easier, not harder to write plugins".

And like I said, and I do think this is important for any API design, we are designing an API in which backwards compatible evolution of the API (retaining ABI compatibility) is just about impossible.

I agree that putting a context pointer in "self" would be better, but far better still would be for "self" to be the context pointer.

I'm up for a Zoom meeting!

baallan · 2024-01-30T21:57:15Z

Setting aside my thoughts about the instance model mechanics, where is the discussion of the content (methods and their lifecycle semantics for multinstance plugins) of the APIs for samplers, stores (and if we're ambitious, transforms?) I didn't see it in the headers and at least one thing I did see in a header (get_set) rather disturbs me. Maybe the cart is before the horse?

for example, it would strongly simplify sampler and stream plugins (eliminate the coder needing to enforce data safety with explicit use of pthreads) if the "what will be called when and what will not" was fully documented (as has come up in among other discussions, the dcgm plugin).

for example, taking a sampler which does not work by subscribing to a stream, we should have the framework guarantee an order of operations something like:

load
config[,config*]
start
sample
stop
[config*; start; stop]
term

meaning that stuff in [] is possible/optional and that the ordering above is strict (config and term are not called while between start and stop; config is called at least once; stop is called before term; nothing else is called while a sample is in progress). The central problem we currently have is that config/stop/term can all happen due to the command line while the sample loop is on a timer.

morrone · 2024-01-31T02:07:21Z

@baallan I do generally agree with all of that. I was alluding to the same thing when I questioned the semantics earlier.

It looks like get_set() is something that is being brought forward into the updated API, when it really should just be eliminated. I'm not even sure of its existing semantics, let alone what they might be in the new API. Plenty of the new samplers just return NULL to get_set(), so it can't be very important?

morrone · 2024-01-31T02:20:04Z

ldms/src/sampler/procnetdev2/procnetdev2.c


-struct ldmsd_plugin *get_plugin()
+static void __once()


The existence of this __once() function tells us that there is a problem with the API design. No plugin author should need to do this. We are failing to give the author a place to do global initialization (and also missing the matching place to tear down global state at the end) that will be used, regardless of how many instances there may be.

We could fix this by going with my suggestion (or something similar) of having ldmsd_import()/ldmsd_export() calls (that are independant of the instance_init()/instance_fini() methods).

baallan · 2024-01-31T16:08:28Z

Re get_set:
If it exists, something like it should be a utility function provided by the framework (which is managing underlying lists anyway) and not put on the plugin author to implement.

I can picture the utility of 3 functions (pseudocode, not derived from any current code) managing lists of objects:

// let people find the plugins in which they may be interested
ldmsd_plugin_class_iterator ldmsd_plugin_class_iterator_get()

// let people find the instances of a specific factory plugin in which they may be interested
ldmsd_plugin_class_instance_iterator ldmsd_plugin_class_instance_iterator_get(ldmsd_plugin_class c)

// let people find the entire list of sets registered by a single plugin instance (whether factory or singleton)
ldmsd_plugin_instance_sets_iterator_get(ldmsd_plugin_instance i)

yes of course this is c and not c++, so maybe we should introduce the word factory (singleton plugins and factory plugins) to clarify the behavior patterns for new developers. For some samplers and stores, singleton really is the obvious implementation and it should be easy, while for others the factory pattern may be the most appropriate.
And of course in the // comments by 'people' I really mean generalized plugins which do things like transform data as it flows through the pipeline.

I don't see the value in trying to make a single set of function names which encapsulates both singleton and factory plugins. That just introduces ambiguities and conditional behavior points in the documentation of api functions, which makes the documentation hard to write and maintain.

narategithub added 13 commits December 11, 2023 17:06

Fix typo in ldms_stream_cb (LDMS_STREAM_EVENT_UNSUBSCRIBE_STATUS)

14e77fb

This was a copy-paste typo. It should be LDMS_STREAM_EVENT_UNSUBSCRIBE_STATUS, not LDMS_STREAM_EVENT_SUBSCRIBE_STATUS.

Add LDMS_STREAM_EVENT_CLOSE

da8a54c

Add multi-instance plugin support in ldmsd

cfefbfc

This patch add multi-instance plugin infrastructure in ldmsd.

Update ldmsd_controller help string & manpage

6ad5173

- Add plugin instance. - Update format for consistency.

Apply ovis_ref to ldmsd cfgobj

12dad77

Add multi-instance support in store_sos

c81a9f4

NOTE: This passed `ldms-test/multi_store_sos_test`, which tests both multi-instance loading and traditional loading.

Add multi-instance support in json_stream_sampler

8e6c7d7

NOTE: This passed `ldms-test/multi_json_stream_sampler_test` which tests both multi-instance loading and traditional loading. It also tests terminating the plugin instance.

Add cfgobj debug ref logging

a891b87

store_csv multi-instance

6f72860

This passed `ldms-test/multi_store_csv_test`.

store_avro_kafka to support multi-instances

5d818c3

This passed `ldms-test/multi_store_avro_kafka_test`.

Muitl-instance test_sampler

914c283

multi-instance procnetdev2

0b323ab

- Make procnetdev2 multiple instances - Remove the 32 device list limitation - Add `excludes` option to exclude the listed interfaces - For example, excludes=lo,eth0 NOTE: passed 'ldms-test/multi_procnetdev2_test'

narategithub requested a review from tom95858 January 15, 2024 16:52

morrone self-requested a review January 18, 2024 00:04

morrone suggested changes Jan 18, 2024

View reviewed changes

morrone reviewed Jan 18, 2024

View reviewed changes

morrone reviewed Jan 31, 2024

View reviewed changes

tom95858 closed this Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi Instance Plugins #1324

Multi Instance Plugins #1324

narategithub commented Jan 15, 2024

morrone commented Jan 17, 2024

morrone left a comment

morrone commented Jan 18, 2024

morrone Jan 18, 2024 •

edited

Loading

morrone commented Jan 18, 2024

narategithub commented Jan 19, 2024

morrone commented Jan 19, 2024

morrone commented Jan 19, 2024 •

edited

Loading

tom95858 commented Jan 22, 2024

tom95858 commented Jan 22, 2024

morrone commented Jan 23, 2024

baallan commented Jan 30, 2024

morrone commented Jan 31, 2024

morrone Jan 31, 2024

baallan commented Jan 31, 2024

Multi Instance Plugins #1324

Multi Instance Plugins #1324

Conversation

narategithub commented Jan 15, 2024

morrone commented Jan 17, 2024

morrone left a comment

Choose a reason for hiding this comment

morrone commented Jan 18, 2024

morrone Jan 18, 2024 • edited Loading

Choose a reason for hiding this comment

morrone commented Jan 18, 2024

narategithub commented Jan 19, 2024

morrone commented Jan 19, 2024

morrone commented Jan 19, 2024 • edited Loading

tom95858 commented Jan 22, 2024

tom95858 commented Jan 22, 2024

Requirements

Currently Non Requirements (but may be desirable)

Other Thoughts

morrone commented Jan 23, 2024

baallan commented Jan 30, 2024

morrone commented Jan 31, 2024

morrone Jan 31, 2024

Choose a reason for hiding this comment

baallan commented Jan 31, 2024

morrone Jan 18, 2024 •

edited

Loading

morrone commented Jan 19, 2024 •

edited

Loading