-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi Instance Plugins #1324
Multi Instance Plugins #1324
Conversation
This was a copy-paste typo. It should be LDMS_STREAM_EVENT_UNSUBSCRIBE_STATUS, not LDMS_STREAM_EVENT_SUBSCRIBE_STATUS.
This patch add multi-instance plugin infrastructure in ldmsd.
- Add plugin instance. - Update format for consistency.
NOTE: This passed `ldms-test/multi_store_sos_test`, which tests both multi-instance loading and traditional loading.
NOTE: This passed `ldms-test/multi_json_stream_sampler_test` which tests both multi-instance loading and traditional loading. It also tests terminating the plugin instance.
When ldmsd daemonized (fork + exec), `zap_init()` function is called again from the pthread atfork call chain. Because the logger has already been registered before daemonizing, `ovis_log_register()` failed to register the same logger after ldmsd was daemonized. This patch adds a checking condition to not register the logger if we alread have the log handle.
This passed `ldms-test/multi_store_csv_test`.
This passed `ldms-test/multi_store_avro_kafka_test`.
- Make procnetdev2 multiple instances - Remove the 32 device list limitation - Add `excludes` option to exclude the listed interfaces - For example, excludes=lo,eth0 NOTE: passed 'ldms-test/multi_procnetdev2_test'
I would really, really like to see some changes to the approach taken here. First of all, I would love to see plugins only have to have implement known API functions, rather than bundling function pointers into an ldmsd_plugin struct. The ldmsd_plugin struct, and how it tracks the functions pointers from a plugin, are ldmsd implementation details and should not be exposed to the plugin author. Next, I would really, really, really want to see us do away with this practice of forcing plugin authors to play tricks of wrapping the ldmsd_plugin struct with hidden state data after the end. The plugin instance's state data should be passed back explicitly when the intance is initialized (for instance, it could be the return value from config()). The state data for an instance should be explicitly passed in to every subsequent call. This is pretty standard in most C code that I see. Moving things from hidden-and-implicit to being explicit in the API will make things much easier to understand for new plugin authors. We need a new plugin API call to allow the plugin to express whether or not it is multi-instance capable, and ldmsd should guarantee that if the answer is "no", that it will not allow attempts to configure more than one instance. I think it may be in scope, but maybe this is more debateable: we need the API calls fully documented in the header file, including their semantics. As things stand, the semantics are muddy at best. For insance, it should absolutely be enforced by ldmsd that it will never call deconstructor functions on an instance that has not had its constructor called. The sample() functon should never becalled on something that has not had its constructor called, or after the destructor has run. We shouldn't have to have every single plugin author reinvent the wheel when we can handle these semantics once in ldmsd, and document the semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see API changes.
I don't know how much further review I should continue to do. If my high level requests are addressed, I think a number of things will naturally fall out of that and get better. For instance, the lack of symmetry and clarity in just allocating and freeing the memory for plugin instances needs to be addressed. The plugin author (at least for a store that I am looking at right now) needs to allocate the instance state in get_plugin_instance(). But there is no matching put_plugin_instance(). Instead the author must free the memory in ptr->store.base.term(). We need to call "ldmsd_store_alloc()" in get_plugin_instance(). We would then expect to call ldmsd_store_free() in term(), right? But instead we need to call ldmsd_store_cleanup() and then free(). |
extern int ldmsd_config_init(char *name); | ||
struct ldmsd_plugin_cfg *ldmsd_get_plugin(char *name); | ||
struct ldmsd_plugin *ldmsd_get_plugin(const char *name); | ||
void ldmsd_put_plugin(struct ldmsd_plugin *pi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of having both ldmsd_put_plugin and ldmsd_plugin_put. Too close in naming, and there is no documentation here about what they do either.
What are the symantics of sample() in the new paradigm? Does sample() need to be reentrant, or will ldmsd ensure that it is not required? |
@morrone Thank you for a detailed review :) I think the structure extension is quite standard practice. And I'm not sure if we should force the names of the functions. I think @tom95858 can elaborate on this matter better than me. After plugin creation, all calls from I agree with you that the plugin creation is quite cumbersome. I'll change the The existing plugins that does not have
There is also a lack of developer documentation that shall be addressed. The asymmetric that you mention should be either addressed or explained in the next update. Again, thank you for the review :) |
I can't speak to whether it is common in some circles, but it certainly isn't standard. I think you would be hard pressed to find that called out as a best practice in text books. In fact in most uses, it is not the best approach. There is no good reason to make the plugin's state data a hidden data at the end of ldmsd's internal pointer structure, when we can easily make them separate and explicit. I think there are various reasons why this is wrong from a code readability and maintainability standpoint, but those would take a long discussion of softrware development pracitices to make a good case for. Here is a clear technical reason why this is a bad approach: This approach virtually guarantees that plugin ABI compatibility is impossible. Any time we add a function to the API, or tweak the structure in any way, all plugins must be recompiled and tracked against a specific build of LDMS. So one big reason that I believe that my proposed approach to plugin API design is that it entirely allows for making backwards-compatible API improvements in a clean and straight-forward way. New optional API functions do not break the ABI. |
Temporarily, to move the conversation forward, I'll set aside the idea of exporting all the of the API functions in a plugin module (rather than exporting one function that returns the struct of API function pointers). I think we can address my other major concern pretty easily. To try to expand on why I think it is bad practice, in this case, to hide an instance's data at the end of the API function pointer structure, I think basically we are failing to have good separation of concerns. We are creating a mishmash of different concepts that don't really go together. In this case, I think there should be a clear division between:
In the proposed change, these are combined. But we should only load the module/plugin and learn its' exported functions once. On the other hand, instances and their state need to be capable of multiple interactions. This PR proposes that we replace get_plugin() with get_plugin_instance(). Instead of that, how about we keep get_plugin(), and make its job to just return the struct of API function pointers. And we do not hide any plugin state hanging off the end of that structure. Next we introduce a new pair of functions to the API struct:
By doing this, we eliminate any need to combine the API pointer struct (which is something that is only used internally in ldmsd), with the instance state (which is only something used internally in the plugin). Since we are changing the API, I'd like to take the opportunity to improve naming a bit. Instead of calling it "get_plugin", maybe the name could be:
And perhaps we could then at least export a second function named something like:
I think these suggestions untangle the the two concepts (a module and an instance), improves encapsulation, and helps to introduce a stronger degree of degree of symmetry to the API. |
Hi @narategithub, we have mixed the ldmsd_noun_verb_ function naming and ldmsd_verb_noun_ in some places. We should make these consistent in the API. I think most of the API is ldmsd_noun_verb(). |
@morrone, @narategithub let's get a bit more specific: Requirements
Currently Non Requirements (but may be desirable)
Other ThoughtsHaving a base structure that is overloaded to contain additional data for those plugins/libraries/clients that understand the extensions to the structure is quite common in my experience; especially in the Linux kernel. I don't have any numbers on how often and/or if this paradigm is frowned upon in academic circles. That said we could trivially avoid this with a context pointer in the self pointer that is handed around. Let's maybe plan a white-board meeting on Zoom to toss around ideas? |
I don't think anyone is suggesting the "Currently Non Requirements", so I think we're all good there.
First, the kernel is not always the best example for user-space programming decisions. The kernel needs to deal with lower level details that often explain what they are doing. And often the kernel has very good reasons for the niche techniques that it employs that would not also apply here. And of course historically some of the kernel's C usage has been gcc-specific rather than generally C-compliant. And I think if we look closer at those use cases, we'll find that they don't apply here. I don't think it is only academics that would frown. :) This isn't really "subclassing". We're in C, not C++, and there are probably quite a few more things that would need to be done to make this qualify as subclassing. And more importantly, we are talking about designing an API, and we are not setting up proper abstraction across that API. But let's consider for a moment that this is subclassing. If that were the case, then its an example of bad subclassing. It is a mistake in code abstraction. The plugin never calls a single method in the class. It doesn't even employ any of the data in the base classes. The only data it uses is its own internal data, which is hidden at the end of the class. Combining unrelated classes in a memory management trick would be less than desirable coding practice. So there is a clear division in use of data and methods, but for some reason, this code smashes them together. This is going to be strange and unexpected for any developer new to this code base. I can tell you it already was when I started on LDMS, and this moves in the directions of even more unusual. It violates the Principle of Least Surprise. And like you said, one or our goals is "Make it easier, not harder to write plugins". And like I said, and I do think this is important for any API design, we are designing an API in which backwards compatible evolution of the API (retaining ABI compatibility) is just about impossible. I agree that putting a context pointer in "self" would be better, but far better still would be for "self" to be the context pointer. I'm up for a Zoom meeting! |
Setting aside my thoughts about the instance model mechanics, where is the discussion of the content (methods and their lifecycle semantics for multinstance plugins) of the APIs for samplers, stores (and if we're ambitious, transforms?) I didn't see it in the headers and at least one thing I did see in a header (get_set) rather disturbs me. Maybe the cart is before the horse? for example, it would strongly simplify sampler and stream plugins (eliminate the coder needing to enforce data safety with explicit use of pthreads) if the "what will be called when and what will not" was fully documented (as has come up in among other discussions, the dcgm plugin). for example, taking a sampler which does not work by subscribing to a stream, we should have the framework guarantee an order of operations something like:
meaning that stuff in [] is possible/optional and that the ordering above is strict (config and term are not called while between start and stop; config is called at least once; stop is called before term; nothing else is called while a sample is in progress). The central problem we currently have is that config/stop/term can all happen due to the command line while the sample loop is on a timer. |
@baallan I do generally agree with all of that. I was alluding to the same thing when I questioned the semantics earlier. It looks like get_set() is something that is being brought forward into the updated API, when it really should just be eliminated. I'm not even sure of its existing semantics, let alone what they might be in the new API. Plenty of the new samplers just return NULL to get_set(), so it can't be very important? |
|
||
struct ldmsd_plugin *get_plugin() | ||
static void __once() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existence of this __once() function tells us that there is a problem with the API design. No plugin author should need to do this. We are failing to give the author a place to do global initialization (and also missing the matching place to tear down global state at the end) that will be used, regardless of how many instances there may be.
We could fix this by going with my suggestion (or something similar) of having ldmsd_import()/ldmsd_export() calls (that are independant of the instance_init()/instance_fini() methods).
Re get_set: I can picture the utility of 3 functions (pseudocode, not derived from any current code) managing lists of objects:
yes of course this is c and not c++, so maybe we should introduce the word factory (singleton plugins and factory plugins) to clarify the behavior patterns for new developers. For some samplers and stores, singleton really is the obvious implementation and it should be easy, while for others the factory pattern may be the most appropriate. I don't see the value in trying to make a single set of function names which encapsulates both singleton and factory plugins. That just introduces ambiguities and conditional behavior points in the documentation of api functions, which makes the documentation hard to write and maintain. |
This series of commits include:
ovis_ref
to cfgobjstore_sos
(test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_store_sos_test)store_csv
(test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_store_csv_test)store_avro_kafka
(test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_store_avro_kafka_test)json_stream_sampler
(test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_json_stream_sampler_test)test_sampler
(test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_test_sampler_test)procnetdev2
(test: https://github.com/ovis-hpc/ldms-test/blob/stage/multi_json_stream_sampler_test)