Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka topic patterns should be per-schema #1205

Open
morrone opened this issue May 11, 2023 · 10 comments
Open

Kafka topic patterns should be per-schema #1205

morrone opened this issue May 11, 2023 · 10 comments
Milestone

Comments

@morrone
Copy link
Collaborator

morrone commented May 11, 2023

Right now we can only set the kafka topic name pattern one when store_avro_kafka is configure. It would be much better to allow us to configure different topic patterns per-schema, or per-decomposition, or maybe per strgp.

A particular use case we have is that data from some schemas we want to store in topics names that are independent across all clusters, and other schemas we want have all of the clusters feed into topics of the same name.

@tom95858
Copy link
Collaborator

This would be resolved by the multi-configuration plugin support. As a reminder, what is proposed is as follows:

load name=plugin
config name=plugin x=a y=b z=c
load name=plugin as=plugin-1
config name=plugin-1 x=d y=e z=f

We need this for streams subscription too. We are developing a generic "stream sampler" that consumes JSON and produces LDMS metric sets that can then be decomposed for storage. Consider that this sampler may want to consume from different streams and may need different configurations for each stream like you suggest.

@baallan
Copy link
Collaborator

baallan commented Jun 20, 2023

If creating a stream to set transform plugin, please see also:
#1217
for needed functionalities to deal with json ambiguities.

Also, why are we creating a new keyword/usage "as=foo" when for plugins and objects in general we already had/have "name=foo" for the object pointer resolution and "plugin=bar" for the object type/instantiator (which have been shorthanded in v2/v3/v4 load implementation)?
Multiple instances of plugins should have different names just like all our other object types.
Repeated 'load' commands of the same library just return quietly for already loaded plugins.

Finally, it's not at all clear to me why I would want full-blown sets (and the implication that down-stream aggregators will see them and move them, with the timing and sizing issues it entails for user-derived streams) made out of streams. Are you proposing making pseudo-set objects for the purpose of the store plugin->store command and storage policy logic? Or are you proposing full-on set instance with all the inter-daemon work this entails?

@tom95858
Copy link
Collaborator

@baallan are you quibbling with the syntax? If so the reason is that using as="" allows us to be backward compatible with the existing load command syntax.

I suppose we could create a whole new command:

new-command name=<config-name> plugin=<plugin-name>

Personally, I think that would be more confusing than adding as= to the existing command. We could certainly change as to another keyword, maybe "config=" or something.

I think you are confused about load; it does not create a configuration object. That is what the proposed change would do.

I couldn't parse the remainder of your comment(s).

@tom95858
Copy link
Collaborator

@morrone I'm wrong. The multi-config thing doesn't fix this at all. I think you need additional syntax to bind schemas to topic formats. Something like:

topic="":"","":""

You would need this kind of binding with or without multiple configurations per plugin.

@tom95858
Copy link
Collaborator

@morrone Markdown ate my syntax

topic="schema-regex":"topic-fmt-str","schema-regex":"topic-fmt-str"

@baallan
Copy link
Collaborator

baallan commented Jun 20, 2023

@tom95858 re "I couldn't parse the remainder of your comments."

The remainder of my comments (syntax quibbling aside-- you win there) could be:

If creating a stream to set transform plugin, please see also:
https://github.com/ovis-hpc/ovis/discussions/1217
for needed functionalities to deal with json ambiguities.
...

Finally, it's not at all clear to me why I would want full-blown sets (and the implication that down-stream aggregators will see them and move them, with the timing and sizing issues it entails for user-derived streams) made out of streams. Are you proposing making pseudo-set objects for the purpose of the store plugin->store command and storage policy logic? Or are you proposing full-on set instance with all the inter-daemon work this entails?

Are you saying the comments in #1217 are unparseable, or just the 'Finally ' paragraph?

If just the 'Finally' paragraph didn't parse, then more directly about a "stream to set" plugin:
(a) the 'set' argument to the store plugin commit function appears to be unnecessary in any of the current store plugins (store_csv excepted which uses it to get at udata values) which are using the strgp and commit() interface instead of store(). It appears that incoming stream messages could be converted to strgp and committed without any intermediate set instance being required. So why convert streams to sets? Streams have no udata.
(b) Currently we have a rule/implementation that sets are not visible to store plugins on their originating node; this would mean that I cannot convert a stream to a set on an aggregator and also have it stored there. This implies millions of transient-existence sets being handled across aggregation levels. So why convert streams to sets? Just create the arguments for commit( ldmsd_strgp_t strgp, NULL, ldmsd_row_list_t row_list, int row_count) and commit using plugins which support commit() at the storage destination.

@tom95858
Copy link
Collaborator

@baallan Are you talking about the generic JSON storage plugin? What does that have to do with the avro-kafka store?

@baallan
Copy link
Collaborator

baallan commented Jun 20, 2023

@tom95858 yes i am referring to

We need this for streams subscription too. We are developing a generic "stream sampler" that consumes JSON and produces LDMS metric sets that can then be decomposed for storage. Consider that this sampler may want to consume from different streams and may need different configurations for each stream like you suggest.

@morrone
Copy link
Collaborator Author

morrone commented Jun 22, 2023

@baallan, please have that discussion in another ticket.

@morrone
Copy link
Collaborator Author

morrone commented Jun 22, 2023

Multi-config could almost solve it, because we can use different strgrps with different regex's to map to the different store configs. But strgrps regex only match LDMS schemas, not decomposed schemas. We might want different topic patterns for each decomposed schema.

In my ideal world, there would be a way in the decomposition configuration file to pass store-specific configuration options along to the store. For instance, under a row's mapping we might have an additional mapping named "store_options".

    { 
        "schema": "foo",
        "cols": [],
        "store_options": {
            "topic_pattern": ""
        }
    }

If we did it that way, we would have to come up with some rules for how we add store-specific options that don't conflict. Stores would just have to ignore options that they don't recognize, which means we basically can't catch typos if we can't throw errors on unrecognized options. So we could also get more verbose:

    {
        "schema": "foo",
        "cols": [],
        "store_options": {
            "store_avro_kafka": {
                "topic_pattern": ""
            }
        }
    }

We could also move "indices" under the new "store_options".

@tom95858 tom95858 added this to the v4.5.1 milestone Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants