-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the Maximum Ingestion rate that is acceptable by IPFIXCol2? #95
Comments
Hello, there's no easy answer to this question. A lot depends on the actual structure of the data sent by your probes and what you plan to do with the flow records (e.g. save them in a binary file or forward them in JSON format to other systems) E.g. typical use-cases: However, the numbers also depend on the performance of your hardware and a number of other factors. Lukas |
Just note that my values are in units per second (not per minute). In other words, I would say that our collector is considerably faster. Lukas |
Thanks Lukas for your response. Most of the Flow Records sent by our sources are based on Template ID 258 and we want to collect, enrich and push it onto a Datastore in JSON format, so to answer your question, it will be a JSON forward. Even if I consider 400K records per second, still it will be 2.4 Million per Min, which will not reach the level we have, which is 8 Million Per min. |
I think you've miscalculated a bit. 400k x 60s = 24 milion flows/min. As for template ID 258, unfortunately the template specification is dynamic and always dependent on your specific probes. In other words, it is unfortunately not possible to deduce anything from the number 258. Lukas |
Ah, I'm worse in Maths :) yep 24 millions. Thanks for correcting. I don't think Template spec is dynamic and this site will give you a view on what Template 258 in general should behave like. https://docs.fortinet.com/document/fortigate/7.2.8/administration-guide/448589/netflow-templates#258 |
The structure of the referenced Template is relatively simple and the required performance targets should be achievable. By the way, the definition of templates always depends on the specific implementation of the probe. For example, I usually work with probes that can send completely different flow fields under the same Template ID after a restart. In fact, the NetFlow/IFPIX protocol does not say that the items must always be the same under one Template ID. That's why I mentioned that templates are dynamic. However, if I understand correctly, in your case you always have as flow data source multiple devices from the same manufacturer, which guarantees that under the same Template ID there are always the same fields. Ok, why not. Lukas |
Thanks Lukas for the explanation. We don't use probes to export flows. The routers themselves exports the flows with a standard configuration across. As you said, Netflow v9/IPFIX doesn't say it should be on the same template and I understand it is dynamic, but it has a standard set of attributes as per their RFC which I believe is supported by this Collector. https://www.ietf.org/rfc/rfc3954.txt We are predominant Cisco based, but there are other vendors too. The cases you mentioned are also there where I have seen the flows doesn't correspond to the template which are not getting parsed too. Sanky |
All fields mentioned in the Template should be supported. The exception is the FLOW_FLAGS field (65), which is not clearly defined in NetFlow/IPFIX standard. However, the JSON plugin will also convert, it if the option to skip unknown fields is not active. The name associated with will probably be something like Let me know if you need help with anything. Lukas |
Many Thanks Lukas. Let me try it out. Sanky |
I guess you have run into a typical NetFlow/IPFIX protocol problem. Namely, if you use the UDP transport protocol for the transfer, the collector may not be able to parse the incoming records for some time after it starts. The probes (in your case, routers) send the template definitions needed to interpret the flow records periodically in the data (look for something like "template refresh/timeout" in the router configuration). Since the collector usually started later than probe/router, it missed receiving the definitions and has to wait until they are resent. Depending on the configuration of the probe/router, this can take a few seconds or even a few minutes, during which the corresponding uninterpretable records are ignored. Lukas |
Thanks Lukas. Read a similar pattern somewhere. Thanks for the inputs. Will try it out and revert if I face any issues. |
1 another point missed to ask you Lukas. The JSON output plugin, is there a way for us to push the data onto standard document stores like Elastic/Opensearch Cluster similar to using additional Properties for Kafka push. |
Hi, I don't have first hand experience with your targeted storage, but I think it should be possible to use However, I believe it should be possible to eventually use Kafka as a message broker to transfer (and possibly enrich) records. Lukas |
I'm not very familiar with these data stores either, but based on my quick search it should be possible to use the JSON output in "send" mode and Logstash with TCP input plugin and OpenSearch/ElasticSearch output plugin. |
Thanks Lukas, Sedmicha. The options you guys suggested are the backup
The reason I raised that question is, do we have additional properties support for "Send"mode or in future to have output plugins to push data onto Document or Timeseries datastores as flow data corresponds to a specific time snapshot. |
Hi, WARNING: UDP collector (parser): [10.37.208.64:50452, ODID: 256] Unexpected Sequence number (expected: 278756670, got: 278756667) |
Hi, yes, this is also typical a "feature" of UDP transport. The only solution is to switch to TCP, which I understand is probably not possible due to lack of support on routers. In the example, you provided, it looks like simple reorder of UDP packets ("expected" sequence number is greater than sequence number of the "received/got" packet). However, this message (if "expected" < "got") may also indicate that other packets may be lost somewhere during transmission. Lukas |
Thanks Lukas. Thought the same but thought of reverifying it once with the experts. |
At the moment, you'll probably have to use one of the options you listed. It seems like to be able to push the data directly to the OpenSearch store the JSON output plugin would have to support sending data to a specific HTTP endpoint. This is not something that is currently supported, but I could imagine the ability to do so being added in the future. |
Thanks Sedmicha. Plugins to push data onto specific datastores would be really helpful. Mainly towards document DBs like Open Search, Elastic Search and time series DBs like Prometheus, Influx DB, Victoria Metrics. If I have to add items to the backlog, then above mentioned storage aspects would be my most prioritised items in the backlog. |
Hi, |
Hi, information about source (e.g. IP address, ODID, ...) are added to the record if Try to add it to your startup configuration file: <output>
<name>JSON output</name>
<plugin>json</plugin>
<params>
...
<detailedInfo>true</detailedInfo> <!-- set this option to true -->
...
</params>
</output> Lukas |
Works Lukas, Thanks for your quick help. |
HI Team,
We are planning to use IPFIXCol2 as a collector for our NetFlow collection as a replacement to our existing vendor current tool. The current maximum ingestion that the vendor tool can accept is 8 million Flows/Min. I don't see any note about the maximum ingestion rate that IPFIXCol2 can process per minute? Can you please help with standards related to load it can accept please?
Kind Regards,
Sanky.
The text was updated successfully, but these errors were encountered: