Skip to content

Latest commit

 

History

History
96 lines (68 loc) · 6.01 KB

index.en.md

File metadata and controls

96 lines (68 loc) · 6.01 KB
Title summary
Personal News Analysis
Automatically find relevant news from the Web.

Systematically retrieves online news articles, enriches them, scans them for keywords and sends hits to raindrop.io / GetPocket.com. All analysis components are loosely-coupled with NATS.io work queues, which also allows scaling single-core-CPU-intensive components easily.

Open In Draw.io

The system has three NATS queues:

  1. feed-urls - URLs of RSS feeds.
  2. article-urls - URLs of individual articles of RSS feeds.
  3. match-urls - URLs of positive matching articles.

Involved services

All services are orchestrated and scaled with compose.yml.

Custom services

Third party services

Message queue for scaling

Instead of blocking the application with a single core keyword matching operation, or even trying to build a complex multi-threading keyword matching, we are using the scale option of docker compose to run multiple single-core keyword matching components in parallel, wired together with the message queue. This allows us to keep individual components super straight-forward and easy to maintain.

Keyword matching containers, scaled up

One core per keyword matching

Observability

A typical Prometheus-Loki-Grafana stack is used to monitor application metrics and statistics.

NATS server stats are made available to Prometheus via Prometheus NATS Exporter.

Keyword-matcher-containers use zerolog and expose their logs to Loki using the Docker Loki logging driver.

A Grafana dashboard ships with the source of the repository.

Comparing Python with Golang

As one of the core components responsible for the main analysis task, keyword-matcher has been ported from Python to Golang, for fun and research purposes. Both implementations of keyword-matcher can play alongside or even to compete with each other:

NAME                                                 CPU %     MEM USAGE / LIMIT
loki                                                 1.33%     74.55MiB / 7.667GiB
nats-news-analysis_fullfeedrss_1                     0.00%     76.68MiB / 7.667GiB
nats-news-analysis_fullfeedrss_2                     0.01%     70.62MiB / 7.667GiB
nats-news-analysis_grafana_1                         0.17%     35.95MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_1              0.00%     8.051MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_2              0.00%     8.422MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_3              0.00%     8.781MiB / 7.667GiB
nats-news-analysis_keyword-matcher-go_4              0.00%     8.059MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_1          0.00%     22.64MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_2          0.00%     23.21MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_3          0.00%     24.23MiB / 7.667GiB
nats-news-analysis_keyword-matcher-python_4          0.00%     23.8MiB / 7.667GiB
nats-news-analysis_loadbalancer_1                    0.00%     2.316MiB / 7.667GiB
nats-news-analysis_nats-server_1                     1.34%     92.97MiB / 7.667GiB
nats-news-analysis_natsexporter_1                    0.03%     7.41MiB / 7.667GiB
nats-news-analysis_pocket-integration_1              0.00%     18.41MiB / 7.667GiB
nats-news-analysis_prometheus_1                      0.00%     37.22MiB / 7.667GiB
nats-news-analysis_rss-article-url-feeder-go-1st_1   0.05%     15.32MiB / 7.667GiB
nats-news-analysis_rss-article-url-feeder-go-2nd_1   11.46%    12.95MiB / 7.667GiB

Here are some interesting stats from Docker and Loki, collected during regular operation:

Metric Python Golang Comparison
Docker image size 424MB 6.09MB Go impl. is ~70x smaller
Memory consumption 23,8MiB 8,33MiB Go impl. needs ~3x less memory
LoC 447 485 Python impl. has ~8% less lines