Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: profilecli query-blocks merge #3618

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alsoba13
Copy link
Contributor

@alsoba13 alsoba13 commented Oct 9, 2024

In this PR we extend the new profilecli command query-blocks with a merge, analogous to profilecli query merge. With this command, you can execute queries directly to a single block hosted in your localhost or a remote bucket.

Partially solves #3559

Main trade-off

Note that opposed to profilecli query-blocks series, this can only query a single block.

Merging data from different blocks is a complex task. The implementation for that is distributed in the codebase. It implies defining query plans, using streams, and following/duplicating read path. That can be handled easily by a pyroscope server, but doing it in profilecli means duplicating code and introducing a good amount of boilerplate code for stream handling. For all those, I decided to simplify the capabilities here while still delivering some value, limiting the amount of blocks to query to just 1.

Funny enough, we should maybe rename merge to another command name here.

Capabilities

This feature gives similar capabilities as profilecli query merge but for a specified local/remote block:

  • You may choose the profile type --profile-type or specify a query with --query.
  • You may choose the output format (console, raw, pprof)
  • You can choose a --stacktrace-selector
  • You may choose to use it locally (--local-path) or remotely (--bucket-name, --tenant-id and --object-store-type - only gcs supported right now).
  • You specify queried block with the --block-ids flag.
  • Time ranges (to and from) are not needed: it will query the whole blocks instead.

doc

profilecl query-blocks merge --help
usage: profilecli query-blocks merge [<flags>]

Request merged profile.

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
  -v, --verbose                  Enable verbose logging.
      --local-path="./data/anonymous/local"
                                 Path to blocks directory.
      --bucket-name=BUCKET-NAME  The name of the object storage bucket.
      --object-store-type="gcs"  The type of the object storage (e.g., gcs).
      --block-ids=BLOCK-IDS ...  List of blocks ids to query on
      --tenant-id=TENANT-ID      Tenant id of the queried block for remote bucket
      --query="{}"               Label selector to query.
      --output="console"         How to output the result, examples: console, raw, pprof=./my.pprof
      --profile-type="process_cpu:cpu:nanoseconds:cpu:nanoseconds"
                                 Profile type to query.
      --stacktrace-selector=STACKTRACE-SELECTOR ...
                                 Only query locations with those symbols. Provide multiple times starting with the root

Usage example:

Querying profiles on a local block, and filter on service_name

profilecli query-blocks merge --block-ids=01J9RQ8QENNY6ZEA84K30GZM1C --query='{service_name="ride-sharing-app"}' | head
level=info msg="query-block merge" blockIds=[01J9RQ8QENNY6ZEA84K30GZM1C] localPath=./data/anonymous/local bucketName= tenantId= query="{service_name=\"ride-sharing-app\"}" type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
PeriodType:
Period: 0
Samples:
/[dflt]
  580000000: 17 3 4 5 6 19 20 9 10 11 9 12 13 14
  140000000: 2 21 41
 1010000000: 17 104 105 24 4 5 6 19 20 9 10 11 9 12 13 14
   10000000: 138 130 95 86 87
  350000000: 17 3 4 5 6 7 8 9 10 11 9 12 13 14
 28330000000: 1 2 21 22 23 24 4 5 6 19 20 9 10 11 9 12 13 14
...

Querying series on a remote block, raw output:

profilecli query-blocks merge --bucket-name=dev-us-central-0-profiles-dev-001-data --tenant-id=1218 --block-ids=01J9RWPHE83FGAQCE0Z9GAXV4K --query='{service_name="profiles-dev-002/ingester", pod="pyroscope-ingester-1", span_name="HTTP POST - grpc_health_v1_health", __type__="cpu"}' --output raw | head
level=info msg="query-block merge" blockIds=[01J9RWPHE83FGAQCE0Z9GAXV4K] localPath=./data/anonymous/local bucketName=dev-us-central-0-profiles-dev-001-data tenantId=1218 query="{service_name=\"profiles-dev-002/ingester\", pod=\"pyroscope-ingester-1\", span_name=\"HTTP POST - grpc_health_v1_health\", __type__=\"cpu\"}" type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
&googlev1.Profile{
  SampleType: []*googlev1.ValueType{
    &googlev1.ValueType{
      Type: 0,
      Unit: 0,
    },
  },
  Sample: []*googlev1.Sample{
    &googlev1.Sample{
      LocationId: []uint64{

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common code for output from query.go

@alsoba13 alsoba13 marked this pull request as ready for review October 9, 2024 16:25
@alsoba13 alsoba13 requested a review from a team as a code owner October 9, 2024 16:25
@aleks-p
Copy link
Contributor

aleks-p commented Oct 9, 2024

Funny enough, we should maybe rename merge to another command name here.

"merge" in this context refers to merging multiple profiles and their samples to produce a single result (e.g., flamegraph, a pprof file, etc.). The name is still valid, even if we are operating on one block :)

Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, some files were added mistakenly (lel.txt and so on).

Also, I propose to revisit the way the CLI interface is extended

Start: meta.MinTime.Time().UnixMilli(),
End: meta.MaxTime.Time().UnixMilli(),
},
100,
Copy link
Collaborator

@kolesnikovae kolesnikovae Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the max_nodes parameter. I'd say that it should be configurable. In case of pprof (SelectMergePprof) it should default to 0

Comment on lines 84 to +86
queryBlocksSeriesParams := addQueryBlocksSeriesParams(queryBlocksSeriesCmd)
queryBlocksMergeCmd := queryBlocksCmd.Command("merge", "Request merged profile.")
queryBlocksMergeParams := addQueryBlocksMergeParams(queryBlocksMergeCmd)
Copy link
Collaborator

@kolesnikovae kolesnikovae Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's design the CLI interface first. I believe that merge might be confusing.

I propose the following interface:

profilecli query merge   // Already exists. Should be hidden and replaced in docs with "profile".
profilecli query profile // Alias for "merge".
profilecli query series  // Existing subcommand.
profilecli query go-pgo  // Queries pprof for Go PGO.
etc.

Now, in the command handler, we check whether --block flag is specified. There's a common practice to use singular form for flags that accept multiple values; the flag should be specified multiple times:

profilecli query series --block=A

profilecli query series \
  --block=A \
  --block=B \

Next, let's make query subcommand to support storage backend configuration (this is very easy).

Finally, let's remove query-blocks subcommand.


Alternatively, we could extend the existing admin blocks subcommand:

profilecli admin block query profile
profilecli admin block query series
profilecli admin block query go-pgo
etc.

However, I believe query X --block=A is more intuitive. On the other hand profilecli admin block query is more correct from the semantics standpoint.

feat: profilecli query-blocks merge
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file was introduced unintentionally time ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants