Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: CodeCommit Discovery #13

Open
stijnbrouwers opened this issue Mar 18, 2024 · 6 comments
Open

feature: CodeCommit Discovery #13

stijnbrouwers opened this issue Mar 18, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@stijnbrouwers
Copy link

🔖 Feature description

Discover entities on CodeCommit repositories similar to the AwsS3DiscoveryProcessor.

🎤 Context

As discussed first in #6

✌️ Possible Implementation

No response

@stijnbrouwers stijnbrouwers added the enhancement New feature or request label Mar 18, 2024
@stijnbrouwers stijnbrouwers changed the title feature: <title> feature: CodeCommit Discovery Mar 18, 2024
@niallthomson
Copy link
Contributor

@stijnbrouwers do you feel this is addressed by the contribution you made upstream?

@stijnbrouwers
Copy link
Author

@niallthomson
No, discovery is not yet implemented in the contribution.
This is still a TODO at this point.

@niallthomson
Copy link
Contributor

After looking in a bit more detail I see the gap here. I think we could do with outlining some more concrete requirements.

This is an example of configuring the GitHub equivalent:

catalog:
  providers:
    github:
      # the provider ID can be any camelCase string
      providerId:
        organization: 'backstage' # string
        catalogPath: '/catalog-info.yaml' # string
        filters:
          branch: 'main' # string
          repository: '.*' # Regex
        schedule: # same options as in TaskScheduleDefinition
          # supports cron, ISO duration, "human duration" as used in code
          frequency: { minutes: 30 }
          # supports ISO duration, "human duration" as used in code
          timeout: { minutes: 3 }

I assume we'd need to take something like that format and add AWS considerations like how accounts and regions are handled.

The simple approach would be that instead of the organization field each provider could accept optional accountId and region fields. However I would see obvious issues with this scaling in situations where CodeCommit repositories are spread across a large number of accounts in an AWS organization, so perhaps some discovery in that regard may also be necessary.

@stijnbrouwers
Copy link
Author

@niallthomson
Sorry for the late response, but I agree.
I think we can reuse the whole config here except for the organization part.
accountId and region are required but I think also an optional "RoleName" to assume a different IAM Role to perform the actions.

Then for multiple accounts, I was first thinking of having an array with this info (accountId, region, roleName) in a single provider but it makes more sense to just create a provider per account. This way each account can have it's own filters and schedule.

I think the scaling will be OK, no?
A pseudo-algorithm would look something like this I think:
For each account =>

  1. We request a list of all repositories (ListRepositories)
  2. We apply the filter from the config
  3. For all remaining repos, we perform a GetFile for the config 'catalogPath' and the branch from the config 'filters.branch'. Either the GetFile command will return the content and we register it, or it will fail and we continue to the next

I think the load to check on this is doable. Not much difference compared to doing the same for example for Gitlab I would think.

@niallthomson
Copy link
Contributor

I believe the role/account mapping should be handled by the DefaultAwsCredentialsManager. Theres been a similar conversation upstream regarding the AwsOrganizationCloudAccountProcessor and the plan is to remove the existing roleArn parameter for the processor and just use account ID instead. So in this case adding accountId and region should be sufficient I think?

The implementation of the logic itself might even be simpler than what you outlined. It looks like the GitHub discovery processor just optimistically emits locations and lets "other stuff" figure out if the locations are valid and contain data:

https://github.com/backstage/backstage/blob/master/plugins/catalog-backend-module-github/src/processors/GithubDiscoveryProcessor.ts#L151

Because you already did the hard work of processing locations for CodeCommit I believe we could do the same thing?

@stijnbrouwers
Copy link
Author

OK, if this is already foreseen in the DefaultAwsCredentialsManager, we can just reuse it. The only case that it supports should we let it be declared (optionally) is if you should want to overwrite the default behaviour of the account (i.e. where you create a separate role that only allows the listing of codecommit repos). But that might be overkill and could always be added later should the need arise.

Regarding the emitting of the location, that seems quite nice. I haven't looked at how others implemented it just yet so I am not sure what is picking up these emitted locations. I suppose the logic will check which integration matches the url and just uses the readUrl of the relevant urlReader, nice!

I think it shouldn't be that much work to implement it then since a lot of the work is already there, it just needs to be glued together :-)

Thank you for your insights!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants