Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: URGENT | ADF v4.0.0 update failed in Production #750

Open
1 of 2 tasks
jdhakar1995 opened this issue Aug 8, 2024 · 5 comments
Open
1 of 2 tasks

[Bug]: URGENT | ADF v4.0.0 update failed in Production #750

jdhakar1995 opened this issue Aug 8, 2024 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@jdhakar1995
Copy link

jdhakar1995 commented Aug 8, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

Pipeline aws-deployment-framework-bootstrap-pipeline failed at stage EnableBootstrappingViaJumpRole with below error. This is our production environment with 400+ active accounts. Also, the stack serverlessrepo-aws-deployment-framework was updated successfully.

image
image

Expected Behavior

Pipeline should be able to execute successfully in case of 400+ active accounts in an AWS Organizations.

Current Behavior

Pipeline is failing with 400+ accounts. It worked in our dev/test environment where number of accounts were less than 100.

Steps To Reproduce

Update ADF to v4.0.0 with 400+ active accounts in an AWS Org.

Possible Solution

No response

Additional Information/Context

No response

ADF Version

4.0.0

Contributing a fix?

  • Yes, I am working on a fix to resolve this issue
@jdhakar1995 jdhakar1995 added the bug Something isn't working label Aug 8, 2024
@jdhakar1995
Copy link
Author

Dear ADF Team, Please check this on priority as this is our Production environment and let me know if there is any fix or workaround.
@sbkok @bundyfx @javydekoning

@sbkok sbkok self-assigned this Aug 8, 2024
@sbkok sbkok added this to the v4.0.1 milestone Aug 8, 2024
@sbkok
Copy link
Collaborator

sbkok commented Aug 8, 2024

Hi @jdhakar1995, thank you for reporting this.
I am looking into the root cause at the moment, I will get back to you asap.

sbkok added a commit to sbkok/aws-deployment-framework that referenced this issue Aug 8, 2024
Issue: awslabs#750

## Why?

The calculation for the maximum number of accounts that can be supported with
the jump role manager in one go was incorrect.

Among other things, the calculation did not take into account the maximum
length of a role name.

## What?

* Added tests to validate future changes of the policy generation process will
  generate policies of a supported length.
* Fixed the calculation to include the maximum role name length of 64
  characters.
@sbkok
Copy link
Collaborator

sbkok commented Aug 8, 2024

@jdhakar1995 I opened a pull request that addresses the root cause.
Since you might not want to wait until that is merged and released, you could try this workaround:

In the following file: src/lambda_codebase/jump_role_manager/main.py
Comment out these lines:

MAX_NUMBER_OF_ACCOUNTS = math.floor(
(
MAX_MANAGED_POLICY_LENGTH
- ZERO_ACCOUNTS_POLICY_LENGTH
)
/ CHARS_PER_ACCOUNT_ID,
)

Add the following after the commented out lines:

MAX_NUMBER_OF_ACCOUNTS = 361

That decreases the number of accounts it tried to include in the policy from 391 to 361.

Please note, this does not mean that you cannot have more than 361 accounts.
It uses this number to allow 361 non-ADFv4-bootstrapped accounts to be bootstrapped in one go.

When you install ADF v4.0 it needs to bootstrap more than 361 accounts in your environment, please set the GrantOrgWidePrivilegedBootstrapAccessUntil parameter to a time that is a few hours in the future.
This will allow ADF to use the privileged bootstrap access for all accounts in your AWS Organization until the configured time.

Apologies for the experience, I hope this helps to resolve the issue quickly.

Best regards, Simon

@jdhakar1995
Copy link
Author

Thanks @sbkok for looking into the issue and providing the fix so quick. Yesterday I updated the same main.py and tried changing the value of variable ZERO_ACCOUNTS_POLICY_LENGTH from 265 to 400 which reduced the MAX_NUMBER_OF_ACCOUNTS to 382 from 391. And it did work. The bootstrap pipeline had been executed successfully.

Now, How shall I proceed further? Shall I include the change you did in the PR you opened or shall I do the workaround you suggested in your comment above?

Thanks
Jitendra

@sbkok
Copy link
Collaborator

sbkok commented Aug 9, 2024

Great to hear that you got that working! You should be good in that case and can proceed with the change you made.

There is no harm setting the number of accounts lower. It only limits the number of accounts that can be bootstrapped in one-go if you are not installing/updating ADF. For example, if you move hundreds of accounts from a protected-ou that was not bootstrapped by ADF to an ADF-enabled organization unit (OU).

In the PR I created, the number is lower, as it also counts spaces. If it worked with 382 accounts, they probably don't count those. No harm to leave it at 382 that you have now, or to adopt the PR. Either way works.

Do bear in mind that if you deploy ADF with the fix, it will change the version number slightly.
Inserting the commit id and number of commits since v4.0.0. It will warn about this, but you can ignore that.
Please reference this issue if you run into another issue later with ADF v4.0.0, so we know what change was applied on top of the base v4.0.0 release tag.

Once you install/update to v4.0.1 or later you do not need to reference it anymore, as the change you introduced would be overwritten and it would run the stock v4.0.1 version then.

Best regards, Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants