Skip to content

Commit

Permalink
upgrade-manager-v2: Fix unit tests (#275)
Browse files Browse the repository at this point in the history
* Delete README.md

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* delete all

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add API

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* initial code

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add more scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add kubernetes API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* aws API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* AWS API calls & Drift detection

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* initial rotation logic

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Implemented RollingUpgrade object validation. (#176)

* Validation step to check Nodes and ASG launch configs

Signed-off-by: shreyas-badiger <[email protected]>

* Validating launch definition after a rolling upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix all the "make vet" errors in Controller V2 branch. (#177)

* Validation step to check Nodes and ASG launch configs

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Validating launch definition after a rolling upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolve error log message and return statement

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Adding Functional Test (#113)

* Adding BDD, workflow and badge

* Changing CI workflow job name

* Adding make manifests

* Clarifying cron time zone comment

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.13 (#115)

* release 0.13

* Update CHANGELOG.md

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* bump version (#116)

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Repo selection for CI and BDD workflows & CI step for releases (#117)

* CI-BDD not on forks & Step for releases (#2)

* Testing CI-BDD not on forks & Step for releases

* Adding step for image with tag git-tag

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Terminate unjoined nodes

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolving PR comments

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version 0.14. (#121)

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to 0.15-dev.

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix typo in README.md. (#125)

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Ignore the terminated instance during upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Added WARNING prefix in the logging

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Apply suggestions from code review

Co-authored-by: Kevin Downey <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Capitalize sprintf to Sprintf

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Upgrade to Go 1.15 (#128)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix few typos and simplify error returns, remove redundant types (#131)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Readiness gates implementation for eager mode (#130)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Adding Functional Test (#113)

* Adding BDD, workflow and badge

* Changing CI workflow job name

* Adding make manifests

* Clarifying cron time zone comment

Signed-off-by: sbadiger <[email protected]>

* Validation step to check Nodes and ASG launch configs (#112)

* Validation step to check Nodes and ASG launch configs

* Validating launch definition after a rolling upgrade

* Resolve error log message and return statement

Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.13 (#115)

* release 0.13

* Update CHANGELOG.md

Signed-off-by: sbadiger <[email protected]>

* bump version (#116)

Signed-off-by: sbadiger <[email protected]>

* Repo selection for CI and BDD workflows & CI step for releases (#117)

* CI-BDD not on forks & Step for releases (#2)

* Testing CI-BDD not on forks & Step for releases

* Adding step for image with tag git-tag

Signed-off-by: sbadiger <[email protected]>

* Terminate unjoined nodes (#120)

* Validation step to check Nodes and ASG launch configs

* Validating launch definition after a rolling upgrade

* Resolve error log message and return statement

* Terminate unjoined nodes

* Resolving PR comments

Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version 0.14. (#121)

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to 0.15-dev.

Signed-off-by: sbadiger <[email protected]>

* Fix bug when switching to launch templates (#136)

* Update rollingupgrade_controller.go

* Update rollingupgrade_controller.go

Signed-off-by: Eytan Avisror <[email protected]>

* spacing fixes

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Extract script runner to a separate type; fix work with env. variables (#132)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version v0.15 (#137)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to v0.16-dev.

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Propagate parent env variables to allow to talk with API Server (#144)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump Golang CI action to fix failed CI run (#146)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Simplify (#145)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add Expiration to cache and do not refresh ASG if cache is not expired (#143)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix documentation for uniform across AZ Update strategy and fix typos (#147)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Move cluster state from package level to a cluster state impl (#148)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Simplify work with intstr type. (#149)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* If instance is in standby mode already, just return (#138)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Handle terminated instances gracefully. (#150)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Template version comparison fix (#155)

* get template version

Signed-off-by: Eytan Avisror <[email protected]>

* fix tests

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.16 (#157)

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* bump version to 0.17-dev (#158)

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Don't uncordon node on failure to run postDrain script when IgnoreDrainFailures set (#151)

* Don't uncordon node on failure to run postDrain script when IgnoreDrainFailures set

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Test node uncordon when postDrain / postDrainWait script fails

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Abort on strategy failure instead of continuing (#152)

* Abort on strategy failure instead of continuing

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Remove unformatted error message placeholder

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Explictly specify strategy for tests

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* use NamespacedName (#160)

Signed-off-by: Eytan Avisror <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version v0.17 (#161)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to v0.18-dev (#162)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Move constants to types so that they can be reused (#167)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Remove separate module for pkg/log (#168)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump dependencies. (#169)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* use standard fmt.Errorf to format error message; unify error format (#171)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix namespaced name order (#170)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add instance id to the logs (#173)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump golang and busybox (#172)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Expose template list and other execution errors to logs (#166)

* Log and return wrapped launchtemplate error

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Expose execution error in logs

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* output can contain other messages from API Server, so be more relaxed (#174)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Delete README.md

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* delete all

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add API

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* initial code

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add more scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add kubernetes API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* aws API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* AWS API calls & Drift detection

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* validate() function

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* modified validate()

Signed-off-by: sbadiger <[email protected]>

* modified validate()

Signed-off-by: sbadiger <[email protected]>

* initial rotation logic

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* basic script_runner without any modifications

Signed-off-by: sbadiger <[email protected]>

* Fix all the vet related errors

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Alfredo Garo <[email protected]>
Co-authored-by: Eytan Avisror <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Craig Robson <[email protected]>
Co-authored-by: Kevin Downey <[email protected]>
Co-authored-by: Oleg Atamanenko <[email protected]>
Co-authored-by: Shreyas Badiger <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Controller v2: Implementation of Instance termination (#178)

* fix make vet errors.

Signed-off-by: sbadiger <[email protected]>

* Terminate instances and run v2 for first time.

Signed-off-by: sbadiger <[email protected]>

* Addressing review comments

Signed-off-by: sbadiger <[email protected]>

* addressing more review comments

Signed-off-by: sbadiger <[email protected]>

* Log error message

Signed-off-by: sbadiger <[email protected]>

* error handling for instance tagging

Signed-off-by: sbadiger <[email protected]>

* Migrate Script Runner (#179)

* Basic script runner

Signed-off-by: Eytan Avisror <[email protected]>

* Update upgrade.go

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Implemented node drain. (#181)

Signed-off-by: sbadiger <[email protected]>

* Eager mode implementation (#183)

* Eager mode implementation

Signed-off-by: sbadiger <[email protected]>

* Metrics features (#189)

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Process the batch rotation in parallel (#192)

* Process the batch rotation in parallel

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* Move the DrainManager within ReplaceBatch(), to access one per RollingUpgrade CR (#195)

Signed-off-by: sbadiger <[email protected]>

* Refine metrics implementation to support goroutines (#196)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Ignore generated code  (#201)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix bug in deleting the entry in syncMap (#203)

Signed-off-by: sbadiger <[email protected]>

* Unit tests for controller-v2 (#215)

* Unit tests

Signed-off-by: sbadiger <[email protected]>

* minor change in accessing the namespace name

Signed-off-by: sbadiger <[email protected]>

* move helper functions to a differnt file

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgradeContext (#234)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolve compile errors caused by merge conflict. (#235)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Move DrainManager back to Reconciler (#236)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

* move drain-manager to reconciler

Signed-off-by: sbadiger <[email protected]>

* initialize RollingUpgrade object

Signed-off-by: sbadiger <[email protected]>

* use bool instead of count for standby function

Signed-off-by: sbadiger <[email protected]>

* refactor in-progress and standby code

Signed-off-by: sbadiger <[email protected]>

* rename instance standby function

Signed-off-by: sbadiger <[email protected]>

* DrainManager changes in unit test files

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* V2 controller metrics concurrency fix (#231)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into upgrade_metrics.go

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into metrics.go

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add missing parenthesis (#239)

Signed-off-by: sbadiger <[email protected]>

* metricsMutex should be initialized (#240)

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Load test fixes (#245)

* upgrade-manager-v2: Move DrainManager back to Reconciler (#236)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

* move drain-manager to reconciler

Signed-off-by: sbadiger <[email protected]>

* initialize RollingUpgrade object

Signed-off-by: sbadiger <[email protected]>

* use bool instead of count for standby function

Signed-off-by: sbadiger <[email protected]>

* refactor in-progress and standby code

Signed-off-by: sbadiger <[email protected]>

* rename instance standby function

Signed-off-by: sbadiger <[email protected]>

* DrainManager changes in unit test files

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* V2 controller metrics concurrency fix (#231)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into upgrade_metrics.go

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into metrics.go

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add missing parenthesis

Signed-off-by: sbadiger <[email protected]>

* load test fixes

Signed-off-by: sbadiger <[email protected]>

* handle scaling group not found

Signed-off-by: sbadiger <[email protected]>

* Update upgrade.go

Signed-off-by: sbadiger <[email protected]>

* log one level up

* remove double logging

Signed-off-by: sbadiger <[email protected]>

* final push before RC release. (#254)

* support IgnoreDrainFailures flag

Signed-off-by: sbadiger <[email protected]>

* add else condition

Signed-off-by: sbadiger <[email protected]>

* set min for maxUnavailable

Signed-off-by: sbadiger <[email protected]>

* calculateMaxUnavailable function

Signed-off-by: sbadiger <[email protected]>

* add a new coloumn (completePercentage)

Signed-off-by: sbadiger <[email protected]>

* disable debug logs by default

Signed-off-by: sbadiger <[email protected]>

* Fix metrics collecting issue (#249)

* metricsMutex should be initialized

Signed-off-by: xshao <[email protected]>

* Use InProcessingNode instead of Stringp[] so that it can have the status of steps

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Revert "Fix metrics collecting issue (#249)" (#256)

This reverts commit f5dd1cb.

Signed-off-by: sbadiger <[email protected]>

* Fix metrics calculation issue (#258)

* metricsMutex should be initialized

Signed-off-by: xshao <[email protected]>

* Use InProcessingNode instead of Stringp[] so that it can have the status of steps

Signed-off-by: xshao <[email protected]>

* Make the change backward compatible

Signed-off-by: xshao <[email protected]>

* Make the change backward compatible

Signed-off-by: xshao <[email protected]>

* Add mutex for InProcessingNode deleting

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add a mock for test and update version in Makefile (#262)

Signed-off-by: sbadiger <[email protected]>

* and CR end time (#264)

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: expose totalProcessing time and other metrics (#265)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* move cloud discovery after nodeInterval / drainInterval wait (#270)

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Add nodeEvents handler instead of a watch handler (#272)

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* move cloud discovery after nodeInterval / drainInterval wait

Signed-off-by: sbadiger <[email protected]>

* Add watch event for cluster nodes instead of API calls

Signed-off-by: sbadiger <[email protected]>

* upon node deletion, remove it from syncMap as well

Signed-off-by: sbadiger <[email protected]>

* Add nodeEvents handler instead of watch handler

Signed-off-by: sbadiger <[email protected]>

* Ignore Reconciles on nodeEvents

Signed-off-by: sbadiger <[email protected]>

* Add comments

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sheldon Shao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Process next batch while waiting on nodeInterval period. (#273)

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* move cloud discovery after nodeInterval / drainInterval wait

Signed-off-by: sbadiger <[email protected]>

* Add watch event for cluster nodes instead of API calls

Signed-off-by: sbadiger <[email protected]>

* upon node deletion, remove it from syncMap as well

Signed-off-by: sbadiger <[email protected]>

* Add nodeEvents handler instead of watch handler

Signed-off-by: sbadiger <[email protected]>

* Ignore Reconciles on nodeEvents

Signed-off-by: sbadiger <[email protected]>

* Add comments

Signed-off-by: sbadiger <[email protected]>

* Set nextbatch to standBy while waiting for terminate

* Avoid parallel reconcile operation per ASG

* add default requeue time

Co-authored-by: Sheldon Shao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* fix unit tests

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Eytan Avisror <[email protected]>
Co-authored-by: Alfredo Garo <[email protected]>
Co-authored-by: Eytan Avisror <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Craig Robson <[email protected]>
Co-authored-by: Kevin Downey <[email protected]>
Co-authored-by: Oleg Atamanenko <[email protected]>
Co-authored-by: Shreyas Badiger <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Sheldon Shao <[email protected]>
Co-authored-by: Sahil Badla <[email protected]>
Co-authored-by: Sheldon Shao <[email protected]>
  • Loading branch information
16 people authored Jul 21, 2021
1 parent 0e64929 commit 00f7e89
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 14 deletions.
35 changes: 33 additions & 2 deletions controllers/helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,20 @@ func createASGInstance(instanceID string, launchConfigName string) *autoscaling.
}
}

func createEc2Instances() []*ec2.Instance {
return []*ec2.Instance{
&ec2.Instance{
InstanceId: aws.String("mock-instance-1"),
},
&ec2.Instance{
InstanceId: aws.String("mock-instance-2"),
},
&ec2.Instance{
InstanceId: aws.String("mock-instance-3"),
},
}
}

func createASG(asgName string, launchConfigName string) *autoscaling.Group {
return &autoscaling.Group{
AutoScalingGroupName: &asgName,
Expand All @@ -192,10 +206,23 @@ func createASG(asgName string, launchConfigName string) *autoscaling.Group {
}
}

func createDriftedASG(asgName string, launchConfigName string) *autoscaling.Group {
return &autoscaling.Group{
AutoScalingGroupName: &asgName,
LaunchConfigurationName: &launchConfigName,
Instances: []*autoscaling.Instance{
createASGInstance("mock-instance-1", "different-launch-config"),
createASGInstance("mock-instance-2", "different-launch-config"),
createASGInstance("mock-instance-3", "different-launch-config"),
},
DesiredCapacity: func(x int) *int64 { i := int64(x); return &i }(3),
}
}

func createASGs() []*autoscaling.Group {
return []*autoscaling.Group{
createASG("mock-asg-1", "mock-launch-config-1"),
createASG("mock-asg-2", "mock-launch-config-2"),
createDriftedASG("mock-asg-2", "mock-launch-config-2"),
createASG("mock-asg-3", "mock-launch-config-3"),
}
}
Expand Down Expand Up @@ -319,5 +346,9 @@ func (m *MockEC2) DescribeInstancesPages(request *ec2.DescribeInstancesInput, ca
}

func (m *MockEC2) DescribeInstances(*ec2.DescribeInstancesInput) (*ec2.DescribeInstancesOutput, error) {
return &ec2.DescribeInstancesOutput{}, nil
return &ec2.DescribeInstancesOutput{
Reservations: []*ec2.Reservation{
&ec2.Reservation{Instances: createEc2Instances()},
},
}, nil
}
9 changes: 9 additions & 0 deletions controllers/upgrade.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ type RollingUpgradeContext struct {
}

func (r *RollingUpgradeContext) RotateNodes() error {
// set status to running
r.RollingUpgrade.SetCurrentStatus(v1alpha1.StatusRunning)
common.SetMetricRollupInitOrRunning(r.RollingUpgrade.Name)

// set start time
if r.RollingUpgrade.StartTime() == "" {
r.RollingUpgrade.SetStartTime(time.Now().Format(time.RFC3339))
}

// discover the state of AWS and K8s cluster.
if err := r.Cloud.Discover(); err != nil {
r.Info("failed to discover the cloud", "scalingGroup", r.RollingUpgrade.ScalingGroupName(), "name", r.RollingUpgrade.NamespacedName())
Expand Down
27 changes: 15 additions & 12 deletions controllers/upgrade_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -246,33 +246,36 @@ func TestIsScalingGroupDrifted(t *testing.T) {

func TestRotateNodes(t *testing.T) {
var tests = []struct {
TestDescription string
Reconciler *RollingUpgradeReconciler
AsgClient *MockAutoscalingGroup
ExpectedValue bool
ExpectedStatusValue string
TestDescription string
Reconciler *RollingUpgradeReconciler
AsgClient *MockAutoscalingGroup
RollingUpgradeContext *RollingUpgradeContext
ExpectedValue bool
ExpectedStatusValue string
}{
{
"All instances have different launch config as the ASG, RotateNodes() will not mark CR complete",
"All instances have different launch config as the ASG, RotateNodes() should not mark CR complete",
createRollingUpgradeReconciler(t),
func() *MockAutoscalingGroup {
newAsgClient := createASGClient()
newAsgClient.autoScalingGroups[0].LaunchConfigurationName = aws.String("different-launch-config")
return newAsgClient
createASGClient(),
func() *RollingUpgradeContext {
newRollingUpgradeContext := createRollingUpgradeContext(createRollingUpgradeReconciler(t))
newRollingUpgradeContext.RollingUpgrade.Spec.AsgName = "mock-asg-2" // The instances in mock-asg are drifted
return newRollingUpgradeContext
}(),
true,
v1alpha1.StatusRunning,
},
{
"All instances have same launch config as the ASG, RotateNodes() will mark CR complete",
"All instances have same launch config as the ASG, RotateNodes() should mark CR complete",
createRollingUpgradeReconciler(t),
createASGClient(),
createRollingUpgradeContext(createRollingUpgradeReconciler(t)),
false,
v1alpha1.StatusComplete,
},
}
for _, test := range tests {
rollupCtx := createRollingUpgradeContext(test.Reconciler)
rollupCtx := test.RollingUpgradeContext
rollupCtx.Cloud.ScalingGroups = test.AsgClient.autoScalingGroups
rollupCtx.Auth.AmazonClientSet.AsgClient = test.AsgClient

Expand Down

0 comments on commit 00f7e89

Please sign in to comment.