Changelog for Cass Operator, new PRs should update the main / unreleased
section with entries in the order:
* [CHANGE]
* [FEATURE]
* [ENHANCEMENT]
* [BUGFIX]
- [CHANGE] #553 dockerImageRunsAsCassandra is no longer used for anything as that's the default for current images. Use SecurityContext to override default SecurityContext (999:999)
- [ENHANCEMENT] #554 Add new empty directory as mount to server-system-logger (/var/lib/vector) so it works with multiple securityContexes
- [ENHANCEMENT] #512 Configs are now built using the newer k8ssandra-client config build command instead of the older cass-config-builder. This provides support for Cassandra 4.1.x config properties and to newer.
- [CHANGE] #542 Support 7.x.x version numbers for DSE and 5.x.x for Cassandra
- [CHANGE] #531 Update to Kubebuilder gov4-alpha layout structure
- [ENHANCEMENT] #523 Spec.ServiceAccountName is introduced as replacements to Spec.ServiceAccount (to account for naming changes in Kubernetes itself), also PodTemplateSpec.Spec.ServiceAccountName is supported. Precendence order is: Spec.ServiceAccountName > Spec.ServiceAccount > PodTemplateSpec.
- [ENHANCEMENT] #541 When deployed through OLM, add serviceAccount to Cassandra pods that use nonroot priviledge
- [CHANGE] #516 Modify sidecar default CPU and memory limits.
- [CHANGE] #495 Remove all the VMware PSP specific code from the codebase. This has been inoperational since 1.8.0
- [CHANGE] #494 Remove deprecated generated clientsets.
- [CHANGE] #496 ScalingUp is no longer tied to the lifecycle of the cleanup job. The cleanup job is created after the ScaleUp has finished, but to track its progress one should check the status of the CassandraTask and not the CassandraDatacenter's status. Also, added a new annotation to the Datacenter "cassandra.datastax.com/no-cleanup", which if set prevents from the creation of the CassandraTask.
- [ENHANCEMENT] #500 Allow the /start command to run for a longer period of time (up to 10 minutes), before killing the pod if no response is received. This is intermediate solution until we can correctly detect from the pod that the start is not proceeding correctly.
- [BUGFIX] #444 Update cass-config-builder to 1.0.5.
- [BUGFIX] #415 Fix version override + imageRegistry issue where output would be invalid
- [BUGFIX] #437 Ignore cluster healthy check on Datacenter decommission. Rest of #437 fix is not applied since this version does not have that bug.
- [BUGFIX] #404 Filter unallowed values from the rackname when used in Kubernetes resources
- [BUGFIX] #455 After task had completed, the running state would still say true
- [CHANGE] #501 Replaced server-system-logger with a Vector based implementation. Also, examples are added how the Cassandra system.log can be parsed to a more structured format.
- [CHANGE] #496 ScalingUp is no longer tied to the lifecycle of the cleanup job. The cleanup job is created after the ScaleUp has finished, but to track its progress one should check the status of the CassandraTask and not the CassandraDatacenter's status. Also, added a new annotation to the Datacenter "cassandra.datastax.com/no-cleanup", which if set prevents from the creation of the CassandraTask.
- [ENHANCEMENT] #500 Allow the /start command to run for a longer period of time (up to 10 minutes), before killing the pod if no response is received. This is intermediate solution until we can correctly detect from the pod that the start is not proceeding correctly.
- [BUGFIX] #481 If CDC was enabled, disabling MCAC (old metric collector) was not possible
- [CHANGE] #457 Extract task execution attributes into CassandraTaskTemplate
- [CHANGE] #447 Update Github actions to remove all deprecated features (set-outputs, node v12 actions)
- [CHANGE] #448 Update to operator-sdk 1.25.1, update to go 1.19, update to Kubernetes 1.25, remove amd64 restriction on local builds (cass-operator and system-logger will be built for aarch64 also)
- [CHANGE] #442 Deprecate old internode-encryption storage mounts and cert generation. If old path /etc/encryption/node.jks is no longer present, then the storage mount is no longer created. For certificates with internode-encryption, we recommend using cert-manager.
- [CHANGE] #329 Thrift port is no longer open for Cassandra 4.x installations
- [CHANGE] #487 The AdditionalVolumes.PVCSpec is now a pointer. Also, webhook will allow modifying AdditionalVolumes.
- [FEATURE] #441 Implement a CassandraTask for moving single-token nodes
- [ENHANCEMENT] #486 AdditionalVolumes accepts VolumeSource as the data also, allowing ConfigMap/Secret/etc to be mounted to cassandra container.
- [ENHANCEMENT] #467 Add new metrics endpoint port (9000) to Cassandra container. This is used by the new mgmt-api /metrics endpoint.
- [ENHANCEMENT] #457 Allow overriding the datacenter name
- [ENHANCEMENT] #476 Enable CDC for DSE deployments.
- [ENHANCEMENT] #472 Add POD_NAME and NODE_NAME env variables that match metadata.name and spec.nodeName information
- [ENHANCEMENT] #315 PodTemplateSpec allows setting Affinities, which are merged with the current rules. PodAntiAffinity behavior has changed, if allowMultipleWorkers is set to true the PodTemplateSpec antiAffinity rules are copied as is, otherwise merged with current restrictions. Prevent usage of deprecated rack.Zone (use topology.kubernetes.io/zone label instead), but allow removal of Zone.
- [BUGFIX] #410 Fix installation in IPv6 only environment
- [BUGFIX] #455 After task had completed, the running state would still say true
- [BUGFIX] #488 Expose the new metrics port in services
- [CHANGE] #291 Update Ginkgo to v2 (maintain current features, nothing additional from v2)
- [BUGFIX] #431 Fix a bug where the restartTask would not provide success counts for restarted pods.
- [BUGFIX] #444 Update cass-config-builder to 1.0.5. Update the target tag of cass-config-builder to :1.0 to allow future updates in 1.0.x without rolling restarts.
- [BUGFIX] #437 Fix startOneNodeRack to not loop forever in case of StS with size 0 (such as decommission of DC)
- [CHANGE] #442 Do not mount encryption-cred-storage or create internode CAs if not needed. Recommended approach is to use cert-manager instead of this old legacy method.
- [CHANGE] #395 Make CassandraTask job arguments strongly typed
- [CHANGE] #354 Remove oldDefunctLabel support since we recreate StS. Fix #335 created-by value to match expected value.
- [CHANGE] #385 Deprecate CassandraDatacenter's RollingRestartRequested. Use CassandraTask instead.
- [CHANGE] #397 Remove direct dependency to k8s.io/kubernetes
- [FEATURE] #384 Add a new CassandraTask operation "replacenode" that removes the existing PVCs from the pod, deletes the pod and starts a replacement process.
- [FEATURE] #387 Add a new CassandraTask operation "upgradesstables" that allows to do SSTable upgrades after Cassandra version upgrade.
- [ENHANCEMENT] #385 Add rolling restart as a CassandraTask action.
- [ENHANCEMENT] #398 Update to go1.18 builds, update to use Kubernetes 1.24 envtest + dependencies, operator-sdk 1.23, controller-gen 0.9.2, Kustomize 4.5.7, controller-runtime 0.12.2
- [ENHANCEMENT] #383 Add UpgradeSSTables, Compaction and Scrub to management-api client. Improve CassandraTasks to have the ability to validate input parameters, filter target pods and do processing outside of pods.
- [ENHANCEMENT] #381 Make bootstrap operations deterministic.
- [ENHANCEMENT] #417 Allow loading imageConfig from byte array
- [BUGFIX] #327 Replace node done through CassandraTask can replace a node that's stuck in the Starting state.
- [BUGFIX] #404 Filter unallowed values from the rackname when used in Kubernetes resources
- [BUGFIX] #415 Fix version override + imageRegistry issue where output would be invalid
- [CHANGE] #370 If Cassandra start call fails, delete the pod
- [ENHANCEMENT] #366 If no finalizer is present, do not process the deletion. To prevent cass-operator from re-adding the finalizer, add an annotation no-finalizer that prevents the re-adding.
- [ENHANCEMENT] #360 If Datacenter quorum reports unhealthy state, change Status Condition DatacenterHealthy to False (DBPE-2283)
- [ENHANCEMENT] #317 Add ability for users to define CDC settings which will cause an agent to start within the Cassandra JVM and pass mutation events from Cassandra back to a Pulsar broker. (Tested on OSS Cassandra 4.x only.)
- [ENHANCEMENT] #369 Add configurable timeout for liveness / readiness and drain when mutual auth is used and a default timeout for all wget execs (required with mutual auth)
- [BUGFIX] #335 Cleanse label values derived from cluster name, which can contain illegal chars. Include app.kubernetes.io/created-by label.
- [BUGFIX] #330 Apply correct updates to Service labels and annotations through additionalServiceConfig (they are now validated and don't allow reserved prefixes).
- [BUGFIX] #368 Do not fetch endpointStatus from pods that have not started
- [BUGFIX] #364 Do not log any errors if we fail to get endpoint states from nodes.
- [CHANGE] #370 If Cassandra start call fails, delete the pod
- [ENHANCEMENT] #360 If Datacenter quorum reports unhealthy state, change Status Condition DatacenterHealthy to False (DBPE-2283)
- [BUGFIX] #377 Add timeout to all calls made with wget (mutual-auth)
- [BUGFIX] #355 Cleanse label values derived from cluster name, which can contain illegal chars.
- [BUGFIX] #330 Apply correct updates to Service labels and annotations through additionalServiceConfig (they are now validated and don't allow reserved prefixes).
- [BUGFIX] #368 Do not fetch endpointStatus from pods that have not started
- [BUGFIX] #364 Do not log any errors if we fail to get endpoint states from nodes.
- [CHANGE] #183 Move from PodDisruptionBudget v1beta1 to v1 (changes min. required Kubernetes version to 1.21)
- [ENHANCEMENT] #325 Enable static checking (golangci-lint) in the repository and fix all the found issues.
- [ENHANCEMENT] #292 Update to Go 1.17 with updates to dependencies: Kube 1.23.4 and controller-runtime 0.11.1
- [CHANGE] #264 Generate PodTemplateSpec in CassandraDatacenter with metadata
- [BUGFIX] #313 Remove Cassandra 4.0.x regexp restriction, allow 4.x.x
- [BUGFIX] #322 Add missing requeue if decommissioned pods haven't been removed yet
- [BUGFIX] #315 Validate podnames in the ReplaceNodes before moving them to NodeReplacements
- [FEATURE] #309 If StatefulSets are modified in a way that they can't be updated directly, recreate them with new specs
- [ENHANCEMENT] #312 Integration tests now output CassandraDatacenter and CassandraTask CRD outputs to build directory
- [BUGFIX] #298 EndpointState has incorrect json key
- [BUGFIX] #304 Hostname lookups on Cassandra pods fail
- [BUGFIX] #311 Fix cleanup retry reconcile bug
Bundle tag only, no changes.
- [BUGFIX] #278 ImageRegistry json key was incorrect in the definition type. Fixed from "imageRegistryOverride" to "imageRegistry"
- [CHANGE] #271 Admission webhook's FailPolicy is set to Fail instead of Ignored
- [FEATURE] #243 Task scheduler support to allow creating tasks that run for each pod in the cluster. The tasks have their own reconciliation process and lifecycle distinct from CassandraDatacenter as well as their own API package.
- [ENHANCEMENT] #235 Adding AdditionalLabels to add on all resources managed by the operator
- [ENHANCEMENT] #244 Add ability to skip Cassandra user creation
- [ENHANCEMENT] #257 Add management-api client method to list schema versions
- [ENHANCEMENT] #125 On delete, allow decommission of the datacenter if running in multi-datacenter cluster and cassandra.datastax.com/decommission-on-delete annotation is set on the CassandraDatacenter
- [BUGFIX] #272 Strip password (if it has one) from CreateRole
- [BUGFIX] #254 Safely set annotation on datacenter in config secret
- [BUGFIX] #261 State of decommission can't be detected correctly if the RPC address is removed from endpoint state during decommission, but before it's finalized.
- [CHANGE] #202 Support fetching FeatureSet from management-api if available. Return RequestError with StatusCode when endpoint has bad status.
- [CHANGE] #213 Integration tests in Github Actions are now reusable across different workflows
- [CHANGE] Prevent instant requeues, every requeue must wait at least 500ms.
- [FEATURE] #193 Add new Management API endpoints to HTTP Helper: GetKeyspaceReplication, ListTables, CreateTable
- [FEATURE] #175 Add FQL reconciliation via parseFQLFromConfig and SetFullQueryLogging called from ReconcileAllRacks. CallIsFullQueryLogEnabledEndpoint and CallSetFullQueryLog functions to httphelper.
- [FEATURE] #233 Allow overriding default Cassandra and DSE repositories and give versions a default suffix
- [ENHANCEMENT] #185 Add more app.kubernetes.io labels to all the managed resources
- [ENHANCEMENT] #233 Simplify rebuilding the UBI images if vulnerability is found in the base images
- [ENHANCEMENT] #221 Improve bundle creation to be compatible with Red Hat's certified bundle rules
- [BUGFIX] #185 introduced a regression which caused labels to be updated in StatefulSet when updating a version. Keep the original as these are not allowed to be modified.
- [BUGFIX] #222 There were still occasions where clusterName was not an allowed value, cleaning them before creating StatefulSet
- [BUGFIX] #186 Run cleanups in the background once per pod and poll its state instead of looping endlessly
- [CHANGE] #178 If clusterName includes characters not allowed in the serviceName, strip those chars from service name.
- [CHANGE] #108 Integrate Fossa component/license scanning
- [CHANGE] #120 Removed Helm charts, use k8ssandra helm charts instead
- [CHANGE] #120 Deployment model is now Kustomize in cass-operator
- [CHANGE] #120 and #148 Package placements have been modified
- [CHANGE] #148 Project uses kubebuilder multigroup structure
- [CHANGE] #120 Mage has been removed and replaced with Makefile
- [CHANGE] #120 Webhook TLS certs require new deployment model (such as with cert-manager)
- [CHANGE] #145 SKIP_VALIDATING_WEBHOOK env variable is replaced with configuration in OperatorConfig
- [CHANGE] #161 Additional seeds service is always created, even if no additional seeds in the spec is defined
- [CHANGE] #163 If additional-seeds-service includes an IP address with targetRef, do not remove it
- [ENHANCEMENT] #173 Add ListKeyspace function to httphelper
- [ENHANCEMENT] #120 Update operator-sdk and modify the project to use newer kubebuilder v3 structure instead of v1
- [ENHANCEMENT] #145 Operaator can be configured with configuration objects from ConfigMap
- [ENHANCEMENT] #146 system-logger and cass-operator base is now UBI8-micro
- [ENHANCEMENT] #180 Add Kustomize components to improve installation experience and customization options
- [BUGFIX] #162 Affinity labels defined at rack-level should have precedence over DC-level ones
- [BUGFIX] #120 Fix ResourceVersion conflict in CheckRackPodTemplate
- [BUGFIX] #141 #110 Force update of HostId after pod replace process to avoid stale HostId
- [BUGFIX] #139 Bundle creation had some bugs after #120
- [BUGFIX] #134 Fix cluster-wide-installation to happen with Kustomize also, from config/cluster
- [BUGFIX] #103 Fix upgrade of StatefulSet, do not change service name
- [CHANGE] #1 Repository move
- [CHANGE] #19 Remove internode_encryption_test
- [CHANGE] #12 Remove Reaper sidecar integration
- [CHANGE] #8 Reduce dependencies of apis/CassandraDatacenter
- [FEATURE] #14 Override PodSecurityContext
- [FEATURE] #18 Allow DNS lookup by pod name
- [FEATURE] #27 Upgrade to Cassandra 4.0-RC1
- [FEATURE] #293 Add custom labels and annotations for services (see datastax/cass-operator#293)
- [FEATURE] #232 Use hostnames and DNS lookups for AdditionalSeeds (see datastax/cass-operator#293)
- [FEATURE] #28 Make tolerations configurable
- [FEATURE] #13 Provide server configuration with a secret
- [ENHANCEMENT] #7 Add CreateKeyspace and AlterKeyspace functions to httphelper
- [ENHANCEMENT] #20 Include max_direct_memory in examples
- [ENHANCEMENT] #16 Only build images once during integration tests
- [ENHANCEMENT] #1 Replace system-logger with a custom built image, enabling faster shutdown
Features:
- Upgrade to Go 1.14 #347
- Add support for specifying additional PersistentVolumeClaims #327
- Add support for specify rack labels #292
Bug fixes:
- Retry decommission to prevent cluster from getting stuck in decommission state #356
- Set explicit tag for busybox image #339
- Incorrect volume mounts are created when adding an init container with volume mounts #309
Docs/tests:
- Introduce integration with Go Report Card #346
- Add more examples for running integration tests #338
Bug fixes:
- Fixed reconciling logic in VMware k8s environments #361 #342
- Retry decommission if worker nodes are too slow to start it #356
Features:
- Allow configuration of the system.log tail-er #311
- Update Kubernetes support to cover 1.15 to 1.19 #296 #304
- Specify more named ports for advanced workloads, and additional ports for ClusterIP service #289 #291
- Support WATCH_NAMESPACE=* to watch all namespaces #286
- Support arbitrary Cassandra and DSE versions, using a regex for validation #271
- Support running cassandra as a non-root cassandra user #275
- Always gracefully drain Cassandra nodes before pod termination, and make terminationGracePeriodSeconds configurable #269
- Add canaryUpgradeCount to support canary upgrade of a single node #258
- Support merging all user customizations into the Cassandra podTemplateSpec #263
- DSE 6.8.4 support #257
- Add arm64 support #238
- Support safely scaling down a datacenter, decommissioning Cassandra nodes #242 #265
- Add support to override the default registry for all container images #228
- Better helm chart support for branch / master builds on private Docker registries #236
- Support for specific reconciliation logic in VMware k8s environments #204 #206 #203 #224 #259 #288
Bug fixes:
- Increase default init container CPU for better startup performance #261
- Re-enable the most common quiet period #253
- Added label to all-pods service to narrow metrics scrape selection #277
Docs/tests:
- Add test for infinite reconcile #220
- Added keys from datastax/charts and updated README #221
- Support integration tests with k3d v3 #248
- Default to KIND for integration tests #252
- Run oss/dse integration smoke tests in github workflow #267
Features:
Bug fixes:
- Fix for enabling DSE advanced workloads #230
Features:
- Cassandra 3.11.7 support #209
- DSE 6.8.2 support #207
- Configurable resource requests and limits for init and system-logger containers. #184
- Add quietPeriod and observedGeneration to the status #190
- Update config builder init container to 1.0.2 #193
- Host network support #186
- Helm chart option for cluster-scoped install #182
- Create JKS for internode encryption #156
- Headless ClusterIP service for additional seeds #175
- Operator managed NodePort service #177
- Experimental ability to run DSE advanced workloads #158
- More validation logic in the webhook #165
Bug fixes:
- Fix watching CassDC to not trigger on status update #212
- Enumerate more container ports #200
- Resuming a stopped CassDC should not use the ScalingUp condition #198
- Idiomatic usage of the term "internode" #197
- First-seed-in-the-DC logic should respect additionalSeeds #180
- Use the additional seeds service in the config #189
- Fix operator so it can watch multiple or all namespaces #173
Docs/tests:
- Encryption documentation #196
- Fix link to sample-cluster-sample-dc.yaml #191
- Kong Ingress Documentation #160
- Adding AKS storage example #164
- Added ingress documentation and sample client application to docs #140
- Add DSE 6.8.1 support, and update to config-builder 1.0.1 #139
- Experimental support for Cassandra Reaper running in sidecar mode #116
- Support using RedHat universal base image containers #95
- Provide an easy way to specify additional seeds in the CRD #136
- Unblocking Kubernetes 1.18 support #132
- Bump version of Management API sidecar to 0.1.5 #129
- No need to always set LastRollingRestart status #124
- Set controller reference after updating StatefulSets, makes sure StatefulSets are cleaned up on delete #121
- Use the PodIP for Management API calls #112
- Watch secrets to trigger reconciling user and password updates #109
- Remove NodeIP from status #96
- Add ability to specify additional Cassandra users in CassandraDatacenter #94
- Improve validation for webhook configuration #103
- Support for several k8s versions in the helm chart #97
- Ability to roll back a broken upgrade / configuration change #85
- Mount root as read-only and temp dir as memory emptyvol #86
- Fix managed-by label #84
- Add sequence diagrams #90
- Add PodTemplateSpec in CassDC CRD spec, which allows defining a base pod template spec #67
- Support testing with k3d #79
- Add logging of all events for more reliable retrieval #76
- Update to Operator SDK v0.17.0 #78
- Update Cassandra images to include metric-collector-for-apache-cassandra (MCAC) #81
- Run data cleanup after scaling up a datacenter #80
- Requeue after the last node has its node-state label set to Started during cluster creation #77
- Remove delete verb from validating webhook #75
- Add conditions to CassandraDatacenter status #50
- Better support and safeguards for adding racks to a datacenter #59
- #27 Added a helm chart to ease installing.
- #23 #37 #46 Added a validating webhook for CassandraDatacenter.
- #43 Emit more events when reconciling a CassandraDatacenter.
- #47 Support
nodeSelector
to pin database pods to labelled k8s worker nodes. - #22 Refactor towards less code listing pods.
- Several integration tests added.
- Project renamed to
cass-operator
. - KO-281 Node replace added.
- KO-310 The operator will work to revive nodes that fail readiness for over 10 minutes by deleting pods.
- KO-317 Rolling restart added.
- K0-83 Stop the cluster more gracefully.
- KO-329 API version bump to v1beta1.
- KO-146 Create a secret for superuser creation if one is not provided.
- KO-288 The operator can provision Cassandra clusters using images from
https://github.com/datastax/management-api-for-apache-cassandra and the primary
CRD the operator works on is a
v1alpha2
cassandra.datastax.com/CassandraDatacenter
- KO-210 Certain
CassandraDatacenter
inputs were not rolled out to pods during rolling upgrades of the cluster. The new process considers everything in the statefulset pod template. - KO-276 Greatly improved integration tests on real KIND / GKE Kubernetes clusters using Ginkgo.
- KO-223 Watch fewer Kubernetes resources.
- KO-232 Following best practices for assigning seed nodes during cluster start.
- KO-92 Added a container that tails the system log.
- KO-190 Fix bug introduced in v0.4.0 that prevented scaling up or deleting datacenters.
- KO-177 Create a headless service that includes pods that are not ready. While this is not useful for routing CQL traffic, it can be helpful for monitoring infrastructure like Prometheus that would like to attempt to collect metrics from pods even if they are unhealthy, and which can tolerate connection failure.
- KO-97 Faster cluster deployments
- KO-123 Custom CQL super user. Clusters can now be provisioned without the
publicly known super user
cassandra
and publicly known default passwordcassandra
. - KO-42 Preliminary support for DSE upgrades
- KO-87 Preliminary support for two-way SSL authentication to the DSE management API. At this time, the operator does not automatically create certificates.
- KO-116 Fix pod disruption budget calculation. It was incorrectly calculated per-rack instead of per-datacenter.
- KO-129 Provide
allowMultipleNodesPerWorker
parameter to enable testing on small k8s clusters. - KO-136 Rework how DSE images and versions are specified.
- Initial labs release.