Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade issue #42

Open
hooray4me opened this issue Jun 6, 2023 · 13 comments · May be fixed by #102
Open

upgrade issue #42

hooray4me opened this issue Jun 6, 2023 · 13 comments · May be fixed by #102
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@hooray4me
Copy link

Describe the bug
unable to upgrade from 6.0.13 to 6.4. I can't seem to find a way to comment out HANodeName so that the server can start up in standalone mode.

Version of Helm and Kubernetes: 1.27

Any suggestions?

@aeciopires aeciopires self-assigned this Aug 6, 2023
@aeciopires aeciopires added the help wanted Extra attention is needed label Aug 6, 2023
@aeciopires
Copy link
Member

Hi @hooray4me!

Sorry by late.

Today I published versions 4.0.0 and 4.0.1 of the chart which contains some important changes. I recommend that you read and test.

The HA mode of the Zabbix Server can be disabled with the following values:

zabbixServer:
  enabled: true
  replicaCount: 1

HA mode only works with two or more Zabbix Server replicas.

@IlyaPupkovs
Copy link

IlyaPupkovs commented Oct 26, 2023

Today I tried to upgrade zabbix from 6.0.9 to 6.4.7 and even with

zabbixServer:
  enabled: true
  replicaCount: 1

it still starts in HAmode:

8:20231026:124626.896 current database version (mandatory/optional): 06000000/06000043
8:20231026:124626.896 required mandatory version: 06040000
8:20231026:124626.896 mandatory patches were found
8:20231026:124626.906 cannot perform database upgrade in HA mode: all nodes need to be stopped and Zabbix server started in standalone mode for the time of upgrade.

Zabbix upgrade documentation says:
" [...] change its configuration to standalone mode by commenting out HANodeName [parameter]"
So I tried to add

    - name: "ZBX_HANODENAME"
      value:

to zabbixServer.extraEnv: but deployment ignores it:

** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSCipherAll13": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSCipherCert": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSCipherCert13": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSCipherPSK": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSCipherPSK13": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSKeyFile": 'privatekey'...added
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSPSKIdentity": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "TLSPSKFile": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "ServiceManagerSyncFrequency": ''...removed
** Updating '/etc/zabbix/zabbix_server.conf' parameter "HANodeName": 'zabbix-services-zabbix-server-ddff74775-rhl4z'...added
** Updating '/etc/zabbix/zabbix_server.conf' parameter "NodeAddress": '10.66.34.204'...added
** Updating '/etc/zabbix/zabbix_server.conf' parameter "User": 'zabbix'...added

Changing docker images gave no result so I assume helm somehow defines ZBX_HANODENAME=hostname

P.S. removing ZBX_HANODENAME or setting it to null didnt take any effect

@IlyaPupkovs
Copy link

IlyaPupkovs commented Nov 3, 2023

So in the end I was able to upgrade from from 6.0.9 to 6.4.8
Used helm chart version 4.0.2, image alpine-6.4-latest

Problem was with undocumented parameter ZBX_AUTOHANODENAME which is hardcoded into chart, is always present on pod and is responsible for starting server in HA mode.

Interestingly enough I could set

- ZBX_AUTOHANODENAME
  value: ""

only without any other parameters in zabbixServer.extraEnv:
If any other parameter (in this case ZBX_HANODENAME) was present it resulted in error:

client.go:428: [debug] error updating the resource "zabbix-zabbix-server":
         cannot patch "zabbix-zabbix-server" with kind Deployment: The order in patch list:
[map[name:ZBX_AUTOHANODENAME value:hostname] map[name:ZBX_AUTOHANODENAME value:] map[name:ZBX_HANODENAME value:]]
 doesn't match $setElementOrder list:
[map[name:DB_SERVER_HOST] map[name:DB_SERVER_PORT] map[name:POSTGRES_USER] map[name:POSTGRES_PASSWORD] map[name:POSTGRES_DB] map[name:ZBX_AUTOHANODENAME] map[name:ZBX_HANODENAME] map[name:ZBX_AUTOHANODENAME] map[name:ZBX_NODEADDRESS] map[name:ZBX_WEBSERVICEURL] map[name:ZBX_STARTREPORTWRITERS]]

So I did two deployment cycles - one without any parameters except ZBX_AUTOHANODENAME, and after DB was upgraded second cycle with all usual parameters without ZBX_AUTOHANODENAME.

@fibbs
Copy link

fibbs commented Nov 3, 2023

It is by design that the Zabbix server does ALWAYS start in HA-Mode even with Replicas set to 1. This is in order to make sure that a scale-up does just work, and has, at least didn't have when I developed that part, no negative effect except being a "HA cluster with just one node".
The issue with upgrading major version is not entirely solved yet. The best workaround, if I understand your post correctly would be to scale down to just one replica, then do the upgrade, then scale up again.
Or do I get something completely wrong?

@fibbs
Copy link

fibbs commented Nov 3, 2023

Sorry, I did not read carefully. So, the problem is that apparently recently Zabbix server does not accept to upgrade the database if run in HA mode. This is actually new to me.
Let me think about how to solve this in most elegant way...
First shot of idea:
We have a job that runs in single mode that prepares database before the "real" Zabbix server pods start up, and which is designed to prepare the database structure in case of a fresh installation. I am thinking in a similar solution for upgrading:

  • in the sidecars of the Zabbix Server pods, which prevent those from starting when no database is there yet, add a check to figure out that a major release upgrade is necessary and prevent Zabbix servers from starting
  • start a job that does this upgrade, using the Zabbix Server image but starting it one-shot and only to upgrade database
  • then let the Zabbix Server(s) start

@IlyaPupkovs
Copy link

IlyaPupkovs commented Nov 3, 2023

Yep, exactly - Zabbix server does not accept to upgrade the database if run in HA mode.
As for solution, sounds great if could be implemented in that way

@fibbs
Copy link

fibbs commented Nov 19, 2023

I am in incubating phase for finding a solution :)

@aeciopires aeciopires added the bug Something isn't working label Nov 26, 2023
@szelga
Copy link

szelga commented Feb 3, 2024

for me, setting ZBX_AUTOHANODENAME to "" (w/o specifying ZBX_HANODENAME in values.yaml in any way whatsoever) during the upgrade did the trick. the other extra env variables (I use TimescaleDB, so can't do w/o them) I didn't touch.

UPD: and setting replicaCount to 1 during the upgrade, of course.

@aeciopires aeciopires added the documentation Improvements or additions to documentation label Feb 13, 2024
@fibbs
Copy link

fibbs commented Jun 19, 2024

An upgrade from 6 to 7 unfortunately fails (well it actually doesn't fail but it doesn't complete entirely) when using TimescaleDB due to the fact that the timescaledb.sql must be executed once again to create the newly needed hypertable:

229:20240619:083705.201 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  table "auditlog" is not a hypertable

I am wondering whether the best way to solve this once for all is to create a post-install and post-upgrade hook job that handles all the database schema relevant tasks. Up to now we have one Job, simply being deployed with the Chart and only taking care of initializing the database. The good thing was that it was not needed to create a custom image for that, just a bit of sed-magic. I think this has to be redesigned entirely, also for future use cases, having ONE custom image taking care of:

  • creating empty database schema if none existing
  • upgrading database schema in case a major release upgrade happened
  • initializing / upgrading TiimescaleDB stuff

It should be built as a custom image, or at least using an entrypoint script mounted as a configmap or such, but the image should be based on the Zabbix Server image (needed for the actual Upgrade of DB schema).

From my point of view, this should also fix the above found problem when Zabbix Server is running in HA mode.

Any more comments on this? Will investigate further during the next days.

@crowleym
Copy link

crowleym commented Jul 2, 2024

@fibbs I was able to upgrade from Zabbix 6.5 to 7 with following steps.

  1. Edit values to scale down Zabbix Server to replicaCount: 0 and deploy using zabbix-community/zabbix
  2. Clone helm chart source and comment out ZBX_AUTOHANODENAME config (name and value) in https://github.com/zabbix-community/helm-zabbix/blob/master/charts/zabbix/templates/deployment-zabbix-server.yaml#L142
  3. Deploy from this local clone with replicaCount: 1
  4. Follow container logs until DB upgrade was complete.
  5. Login and test

Now when scaling the server back to original replicaCount value of 3 I get the following error

Error: UPGRADE FAILED: error validating "": error validating data: ValidationError(Job.spec): unknown field "metadata" in io.k8s.api.batch.v1.JobSpec

Same error occurs when deploying from zabbix-community/zabbix or from local clone.

Looking into it I see an if statement that affects how things are deployed depending on the replicaCount, so will look to understand this more.
https://github.com/zabbix-community/helm-zabbix/blob/master/charts/zabbix/templates/job-init-db-schema.yaml#L1

Once found I am guessing a PR with the same if statement could be applied to disable HA automatically for replicaCount: 1 so the ZBX_AUTOHANODENAME is not applied.

@crowleym crowleym linked a pull request Jul 3, 2024 that will close this issue
2 tasks
@crowleym
Copy link

crowleym commented Jul 3, 2024

When the server is started in single mode, it automatically upgrades the DB it self, and therefore I am questioning the need for the job at all, if Zabbix has changed it behaviour as mentioned in a comment above.

By adding a false condition to the top of the job template as well as only applying ZBX_AUTOHANODENAME if replicaCount was greater than 1, I was able to use the chart to upgrade from 6.5-7.

I have made a PR #102 in case it helps someone else, but I cannot comment on the validity of removing the Job entirely beyond "it worked for me"

@fibbs
Copy link

fibbs commented Jul 4, 2024

thanks @crowleym, indeed that's exactly the way I did upgrades, but it is a bit "hacky" and shouldn't be this way, which is why I am working on a good solution.
I don't want this Helm Chart to run the Zabbix server in "single mode", even when only having one Replica. We have defined this back then when DB upgrades worked also in HA mode because of wanting to be able to scale up and down at any time.

I have an almost-working solution here in my lab, with one or two challenges to solve. One of them is to start a zabbix_server process to only upgrade the database schema and then stop, which I will try to achieve with a hacky "start process in background and loop reading its STDOUT" kind of construct. The solution will work as follows, briefly:

  • zabbix server runs in HA mode, even if only having one replica. I don't want to change this
  • in any helm installation or upgrade, all available zabbix server pods will start and be hold back by an init container, waiting for a database not only to be available but also to have the correct version
  • an additional job is being started, based on the "zabbix_server" container image, which achieves the magic of preparing the database, and also to upgrade the schema in case a major release upgrade happened

This is almost exactly the same as it is designed to work right now, with the following changes:

  • this "after install/upgrade job" (indeed, I will probably change this to be a post-upgrade / post-install hook in the helm chart) will get one additional task: the upgrade of the db schema which is being performed by zabbix_server
  • the init container coming up with any zabbix server pod will not only wait for availability of the db, but also for the right version

That should work fine then and without manual intervention.

Of course, it would be awesome if Zabbix themselves would implement a zabbix_server --only-upgrade-db or something, so that this Job container could be less hacky. I will probably try to get into discussion with the "right people" and try to convince them make our lives easier.

Stay tuned, an upgrade will come.

@spectroman
Copy link

Hi @fibbs , I wonder if you managed to raise the issue with Zabbix SIA, if there is a support ticket we could upvote?

I am facing the same problem here, although I don't use this helm project, I have my own methodology with different specs. And I got stuck also with the problem.

I went around looking if someone had found a solution and I see this ticket here and something related on the zabbix forums, to no avail anyhow.

I came up with some ideas, but absolutely the best solution would be a flag with --only-upgrade-db kinda switch, provided by them.

As I compile my own binaries and build my own images, I was thinking that I could snoop in the source code and catch the latest DBPATCH_VERSION(integer) and flag it on the entrypoint, check with the database if it requires an update, do the necessary changes, bail the zabbix_server when that is finished, add back the ha configuration, restart pod...

But this is so ugly that I am not really happy pursuing it, so maybe, I would patch zabbix source code myself to build the image if I see that Zabbix SIA will take a long time to release a solution for it.

In the end I also find it beneficial to add a new status on the HA node to inform other nodes that the database is under upgrade, they would just back off, until the node executing upgrades would just mark it finished and/or assume an active role -- that would fix the problem to avoid having a "only-upgrade-db" switch but would incur in a larger patch.

If its possible , I would be glad to know the status of the conversation with Zabbix SIA and about the ticket... and I will also decide if I go forward writing / using a patched zabbix_server binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants