Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] grafana operator 5.6.0 -> 5.6.1 upgrade issues openshift #1399

Closed
ginokok1996 opened this issue Feb 5, 2024 · 28 comments · Fixed by #1403
Closed

[Bug] grafana operator 5.6.0 -> 5.6.1 upgrade issues openshift #1399

ginokok1996 opened this issue Feb 5, 2024 · 28 comments · Fixed by #1403
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ginokok1996
Copy link

Describe the bug
We are unable to upgrade to grafana-operator.v.5.6.1 from grafana-operator.v.5.6.0

We have a development cluster where we have automatic upgrades for the operators enabled.
This morning our cluster tried to upgrade grafana to 5.6.1 but encountered the following error:

install strategy failed: Deployment.apps "grafana-operator-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/managed-by":"olm", "app.kubernetes.io/name":"grafana-operator"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

Version
v.5.6.0

To Reproduce

  1. Install grafana operator v.5.6.0.
  2. Approve install-plan for v.5.6.1.
  3. Check the events of the v.5.6.1 operator csv.

Expected behavior
Successful installation of v.5.6.1

Runtime (please complete the following information):

OS: Linux
Grafana Operator Version v5.6.0
Environment: Openshift 4.12.26
Deployment type: Deployed via operatorhub

@ginokok1996 ginokok1996 added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 5, 2024
@ginokok1996 ginokok1996 changed the title [Bug] [Bug] grafana operator 4.6.1 upgrade issues openshift Feb 5, 2024
@tkolo
Copy link

tkolo commented Feb 5, 2024

bug title is wrong, it's about upgrade from 5.6.0 -> 5.6.1. Other than that can confirm, I'm having the same issue

@ginokok1996 ginokok1996 changed the title [Bug] grafana operator 4.6.1 upgrade issues openshift [Bug] grafana operator 5.6.0 -> 5.6.1 upgrade issues openshift Feb 6, 2024
@ginokok1996
Copy link
Author

Excuses, corrected it

@NissesSenap
Copy link
Collaborator

NissesSenap commented Feb 6, 2024

Thanks for the report @ginokok1996 , sorry about the issue.

This issue was introduced in #1373
We could potentially do as suggested here: operator-framework/operator-lifecycle-manager#952 to solve this issue.

We are working on a fix.

@NissesSenap NissesSenap added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 6, 2024
@ginokok1996
Copy link
Author

Thanks for the fast reply @NissesSenap , happy to know a fix is in the works.

@NissesSenap
Copy link
Collaborator

The fix has been applied and I have created the upstream PR to release it to OLM, hopefully it will be merged tonight.
redhat-openshift-ecosystem/community-operators-prod#4000

@ginokok1996
Copy link
Author

Thanks for the fix @NissesSenap, only unfortunate thing is that 5.6.1 still exists on operatorhub.
meaning we cant follow the upgrade paths with automatic update on since we cant update to 5.6.1

We would then need to uninstall the operator and install 5.6.2 directly

@NissesSenap
Copy link
Collaborator

Can't you go from 5.6.0 manually over to 5.6.2?

@ginokok1996
Copy link
Author

In openshift you can't, its either follow the different patches or uninstall the operator and install a specific version.

@NissesSenap
Copy link
Collaborator

I haven't used OCP in years. But back in the days you could change an object called something like operatorgroup https://docs.openshift.com/container-platform/4.10/operators/understanding/olm/olm-understanding-olm.html

You should be able to set the version manually. If you can try that, it would be great, so we at least know it works.

I will try to create a PR upstream to OLM, but it won't be able to merge automatically. So I will have to get in contact with someone at redhat and normally that isn't the quickest.

@NissesSenap
Copy link
Collaborator

I have created the upstream change here: redhat-openshift-ecosystem/community-operators-prod#4002
Let's see how it goes

@ginokok1996
Copy link
Author

Thanks a bunch @NissesSenap

@tkolo
Copy link

tkolo commented Feb 8, 2024

If someone wants a Quick And Dirty™ solution to resolve this, all you need to do is remove deployment grafana-operator-controller-manager from openshift-operators namespace. The operator will re-create it automatically and proceed with update to 5.6.1 normally. I've noticed no other side-effects in any grafana instance, then again, the cluster I tested it on is not a critical/production one so your mileage may vary.

@NissesSenap
Copy link
Collaborator

That is a very good point @tkolo , the only thing that will happen is when the operator restarts it will reconcile any potential changes that could have been done while it was gone. This happens the same way when the operator is restarted.
So it should be fine to do in a production env.

@NissesSenap
Copy link
Collaborator

What feels like my 10th try to fix this: redhat-openshift-ecosystem/community-operators-prod#4017
Hopefully the skip part will solve it.
Please give it a try @ginokok1996 when you got the time.

@ginokok1996
Copy link
Author

Screenshot_20240212_075257

Not sure if its something on my end yet, will check further.
But it now says that its up to date even though there are new versions.

@NissesSenap
Copy link
Collaborator

Well, that is worrying....
Don't know if OLM got some cronjob to push updates and if they do, when does it come. But I think you should have seen the other versions any way.

But I have done changes to the patch flow it was , 5.6.0 -> 5.6.1 -> 5.6.2 -> 5.6.3
And in 5.6.3 it says that you should skip 5.6.1.
So I guess it is 5.6.0 -> 5.6.2 -> 5.6.3 at least that was my intention.

And if this makes OLM break it will drive me nuts since that was the suggested workaround :D

@ginokok1996
Copy link
Author

OLM isn't making this easy haha.
Lets give it some time maybe it still needs to process some stuff.

@hubeadmin
Copy link
Collaborator

@ginokok1996 Can you please navigate to the search tab, and search for ClusterServiceVersion in the namespace o which you've installed the grafana-operator, that'll give us some clue as to what's going on

@ginokok1996
Copy link
Author

Screenshot_20240212_093732
Screenshot_20240212_093725

I also reinstalled grafana 5.6.0 and restarted all the OLM pods in the lifecycle namespace.
It still says there are no updates.

@avi-biton
Copy link

We are also facing the same issue in our clusters.
Automatic update from 5.6.0 fails with the same message

        install strategy failed: Deployment.apps
        "grafana-operator-controller-manager" is invalid: spec.selector: Invalid
        value:
        v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/managed-by":"olm",
        "app.kubernetes.io/name":"grafana-operator"},
        MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

@hubeadmin
Copy link
Collaborator

Hey folks, for now, can you try to delete the grafana-operator-controller-manager deployment? This should unblock the update and allow it to proceed to the latest release.
Apologies for the inconvenience.

I've tested this on one of my clusters, and it unblocked the update path. Unfortunately, those that tried to update to 5.6.2 from 5.6.1 will have to do this. 5.6.3 shouldn't have this issue (I hope)

@hubeadmin
Copy link
Collaborator

Hey folks, for now, can you try to delete the grafana-operator-controller-manager deployment? This should unblock the update and allow it to proceed to the latest release. Apologies for the inconvenience.

I've tested this on one of my clusters, and it unblocked the update path. Unfortunately, those that tried to update to 5.6.2 from 5.6.1 will have to do this. 5.6.3 shouldn't have this issue (I hope)

OLM should be able to re-deploy the operator with a new deployment, thus avoiding the "immutability issue". It shouldn't affect your running Grafana deployments

@avi-biton
Copy link

avi-biton commented Feb 12, 2024

I verified it in my development environment.
I deleted the grafana-operator-controller-manager and the operator upgraded successfully to 5.6.1 and then to 5.6.3

@ginokok1996
Copy link
Author

Still facing the issue that when you're on version 5.6.0 openshift thinks its up to date and there are no new versions.

@NissesSenap
Copy link
Collaborator

There is no solution for this other than what is written here: #1399 (comment) @ginokok1996 .

OLM have its limitations, and we can't force your cluster from doing what it was doing.
The easiest way forward you have is to just delete the old operator deployment, and it will get solved.
We have created an issue upstream about this, but it's not like a fix will come out soon.

@ginokok1996
Copy link
Author

Deleting the deployment will work to upgrade from 5.6.1 a higher version since it will then be allowed to change the labels.

However we can't even go from 5.6.0 to 5.6.1 now.
We would need to uninstall grafana and install the latest version.

Isn't the biggest issue but seems like a different problem than whats stated above.

@tkolo
Copy link

tkolo commented Feb 19, 2024

If you have the operator from community-operators catalog, you can try restarting (deleting) pods from openshift-marketplace/community-operatros deployment. That usually does the trick for me.

@NissesSenap
Copy link
Collaborator

What @tkolo wrote.

No matter, there is nothing the grafana-opreator maintainers can do to stop having your cluster in bad state. All we can do is apologize for the inconveniences and point to the workarounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants