Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1896841

Summary: openshift-monitoring operator degraded on Azure
Product: OpenShift Container Platform Reporter: Mohit Sheth <msheth>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.5CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 16:51:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mohit Sheth 2020-11-11 16:38:13 UTC
Description of problem:
Installed OCP 4.5 cluster 3 times in last day or so but had the monitoring operator degraded all 3 times

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Azure IPI install
2. Check out the degraded operators

Actual results:
openshift-monitoring operator is degraded


Expected results:
openshift-monitoring is not degraded

Additional info:
link to must-gather http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/msheth/azure-monitoring-operator-degraded/must-gather-20201106-165137.tar.xz

Comment 1 Sergiusz Urbaniak 2020-11-11 16:51:40 UTC
Thank you for the must-gather, that helped a lot finding the root cause. Looking at the prometheus-operator logs I see the following:

$ less ./pods/prometheus-operator-58658d8d88-n2fkp/prometheus-operator/prometheus-operator/logs/current.log
...
2020-11-06T16:35:12.93027497Z level=info ts=2020-11-06T16:35:12.93017817Z caller=operator.go:498 component=alertmanageroperator msg="resolving illegal update of Alertmanager StatefulSet" details="&StatusDetails{Name:alertmanager-main,Group:apps,Kind:StatefulSet,Causes:[]StatusCause{StatusCause{Type:FieldValueForbidden,Message:Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden,Field:spec,},},RetryAfterSeconds:0,UID:,}"
...
2020-11-06T16:35:25.246000242Z level=info ts=2020-11-06T16:35:25.245916842Z caller=operator.go:1189 component=prometheusoperator msg="resolving illegal update of Prometheus StatefulSet" details="&StatusDetails{Name:prometheus-k8s,Group:apps,Kind:StatefulSet,Causes:[]StatusCause{StatusCause{Type:FieldValueForbidden,Message:Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden,Field:spec,},},RetryAfterSeconds:0,UID:,}"
...

which was already reported in https://bugzilla.redhat.com/show_bug.cgi?id=1880646, being also a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1887354 and fixed in https://github.com/openshift/cluster-monitoring-operator/pull/956 for 4.5.z.

*** This bug has been marked as a duplicate of bug 1880646 ***