Bug 1641814

Summary: [free-int] prometheus operator repeatedly logging: updating statefulset failed
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: palonsor
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:40:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
statefulset content
none
prometheus-operator deployment file none

Description Justin Pierce 2018-10-22 20:48:00 UTC
Description of problem:

$  oc edit pod prometheus-operator-579779cd5c-n8jqt
E1022 20:36:04.786726       1 operator.go:278] Sync "openshift-monitoring/main" failed: updating statefulset failed: StatefulSet.apps "alertmanager-main" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden.
level=info ts=2018-10-22T20:36:21.018246178Z caller=operator.go:402 component=alertmanageroperator msg="sync alertmanager" key=openshift-monitoring/main
E1022 20:36:21.041111       1 operator.go:278] Sync "openshift-monitoring/main" failed: updating statefulset failed: StatefulSet.apps "alertmanager-main" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden.
level=info ts=2018-10-22T20:36:47.988566504Z caller=operator.go:402 component=alertmanageroperator msg="sync alertmanager" key=openshift-monitoring/main
E1022 20:36:48.008604       1 operator.go:278] Sync "openshift-monitoring/main" failed: updating statefulset failed: StatefulSet.apps "alertmanager-main" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden.
level=info ts=2018-10-22T20:37:16.90989681Z caller=operator.go:402 component=alertmanageroperator msg="sync alertmanager" key=openshift-monitoring/main
E1022 20:37:16.924442       1 operator.go:278] Sync "openshift-monitoring/main" failed: updating statefulset failed: StatefulSet.apps "alertmanager-main" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden.
level=info ts=2018-10-22T20:38:29.030985292Z caller=operator.go:402 component=alertmanageroperator msg="sync alertmanager" key=openshift-monitoring/main
E1022 20:38:29.049614       1 operator.go:278] Sync "openshift-monitoring/main" failed: updating statefulset failed: StatefulSet.apps "alertmanager-main" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden.


Version-Release number of selected component (if applicable):
v3.11.16

Comment 1 Justin Pierce 2018-10-22 20:51:21 UTC
Created attachment 1496505 [details]
statefulset content

Comment 2 Junqi Zhao 2018-10-23 01:56:22 UTC
There is another error "unable to retrieve auth token: invalid username/password" 
# oc -n openshift-monitoring describe pod node-exporter-vdj4z
Events:
  Type     Reason   Age              From                                    Message
  ----     ------   ----             ----                                    -------
  Warning  Failed   1m (x2 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Error: ErrImagePull
  Normal   BackOff  1m (x2 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Back-off pulling image "registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v3.11.28"
  Warning  Failed   1m (x2 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Error: ImagePullBackOff
  Normal   BackOff  1m (x2 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Back-off pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11.28"
  Warning  Failed   1m (x2 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Error: ImagePullBackOff
  Warning  Failed   1m (x3 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Failed to pull image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11.28": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Normal   Pulling  1m (x3 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  pulling image "registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v3.11.28"
  Warning  Failed   1m (x3 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Failed to pull image "registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter:v3.11.28": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password
  Warning  Failed   1m (x3 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  Error: ErrImagePull
  Normal   Pulling  1m (x3 over 1m)  kubelet, ip-172-31-49-167.ec2.internal  pulling image "registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy:v3.11.28"
**********************************************************************
# oc -n openshift-monitoring get pod -o wide | grep node-exporter-vdj4z
node-exporter-vdj4z                           0/2       ImagePullBackOff   0          4m        172.31.49.167   ip-172-31-49-167.ec2.internal   <none>

Comment 3 Junqi Zhao 2018-10-23 02:10:13 UTC
find in the attached "statefulset content", I see
registry.reg-aws.openshift.com:443/openshift3/ose-configmap-reloader:v3.11.7
registry.reg-aws.openshift.com:443/openshift3/ose-prometheus-config-reloader:v3.11.7

other version is v3.11.0-0.21.0

Comment 4 Junqi Zhao 2018-10-23 02:40:53 UTC
From
# oc -n openshift-monitoring get deployment.apps/prometheus-operator -oyaml

        - --config-reloader-image=registry.reg-aws.openshift.com:443/openshift3/ose-configmap-reloader:v3.11.28
        - --prometheus-config-reloader=registry.reg-aws.openshift.com:443/openshift3/ose-prometheus-config-reloader:v3.11.28

maybe it is the version issue caused the problem, we should make sure it uses the same version

Comment 5 Junqi Zhao 2018-10-23 02:41:31 UTC
Created attachment 1496593 [details]
prometheus-operator deployment file

Comment 6 Junqi Zhao 2018-10-24 05:55:03 UTC
see https://github.com/kubernetes/kubernetes/issues/66137

Comment 7 Frederic Branczyk 2018-10-24 20:04:33 UTC
This has been fixed in a newer version of the Prometheus Operator, so we should probably bump the version in the 3.11 release. For now what you can do is just delete the underlying StatefulSet.

Comment 12 Junqi Zhao 2019-01-27 10:32:09 UTC
Change back to MODIFIED, since free-int environment is still v3.11.69

Comment 15 Junqi Zhao 2019-04-12 09:23:21 UTC
$  oc -n openshift-monitoring logs $(oc -n openshift-monitoring get pod | grep prometheus-operator | awk '{print $1}') | grep Forbidden
nothing returned

no such issue now, payload: 4.0.0-0.nightly-2019-04-10-182914

Comment 17 errata-xmlrpc 2019-06-04 10:40:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758