Created attachment 1824629 [details] KS - Green interval indicates an operator going Degraded (red, yellow and blue intervals corresponds to master nodes getting upgraded) Description of problem: Checking the last 115 from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade/ jobs ("1433654493412593664", "1433695382893760512", "1433703446451589120", "1433737883625197568", "1433745275154862080", "1433780536173662208", "1433788182586986496", "1433819059568250880", "1433839260707852288", "1433856995122745344", "1433877723872235520", "1433903880265011200", "1433926818208944128", "1433941032432570368", "1433964704052547584", "1433985471888756736", "1434007529066598400", "1434029702175002624", "1434051521623887872", "1434073950928769024", "1434095087838564352", "1434114952909557760", "1434143394703085568", "1434167027945181184", "1434184382335160320", "1434205390085558272", "1434229565730852864", "1434247354847858688", "1434269713793290240", "1434294199980658688", "1434343280841068544", "1434351140564111360", "1434399506228580352", "1434409340411842560", "1434447076791422976", "1434453531875610624", "1434487353568661504", "1434529592227401728", "1434558739649662976", "1434588325590601728", "1434602912591384576", "1434637821036990464", "1434649790867574784", "1434681235531108352", "1434705644606197760", "1434721329231171584", "1434769769571028992", "1434779545566711808", "1434810854880055296", "1434932999106859008", "1435287204233482240", "1435293628325957632", "1435335775729225728", "1435369395000971264", "1435562576384626688", "1435569519127957504", "1435604448561860608", "1435622695654920192", "1435648148486754304", "1435925603495710720", "1435932899873394688", "1435966907135037440", "1436003421353152512", "1436330398035480576", "1436363496152371200", "1436404814526287872", "1436481246967369728", "1436843648141496320", "1437206040121708544", "1437347542516895744", "1437369972601917440", "1437438673279782912", "1437781648878866432", "1437843290115280896", "1437860876945199104", "1438097470620962816", "1438107499562536960", "1438147825039839232", "1438188055109308416", "1438206454417854464", "1438245936542257152", "1438308798660874240", "1438344115946262528", "1438432227674296320", "1438439719040978944", "1438482509896617984", "1438508208925708288", "1438535336526352384", "1438558204857421824", "1438585684343394304", "1438630961670524928", "1438688875617718272", "1438696431564099584", "1438753378527088640", "1438776960233771008", "1438846270826352640", "1438892506228985856", "1438914022194810880", "1438955520907022336", "1438980934777966592", "1439017459603476480", "1439045437314043904", "1439078926621085696", "1439129330985734144", "1439192224179949568", "1439257819504185344", "1439323099492257792", "1439386032209399808", "1439461541865852928", "1439526935569895424", "1439588831945822208", "1439652767672045568", "1439718235556548608", "1439781112212623360", "1439867158736670720"), KS, KCM and KA goes Degraded at the end of each master node upgrade. From https://github.com/openshift/cluster-authentication-operator/blob/9efb3c1e5ac657aaa87f237d2c6aea586b7aad49/vendor/github.com/openshift/api/config/v1/types_cluster_operator.go#L161-L177 // Degraded indicates that the operator's current state does not match its // desired state over a period of time resulting in a lower quality of service. // The period of time may vary by component, but a Degraded state represents // persistent observation of a condition. ... // ... A service should not // report Degraded during the course of a normal upgrade Given the operator is going through an upgrade, reporting condition/Degraded=True is incorrect. The important piece of information here is "Degraded state represents persistent observation of a condition". The reported issue is not persistent, only temporary.
Created attachment 1824630 [details] KCM - Green interval indicates an operator going Degraded (red, yellow and blue intervals corresponds to master nodes getting upgraded)
Created attachment 1824632 [details] KA - Green interval indicates an operator going Degraded (red, yellow and blue intervals corresponds to master nodes getting upgraded)
Created attachment 1824633 [details] KS - Green interval indicates an operator going Degraded (red, yellow and blue intervals corresponds to master nodes getting upgraded)
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
Will need more time to implement the relevant changes for the static pods to get guarded by a PDB
Need more time to analyze why the CI tests in https://github.com/openshift/cluster-kube-apiserver-operator/pull/1275 are failing.
Verified bug in the build below and i see that KA, KS & KCM did not go to degraded state during master nodes upgrade. Below is the proceduer i have followed to verify the same. Procedure followed: ================== 1) Install 4.9 cluster 2) Pause worker node upgrade using the command oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/worker 3) upgrade master nodes by running the 'oc adm upgrade --to-image=<version>' command to 4.10.0-rc.0 4) I did not see any operators like KA, KCM & KS going to degraded state and the upgrade went fine. NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.0 True False 117m Cluster version is 4.10.0-rc.0 Based on the above moving the bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056