> The CSO status controller is repeatedly getting "context canceled" errors, though: context canceled is a very generic error referring to a client timeout this has nothing to do with etcd directly. > Nearly 1/3 of the messages are this error from within the etcd operator are this context canceled. The only peculiar thing to me is why only the vsphere problem detector is reporting as degraded. While I agree this chatter is distracting it is not the root cause, etcd the operand is running fine. > dns 4.6.18 True False False 6d > machine-config 4.6.18 True False False 5h3m > network 4.6.18 True False False 8h57m I think you want to understand this issue first right. Why is MCO, network and dns are failing to upgrade, perhaps machine-config-daemon logs can report? I am moving this to MCO as I am curious why given the old version of MCO in the context of upgrade why the operator is not Degraded,at a minimum Progressing.?
Created attachment 1762672 [details] mcodump
I think it's not related to MCO as CVO is just trying to follow the order of the upgrades and those operators (dns,mco,network) are simply the next ones in the order list after we solve the storage operator issue, which seemed that it was failing due to etcd storage spike but now it's blocking the rest of the upgrades. I think we're starting to run in circles as the root cause is not clear, but in my humble opinion, a workaround from storage/cvo should be the aiming instead. Let me know if you need anything.
I cleared blocker+, because the issue affects all existing 4.7.z [1]. While, updating within 4.7.z does introduce some of the triggering problem-detector-interruptions, the workaround of setting storage Unmanaged should help folks who need to resolve the Degraded condition before they can update to a 4.7.z with the fix. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1939555#c3
Verified with: 4.7.0-0.nightly-2021-03-25-013802
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.5 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1005