Bug 1725220
| Summary: | CNO stuck in Progressing=True as it doesn't refresh Multus DS state on startup | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vadim Rutkovsky <vrutkovs> |
| Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> |
| Status: | CLOSED ERRATA | QA Contact: | zhaozhanqi <zzhao> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.z | CC: | aos-bugs, bbennett, danw, weliang |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:32:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Vadim Rutkovsky
2019-06-28 17:51:33 UTC
It's not just multus. The problem is that CNO doesn't ensure that the operator status is correct when it starts up. It only updates it when something changes while the CNO is running. So the order of events is: 1. multus pod is killed 2. multus daemonset updates to reflect that we're missing a multus pod 3. CNO sees the daemonset change, updates operator state to Progressing 4. CNO is killed 5. multus pod is restarted, multus daemonset updates to say it's OK 6. CNO is restarted, does nothing 7. (5 minutes later) CNO does a full resync, sees that nothing has changed, does nothing 8. (eventually) another multus pod is killed and comes back, CNO finally fixes operator status I think Alexander fixed this. Over him to verify and close. Yes, this has been fixed with the PR: https://github.com/openshift/cluster-network-operator/pull/232 I am assigning "modified" for QA testing. Tested and verified in v4.2.0-0.ci-2019-07-30-115127, CNO would not stuck in Progressing=True any more. [root@dhcp-41-193 ~]# oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network False True False 43s [root@dhcp-41-193 ~]# oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network False True False 63s [root@dhcp-41-193 ~]# oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network 4.2.0-0.ci-2019-07-30-115127 True False False 2s [root@dhcp-41-193 ~]# oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network 4.2.0-0.ci-2019-07-30-115127 True False False 14s [root@dhcp-41-193 ~]# oc get co network NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network 4.2.0-0.ci-2019-07-30-115127 True False False 95s [root@dhcp-41-193 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |