Bug 1805774

Summary: [4.4] Multus should not cause machine to go not ready when a default SDN is updated
Product: OpenShift Container Platform Reporter: Douglas Smith <dosmith>
Component: NetworkingAssignee: Douglas Smith <dosmith>
Networking sub component: multus QA Contact: Weibin Liang <weliang>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aconstan, afield, danw, lmohanty, william.caban, wking
Version: 4.4   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1805987 (view as bug list) Environment:
Last Closed: 2020-05-04 11:38:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1805987    
Bug Blocks: 1805444    

Description Douglas Smith 2020-02-21 14:26:35 UTC
This bug was initially created as a copy of Bug #1805444

I am copying this bug because: 



Description of problem: This was discovered in the investigation of https://bugzilla.redhat.com/show_bug.cgi?id=1793635 (for upgrades)

The fix includes using the Multus "readinessindicatorfile".

How reproducible: During upgrades.

Comment 1 Douglas Smith 2020-02-21 16:21:57 UTC
PR @ https://github.com/openshift/multus-cni/pull/47

I think the only validation that can be done would be to spin up a cluster that includes that PR and enter the Multus damonset and see if the file /entrypoint.sh contains the string MULTUS_READINESS_INDICATOR_FILE if that's the case, I believe it should be (mostly) verified -- that code won't be exercised until the associate CNOs merge (and they can't pass CI until that code is there, e.g. that string I mentioned)

Comment 5 Weibin Liang 2020-02-25 14:24:22 UTC
Tested and verified in 4.4.0-0.nightly-2020-02-25-085441

[root@dhcp-41-193 FILE]# oc exec -it multus-2jh6t -- cat /entrypoint.sh | grep MULTUS_READINESS_INDICATOR_FILE
MULTUS_READINESS_INDICATOR_FILE=""
    echo -e "\t--readiness-indicator-file=$MULTUS_READINESS_INDICATOR_FILE (used only with --multus-conf-file=auto)"
            MULTUS_READINESS_INDICATOR_FILE=$VALUE
      if [ ! -z "${MULTUS_READINESS_INDICATOR_FILE// }" ]; then
        READINESS_INDICATOR_FILE_STRING="\"readinessindicatorfile\": \"$MULTUS_READINESS_INDICATOR_FILE\","
[root@dhcp-41-193 FILE]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-02-25-085441   True        False         26m     Cluster version is 4.4.0-0.nightly-2020-02-25-085441
[root@dhcp-41-193 FILE]#

Comment 6 Dan Winship 2020-02-28 13:24:01 UTC
(In reply to Weibin Liang from comment #5)
> Tested and verified in 4.4.0-0.nightly-2020-02-25-085441
> 
> [root@dhcp-41-193 FILE]# oc exec -it multus-2jh6t -- cat /entrypoint.sh |
> grep MULTUS_READINESS_INDICATOR_FILE
> MULTUS_READINESS_INDICATOR_FILE=""
>     echo -e "\t--readiness-indicator-file=$MULTUS_READINESS_INDICATOR_FILE
> (used only with --multus-conf-file=auto)"

This confirms that the option *exists* in multus, but if you look at the multus Pod spec, we weren't actually passing that option to it yet, so it had no effect.

Comment 8 Stephen Cuppett 2020-03-02 12:21:44 UTC
*** Bug 1794142 has been marked as a duplicate of this bug. ***

Comment 12 Weibin Liang 2020-03-02 15:22:04 UTC
Tested and verified in 4.4.0-0.nightly-2020-03-02-081928


[root@dhcp-41-193 FILE]# oc describe daemonset.apps/multus | grep readiness-indicator-file
      --readiness-indicator-file=/var/run/multus/cni/net.d/10-ovn-kubernetes.conf

[root@dhcp-41-193 FILE]# oc describe pod multus-hfv8r | grep readiness-indicator-file
      --readiness-indicator-file=/var/run/multus/cni/net.d/10-ovn-kubernetes.conf

Comment 14 errata-xmlrpc 2020-05-04 11:38:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Comment 15 W. Trevor King 2021-04-05 17:46:42 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475