Bug 1867992
| Summary: | [OVN] shared gateway does not work with RHEL worker nodes | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> | ||||
| Component: | Networking | Assignee: | Jacob Tanenbaum <jtanenba> | ||||
| Networking sub component: | ovn-kubernetes | QA Contact: | zhaozhanqi <zzhao> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | urgent | ||||||
| Priority: | urgent | CC: | anbhat, anusaxen, danw, huirwang, jtanenba, ricarril, rteague, trozet, vrutkovs, yanyang | ||||
| Version: | 4.6 | Keywords: | TestBlocker | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.6.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-10-27 16:27:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1871935 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
zhaozhanqi
2020-08-11 10:56:33 UTC
Created attachment 1711078 [details]
ovn-node-logs
Looking at your setup, your new nodes failed during ovs-configuration service: Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 systemd[1]: Starting Configures OVS with proper host networking configuration... Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: + iface= Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: + counter=0 Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: + '[' 0 -lt 12 ']' Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: ++ ip -j route show default Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 systemd[1]: ovs-configuration.service: main process exited, code=exited, status=127/n/a Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: ++ jq -r '.[0].dev' Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: /usr/local/bin/configure-ovs.sh: line 14: jq: command not found Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 systemd[1]: Failed to start Configures OVS with proper host networking configuration. Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: Option "-j" is unknown, try "ip -help". Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 systemd[1]: Unit ovs-configuration.service entered failed state. Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 configure-ovs.sh[1038]: + iface= Aug 11 12:31:54 zzhaoovn46-bpqgh-rhel-0 systemd[1]: ovs-configuration.service failed. sh-4.2# which jq which: no jq in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) you are missing jq somehow on this node. Any ideas how that would be possible? Are you sure the new nodes have the right OS image? Looks like your new nodes have wrong OS image: zzhaoovn46-bpqgh-master-2 Ready master 28h v1.19.0-rc.2+5241b27-dirty 10.0.0.7 <none> Red Hat Enterprise Linux CoreOS 46.82.202008102140-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-71.rhaos4.6.git19455e9.el8-dev zzhaoovn46-bpqgh-rhel-0 NotReady worker 27h v1.19.0-rc.2+9932f63-dirty 10.0.1.6 <none> Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.18.2.el7.x86_64 cri-o://1.19.0-71.rhaos4.6.git19455e9.el7-dev I think we should be using RHEL 8.2 and not 7.8. Can someone confirm? If so, for RHEL 8.2 we need to answer the following questions: 1. Does rhel 8.2 have jq by default? If not, thats a problem 2. There were NetworkManager specific fixes that went into a hotfix build for RHCOS 4.6, that are supposed to land in a different RHEL 8.2 z stream later, so without that this also wont work: https://bugzilla.redhat.com/show_bug.cgi?id=1857775 https://bugzilla.redhat.com/show_bug.cgi?id=1820052 RHEL78 is always supported in 4.x version (4.3/4.4/4.5) and no issue before. I'm not sure exactly how non-RHCOS RHEL nodes work, but it sounds like we need to just make sure jq gets installed on them. I assume there must already be infrastructure somewhere (MCO?) for ensuring that the RPMs we need are available on all nodes... (In reply to Dan Winship from comment #7) > I'm not sure exactly how non-RHCOS RHEL nodes work, but it sounds like we > need to just make sure jq gets installed on them. I assume there must > already be infrastructure somewhere (MCO?) for ensuring that the RPMs we > need are available on all nodes... It's BYO RHEL, so I think the user would have to include the package. I'm not sure if MCO can install the package. An alternative is we could just remove using jq from the script. Additionally we need fixes backported for NM OVS from 8.2 into 7.9z: https://bugzilla.redhat.com/show_bug.cgi?id=1852106 https://bugzilla.redhat.com/show_bug.cgi?id=1820052 cc'ing Russel as `jq` needs to be installed on hosts using openshift-ansible Support packages are installed on RHEL workers based on this list: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/defaults/main.yml#L20 If jq is required, it would need to be added to that list. `jq` has not been a requirement for any components previously. We can remove using jq, I was going to hold off until we can verify if we can get backports for the NM OVS bugs. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |