Bug 2052393 - Failed to scaleup RHEL machine against OVN cluster due to jq tool is required by configure-ovs.sh
Summary: Failed to scaleup RHEL machine against OVN cluster due to jq tool is required...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.11.0
Assignee: Jaime Caamaño Ruiz
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks: 2052600
TreeView+ depends on / blocked
 
Reported: 2022-02-09 07:46 UTC by Yunfei Jiang
Modified: 2022-08-10 10:49 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: missing jq package Consequence: Scale up of a cluster with RHEL nodes fails on node deployment Fix: install jq package on deployment Result: Scale up of a cluster with RHEL nodes succceeds
Clone Of:
Environment:
Last Closed: 2022-08-10 10:48:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12370 0 None open Bug 2052393: Install jq package required by configure-ovs 2022-02-09 12:17:08 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:49:09 UTC

Description Yunfei Jiang 2022-02-09 07:46:17 UTC
Description of problem:

configure-ovs is failing because jq is not available in RHEL.

From must-gather logs:
Feb 08 11:10:06.074973 ip-10-0-53-215 configure-ovs.sh[1267]: ++ ip -j a show dev ens5
Feb 08 11:10:06.075928 ip-10-0-53-215 configure-ovs.sh[1267]: ++ jq '.[0].addr_info | map(. | select(.family == "inet")) | length'
Feb 08 11:10:06.079449 ip-10-0-53-215 configure-ovs.sh[1267]: + num_ipv4_addrs=1
Feb 08 11:10:06.080303 ip-10-0-53-215 configure-ovs.sh[1267]: + '[' 1 -gt 0 ']'
Feb 08 11:10:06.080303 ip-10-0-53-215 configure-ovs.sh[1267]: + extra_if_brex_args+='ipv4.may-fail no '
Feb 08 11:10:06.082385 ip-10-0-53-215 configure-ovs.sh[1267]: ++ ip -j a show dev ens5
Feb 08 11:10:06.083353 ip-10-0-53-215 configure-ovs.sh[1267]: ++ jq '.[0].addr_info | map(. | select(.family == "inet6" and .scope != "link")) | length'
Feb 08 11:10:06.085296 ip-10-0-53-215 configure-ovs.sh[1267]: + num_ip6_addrs=0
Feb 08 11:10:06.086110 ip-10-0-53-215 configure-ovs.sh[1267]: + '[' 0 -gt 0 ']'


> oc get node -owide
NAME                                        STATUS     ROLES    AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-50-115.us-east-2.compute.internal   Ready      master   19h   v1.23.3+d99c04f   10.0.50.115   <none>        Red Hat Enterprise Linux CoreOS 410.84.202202070040-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8
ip-10-0-50-155.us-east-2.compute.internal   Ready      worker   18h   v1.23.3+d99c04f   10.0.50.155   <none>        Red Hat Enterprise Linux CoreOS 410.84.202202070040-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8
ip-10-0-54-39.us-east-2.compute.internal    NotReady   worker   12h   v1.23.3+d99c04f   10.0.54.39    <none>        Red Hat Enterprise Linux 8.5 (Ootpa)                            4.18.0-348.12.2.el8_5.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8
ip-10-0-55-224.us-east-2.compute.internal   Ready      worker   18h   v1.23.3+d99c04f   10.0.55.224   <none>        Red Hat Enterprise Linux CoreOS 410.84.202202070040-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8
ip-10-0-57-184.us-east-2.compute.internal   NotReady   worker   12h   v1.23.3+d99c04f   10.0.57.184   <none>        Red Hat Enterprise Linux 8.5 (Ootpa)                            4.18.0-348.12.2.el8_5.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8
ip-10-0-59-38.us-east-2.compute.internal    Ready      master   19h   v1.23.3+d99c04f   10.0.59.38    <none>        Red Hat Enterprise Linux CoreOS 410.84.202202070040-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8
ip-10-0-66-173.us-east-2.compute.internal   Ready      worker   18h   v1.23.3+d99c04f   10.0.66.173   <none>        Red Hat Enterprise Linux CoreOS 410.84.202202070040-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8
ip-10-0-67-199.us-east-2.compute.internal   Ready      master   19h   v1.23.3+d99c04f   10.0.67.199   <none>        Red Hat Enterprise Linux CoreOS 410.84.202202070040-0 (Ootpa)   4.18.0-305.34.2.el8_4.x86_64   cri-o://1.23.0-112.rhaos4.10.gitb527000.el8

> oc get co network -oyaml
<--SNIP-->
status:
  conditions:
  - lastTransitionTime: "2022-02-07T09:01:00Z"
    status: "False"
    type: ManagementStateDegraded
  - lastTransitionTime: "2022-02-07T15:50:20Z"
    message: |-
      DaemonSet "openshift-ovn-kubernetes/ovn-ipsec" rollout is not making progress - last change 2022-02-07T15:49:32Z
      DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-dft7c is in CrashLoopBackOff State
      DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-n4kp4 is in CrashLoopBackOff State
      DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-02-07T15:49:32Z
    reason: RolloutHung
    status: "True"
    type: Degraded
<--SNIP-->

> oc get pod -n openshift-ovn-kubernetes  | grep -v "Running" | grep -v "Completed"
NAME                   READY   STATUS             RESTARTS          AGE
ovn-ipsec-44b6s        0/1     Init:Error         107 (7m15s ago)   12h
ovn-ipsec-x926k        0/1     Init:0/1           107 (6m44s ago)   12h
ovnkube-node-dft7c     4/5     CrashLoopBackOff   148 (4m37s ago)   12h
ovnkube-node-n4kp4     4/5     CrashLoopBackOff   148 (3m32s ago)   12h

> oc logs -n openshift-ovn-kubernetes ovnkube-node-n4kp4 -c ovnkube-node
I0208 03:55:10.638403  346842 ovs.go:207] exec(1): stdout: ""
I0208 03:55:10.638434  346842 ovs.go:208] exec(1): stderr: ""
I0208 03:55:10.638454  346842 ovs.go:204] exec(2): /usr/bin/ovs-vsctl --timeout=15 -- clear bridge br-int netflow -- clear bridge br-int sflow -- clear bridge br-int ipfix
I0208 03:55:10.663033  346842 ovs.go:207] exec(2): stdout: ""
I0208 03:55:10.663060  346842 ovs.go:208] exec(2): stderr: ""
I0208 03:55:10.670858  346842 node.go:386] Node ip-10-0-57-184.us-east-2.compute.internal ready for ovn initialization with subnet 10.130.2.0/23
I0208 03:55:10.670890  346842 ovs.go:204] exec(3): /usr/bin/ovn-sbctl --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --bootstrap-ca-cert=/ovn-ca/ca-bundle.crt --db=ssl:10.0.50.115:9642,ssl:10.0.59.38:9642,ssl:10.0.67.199:9642 --timeout=15 --columns=up list Port_Binding
I0208 03:55:10.706348  346842 ovs.go:207] exec(3): stdout: "up                  : false\n\nup                  : false\n\nup                  : true\n\nup                  : true\n\nup                  : false\n\nup                  : false\n\nup                  

<--SNIP-->

I0208 03:55:10.706525  346842 ovs.go:208] exec(3): stderr: ""
I0208 03:55:10.706542  346842 node.go:315] Detected support for port binding with external IDs
I0208 03:55:10.706654  346842 ovs.go:204] exec(4): /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-ip-10-0-57- -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=8855 external-ids:iface-id=k8s-ip-10-0-57-184.us-east-2.compute.internal
I0208 03:55:10.731323  346842 ovs.go:207] exec(4): stdout: ""
I0208 03:55:10.731348  346842 ovs.go:208] exec(4): stderr: ""
I0208 03:55:10.731365  346842 ovs.go:204] exec(5): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface ovn-k8s-mp0 mac_in_use
I0208 03:55:10.756306  346842 ovs.go:207] exec(5): stdout: "\"ba:e9:29:b9:49:ae\"\n"
I0208 03:55:10.756329  346842 ovs.go:208] exec(5): stderr: ""
I0208 03:55:10.756351  346842 ovs.go:204] exec(6): /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 mac=ba\:e9\:29\:b9\:49\:ae
I0208 03:55:10.780505  346842 ovs.go:207] exec(6): stdout: ""
I0208 03:55:10.780529  346842 ovs.go:208] exec(6): stderr: ""
I0208 03:55:10.818558  346842 gateway_init.go:261] Initializing Gateway Functionality
I0208 03:55:10.818801  346842 gateway_localnet.go:131] Node local addresses initialized to: map[10.0.57.184:{10.0.48.0 fffff000} 10.130.2.2:{10.130.2.0 fffffe00} 127.0.0.1:{127.0.0.0 ff000000} ::1:{::1 ffffffffffffffffffffffffffffffff} fe80::8e:3dff:fe6a:8ce4:{fe80:: ffffffffffffffff0000000000000000} fe80::acc3:8dff:fe0a:26db:{fe80:: ffffffffffffffff0000000000000000} fe80::b8e9:29ff:feb9:49ae:{fe80:: ffffffffffffffff0000000000000000}]
I0208 03:55:10.818994  346842 helper_linux.go:74] Found default gateway interface eth0 10.0.48.1
F0208 03:55:10.819075  346842 ovnkube.go:133] could not find IP addresses: failed to lookup link br-ex: Link not found


OCP Version:
4.10.0-0.nightly-2022-02-07-032308

How reproducible:
Always

Steps to Reproduce:
Create an OVN cluster
Scaleup a RHEL machine

Actual results:
New RHEL machine not ready

Expected results:
Scaleup finished successfully

Suggestion:
Avoid to use jq tool in configure-ovs.sh script
Or update documents that mention the jq tool is required for scaleup RHEL machines.

Additional info:
After installing jq tool in RHEL machine before doing scaleup, the scaleup process finished successfully.

Comment 2 Jaime Caamaño Ruiz 2022-02-09 13:57:00 UTC
Setting back to blocker+ to respect the initial assessment made by @yunjiang

Comment 8 errata-xmlrpc 2022-08-10 10:48:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.