Bug 2070343
Summary: | Failed to read database with dns hostname address | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | OvS team <ovs-bugzilla> |
Component: | openvswitch2.15 | Assignee: | Ilya Maximets <i.maximets> |
Status: | CLOSED ERRATA | QA Contact: | Rick Alongi <ralongi> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | FDP 22.A | CC: | ctrautma, hewang, i.maximets, jhsiao, pparasur, ralongi, tredaelli |
Target Milestone: | --- | ||
Target Release: | FDP 22.D | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openvswitch2.15-2.15.0-88.el8fdp | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-27 18:14:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
OvS team
2022-03-30 21:15:50 UTC
* Wed Mar 30 2022 Open vSwitch CI <ovs-ci> - 2.15.0-88 - Merging upstream branch-2.15 [RH git: a03b5c62e4] Commit list: 0a3867a9a9 ovsdb: raft: Fix inability to read the database with DNS host names. (#2055097) Ilya, Is this something that can be reproduced using openvswitch alone or does it require a layered product? Thanks, Rick (In reply to Rick Alongi from comment #4) > Is this something that can be reproduced using openvswitch alone or does it > require a layered product? This requires a DNS server. The sequence should be something like this: 1. Create 3 DNS names for 3 severs. 2. Start the ovsdb cluster using these names instead of IP addresses. 3. Stop one of the servers and remove its DNS record. 4. Restart 2 remaining servers, they should continue to work. With the issue, 2 remaining servers will fail to start because they will not be able to resolve the name of the third server. Reproducer/Verification steps can be found at https://bugzilla.redhat.com/show_bug.cgi?id=2055097#c16 This will be tested when FDP 22.D is ready for testing. Reproducer/Verification steps below: # Provision three systems with RHEL-8.6 # Install ovs and ovn packages without the fix included: yum -y install http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/29.el8fdp/noarch/openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.15/2.15.0/87.el8fdp/x86_64/openvswitch2.15-2.15.0-87.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.12.0/30.el8fdp/x86_64/ovn-2021-21.12.0-30.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.12.0/30.el8fdp/x86_64/ovn-2021-central-21.12.0-30.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.12.0/30.el8fdp/x86_64/ovn-2021-host-21.12.0-30.el8fdp.x86_64.rpm # Start ovs and ovn processes: systemctl start openvswitch systemctl enable openvswitch systemctl enable ovn-controller systemctl enable ovn-northd systemctl start ovn-controller systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 # Additional config for ovs: yum -y install net-tools host_ip=$(ifconfig -a | grep inet | head -n 1 | awk '{print $2}' | tr -d "addr:") ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:${host_ip}:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=${host_ip} systemctl restart ovn-controller # Configure three systems: host1=netqe9.knqe.lab.eng.bos.redhat.com host2=netqe20.knqe.lab.eng.bos.redhat.com host3=netqe21.knqe.lab.eng.bos.redhat.com host1_ip=$(nslookup $host1 | grep Address | grep -v '#53' | awk '{print $NF}') host2_ip=$(nslookup $host2 | grep Address | grep -v '#53' | awk '{print $NF}') host3_ip=$(nslookup $host3 | grep Address | grep -v '#53' | awk '{print $NF}') # disable DNS client on each system: mv -f /etc/resolv.conf /etc/resolv.conf_saved # Create empty resolv.conf: touch /etc/resolv.conf # Test with no DNS or /etc/hosts configured: rm -f ./hosts.txt echo $host1 >> ./hosts.txt echo $host2 >> ./hosts.txt echo $host3 >> ./hosts.txt ping_list=$(grep -v $(hostname) ./hosts.txt) for i in $(echo $ping_list); do ping -c1 $i if [[ $? -ne 0 ]]; then echo "Ping of $i failed as expected: PASS" else echo "Ping of $i should have failed: FAIL" fi done # add host info to /etc/hosts file: echo -e "$host1_ip\t$host1" >> /etc/hosts echo -e "$host2_ip\t$host2" >> /etc/hosts echo -e "$host3_ip\t$host3" >> /etc/hosts # Test with /etc/hosts file configured: for i in $(echo $ping_list); do ping -c1 $i if [[ $? -eq 0 ]]; then echo "Ping of $i was successful: PASS" else echo "Ping of $i was unsuccessful: FAIL" fi done # execute on $host1: /usr/share/ovn/scripts/ovn-ctl --db-nb-addr=$host1 --db-nb-create-insecure-remote=yes --db-sb-addr=$host1 --db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=$host1 --db-sb-cluster-local-addr=$host1 --ovn-northd-nb-db=tcp:$host1:6641,tcp:$host2:6641,tcp:$host3:6641 --ovn-northd-sb-db=tcp:$host1:6642,tcp:$host2:6642,tcp:$host3:6642 start_northd # execute on $host2: /usr/share/ovn/scripts/ovn-ctl --db-nb-addr=$host2 --db-nb-create-insecure-remote=yes --db-sb-addr=$host2 --db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=$host2 --db-sb-cluster-local-addr=$host2 --db-nb-cluster-remote-addr=$host1 --db-sb-cluster-remote-addr=$host1 --ovn-northd-nb-db=tcp:$host1:6641,tcp:$host2:6641,tcp:$host3:6641 --ovn-northd-sb-db=tcp:$host1:6642,tcp:$host2:6642,tcp:$host3:6642 start_northd # execute on $host3: /usr/share/ovn/scripts/ovn-ctl --db-nb-addr=$host3 --db-nb-create-insecure-remote=yes --db-sb-addr=$host3 --db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=$host3 --db-sb-cluster-local-addr=$host3 --db-nb-cluster-remote-addr=$host1 --db-sb-cluster-remote-addr=$host1 --ovn-northd-nb-db=tcp:$host1:6641,tcp:$host2:6641,tcp:$host3:6641 --ovn-northd-sb-db=tcp:$host1:6642,tcp:$host2:6642,tcp:$host3:6642 start_northd # stop nb_ovsdb on $host1: /usr/share/ovn/scripts/ovn-ctl stop_nb_ovsdb # delete $host1 entry from /etc/hosts file on $host2 and $host3: sed -i "/$host1_ip/d" /etc/hosts # Test on $host2 and $host3: ping -c1 $host1 if [[ $? -ne 0 ]]; then echo "Ping of $i failed as expected: PASS" else echo "Ping of $i should have failed: FAIL" fi # restart nb_ovsdb on $host2 and $host3: # host2: /usr/share/ovn/scripts/ovn-ctl --db-nb-addr=$host2 --db-nb-create-insecure-remote=yes --db-sb-addr=$host2 --db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=$host2 --db-sb-cluster-local-addr=$host2 --db-nb-cluster-remote-addr=$host1 --db-sb-cluster-remote-addr=$host1 --ovn-northd-nb-db=tcp:$host1:6641,tcp:$host2:6641,tcp:$host3:6641 --ovn-northd-sb-db=tcp:$host1:6642,tcp:$host2:6642,tcp:$host3:6642 restart_nb_ovsdb | tee output.log # host3: /usr/share/ovn/scripts/ovn-ctl --db-nb-addr=$host3 --db-nb-create-insecure-remote=yes --db-sb-addr=$host3 --db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=$host3 --db-sb-cluster-local-addr=$host3 --db-nb-cluster-remote-addr=$host1 --db-sb-cluster-remote-addr=$host1 --ovn-northd-nb-db=tcp:$host1:6641,tcp:$host2:6641,tcp:$host3:6641 --ovn-northd-sb-db=tcp:$host1:6642,tcp:$host2:6642,tcp:$host3:6642 restart_nb_ovsdb | tee output.log sleep 150s # Check for problem grep '/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory)' /var/log/ovn/ovn-northd.log grep 'Joining /etc/ovn/ovnnb_db.db to cluster' output.log | grep FAILED grep 'Starting ovsdb-nb' output.log | grep FAILED grep 'Waiting for OVN_Northbound to come up' output.log | grep FAILED ####################### # Repro of problem: openvswitch2.15-2.15.0-87.el8fdp ovn-2021-21.12.0-30.el8fdp ovn-2021-central-21.12.0-30.el8fdp ovn-2021-host-21.12.0-30.el8fdp RHEL-8.6.0-updates-20220510.0 kernel version: 4.18.0-372.9.1.el8.x86_64 [root@netqe20 ~]# /usr/share/ovn/scripts/ovn-ctl --db-nb-addr=$host2 --db-nb-create-insecure-remote=yes --db-sb-addr=$host2 --db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=$host2 --db-sb-cluster-local-addr=$host2 --db-nb-cluster-remote-addr=$host1 --db-sb-cluster-remote-addr=$host1 --ovn-northd-nb-db=tcp:$host1:6641,tcp:$host2:6641,tcp:$host3:6641 --ovn-northd-sb-db=tcp:$host1:6642,tcp:$host2:6642,tcp:$host3:6642 restart_nb_ovsdb | tee output.log Exiting ovnnb_db (39208) [ OK ] Joining /etc/ovn/ovnnb_db.db to cluster ovsdb-tool: ovsdb error: tcp:netqe9.knqe.lab.eng.bos.redhat.com:6643: syntax error in address [FAILED] Starting ovsdb-nb ovsdb-server: I/O error: /etc/ovn/ovnnb_db.db: open failed (No such file or directory) [FAILED] Waiting for OVN_Northbound to come up 2022-05-10T13:05:02Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting... 2022-05-10T13:05:02Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) 2022-05-10T13:05:03Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting... 2022-05-10T13:05:03Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) 2022-05-10T13:05:03Z|00005|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: waiting 2 seconds before reconnect 2022-05-10T13:05:05Z|00006|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting... 2022-05-10T13:05:05Z|00007|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) 2022-05-10T13:05:05Z|00008|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: waiting 4 seconds before reconnect 2022-05-10T13:05:09Z|00009|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting... 2022-05-10T13:05:09Z|00010|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) 2022-05-10T13:05:09Z|00011|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: continuing to reconnect in the background but suppressing further logging 2022-05-10T13:05:32Z|00012|fatal_signal|WARN|terminating with signal 14 (Alarm clock) /usr/share/openvswitch/scripts/ovs-lib: line 602: 39579 Alarm clock "$@" [FAILED] [root@netqe20 ~]# grep '/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory)' /var/log/ovn/ovn-northd.log 2022-05-10T13:03:26.276Z|00015|reconnect|INFO|unix:/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) 2022-05-10T13:03:28.279Z|00018|reconnect|INFO|unix:/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) 2022-05-10T13:03:32.283Z|00021|reconnect|INFO|unix:/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory) [root@netqe20 ~]# grep 'Joining /etc/ovn/ovnnb_db.db to cluster' output.log | grep FAILED Joining /etc/ovn/ovnnb_db.db to cluster [FAILED] [root@netqe20 ~]# grep 'Starting ovsdb-nb' output.log | grep FAILED Starting ovsdb-nb [FAILED] [root@netqe20 ~]# grep 'Waiting for OVN_Northbound to come up' output.log | grep FAILED Waiting for OVN_Northbound to come up [FAILED] [root@netqe20 ~]# Verification of fix: # Install ovs package with the fix included (FDP 22.D): yum -y update http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.15/2.15.0/99.el8fdp/x86_64/openvswitch2.15-2.15.0-99.el8fdp.x86_64.rpm [root@netqe20 ~]# rpm -qa | grep openvswitch openvswitch-selinux-extra-policy-1.0-29.el8fdp.noarch openvswitch2.15-2.15.0-99.el8fdp.x86_64 [root@netqe20 ~]# rpm -qa | grep ovn ovn-2021-central-21.12.0-30.el8fdp.x86_64 ovn-2021-21.12.0-30.el8fdp.x86_64 ovn-2021-host-21.12.0-30.el8fdp.x86_64 # Follow steps outlined above to set up config for test. # Database now restarts/reconnects without any problem: [root@netqe21 ~]# /usr/share/ovn/scripts/ovn-ctl --db-nb-addr=$host3 --db-nb-create-insecure-remote=yes --db-sb-addr=$host3 --db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=$host3 --db-sb-cluster-local-addr=$host3 --db-nb-cluster-remote-addr=$host1 --db-sb-cluster-remote-addr=$host1 --ovn-northd-nb-db=tcp:$host1:6641,tcp:$host2:6641,tcp:$host3:6641 --ovn-northd-sb-db=tcp:$host1:6642,tcp:$host2:6642,tcp:$host3:6642 restart_nb_ovsdb Exiting ovnnb_db (40605) [ OK ] Starting ovsdb-nb [ OK ] Waiting for OVN_Northbound to come up 2022-05-10T13:24:55Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting... 2022-05-10T13:24:55Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connected 2022-05-10T13:25:25Z|00003|fatal_signal|WARN|terminating with signal 14 (Alarm clock) /usr/share/openvswitch/scripts/ovs-lib: line 602: 40829 Alarm clock "$@" [FAILED] Note: Dev has said the intermittent alarm clock failure above is unrelated to this issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: openvswitch2.15 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4787 |