Bug 2070363

Summary: Failed to read database with dns hostname address
Product: Red Hat Enterprise Linux Fast Datapath Reporter: OvS team <ovs-bugzilla>
Component: openvswitch2.13Assignee: Ilya Maximets <i.maximets>
Status: MODIFIED --- QA Contact: Zhiqiang Fang <zfang>
Severity: unspecified Docs Contact:
Priority: high    
Version: FDP 22.ACC: ctrautma, jhsiao, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch2.13-2.13.0-137.el7fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description OvS team 2022-03-30 22:45:28 UTC
+++ This bug was initially created as a clone of Bug #2055097 +++

Description of problem:

Version-Release number of selected component (if applicable):
ovn21.12-central-21.12.0-25.el8fdp.x86_64
ovn21.12-vtep-21.12.0-25.el8fdp.x86_64
ovn21.12-21.12.0-25.el8fdp.x86_64
ovn21.12-host-21.12.0-25.el8fdp.x86_64

Context: hypershift ovn, run ovn nbdb and sbdb as statefulset.

Assuming ovndb statefulset pods ovnkube-master-guest-0/1/2 formed the quorum, guest-1 is nb leader. Delete both guest-0 and guest-1 pods, guest-2 become leader. 

Since statefulset is used, guest-0 gets re-created first (guest-1 needs to wait until guest-0 is ready, guest pod dns/hostname is only resolvable when pod is running), guest-0 finds the new leader guest-2,  then start nb with the following cmd (local=guest-0, remote=guest-2):


###
+ echo 'Cluster already exists for DB: nb'
+ initial_raft_create=false
+ wait 71
+ exec /usr/share/ovn/scripts/ovn-ctl --db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=ovnkube-master-guest-0.ovnkube-master-guest.hypershift-ovn.svc.cluster.local --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt --db-nb-cluster-remote-port=9643 --db-nb-cluster-remote-addr=ovnkube-master-guest-2.ovnkube-master-guest.hypershift-ovn.svc.cluster.local --db-nb-cluster-remote-proto=ssl '--ovn-nb-log=-vconsole:dbg -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' --db-nb-election-timer=10000 run_nb_ovsdb
2022-02-16T03:05:25.330Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
ovsdb-server: ovsdb error: error reading record 12 from OVN_Northbound log: ssl:ovnkube-master-guest-1.ovnkube-master-guest.hypershift-ovn.svc.cluster.local:9643: syntax error in address
[1]+  Exit 1                  exec /usr/share/ovn/scripts/ovn-ctl ${OVN_ARGS} --db-nb-cluster-remote-port=9643 --db-nb-cluster-remote-addr=${init_ip} --db-nb-cluster-remote-proto=ssl --ovn-nb-log="-vconsole:${OVN_LOG_LEVEL} -vfile:off -vPATTERN:console:${OVN_LOG_PATTERN_CONSOLE}" ${election_timer} run_nb_ovsdb
###

Guest-0 failed due to guest-1 hostname is not resolvable (syntax error in address).  


Expected results:

Guest-0 reconnects successfully until guest-1 become running.

Comment 1 OvS team 2022-03-30 22:45:31 UTC
* Wed Mar 30 2022 Open vSwitch CI <ovs-ci> - 2.13.0-137
- Merging upstream branch-2.13 [RH git: b8990f68eb]
    Commit list:
    3ceb5dbe92 ovsdb: raft: Fix inability to read the database with DNS host names. (#2055097)