Bug 2006744

Summary: OSP13->OSP16.1 stopped working for ovn deployments
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Lukas Bezdicka <lbezdick>
Component: ovn2.13Assignee: Mohammad Heib <mheib>
Status: CLOSED CURRENTRELEASE QA Contact: Jianlin Shi <jishi>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 21.HCC: ccamposr, ctrautma, dcbw, jiji, jishi, jlibosva, jpretori, kfida, mburns, mheib, mmichels, ovnteam, ralongi, sgolovat, spower
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1992705 Environment:
Last Closed: 2023-03-13 07:06:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1992705    
Bug Blocks: 2019451    

Comment 2 Karrar Fida 2021-10-29 18:55:27 UTC
@mheib do you know when you will be able to do the backport?

Comment 3 Karrar Fida 2021-10-29 18:56:32 UTC
we need to get the backport asap because the FDP team will need to test the updated OVN 2.13 build and we only have next week to do so.

Comment 4 Mohammad Heib 2021-10-31 16:53:55 UTC
@kfida sorry for the late response, i have completed the backporting of the commit above an submitted my change to ovn2.13 gerrit repo for review.

unfortunately,  i don't know how i can test it with osp16.1 so i created a build with my change and i will appreciate if you or @lbezdick can install my build in the link below and test it.

build link:
A yum repository for the build of ovn2.13-20.12.0-187.el8fdp (task 40715836) is available at:

http://brew-task-repos.usersys.redhat.com/repos/official/ovn2.13/20.12.0/187.el8fdp/

You can install the rpms locally by putting this .repo file in your /etc/yum.repos.d/ directory:

http://brew-task-repos.usersys.redhat.com/repos/official/ovn2.13/20.12.0/187.el8fdp/ovn2.13-20.12.0-187.el8fdp.repo

RPMs and build logs can be found in the following locations:
http://brew-task-repos.usersys.redhat.com/repos/official/ovn2.13/20.12.0/187.el8fdp/aarch64/
http://brew-task-repos.usersys.redhat.com/repos/official/ovn2.13/20.12.0/187.el8fdp/x86_64/
http://brew-task-repos.usersys.redhat.com/repos/official/ovn2.13/20.12.0/187.el8fdp/s390x/
http://brew-task-repos.usersys.redhat.com/repos/official/ovn2.13/20.12.0/187.el8fdp/ppc64le/

thanks,

Comment 7 Karrar Fida 2021-11-02 13:42:04 UTC
@lbezdick any update on the testing of ovn2.13-20.12.0-187.el8fdp

Comment 8 Lukas Bezdicka 2021-11-02 16:57:16 UTC
I concluded that jobs failed for this reason:
ovn_metadata_agent starts while ovn_db is not reachable yet on controller as it's being redeployed.
The agent instead of trying to reconnect just crashes with error which triggers docker to restart it.
Docker starts restarting the agent with adding double the time between attempts.
By the time the service is available on controller node the docker on compute restarts the service in 30min.
This causes the CI failure as the metadata agent is not available for 30min while we check for it.


Issue described above has nothing to do with the ovn issue which I consider now as solved.

Comment 9 Jianlin Shi 2021-11-03 04:12:32 UTC
Tested with script in https://bugzilla.redhat.com/show_bug.cgi?id=1992705#c8:

reproduced on ovn-2.13-20.12.0-178.el8:

[root@wsfd-advnetlab20 bz2006744]# bash -x rep.sh                                                   
+ systemctl start openvswitch                                                                            
+ ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:20.0.175.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.175.26
+ systemctl restart ovn-controller                                                                  
+ ovs-vsctl set bridge br-int protocols=OpenFlow13,OpenFlow15                                                                                                        
ovs-vsctl: no row "br-int" in table Bridge                                                                                                                                                                 
+ ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2    
ovs-vsctl: no bridge named br-int                                                                      
+ ip netns add ls1p2                                                                                  
+ ip link set ls1p2 netns ls1p2                                                                     
Cannot find device "ls1p2"                                                                            
+ ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02                                     
Cannot find device "ls1p2"                                                                            
+ ip netns exec ls1p2 ip link set ls1p2 up                                                             
Cannot find device "ls1p2"                                                                           
+ ip netns exec ls1p2 ip addr add 192.168.1.2/24 dev ls1p2                                            
Cannot find device "ls1p2"

[root@wsfd-advnetlab20 bz2006744]# grep WARN /var/log/ovn/ovn-controller.log                         
2021-11-03T03:58:50.582Z|00004|ovsdb_idl|WARN|Open_vSwitch database lacks Datapath table (database needs upgrade?)
2021-11-03T03:58:50.582Z|00005|ovsdb_idl|WARN|Open_vSwitch table in Open_vSwitch database lacks datapaths column (database needs upgrade?)
2021-11-03T03:58:50.583Z|00006|ovsdb_idl|WARN|Open_vSwitch database lacks Datapath table (database needs upgrade?)
2021-11-03T03:58:50.583Z|00007|ovsdb_idl|WARN|Open_vSwitch table in Open_vSwitch database lacks datapaths column (database needs upgrade?)
2021-11-03T03:58:50.586Z|00012|ovsdb_idl|WARN|transaction error: {"details":"datapaths is not a valid column name","error":"syntax error","syntax":"[\"bridges\",\"datapaths\"]"}
2021-11-03T03:58:50.586Z|00013|ovsdb_idl|WARN|transaction error: {"details":"datapaths is not a valid column name","error":"syntax error","syntax":"[\"bridges\",\"datapaths\"]"}
2021-11-03T03:58:50.586Z|00014|ovsdb_idl|WARN|transaction error: {"details":"datapaths is not a valid column name","error":"syntax error","syntax":"[\"bridges\",\"datapaths\"]"}
2021-11-03T03:58:50.587Z|00015|ovsdb_idl|WARN|transaction error: {"details":"datapaths is not a valid column name","error":"syntax error","syntax":"[\"bridges\",\"datapaths\"]"}
2021-11-03T03:58:50.587Z|00016|ovsdb_idl|WARN|transaction error: {"details":"datapaths is not a valid column name","error":"syntax error","syntax":"[\"bridges\",\"datapaths\"]"}

Verified on ovn2.13-20.12.0-187.el8:

[root@wsfd-advnetlab20 bz2006744]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.11-2.11.3-86.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch                                                 
ovn2.13-20.12.0-187.el8fdp.x86_64                                                                     
ovn2.13-host-20.12.0-187.el8fdp.x86_64
ovn2.13-central-20.12.0-187.el8fdp.x86_64

[root@wsfd-advnetlab20 bz2006744]# bash -x rep.sh                                                     
+ systemctl start openvswitch                                                                         
+ ovs-vsctl set open . external_ids:system-id=hv0 external_ids:ovn-remote=tcp:1.1.182.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.182.26                                      
+ systemctl restart ovn-controller                                                                    
+ ovs-vsctl set bridge br-int protocols=OpenFlow13,OpenFlow15                                         
+ ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2
+ ip netns add ls1p2                                                                                  
+ ip link set ls1p2 netns ls1p2                                                                       
+ ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02                                     
+ ip netns exec ls1p2 ip link set ls1p2 up                                                            
+ ip netns exec ls1p2 ip addr add 192.168.1.2/24 dev ls1p2                                            
[root@wsfd-advnetlab20 bz2006744]# ip netns exec ls1p2 ping 192.168.1.1 -c 1                          
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.                                                  
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.259 ms

--- 192.168.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.259/0.259/0.259/0.000 ms