Bug 1580217
| Summary: | [ovn]ipv6 load balancer for layer4 on logical router doesn't work | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | haidong li <haili> |
| Component: | openvswitch | Assignee: | Mark Michelson <mmichels> |
| Status: | CLOSED ERRATA | QA Contact: | haidong li <haili> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.5 | CC: | atelang, atragler, kfida, lmanasko, mmichels, pvauter, tredaelli |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openvswitch-2.9.0-69.el7fdn | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-05 14:59:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I tried to reproduce this and I was unable to. When I set up an IPv6 load balancer with a port, it worked as expected. I noticed something suspicious in the output of `ovn-nbctl lr-lb-list`:
[root@dell-per730-19 ~]# ovn-nbctl ls-lb-list s2
UUID LB PROTO VIP IPs
a7b0f293-8897-43bd-ada5-61b67382ce45 tcp/udp 300::1 2001:db8:103::11,2001:db8:103::12
(null) [300::1]:8000 [2001:db8:103::11]:80,[2001:db8:103::12]:80
Notice how the PROTO is "(null)" for the VIP with a port number. When I run the command on my machine, it looks like this:
[vagrant@central ~]$ sudo ovn-nbctl lr-lb-list ro0
UUID LB PROTO VIP IPs
ad707ab6-3f78-4547-9e50-c8a0e1d8bb2d lb0 tcp [fd0f:f07:71c6:b050::100]:8000 [fd0f:0f07:71c6:af56::194]:8000,[fd0f:0f07:71c6:af56::195]:8000
tcp/udp fd0f:f07:71c6:b050::100 fd0f:0f07:71c6:af56::194
Notice that the PROTO is "tcp" for the VIP with a port number. The way I created my load balancer was to issue the following two commands:
ovn-nbctl lb-add lb0 fd0f:0f07:71c6:b050::100 fd0f:0f07:71c6:af56::194
ovn-nbctl lb-add lb0 [fd0f:0f07:71c6:b050::100]:8000 [fd0f:0f07:71c6:af56::194]:8000,[fd0f:0f07:71c6:af56::195]:8000
ovn-nbctl lr-lb-add ro0 lb0
Notice that I did not specify a protocol, but it defaulted to "tcp". Did you create your load balancers this way? Or did you add them directly to the database? If you add them directly to the database and you specify "tcp" as the protocol, does this issue still occur?
Yes I tested directly to the database.But the issue still exist in my environment after I set the TCP param or use the command you mentioned:
[root@hp-dl380g9-04 ovn]# ovn-nbctl lb-add lb0 300::1 2001:db8:103::11,2001:db8:103::12
[root@hp-dl380g9-04 ovn]# ovn-nbctl lb-add lb0 [300::1]:8000 [2001:db8:103::11]:80,[2001:db8:103::12]:80
[root@hp-dl380g9-04 ovn]# ovn-nbctl lr-lb-add r1 lb0
[root@hp-dl380g9-04 ovn]# ovn-nbctl lr-lb-list r1
UUID LB PROTO VIP IPs
22bdef9d-dc3d-45e0-8055-c68fd2f0cd73 lb0 tcp/udp 300::1 2001:db8:103::11,2001:db8:103::12
tcp [300::1]:8000 [2001:db8:103::11]:80,[2001:db8:103::12]:80
[root@hp-dl380g9-04 ovn]# ovn-nbctl lb-list
UUID LB PROTO VIP IPs
34b7145f-0d91-45e8-b3ad-42922d1a8b38 tcp/udp 30.0.0.2 172.16.103.11,172.16.103.12
(null) 30.0.0.2:8000 172.16.103.11:80,172.16.103.12:80
6529a06b-6e5b-4c12-8aca-9ea7798a906d tcp/udp 300::1 2001:db8:103::11,2001:db8:103::12
tcp [300::1]:8000 [2001:db8:103::11]:80,[2001:db8:103::12]:80
2e6e7e49-b0af-444e-95ad-8cad040b6483 tcp/udp 30.0.0.1 172.16.103.11,172.16.103.12
(null) 30.0.0.1:8000 172.16.103.11:80,172.16.103.12:80
22bdef9d-dc3d-45e0-8055-c68fd2f0cd73 lb0 tcp/udp 300::1 2001:db8:103::11,2001:db8:103::12
tcp [300::1]:8000 [2001:db8:103::11]:80,[2001:db8:103::12]:80
[root@hp-dl380g9-04 ovn]# virsh console hv1_vm00
Connected to domain hv1_vm00
Escape character is ^]
[root@localhost ~]# ping6 300::1
PING 300::1(300::1) 56 data bytes
64 bytes from 2001:db8:103::11: icmp_seq=1 ttl=63 time=1.51 ms
64 bytes from 2001:db8:103::11: icmp_seq=2 ttl=63 time=0.590 ms
--- 300::1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.590/1.050/1.510/0.460 ms
[root@localhost ~]# curl -g [300::1]:8000 >> log3.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0
[root@localhost ~]# curl -g [2001:db8:103::11]:80
i am vm1
[root@localhost ~]# curl -g [2001:db8:103::12]:80
i am vm2
By the way,can you please login the machines I used to check the configuration if convenient,the password is redhat
hp-dl380g9-04.rhts.eng.pek2.redhat.com
hp-dl388g8-09.rhts.eng.pek2.redhat.com
I figured out how to reproduce this locally. In my setup, on my logical router, I set options:chassis="central". In your setup, you set options:redirect-chassis="hv1" on the r1_s2 logical router port. I changed my configuration to use redirect-chassis on the logical router port and now I have the same problem. I will look into why this is happening and report back when I have a fix. I figured out the problem and have created a fix locally. The issue is that there is a rule for un-DNATting return traffic from the load balancer destination that does not get installed when using IPv6. The fix is to install this rule for IPv6. I have submitted this patch for review upstream: https://patchwork.ozlabs.org/patch/935066/ This has been committed upstream in OVS master and OVS 2.9 On second inspection, it turns out this is committed to master but not to 2.9. I am putting this back into POST until it is committed to the upstream 2.9 branch. I have sent an e-mail to Ben Pfaff requesting the backport. This is now backported to the 2.9 branch as well. This issue is verified on the latest version:
[root@localhost ~]# curl -g [300::1]:8000 >> log3.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 9 100 9 0 0 461 0 --:--:-- --:--:-- --:--:-- 500
[root@localhost ~]# echo $?
0
[root@localhost ~]# logout
Red Hat Enterprise Linux Server 7.5 (Maipo)
Kernel 3.10.0-862.el7.x86_64 on an x86_64
localhost login:
spawn virsh console hv1_vm00
Connected to domain hv1_vm00
Escape character is ^]
Red Hat Enterprise Linux Server 7.5 (Maipo)
Kernel 3.10.0-862.el7.x86_64 on an x86_64
localhost login: root
Password:
Last login: Tue Oct 9 10:30:13 on ttyS0
[root@localhost ~]# curl -g [300::1]:8000 >> log3.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 9 100 9 0 0 1428 0 --:--:-- --:--:-- --:--:-- 1800
job link:
https://beaker.engineering.redhat.com/jobs/2911725
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3500 |
Description of problem: ipv6 load balancer for layer4 on logical router doesn't work Version-Release number of selected component (if applicable): openvswitch-2.9.0-36.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-3.el7fdp.noarch openvswitch-ovn-common-2.9.0-36.el7fdp.x86_64 openvswitch-ovn-host-2.9.0-36.el7fdp.x86_64 openvswitch-ovn-central-2.9.0-36.el7fdp.x86_64 How reproducible: everytime Steps to Reproduce: In my environment for ipv6,layer 3 load balancer works well,but layer4 doesn't work. TOPO:every switch has two guests connected hv1_vm00----s2---------r1----------s3----hv0_vm01 | | | | hv1_vm00 hv0_vm00 [root@dell-per730-19 ovn]# ovn-nbctl list load_balancer _uuid : 685ad133-ff9a-4f6a-a7e4-f63d7ad07792 external_ids : {} name : "" protocol : [] vips : {"30.0.0.2"="172.16.103.11,172.16.103.12", "30.0.0.2:8000"="172.16.103.11:80,172.16.103.12:80"} _uuid : a7b0f293-8897-43bd-ada5-61b67382ce45 external_ids : {} name : "" protocol : [] vips : {"300::1"="2001b8:103::11,2001b8:103::12", "[300::1]:8000"="[2001b8:103::11]:80,[2001b8:103::12]:80"} _uuid : f0e8d873-50ca-4715-ac15-b0cf1eb2f9a1 external_ids : {} name : "" protocol : [] vips : {"30.0.0.1"="172.16.103.11,172.16.103.12", "30.0.0.1:8000"="172.16.103.11:80,172.16.103.12:80"} [root@dell-per730-19 ovn]# ovn-nbctl show switch 184b6840-32ad-4a05-aedf-f6e2f25d7ff8 (s3) port s3_r1 type: router addresses: ["00e:ad:ff:01:03 172.16.103.1 2001b8:103::1"] router-port: r1_s3 port hv0_vm01_vnet1 addresses: ["00e:ad:00:01:01 172.16.103.12 2001b8:103::12"] port hv0_vm00_vnet1 addresses: ["00e:ad:00:00:01 172.16.103.11 2001b8:103::11"] switch ea195969-cfc3-4d67-97ce-e4e853b5e3a4 (s2) port hv1_vm01_vnet1 addresses: ["00e:ad:01:01:01 172.16.102.12 2001b8:102::12"] port hv1_vm00_vnet1 addresses: ["00e:ad:01:00:01 172.16.102.11 2001b8:102::11"] port s2_r1 type: router addresses: ["00e:ad:ff:01:02 172.16.102.1 2001b8:102::1"] router-port: r1_s2 router 51b6a0d4-8388-493a-9751-929179780b1b (r1) port r1_s3 mac: "00e:ad:ff:01:03" networks: ["172.16.103.1/24", "2001b8:103::1/64"] port r1_s2 mac: "00e:ad:ff:01:02" networks: ["172.16.102.1/24", "2001b8:102::1/64"] [root@dell-per730-19 ovn]# ovn-nbctl get logical_router r1 load_balancer [a7b0f293-8897-43bd-ada5-61b67382ce45] [root@dell-per730-19 ovn]# [root@dell-per730-19 ovn]# ovn-sbctl show Chassis "hv0" hostname: "dell-per730-49.rhts.eng.pek2.redhat.com" Encap geneve ip: "20.0.0.26" options: {csum="true"} Port_Binding "hv0_vm00_vnet1" Port_Binding "hv0_vm01_vnet1" Chassis "hv1" hostname: "dell-per730-19.rhts.eng.pek2.redhat.com" Encap geneve ip: "20.0.0.25" options: {csum="true"} Port_Binding "hv1_vm01_vnet1" Port_Binding "hv1_vm00_vnet1" Port_Binding "cr-r1_s2" [root@dell-per730-19 ovn]# [root@dell-per730-19 ovn]# virsh list Id Name State ---------------------------------------------------- 9 hv1_vm00 running 10 hv1_vm01 running [root@dell-per730-19 ovn]# virsh console hv1_vm00 Connected to domain hv1_vm00 Escape character is ^] [root@localhost ~]# ping6 300::1 PING 300::1(300::1) 56 data bytes 64 bytes from 2001b8:103::11: icmp_seq=1 ttl=63 time=2.46 ms 64 bytes from 2001b8:103::11: icmp_seq=2 ttl=63 time=0.584 ms --- 300::1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.584/1.526/2.469/0.943 ms [root@localhost ~]# curl -g [2001b8:103::11]:80 <-------------curl success through the ip of hv0_vm00 i am vm1 [root@localhost ~]# curl -g [300::1]:8000 <-------------hang there Additional info: 1.no such issue on ipv4 load balancer. 2.no such issue if I changed to use the load balancer on logical switch [root@dell-per730-19 ~]# ovn-nbctl lr-lb-list r1 UUID LB PROTO VIP IPs a7b0f293-8897-43bd-ada5-61b67382ce45 tcp/udp 300::1 2001:db8:103::11,2001:db8:103::12 (null) [300::1]:8000 [2001:db8:103::11]:80,[2001:db8:103::12]:80 [root@dell-per730-19 ~]# ovn-nbctl lr-lb-del r1 [root@dell-per730-19 ~]# ovn-nbctl lr-lb-list r1 [root@dell-per730-19 ~]# ovn-nbctl ls-lb-add s2 a7b0f293-8897-43bd-ada5-61b67382ce45 [root@dell-per730-19 ~]# ovn-nbctl ls-lb-list s2 UUID LB PROTO VIP IPs a7b0f293-8897-43bd-ada5-61b67382ce45 tcp/udp 300::1 2001:db8:103::11,2001:db8:103::12 (null) [300::1]:8000 [2001:db8:103::11]:80,[2001:db8:103::12]:80 [root@dell-per730-19 ~]# virsh console hv1_vm00 Connected to domain hv1_vm00 Escape character is ^] [root@localhost ~]# ping6 300::1 PING 300::1(300::1) 56 data bytes 64 bytes from 300::1: icmp_seq=1 ttl=63 time=1.97 ms --- 300::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.974/1.974/1.974/0.000 ms [root@localhost ~]# curl -g [300::1]:8000 i am vm2 [root@localhost ~]# curl -g [300::1]:8000 i am vm1 [root@localhost ~]#