Bug 1823661
Summary: | Octavia UDP health monitor test fails | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Gregory Thiemonge <gthiemon> |
Component: | openstack-octavia | Assignee: | Assaf Muller <amuller> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Bruna Bonguardo <bbonguar> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 16.0 (Train) | CC: | atragler, bhaley, ihrachys, lpeer, majopela, mjozefcz, scohen |
Target Milestone: | --- | Keywords: | TestOnly, Tracking |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-08 17:52:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1823755 | ||
Bug Blocks: | 1801721 |
Description
Gregory Thiemonge
2020-04-14 07:41:48 UTC
After investigations, IP control (ICMP) packets are dropped by ml2/ovn. Octavia relies on ICMP destination unreachable messages to detect if a server is down (LB's health monitor sends a UDP datagram to the server and if the port is closed - i.e the application is not running - the Linux kernel should reply with an ICMP message). Using tcpdump I checked that the ICMP destination unreachable was sent by the server to the LB, but the LB never received it. I managed to reproduce the issue without using Octavia, with a simple heat stack: heat_template_version: 2016-10-14 parameters: image: type: string default: cirros-0.4.0-x86_64-disk.img network: type: string default: private public_network: type: string default: nova resources: flavor_tiny: type: OS::Nova::Flavor properties: disk: 1 ram: 512 vcpus: 1 security_group1: type: OS::Neutron::SecurityGroup properties: rules: - remote_ip_prefix: 0.0.0.0/0 protocol: tcp port_range_min: 22 port_range_max: 22 server1: type: OS::Nova::Server properties: image: { get_param: image } flavor: { get_resource: flavor_tiny } security_groups: - { get_resource: security_group1 } networks: - network: { get_param: network } fip1: type: OS::Neutron::FloatingIP properties: floating_network_id: { get_param: public_network } fipa1: type: OS::Neutron::FloatingIPAssociation properties: port_id: { get_attr: [ server1, addresses, { get_param: network }, 0, port ] } floatingip_id: { get_resource: fip1 } security_group2: type: OS::Neutron::SecurityGroup properties: rules: - remote_ip_prefix: 0.0.0.0/0 protocol: tcp port_range_min: 22 port_range_max: 22 - remote_ip_prefix: 0.0.0.0/0 protocol: udp port_range_min: 2222 port_range_max: 2222 - remote_ip_prefix: 0.0.0.0/0 protocol: tcp port_range_min: 2222 port_range_max: 2222 server2: type: OS::Nova::Server properties: image: { get_param: image } flavor: { get_resource: flavor_tiny } security_groups: - { get_resource: security_group2 } networks: - network: { get_param: network } fip2: type: OS::Neutron::FloatingIP properties: floating_network_id: { get_param: public_network } fipa2: type: OS::Neutron::FloatingIPAssociation properties: port_id: { get_attr: [ server2, addresses, { get_param: network }, 0, port ] } floatingip_id: { get_resource: fip2 } $ openstack create stack -t servers.yml --wait servers The stack launches 2 servers: - server1 is a "client", used to connect to server2 (netcat on TCP/UDP port 2222) - server2 acts as the "server", SGs allow connections on port 2222 (TCP and UDP), but no applications are running on those ports. Steps: 1 ssh server1 2 observe network traffic using tcpdump on server1 and server2's ports (ssh on computes, tcpdump -nn -i tap<port_id>) 3 send a UDP datagram to server2 from server1: $ date | nc -u 10.0.0.235 2222 4 UDP packet is seen on server1 port: 07:56:36.493472 IP 10.0.1.92.33002 > 10.0.0.235.2222: UDP, length 29 5 UDP packet arrives on server2 port, as the UDP port is not reachable, the server replies with an ICMP msg: 07:56:36.495365 IP 10.0.0.236.33002 > 10.0.1.87.2222: UDP, length 29 07:56:36.497758 IP 10.0.1.87 > 10.0.0.236: ICMP 10.0.1.87 udp port 2222 unreachable, length 65 6 ICMP message is never received by server1 I've found out that we have a similar issue with TCP: 1...2 same as above 3 connect to server2 from server1 using TCP $ nc 10.0.0.235 2222 4 TCP SYN packet is seen on server1 port: 08:01:38.505955 IP 10.0.1.92.37403 > 10.0.0.235.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1098874 ecr 0,nop,wscale 6], length 0 5 TCP SYN packet arrives on server2 port, TCP port is closed, Linux kernel replies with a TCP RST packet: 08:01:38.507662 IP 10.0.0.236.37403 > 10.0.1.87.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1098874 ecr 0,nop,wscale 6], length 0 08:01:38.510137 IP 10.0.1.87.2222 > 10.0.0.236.37403: Flags [R.], seq 0, ack 4252833580, win 0, length 0 6 RST packet is never received by server1, and it still tries to connect by re-sending TCP SYN packet: 08:01:39.504242 IP 10.0.1.92.37403 > 10.0.0.235.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099124 ecr 0,nop,wscale 6], length 0 08:01:41.508209 IP 10.0.1.92.37403 > 10.0.0.235.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099625 ecr 0,nop,wscale 6], length 0 7 server2 replies with some other RST packets: 08:01:39.504563 IP 10.0.0.236.37403 > 10.0.1.87.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099124 ecr 0,nop,wscale 6], length 0 08:01:39.504773 IP 10.0.1.87.2222 > 10.0.0.236.37403: Flags [R.], seq 0, ack 1, win 0, length 0 08:01:41.508651 IP 10.0.0.236.37403 > 10.0.1.87.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099625 ecr 0,nop,wscale 6], length 0 08:01:41.508890 IP 10.0.1.87.2222 > 10.0.0.236.37403: Flags [R.], seq 0, ack 1, win 0, length 0 That issue with TCP also affects Octavia: TCP health monitors are still able to detect if a server is down but after a long timeout (they should be able to detect it immediately). I also reproduced that issue in devstack using ML2/OVN. Note that this behavior is not reproducible with ML2/OVS: ICMP destination unreachable and TCP RST packets are correctly forwarded to the client. Updated BZ because I used floating ip to connect to the server, and it makes tcpdump logs difficult to read. So using only IP addresses from the private subnet: UDP issue: send UDP packet from server1: $ date | nc -u 10.0.1.87 2222 tcpdump on server1 port: 08:33:36.123229 IP 10.0.1.92.40133 > 10.0.1.87.2222: UDP, length 29 tcpdump on server2 port: 08:33:36.123689 IP 10.0.1.92.40133 > 10.0.1.87.2222: UDP, length 29 08:33:36.124319 IP 10.0.1.87 > 10.0.1.92: ICMP 10.0.1.87 udp port 2222 unreachable, length 65 TCP issue: connect to server2 from server1 using TCP: $ nc 10.0.1.87 2222 tcpdump on server1 port: 08:36:02.882839 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1614968 ecr 0,nop,wscale 6], length 0 08:36:03.880069 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1615218 ecr 0,nop,wscale 6], length 0 tcpdump on server2 port: 08:36:02.883883 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1614968 ecr 0,nop,wscale 6], length 0 08:36:02.885862 IP 10.0.1.87.2222 > 10.0.1.92.33261: Flags [R.], seq 0, ack 3424505560, win 0, length 0 08:36:03.880374 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1615218 ecr 0,nop,wscale 6], length 0 08:36:03.880602 IP 10.0.1.87.2222 > 10.0.1.92.33261: Flags [R.], seq 0, ack 1, win 0, length 0 CI job run from May 25 passed successfully for the TrafficOperationsScenarioTest: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP16.1/job/DFG-network-octavia-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve/lastCompletedBuild/testReport/octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops/TrafficOperationsScenarioTest/ Moving the bug to VERIFIED |