Bug 1823661
| Summary: | Octavia UDP health monitor test fails | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Gregory Thiemonge <gthiemon> |
| Component: | openstack-octavia | Assignee: | Assaf Muller <amuller> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Bruna Bonguardo <bbonguar> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.0 (Train) | CC: | atragler, bhaley, ihrachys, lpeer, majopela, mjozefcz, scohen |
| Target Milestone: | --- | Keywords: | TestOnly, Tracking |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-08 17:52:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1823755 | ||
| Bug Blocks: | 1801721 | ||
|
Description
Gregory Thiemonge
2020-04-14 07:41:48 UTC
After investigations, IP control (ICMP) packets are dropped by ml2/ovn. Octavia relies on ICMP destination unreachable messages to detect if a server is down (LB's health monitor sends a UDP datagram to the server and if the port is closed - i.e the application is not running - the Linux kernel should reply with an ICMP message). Using tcpdump I checked that the ICMP destination unreachable was sent by the server to the LB, but the LB never received it. I managed to reproduce the issue without using Octavia, with a simple heat stack:
heat_template_version: 2016-10-14
parameters:
image:
type: string
default: cirros-0.4.0-x86_64-disk.img
network:
type: string
default: private
public_network:
type: string
default: nova
resources:
flavor_tiny:
type: OS::Nova::Flavor
properties:
disk: 1
ram: 512
vcpus: 1
security_group1:
type: OS::Neutron::SecurityGroup
properties:
rules:
- remote_ip_prefix: 0.0.0.0/0
protocol: tcp
port_range_min: 22
port_range_max: 22
server1:
type: OS::Nova::Server
properties:
image: { get_param: image }
flavor: { get_resource: flavor_tiny }
security_groups:
- { get_resource: security_group1 }
networks:
- network: { get_param: network }
fip1:
type: OS::Neutron::FloatingIP
properties:
floating_network_id: { get_param: public_network }
fipa1:
type: OS::Neutron::FloatingIPAssociation
properties:
port_id: { get_attr: [ server1, addresses, { get_param: network }, 0, port ] }
floatingip_id: { get_resource: fip1 }
security_group2:
type: OS::Neutron::SecurityGroup
properties:
rules:
- remote_ip_prefix: 0.0.0.0/0
protocol: tcp
port_range_min: 22
port_range_max: 22
- remote_ip_prefix: 0.0.0.0/0
protocol: udp
port_range_min: 2222
port_range_max: 2222
- remote_ip_prefix: 0.0.0.0/0
protocol: tcp
port_range_min: 2222
port_range_max: 2222
server2:
type: OS::Nova::Server
properties:
image: { get_param: image }
flavor: { get_resource: flavor_tiny }
security_groups:
- { get_resource: security_group2 }
networks:
- network: { get_param: network }
fip2:
type: OS::Neutron::FloatingIP
properties:
floating_network_id: { get_param: public_network }
fipa2:
type: OS::Neutron::FloatingIPAssociation
properties:
port_id: { get_attr: [ server2, addresses, { get_param: network }, 0, port ] }
floatingip_id: { get_resource: fip2 }
$ openstack create stack -t servers.yml --wait servers
The stack launches 2 servers:
- server1 is a "client", used to connect to server2 (netcat on TCP/UDP port 2222)
- server2 acts as the "server", SGs allow connections on port 2222 (TCP and UDP), but no applications are running on those ports.
Steps:
1 ssh server1
2 observe network traffic using tcpdump on server1 and server2's ports (ssh on computes, tcpdump -nn -i tap<port_id>)
3 send a UDP datagram to server2 from server1:
$ date | nc -u 10.0.0.235 2222
4 UDP packet is seen on server1 port:
07:56:36.493472 IP 10.0.1.92.33002 > 10.0.0.235.2222: UDP, length 29
5 UDP packet arrives on server2 port, as the UDP port is not reachable, the server replies with an ICMP msg:
07:56:36.495365 IP 10.0.0.236.33002 > 10.0.1.87.2222: UDP, length 29
07:56:36.497758 IP 10.0.1.87 > 10.0.0.236: ICMP 10.0.1.87 udp port 2222 unreachable, length 65
6 ICMP message is never received by server1
I've found out that we have a similar issue with TCP:
1...2 same as above
3 connect to server2 from server1 using TCP
$ nc 10.0.0.235 2222
4 TCP SYN packet is seen on server1 port:
08:01:38.505955 IP 10.0.1.92.37403 > 10.0.0.235.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1098874 ecr 0,nop,wscale 6], length 0
5 TCP SYN packet arrives on server2 port, TCP port is closed, Linux kernel replies with a TCP RST packet:
08:01:38.507662 IP 10.0.0.236.37403 > 10.0.1.87.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1098874 ecr 0,nop,wscale 6], length 0
08:01:38.510137 IP 10.0.1.87.2222 > 10.0.0.236.37403: Flags [R.], seq 0, ack 4252833580, win 0, length 0
6 RST packet is never received by server1, and it still tries to connect by re-sending TCP SYN packet:
08:01:39.504242 IP 10.0.1.92.37403 > 10.0.0.235.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099124 ecr 0,nop,wscale 6], length 0
08:01:41.508209 IP 10.0.1.92.37403 > 10.0.0.235.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099625 ecr 0,nop,wscale 6], length 0
7 server2 replies with some other RST packets:
08:01:39.504563 IP 10.0.0.236.37403 > 10.0.1.87.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099124 ecr 0,nop,wscale 6], length 0
08:01:39.504773 IP 10.0.1.87.2222 > 10.0.0.236.37403: Flags [R.], seq 0, ack 1, win 0, length 0
08:01:41.508651 IP 10.0.0.236.37403 > 10.0.1.87.2222: Flags [S], seq 4252833579, win 28040, options [mss 1402,sackOK,TS val 1099625 ecr 0,nop,wscale 6], length 0
08:01:41.508890 IP 10.0.1.87.2222 > 10.0.0.236.37403: Flags [R.], seq 0, ack 1, win 0, length 0
That issue with TCP also affects Octavia: TCP health monitors are still able to detect if a server is down but after a long timeout (they should be able to detect it immediately).
I also reproduced that issue in devstack using ML2/OVN.
Note that this behavior is not reproducible with ML2/OVS: ICMP destination unreachable and TCP RST packets are correctly forwarded to the client.
Updated BZ because I used floating ip to connect to the server, and it makes tcpdump logs difficult to read. So using only IP addresses from the private subnet: UDP issue: send UDP packet from server1: $ date | nc -u 10.0.1.87 2222 tcpdump on server1 port: 08:33:36.123229 IP 10.0.1.92.40133 > 10.0.1.87.2222: UDP, length 29 tcpdump on server2 port: 08:33:36.123689 IP 10.0.1.92.40133 > 10.0.1.87.2222: UDP, length 29 08:33:36.124319 IP 10.0.1.87 > 10.0.1.92: ICMP 10.0.1.87 udp port 2222 unreachable, length 65 TCP issue: connect to server2 from server1 using TCP: $ nc 10.0.1.87 2222 tcpdump on server1 port: 08:36:02.882839 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1614968 ecr 0,nop,wscale 6], length 0 08:36:03.880069 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1615218 ecr 0,nop,wscale 6], length 0 tcpdump on server2 port: 08:36:02.883883 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1614968 ecr 0,nop,wscale 6], length 0 08:36:02.885862 IP 10.0.1.87.2222 > 10.0.1.92.33261: Flags [R.], seq 0, ack 3424505560, win 0, length 0 08:36:03.880374 IP 10.0.1.92.33261 > 10.0.1.87.2222: Flags [S], seq 3424505559, win 28040, options [mss 1402,sackOK,TS val 1615218 ecr 0,nop,wscale 6], length 0 08:36:03.880602 IP 10.0.1.87.2222 > 10.0.1.92.33261: Flags [R.], seq 0, ack 1, win 0, length 0 CI job run from May 25 passed successfully for the TrafficOperationsScenarioTest: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP16.1/job/DFG-network-octavia-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve/lastCompletedBuild/testReport/octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops/TrafficOperationsScenarioTest/ Moving the bug to VERIFIED |