Bug 1699991 - Instance is failing to spawn if its IP from tenant network also assigned to Compute node
Summary: Instance is failing to spawn if its IP from tenant network also assigned to C...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-os-vif
Version: 13.0 (Queens)
Hardware: All
OS: All
medium
medium
Target Milestone: z7
: 13.0 (Queens)
Assignee: Rodolfo Alonso
QA Contact: Candido Campos
URL:
Whiteboard:
Depends On:
Blocks: 1709366
TreeView+ depends on / blocked
 
Reported: 2019-04-15 14:29 UTC by Alex Stupnikov
Modified: 2022-08-09 15:08 UTC (History)
10 users (show)

Fixed In Version: python2-os-vif-1.9.1-3.el7ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-03 16:58:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1825888 0 None None None 2019-04-22 20:14:23 UTC
OpenStack gerrit 655694 0 'None' MERGED Prevent "qbr" Linux Bridge from replying to ARP messages 2021-02-02 05:04:50 UTC
Red Hat Issue Tracker OSP-7945 0 None None None 2022-08-09 15:08:53 UTC
Red Hat Knowledge Base (Solution) 4133771 0 Troubleshoot None iptables_hybrid OVS firewall driver doesn't work properly 2019-05-13 13:49:39 UTC
Red Hat Product Errata RHBA-2019:2623 0 None None None 2019-09-03 16:58:32 UTC

Description Alex Stupnikov 2019-04-15 14:29:11 UTC
Description of problem:

Let's imagine a situation when some IP address that is assigned to Compute node itself also belongs to allocation pool from tenant subnet. If user will launch an instance and that instance will get an IP address which also belongs to instance's Compute node, then VM may fail DHCP allocation process: certain VMs operating systems (like RHEL, Centos, etc) will send an ARP request to confirm that IP address obtained via DHCP is not used by other network entity; Compute will get this ARP request on tap device that is used to emulate VM's NIC and will send a reply that address is used.


Steps to reproduce:

- check Compute's IPs and networks, choose the network and IP address to test this issue. For example: 172.17.3.0/24 -> 172.17.3.29
- create neutron network with appropriate subnet, set DHCP allocation pool properly, so it will include IPs for DHCP agents and the VM itself (or assign fixed IP)
- schedule Centos/RHEL VM on proper compute node


Expected result: VM able to connect to network, since tenant network has nothing to do with Compute node

Actual result: VM will fail to get DHCP lease and will not be able to connect to network.


Tcpdump:


[root@compute-0 ~]# tcpdump -i qbrbd736ed8-6a
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qbrbd736ed8-6a, link-type EN10MB (Ethernet), capture size 262144 bytes
13:15:48.474271 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
13:15:48.662992 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1e:a0:6e (oui Unknown), length 300
13:15:48.665387 IP 172.17.3.27.bootps > compute-0.storage.localdomain.bootpc: BOOTP/DHCP, Reply, length 338
13:15:48.666152 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1e:a0:6e (oui Unknown), length 300
13:15:48.668467 IP 172.17.3.26.bootps > compute-0.storage.localdomain.bootpc: BOOTP/DHCP, Reply, length 338
13:15:48.669480 IP 172.17.3.27.bootps > compute-0.storage.localdomain.bootpc: BOOTP/DHCP, Reply, length 338
13:15:48.742143 ARP, Request who-has compute-0.storage.localdomain (Broadcast) tell 0.0.0.0, length 28
13:15:48.742171 ARP, Reply compute-0.storage.localdomain is-at 66:5a:44:9e:90:1f (oui Unknown), length 28
13:15:48.755438 ARP, Request who-has compute-0.storage.localdomain (Broadcast) tell 0.0.0.0, length 28
13:15:48.755463 ARP, Reply compute-0.storage.localdomain is-at 66:5a:44:9e:90:1f (oui Unknown), length 28

[root@compute-0 ~]# ip link show qbrbd736ed8-6a
21: qbrbd736ed8-6a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 66:5a:44:9e:90:1f brd ff:ff:ff:ff:ff:ff



Workaround:

    -A INPUT -p ARP -i tapbd736ed8-6a -j DROP
    -A OUTPUT -p ARP -o tapbd736ed8-6a -j DROP


Additional info: I am not sure if this issue should be investigated by Nova or Neutron squad. Please re-assign if needed.

Comment 1 Alex Stupnikov 2019-04-15 14:29:50 UTC
Workaround:

[root@compute-0 ~]# ebtables-save 
# Generated by ebtables-save v1.0 on Mon Apr 15 13:53:30 UTC 2019
*filter
:INPUT ACCEPT
:FORWARD ACCEPT
:OUTPUT ACCEPT
-A INPUT -p ARP -i tapbd736ed8-6a -j DROP
-A OUTPUT -p ARP -o tapbd736ed8-6a -j DROP

Comment 2 Alex Stupnikov 2019-04-15 15:41:41 UTC
Docker images:

  DockerNeutronApiImage: 192.168.24.1:8787/rhosp13/openstack-neutron-server:2019-02-24.1
  DockerNeutronConfigImage: 192.168.24.1:8787/rhosp13/openstack-neutron-server:2019-02-24.1
  DockerNeutronDHCPImage: 192.168.24.1:8787/rhosp13/openstack-neutron-dhcp-agent:2019-02-24.1
  DockerNeutronL3AgentImage: 192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2019-02-24.1
  DockerNeutronMetadataImage: 192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent:2019-02-24.1
  DockerOpenvswitchImage: 192.168.24.1:8787/rhosp13/openstack-neutron-openvswitch-agent:2019-02-24.1

  DockerNovaApiImage: 192.168.24.1:8787/rhosp13/openstack-nova-api:2019-02-24.1
  DockerNovaComputeImage: 192.168.24.1:8787/rhosp13/openstack-nova-compute:2019-02-24.1
  DockerNovaConductorImage: 192.168.24.1:8787/rhosp13/openstack-nova-conductor:2019-02-24.1
  DockerNovaConfigImage: 192.168.24.1:8787/rhosp13/openstack-nova-api:2019-02-24.1
  DockerNovaConsoleauthImage: 192.168.24.1:8787/rhosp13/openstack-nova-consoleauth:2019-02-24.1
  DockerNovaLibvirtConfigImage: 192.168.24.1:8787/rhosp13/openstack-nova-compute:2019-02-24.1
  DockerNovaLibvirtImage: 192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2019-02-24.1
  DockerNovaMetadataImage: 192.168.24.1:8787/rhosp13/openstack-nova-api:2019-02-24.1
  DockerNovaPlacementConfigImage: 192.168.24.1:8787/rhosp13/openstack-nova-placement-api:2019-02-24.1
  DockerNovaPlacementImage: 192.168.24.1:8787/rhosp13/openstack-nova-placement-api:2019-02-24.1
  DockerNovaSchedulerImage: 192.168.24.1:8787/rhosp13/openstack-nova-scheduler:2019-02-24.1
  DockerNovaVncProxyImage: 192.168.24.1:8787/rhosp13/openstack-nova-novncproxy:2019-02-24.1

Comment 27 errata-xmlrpc 2019-09-03 16:58:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2623


Note You need to log in before you can comment on or make changes to this bug.