Bug 1699991

Summary: Instance is failing to spawn if its IP from tenant network also assigned to Compute node
Product: Red Hat OpenStack Reporter: Alex Stupnikov <astupnik>
Component: python-os-vifAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED ERRATA QA Contact: Candido Campos <ccamposr>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: amuller, chrisw, jjoyce, jschluet, njohnston, ralonsoh, scohen, slinaber, tvignaud, twilson
Target Milestone: z7Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: python2-os-vif-1.9.1-3.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-03 16:58:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1709366    

Description Alex Stupnikov 2019-04-15 14:29:11 UTC
Description of problem:

Let's imagine a situation when some IP address that is assigned to Compute node itself also belongs to allocation pool from tenant subnet. If user will launch an instance and that instance will get an IP address which also belongs to instance's Compute node, then VM may fail DHCP allocation process: certain VMs operating systems (like RHEL, Centos, etc) will send an ARP request to confirm that IP address obtained via DHCP is not used by other network entity; Compute will get this ARP request on tap device that is used to emulate VM's NIC and will send a reply that address is used.


Steps to reproduce:

- check Compute's IPs and networks, choose the network and IP address to test this issue. For example: 172.17.3.0/24 -> 172.17.3.29
- create neutron network with appropriate subnet, set DHCP allocation pool properly, so it will include IPs for DHCP agents and the VM itself (or assign fixed IP)
- schedule Centos/RHEL VM on proper compute node


Expected result: VM able to connect to network, since tenant network has nothing to do with Compute node

Actual result: VM will fail to get DHCP lease and will not be able to connect to network.


Tcpdump:


[root@compute-0 ~]# tcpdump -i qbrbd736ed8-6a
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qbrbd736ed8-6a, link-type EN10MB (Ethernet), capture size 262144 bytes
13:15:48.474271 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
13:15:48.662992 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1e:a0:6e (oui Unknown), length 300
13:15:48.665387 IP 172.17.3.27.bootps > compute-0.storage.localdomain.bootpc: BOOTP/DHCP, Reply, length 338
13:15:48.666152 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1e:a0:6e (oui Unknown), length 300
13:15:48.668467 IP 172.17.3.26.bootps > compute-0.storage.localdomain.bootpc: BOOTP/DHCP, Reply, length 338
13:15:48.669480 IP 172.17.3.27.bootps > compute-0.storage.localdomain.bootpc: BOOTP/DHCP, Reply, length 338
13:15:48.742143 ARP, Request who-has compute-0.storage.localdomain (Broadcast) tell 0.0.0.0, length 28
13:15:48.742171 ARP, Reply compute-0.storage.localdomain is-at 66:5a:44:9e:90:1f (oui Unknown), length 28
13:15:48.755438 ARP, Request who-has compute-0.storage.localdomain (Broadcast) tell 0.0.0.0, length 28
13:15:48.755463 ARP, Reply compute-0.storage.localdomain is-at 66:5a:44:9e:90:1f (oui Unknown), length 28

[root@compute-0 ~]# ip link show qbrbd736ed8-6a
21: qbrbd736ed8-6a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 66:5a:44:9e:90:1f brd ff:ff:ff:ff:ff:ff



Workaround:

    -A INPUT -p ARP -i tapbd736ed8-6a -j DROP
    -A OUTPUT -p ARP -o tapbd736ed8-6a -j DROP


Additional info: I am not sure if this issue should be investigated by Nova or Neutron squad. Please re-assign if needed.

Comment 1 Alex Stupnikov 2019-04-15 14:29:50 UTC
Workaround:

[root@compute-0 ~]# ebtables-save 
# Generated by ebtables-save v1.0 on Mon Apr 15 13:53:30 UTC 2019
*filter
:INPUT ACCEPT
:FORWARD ACCEPT
:OUTPUT ACCEPT
-A INPUT -p ARP -i tapbd736ed8-6a -j DROP
-A OUTPUT -p ARP -o tapbd736ed8-6a -j DROP

Comment 2 Alex Stupnikov 2019-04-15 15:41:41 UTC
Docker images:

  DockerNeutronApiImage: 192.168.24.1:8787/rhosp13/openstack-neutron-server:2019-02-24.1
  DockerNeutronConfigImage: 192.168.24.1:8787/rhosp13/openstack-neutron-server:2019-02-24.1
  DockerNeutronDHCPImage: 192.168.24.1:8787/rhosp13/openstack-neutron-dhcp-agent:2019-02-24.1
  DockerNeutronL3AgentImage: 192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2019-02-24.1
  DockerNeutronMetadataImage: 192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent:2019-02-24.1
  DockerOpenvswitchImage: 192.168.24.1:8787/rhosp13/openstack-neutron-openvswitch-agent:2019-02-24.1

  DockerNovaApiImage: 192.168.24.1:8787/rhosp13/openstack-nova-api:2019-02-24.1
  DockerNovaComputeImage: 192.168.24.1:8787/rhosp13/openstack-nova-compute:2019-02-24.1
  DockerNovaConductorImage: 192.168.24.1:8787/rhosp13/openstack-nova-conductor:2019-02-24.1
  DockerNovaConfigImage: 192.168.24.1:8787/rhosp13/openstack-nova-api:2019-02-24.1
  DockerNovaConsoleauthImage: 192.168.24.1:8787/rhosp13/openstack-nova-consoleauth:2019-02-24.1
  DockerNovaLibvirtConfigImage: 192.168.24.1:8787/rhosp13/openstack-nova-compute:2019-02-24.1
  DockerNovaLibvirtImage: 192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2019-02-24.1
  DockerNovaMetadataImage: 192.168.24.1:8787/rhosp13/openstack-nova-api:2019-02-24.1
  DockerNovaPlacementConfigImage: 192.168.24.1:8787/rhosp13/openstack-nova-placement-api:2019-02-24.1
  DockerNovaPlacementImage: 192.168.24.1:8787/rhosp13/openstack-nova-placement-api:2019-02-24.1
  DockerNovaSchedulerImage: 192.168.24.1:8787/rhosp13/openstack-nova-scheduler:2019-02-24.1
  DockerNovaVncProxyImage: 192.168.24.1:8787/rhosp13/openstack-nova-novncproxy:2019-02-24.1

Comment 27 errata-xmlrpc 2019-09-03 16:58:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2623