Bug 1732070

Summary: OC deployed with spine&leaf. After rebooting all nodes failing to launch an instance: "Failed to allocate the network(s), not rescheduling."
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: python-networking-ovnAssignee: Assaf Muller <amuller>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15.0 (Stein)CC: amuller, apevec, bfournie, chrisw, dsneddon, hjensas, jlibosva, lhh, majopela, michele, scohen, skaplons, twilson
Target Milestone: rcKeywords: AutomationBlocker, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-25 11:31:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alexander Chuzhoy 2019-07-22 15:15:31 UTC
OC deployed with spine&leaf. After rebooting all nodes failing to launch an instance: "Failed to allocate the network(s), not rescheduling."

Environment:
python3-tripleoclient-11.4.1-0.20190705110410.14ae053.el8ost.noarch
openstack-tripleo-heat-templates-10.6.1-0.20190713150434.2871ce0.el8ost.noarch


Steps to reproduce:

1. Successfully deploy OC with:
openstack overcloud deploy --templates \
--libvirt-type kvm \
-e /home/stack/templates/nodes_data.yaml \
-r /home/stack/templates/roles_data.yaml \
-n /home/stack/templates/network_data.yaml \
-e /home/stack/templates/extraconfig.yaml \
-e  /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/network/dvr-override.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/containers-prepare-parameter.yaml





[stack@undercloud-0 ~]$ cat virt/network/network-environment.yaml
parameter_defaults:
    ControlPlaneDefaultRoute: 192.168.24.1
    ControlPlaneSubnetCidr: '24'
    DnsServers:
    - 10.0.0.1
    EC2MetadataIp: 192.168.24.1
    ExternalAllocationPools:
    -   end: 10.0.0.149
        start: 10.0.0.101
    ExternalInterfaceDefaultRoute: 10.0.0.1
    ExternalNetCidr: 10.0.0.0/24
    ExternalNetworkVlanID: 10
    InternalApiInterfaceDefaultRoute: 172.120.1.1
    InternalApi1InterfaceDefaultRoute: 172.117.1.1
    InternalApi2InterfaceDefaultRoute: 172.118.1.1
    InternalApi3InterfaceDefaultRoute: 172.119.1.1
    InternalApiAllocationPools:
    -   end: 172.120.1.200
        start: 172.120.1.10
    InternalApi1AllocationPools:
    -   end: 172.117.1.200
        start: 172.117.1.10
    InternalApi2AllocationPools:
    -   end: 172.118.1.200
        start: 172.118.1.10
    InternalApi3AllocationPools:
    -   end: 172.119.1.200
        start: 172.119.1.10
    InternalApiNetCidr: 172.120.1.0/24
    InternalApi1NetCidr: 172.117.1.0/24
    InternalApi2NetCidr: 172.118.1.0/24
    InternalApi3NetCidr: 172.119.1.0/24
    InternalApiNetworkVlanID: 23
    InternalApi1NetworkVlanID: 20
    InternalApi2NetworkVlanID: 21
    InternalApi3NetworkVlanID: 22
    NeutronBridgeMappings: datacentre:br-ex,tenant:br-isolated
    NeutronExternalNetworkBridge: br-ex
    NeutronNetworkType: geneve
    NeutronTunnelTypes: geneve
    NeutronNetworkVLANRanges: tenant:1000:2000
    StorageInterfaceDefaultRoute: 172.120.3.1
    Storage1InterfaceDefaultRoute: 172.117.3.1
    Storage2InterfaceDefaultRoute: 172.118.3.1
    Storage3InterfaceDefaultRoute: 172.119.3.1
    StorageAllocationPools:
    -   end: 172.120.3.200
        start: 172.120.3.10
    Storage1AllocationPools:
    -   end: 172.117.3.200
        start: 172.117.3.10
    Storage2AllocationPools:
    -   end: 172.118.3.200
        start: 172.118.3.10
    Storage3AllocationPools:
    -   end: 172.119.3.200
        start: 172.119.3.10
    StorageMgmtAllocationPools:
    -   end: 172.120.4.200
        start: 172.120.4.10
    StorageMgmt1AllocationPools:
    -   end: 172.117.4.200
        start: 172.117.4.10
    StorageMgmt2AllocationPools:
    -   end: 172.118.4.200
        start: 172.118.4.10
    StorageMgmt3AllocationPools:
    -   end: 172.119.4.200
        start: 172.119.4.10
    StorageMgmtInterfaceDefaultRoute: 172.120.4.1
    StorageMgmt1InterfaceDefaultRoute: 172.117.4.1
    StorageMgmt2InterfaceDefaultRoute: 172.118.4.1
    StorageMgmt3InterfaceDefaultRoute: 172.119.4.1
    StorageMgmtNetCidr: 172.120.4.0/24
    StorageMgmt1NetCidr: 172.117.4.0/24
    StorageMgmtNetworkVlanID: 43
    StorageMgmt1NetworkVlanID: 40
    StorageMgmt2NetCidr: 172.118.4.0/24
    StorageMgmt2NetworkVlanID: 41
    StorageMgmt3NetCidr: 172.119.4.0/24
    StorageMgmt3NetworkVlanID: 42
    StorageNetCidr: 172.120.3.0/24
    Storage1NetCidr: 172.117.3.0/24
    Storage1NetworkVlanID: 30
    StorageNetworkVlanID: 33
    Storage2NetCidr: 172.118.3.0/24
    Storage2NetworkVlanID: 31
    Storage3NetCidr: 172.119.3.0/24
    Storage3NetworkVlanID: 32
    TenantInterfaceDefaultRoute: 172.120.2.1
    Tenant1InterfaceDefaultRoute: 172.117.2.1
    Tenant2InterfaceDefaultRoute: 172.118.2.1
    Tenant3InterfaceDefaultRoute: 172.119.2.1
    TenantAllocationPools:
    -   end: 172.120.2.200
        start: 172.120.2.10
    Tenant1AllocationPools:
    -   end: 172.117.2.200
        start: 172.117.2.10
    Tenant2AllocationPools:
    -   end: 172.118.2.200
        start: 172.118.2.10
    Tenant3AllocationPools:
    -   end: 172.119.2.200
        start: 172.119.2.10
    TenantNetCidr: 172.120.2.0/24
    Tenant1NetCidr: 172.117.2.0/24
    TenantNetworkVlanID: 53
    Tenant1NetworkVlanID: 50
    Tenant2NetCidr: 172.118.2.0/24
    Tenant2NetworkVlanID: 51
    Tenant3NetCidr: 172.119.2.0/24
    Tenant3NetworkVlanID: 52
    Composable1NetCidr: 172.150.100.0/24
    Composable1NetworkVlanID: 54
    Composable1InterfaceDefaultRoute: 172.150.100.1
    Composable1AllocationPools:
    -   end: 172.150.100.200
        start: 172.150.100.10
    Composable2NetCidr: 'fd00:fd00:fd00:8000::/64'
    Composable2AllocationPools: [{'start': 'fd00:fd00:fd00:8000::10', 'end': 'fd00:fd00:fd00:8000:ffff:ffff:ffff:fffe'}]
    Composable2NetworkVlanID: 55
resource_registry:
    OS::TripleO::CephStorage1::Net::SoftwareConfig: three-nics-vlans/ceph-storage1.yaml
    OS::TripleO::Compute1::Net::SoftwareConfig: three-nics-vlans/compute1.yaml
    OS::TripleO::Controller::Net::SoftwareConfig: three-nics-vlans/controller.yaml
    OS::TripleO::CephStorage2::Net::SoftwareConfig: three-nics-vlans/ceph-storage2.yaml
    OS::TripleO::Compute2::Net::SoftwareConfig: three-nics-vlans/compute2.yaml
    OS::TripleO::CephStorage3::Net::SoftwareConfig: three-nics-vlans/ceph-storage3.yaml
    OS::TripleO::Compute3::Net::SoftwareConfig: three-nics-vlans/compute3.yaml








[stack@undercloud-0 ~]$ cat /home/stack/templates/network_data.yaml
# List of networks, used for j2 templating of enabled networks
#
# Supported values:
#
# name: Name of the network (mandatory)
# name_lower: lowercase version of name used for filenames
#             (optional, defaults to name.lower())
# enabled: Is the network enabled (optional, defaults to true)
# vlan: vlan for the network (optional)
# vip: Enable creation of a virtual IP on this network
# ip_subnet: IP/CIDR, e.g. '192.168.24.0/24' or '2001:db8:fd00:1000::/64'
#            (optional, may use parameter defaults instead)
# allocation_pools: IP range list e.g. [{'start':'10.0.0.4', 'end':'10.0.0.250'}]
# gateway_ip: gateway for the network (optional, may use parameter defaults)
# ipv6_subnet: Optional, sets default IPv6 subnet if IPv4 is already defined.
# ipv6_allocation_pools: Set default IPv6 allocation pools if IPv4 allocation pools
#                        are already defined.
# ipv6_gateway: Set an IPv6 gateway if IPv4 gateway already defined.
# ipv6: If ip_subnet not defined, this specifies that the network is IPv6-only.
# NOTE: IP-related values set parameter defaults in templates, may be overridden,
# either by operators, or e.g in environments/network-isolation-v6.yaml where we
# set some default IPv6 addresses.
# compat_name: for existing stack you may need to override the default
#              transformation for the resource's name.
#
# Example:
# - name Example
#   vip: false
#   ip_subnet: '10.0.2.0/24'
#   allocation_pools: [{'start': '10.0.2.4', 'end': '10.0.2.250'}]
#   gateway_ip: '10.0.2.254'
#
# To support backward compatility, two versions of the network definitions will
# be created, network/<network>.yaml and network/<network>_v6.yaml. Only
# one of these files may be used in the deployment at a time, since the
# parameters used for configuration are the same in both files. In the
# future, this behavior may be changed to create only one file for custom
# networks. You may specify IPv6 addresses for ip_subnet, allocation_pools,
# and gateway_ip if no IPv4 addresses are used for a custom network, or set
# ipv6: true, and the network/<network>.yaml file will be configured as IPv6.
#
# For configuring both IPv4 and IPv6 on the same interface, use two separate
# networks, and then assign both IPs in the custom NIC configuration templates.

- name: External
  vip: true
  name_lower: external
  ip_subnet: '10.0.0.0/24'
  allocation_pools: [{'start': '10.0.0.4', 'end': '10.0.0.250'}]
  gateway_ip: '10.0.0.1'
  ipv6_subnet: '2001:db8:fd00:1000::/64'
  ipv6_allocation_pools: [{'start': '2001:db8:fd00:1000::10', 'end': '2001:db8:fd00:1000:ffff:ffff:ffff:fffe'}]
  gateway_ipv6: '2001:db8:fd00:1000::1'
- name: InternalApi
  name_lower: internal_api
  vip: true
  ip_subnet: '172.16.2.0/24'
  allocation_pools: [{'start': '172.16.2.4', 'end': '172.16.2.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:2000::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:2000::10', 'end': 'fd00:fd00:fd00:2000:ffff:ffff:ffff:fffe'}]
- name: Storage
  vip: true
  name_lower: storage
  ip_subnet: '172.16.1.0/24'
  allocation_pools: [{'start': '172.16.1.4', 'end': '172.16.1.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:3000::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:3000::10', 'end': 'fd00:fd00:fd00:3000:ffff:ffff:ffff:fffe'}]
- name: StorageMgmt
  name_lower: storage_mgmt
  vip: true
  ip_subnet: '172.16.3.0/24'
  allocation_pools: [{'start': '172.16.3.4', 'end': '172.16.3.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:4000::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:4000::10', 'end': 'fd00:fd00:fd00:4000:ffff:ffff:ffff:fffe'}]
- name: Tenant
  vip: false  # Tenant network does not use VIPs
  name_lower: tenant
  ip_subnet: '172.16.0.0/24'
  allocation_pools: [{'start': '172.16.0.4', 'end': '172.16.0.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:5000::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:5000::10', 'end': 'fd00:fd00:fd00:5000:ffff:ffff:ffff:fffe'}]
- name: Management
  # Management network is enabled by default for backwards-compatibility, but
  # is not included in any roles by default. Add to role definitions to use.
  enabled: true
  vip: false  # Management network does not use VIPs
  name_lower: management
  ip_subnet: '10.0.1.0/24'
  allocation_pools: [{'start': '10.0.1.4', 'end': '10.0.1.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:6000::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:6000::10', 'end': 'fd00:fd00:fd00:6000:ffff:ffff:ffff:fffe'}]
- name: Tenant1
  vip: false  # Tenant network does not use VIPs
  name_lower: tenant1
  ip_subnet: '172.16.11.0/24'
  allocation_pools: [{'start': '172.16.11.4', 'end': '172.16.11.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:5001::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:5001::10', 'end': 'fd00:fd00:fd00:5001:ffff:ffff:ffff:fffe'}]
- name: Tenant2
  vip: false  # Tenant network does not use VIPs
  name_lower: tenant2
  ip_subnet: '172.16.12.0/24'
  allocation_pools: [{'start': '172.16.12.4', 'end': '172.16.12.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:5002::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:5002::10', 'end': 'fd00:fd00:fd00:5002:ffff:ffff:ffff:fffe'}]
- name: Tenant3
  vip: false  # Tenant network does not use VIPs
  name_lower: tenant3
  ip_subnet: '172.16.13.0/24'
  allocation_pools: [{'start': '172.16.13.4', 'end': '172.16.13.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:5003::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:5003::10', 'end': 'fd00:fd00:fd00:5003:ffff:ffff:ffff:fffe'}]
- name: StorageMgmt1
  name_lower: storage_mgmt1
  vip: false
  ip_subnet: '172.16.21.0/24'
  allocation_pools: [{'start': '172.16.21.4', 'end': '172.16.21.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:4001::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:4001::10', 'end': 'fd00:fd00:fd00:4001:ffff:ffff:ffff:fffe'}]
- name: StorageMgmt2
  name_lower: storage_mgmt2
  vip: false
  ip_subnet: '172.16.22.0/24'
  allocation_pools: [{'start': '172.16.22.4', 'end': '172.16.22.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:4002::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:4002::10', 'end': 'fd00:fd00:fd00:4002:ffff:ffff:ffff:fffe'}]
- name: StorageMgmt3
  name_lower: storage_mgmt3
  vip: false
  ip_subnet: '172.16.23.0/24'
  allocation_pools: [{'start': '172.16.23.4', 'end': '172.16.23.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:4003::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:4003::10', 'end': 'fd00:fd00:fd00:4003:ffff:ffff:ffff:fffe'}]
- name: Storage1
  vip: false
  name_lower: storage1
  ip_subnet: '172.16.31.0/24'
  allocation_pools: [{'start': '172.16.31.4', 'end': '172.16.31.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:3001::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:3001::10', 'end': 'fd00:fd00:fd00:3001:ffff:ffff:ffff:fffe'}]
- name: Storage2
  vip: false
  name_lower: storage2
  ip_subnet: '172.16.32.0/24'
  allocation_pools: [{'start': '172.16.32.4', 'end': '172.16.32.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:3002::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:3002::10', 'end': 'fd00:fd00:fd00:3002:ffff:ffff:ffff:fffe'}]
- name: Storage3
  vip: false
  name_lower: storage3
  ip_subnet: '172.16.33.0/24'
  allocation_pools: [{'start': '172.16.33.4', 'end': '172.16.33.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:3003::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:3003::10', 'end': 'fd00:fd00:fd00:3003:ffff:ffff:ffff:fffe'}]
- name: InternalApi1
  name_lower: internal_api1
  vip: false
  ip_subnet: '172.16.41.0/24'
  allocation_pools: [{'start': '172.16.41.4', 'end': '172.16.41.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:2001::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:2001::10', 'end': 'fd00:fd00:fd00:2001:ffff:ffff:ffff:fffe'}]
- name: InternalApi2
  name_lower: internal_api2
  vip: false
  ip_subnet: '172.16.42.0/24'
  allocation_pools: [{'start': '172.16.42.4', 'end': '172.16.42.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:2002::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:2002::10', 'end': 'fd00:fd00:fd00:2002:ffff:ffff:ffff:fffe'}]
- name: InternalApi3
  name_lower: internal_api3
  vip: false
  ip_subnet: '172.16.43.0/24'
  allocation_pools: [{'start': '172.16.43.4', 'end': '172.16.43.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:2003::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:2003::10', 'end': 'fd00:fd00:fd00:2003:ffff:ffff:ffff:fffe'}]
- name: Composable1
  name_lower: composable1
  vip: false
  ip_subnet: '172.16.44.0/24'
  allocation_pools: [{'start': '172.16.44.4', 'end': '172.16.44.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:2004::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:2004::10', 'end': 'fd00:fd00:fd00:2004:ffff:ffff:ffff:fffe'}]
- name: Composable2
  name_lower: composable2
  vip: false
  ipv6: true
  ip_subnet: '172.16.45.0/24'
  allocation_pools: [{'start': '172.16.45.4', 'end': '172.16.45.250'}]
  ipv6_subnet: 'fd00:fd00:fd00:2005::/64'
  ipv6_allocation_pools: [{'start': 'fd00:fd00:fd00:2005::10', 'end': 'fd00:fd00:fd00:2005:ffff:ffff:ffff:fffe'}]


2. Successfully launch an instance.
3. Reboot all nodes in the setup using ironic (simulating complete power outage).
4. Upon boot try to launch another instance.


Result:

The instance gets into ERROR state:
(overcloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+-------------------+--------+----------------------------------------+--------+---------+
| ID                                   | Name              | Status | Networks                               | Image  | Flavor  |
+--------------------------------------+-------------------+--------+----------------------------------------+--------+---------+
| 67dd8156-4644-40a6-8852-702dab3b0c55 | after_reboot      | ERROR  |                                        | cirros | m1.tiny |
| 77c20f4c-db60-4afb-bd14-4181cf81d525 | after_deploy      | ACTIVE | tenantgeneve=192.168.32.36, 10.0.0.223 | cirros | m1.tiny |
+--------------------------------------+-------------------+--------+----------------------------------------+--------+---------+



(overcloud) [stack@undercloud-0 ~]$ openstack server show after_reboot -f value -c fault
{'code': 500, 'created': '2019-07-20T03:48:37Z', 'message': 'Build of instance 67dd8156-4644-40a6-8852-702dab3b0c55 aborted: Failed to allocate the network(s), not rescheduling.', 'details': '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 1984, in _do_build_and_run_instance\n    filter_properties, request_spec)\n  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2322, in _build_and_run_instance\n    reason=msg)\n'}
(overcloud) [stack@undercloud-0 ~]$ 



On a compute see the following error in 
/var/log/containers/neutron/ovn-metadata-agent.log:
2019-07-22 15:11:00.764 139555 ERROR neutron.agent.linux.utils [-] Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-e8e8e4a0-7043-4ee9-9309-1827f9e2a162
; Stderr: error creating container storage: the container name "neutron-haproxy-ovnmeta-e8e8e4a0-7043-4ee9-9309-1827f9e2a162" is already in use by "56b93822c94f9d3de83f431e677cbbe4693864e54b60736066dfbf6ffebbc777". You have to remove that container to be able to reuse that name.: that name is already in use

2019-07-22 15:11:00.765 139555 CRITICAL neutron [-] Unhandled error: neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-e8e8e4a0-7043-4ee9-9309-1827f9e2a162
; Stderr: error creating container storage: the container name "neutron-haproxy-ovnmeta-e8e8e4a0-7043-4ee9-9309-1827f9e2a162" is already in use by "56b93822c94f9d3de83f431e677cbbe4693864e54b60736066dfbf6ffebbc777". You have to remove that container to be able to reuse that name.: that name is already in use
2019-07-22 15:11:00.765 139555 ERROR neutron Traceback (most recent call last):
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/bin/networking-ovn-metadata-agent", line 10, in <module>
2019-07-22 15:11:00.765 139555 ERROR neutron     sys.exit(main())
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/cmd/eventlet/agents/metadata.py", line 17, in main
2019-07-22 15:11:00.765 139555 ERROR neutron     metadata_agent.main()
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata_agent.py", line 38, in main
2019-07-22 15:11:00.765 139555 ERROR neutron     agt.start()
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 186, in start
2019-07-22 15:11:00.765 139555 ERROR neutron     self.sync()
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 58, in wrapped
2019-07-22 15:11:00.765 139555 ERROR neutron     return f(*args, **kwargs)
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 235, in sync
2019-07-22 15:11:00.765 139555 ERROR neutron     metadata_namespaces = self.ensure_all_networks_provisioned()
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 432, in ensure_all_networks_provisioned
2019-07-22 15:11:00.765 139555 ERROR neutron     netns = self.provision_datapath(datapath)
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/agent.py", line 411, in provision_datapath
2019-07-22 15:11:00.765 139555 ERROR neutron     self.conf, bind_address=METADATA_DEFAULT_IP, network_id=datapath)
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/metadata/driver.py", line 200, in spawn_monitored_metadata_proxy
2019-07-22 15:11:00.765 139555 ERROR neutron     pm.enable()
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 89, in enable
2019-07-22 15:11:00.765 139555 ERROR neutron     run_as_root=self.run_as_root)
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/ip_lib.py", line 794, in execute
2019-07-22 15:11:00.765 139555 ERROR neutron     run_as_root=run_as_root)
2019-07-22 15:11:00.765 139555 ERROR neutron   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2019-07-22 15:11:00.765 139555 ERROR neutron     returncode=returncode)
2019-07-22 15:11:00.765 139555 ERROR neutron neutron_lib.exceptions.ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: Starting a new child container neutron-haproxy-ovnmeta-e8e8e4a0-7043-4ee9-9309-1827f9e2a162
2019-07-22 15:11:00.765 139555 ERROR neutron ; Stderr: error creating container storage: the container name "neutron-haproxy-ovnmeta-e8e8e4a0-7043-4ee9-9309-1827f9e2a162" is already in use by "56b93822c94f9d3de83f431e677cbbe4693864e54b60736066dfbf6ffebbc777". You have to remove that container to be able to reuse that name.: that name is already in use
2019-07-22 15:11:00.765 139555 ERROR neutron 



On a nova node in /var/log/containers/nova/nova-compute.log


2019-07-22 15:09:08.067 6 INFO nova.compute.manager [req-d1c5c009-f788-4180-bd63-d0a71e607b67 5a3ca8b0ba1f4677b56c7817c73ea97d d0752468e1c548c3826ac44c6b68ba23 - default default] [instance: 
093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Took 0.27 seconds to destroy the instance on the hypervisor.
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [req-d1c5c009-f788-4180-bd63-d0a71e607b67 5a3ca8b0ba1f4677b56c7817c73ea97d d0752468e1c548c3826ac44c6b68ba23 - default default] [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Traceback (most recent call last):
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 5694, in _create_domain_and_network
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     network_info)
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     next(self.gen)
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 481, in wait_for_instance_event
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     actual_event = event.wait()
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     result = hub.switch()
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 297, in switch
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     return self.greenlet.switch()
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] eventlet.timeout.Timeout: 300 seconds
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] 
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] During handling of the above exception, another exception occurred:
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] 
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Traceback (most recent call last):
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2256, in _build_and_run_instance
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     block_device_info=block_device_info)
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3195, in spawn
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     destroy_disks_on_failure=True)
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 5715, in _create_domain_and_network
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     raise exception.VirtualInterfaceCreateException()
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2019-07-22 15:09:08.379 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] 
2019-07-22 15:09:08.391 6 ERROR nova.compute.manager [req-d1c5c009-f788-4180-bd63-d0a71e607b67 5a3ca8b0ba1f4677b56c7817c73ea97d d0752468e1c548c3826ac44c6b68ba23 - default default] [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Build of instance 093cf259-49f1-4eff-af87-fe0ab3ccd9a1 aborted: Failed to allocate the network(s), not rescheduling.: nova.exception.BuildAbortException: Build of instance 093cf259-49f1-4eff-af87-fe0ab3ccd9a1 aborted: Failed to allocate the network(s), not rescheduling.
2019-07-22 15:09:10.855 6 INFO nova.compute.manager [req-d1c5c009-f788-4180-bd63-d0a71e607b67 5a3ca8b0ba1f4677b56c7817c73ea97d d0752468e1c548c3826ac44c6b68ba23 - default default] [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Took 2.46 seconds to deallocate network for instance.

Comment 2 Bob Fournier 2019-07-23 13:57:36 UTC
Including Networking DFG to look at errors above - specifically "error creating container storage" and "Failed to allocate network(s)".

Comment 4 Bob Fournier 2019-07-24 16:12:36 UTC
I'm not sure if the error "ERROR neutron ; Stderr: error creating container storage: the container name "neutron-haproxy-ovnmeta-e8e8e4a0-7043-4ee9-9309-1827f9e2a162" is already in use by xxx" is significant, I do see this error on other deployments.

Problem can be seen in sosreport-overcloud-novacompute3-0-2019-07-22-edypxad/var/log/containers/nova/nova-compute.log

We see this network-vif-plugged timeout:
2019-07-22 15:09:06.581 6 WARNING nova.virt.libvirt.driver [req-d1c5c009-f788-4180-bd63-d0a71e607b67 5a3ca8b0ba1f4677b56c7817c73ea97d d0752468e1c548c3826ac44c6b68ba23 - default default] [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Timeout waiting for [('network-vif-plugged', '60382863-8b95-4868-aadd-228fa58239ed')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds

which results in:
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Traceback (most recent call last):
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 5694, in _create_domain_and_network
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     network_info)
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     next(self.gen)
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 481, in wait_for_instance_event
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     actual_event = event.wait()
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     result = hub.switch()
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 297, in switch
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]     return self.greenlet.switch()
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] eventlet.timeout.Timeout: 300 seconds
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1]
2019-07-22 15:09:07.787 6 ERROR nova.compute.manager [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] During handling of the above exception, another exception occurred:


It looks like vif 60382863-8b95-4868-aadd-228fa58239ed was successfully plugged here, just prior to timeout (note, plugin is 'ovs'):
sosreport-overcloud-novacompute3-0-2019-07-22-edypxad/var/log/containers/nova/nova-compute.log:2019-07-22 15:04:05.945 6 INFO os_vif [req-d1c5c009-f788-4180-bd63-d0a71e607b67 5a3ca8b0ba1f4677b56c7817c73ea97d d0752468e1c548c3826ac44c6b68ba23 - default default] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:ba:c0:38,bridge_name='br-int',has_traffic_filtering=True,id=60382863-8b95-4868-aadd-228fa58239ed,network=Network(1353f46b-5e8b-44da-902f-857382635da0),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap60382863-8b')

timeout is 5 minutes later:
2019-07-22 15:09:06.581 6 WARNING nova.virt.libvirt.driver [req-d1c5c009-f788-4180-bd63-d0a71e607b67 5a3ca8b0ba1f4677b56c7817c73ea97d d0752468e1c548c3826ac44c6b68ba23 - default default] [instance: 093cf259-49f1-4eff-af87-fe0ab3ccd9a1] Timeout waiting for [('network-vif-plugged', '60382863-8b95-4868-aadd-228fa58239ed')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds

On controller-0, it looks like this vif corresponds to 192.168.32.9:
var/log/containers/neutron/server.log:2019-07-22 15:04:04.463 45 INFO neutron.wsgi [req-7bb0f007-eaa6-4e3c-88a1-58c2428d396d 27b948ed60dd405c9d447ca127a58b5f c0fa260e0235467889813e4aa222f7a5 - default default] 172.120.1.137 "GET /v2.0/floatingips?fixed_ip_address=192.168.32.9&port_id=60382863-8b95-4868-aadd-228fa58239ed HTTP/1.1" status: 200  len: 193 time: 0.0688732