Bug 1653807 - Build of instance aborted - neutron.agent.dhcp.agent unable to reload allocations
Summary: Build of instance aborted - neutron.agent.dhcp.agent unable to reload allocat...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Brent Eagles
QA Contact: Noam Manos
URL:
Whiteboard:
Depends On: 1630069
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-27 16:37 UTC by Noam Manos
Modified: 2019-03-10 07:16 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-10 07:16:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
neutron.agent.dhcp.agent unable to reload_allocations dhcp (150.65 KB, text/plain)
2018-11-27 16:40 UTC, Noam Manos
no flags Details
errors on ODL Karaf logs (83.07 KB, text/plain)
2018-11-29 17:51 UTC, Noam Manos
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenDaylight Bug SRVUTILS-3 0 None None None 2018-11-30 01:53:27 UTC

Description Noam Manos 2018-11-27 16:37:41 UTC
Description of problem:
On OSP14 with ODL - Build of instance aborted: Failed to allocate the network - neutron.agent.dhcp.agent unable to reload_allocations dhcp

Version-Release number of selected component (if applicable):
OSP14   -p 2018-11-21.2

How reproducible:
Sometimes

Steps to Reproduce:
1. Install OSP14 + ODL
2. Spawn a VM. 

You can use this script:
https://code.engineering.redhat.com/gerrit/gitweb?p=Neutron-QE.git;a=blob;f=Scripts/create_multi_topology.sh

For example:
./create_multi_topology.sh -q -i rhel75 -t mvi -c YES -e skip -n 2 -v 1

Actual results:
openstack server list --all
+--------------------------------------+-----------------+--------+----------+--
| ID                                   | Name            | Status | Networks | Image  | Flavor      |
+--------------------------------------+-----------------+--------+----------+--
| 429dff3a-236d-4072-8b7a-5530c9b3a492 | rhel75_vm1_net1 | ERROR  |          | rhel75 | rhel_flavor |
+--------------------------------------+-----------------+--------+----------+--

Build of instance 429dff3a-236d-4072-8b7a-5530c9b3a492 aborted: Failed to allocate the network(s), not rescheduling.', u'code': 500, u'details': u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1929, in _do_build_and_run_instance\n    filter_properties, request_spec)\n  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2186, in _build_and_run_instance\n    reason=msg)\n', u'created': u'2018-11-27T09:57:43Z'


On controller-1 the the VM networks cannot start (status=DOWN):


*** /var/log/containers/neutron/dhcp-agent.log ***


ERROR neutron.agent.dhcp.agent  Unable to reload_allocations dhcp for 13c1f9bc-73e9-4441-a2c4-c00262c94de8.: 
ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: neutron-dnsmasq-qdhcp-13c1f9bc-73e9-4441-a2c4-c00262c94de8
; Stderr: Error response from daemon: removal of container neutron-dnsmasq-qdhcp-13c1f9bc-73e9-4441-a2c4-c00262c94de8 is already in progress
/usr/bin/docker-current: Error response from daemon: Conflict. 
The container name "/neutron-dnsmasq-qdhcp-13c1f9bc-73e9-4441-a2c4-c00262c94de8" is already in use by container a35399a0114e712e912105e12639522412aecfb9eaaf5a21b1586e1c0cf88b2c. 
You have to remove (or rename) that container to be able to reuse that name..

Acquired semaphore "dhcp-agent-network-lock-49b1c270-a2c1-4caf-b328-20e1951ed746" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
Releasing semaphore "dhcp-agent-network-lock-49b1c270-a2c1-4caf-b328-20e1951ed746" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
Lock "_check_child_processes" acquired by "neutron.agent.linux.external_process._check_child_processes" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
Lock "_check_child_processes" released by "neutron.agent.linux.external_process._check_child_processes" :: held 0.002s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285

Acquired semaphore "dhcp-agent-network-lock-bb294f07-63a6-4586-a4fc-ae541843f690" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
Trigger reload_allocations for port 
admin_state_up=True, allowed_address_pairs=[], binding:host_id=, binding:profile=, binding:vif_details=, binding:vif_type=unbound, binding:vnic_type=normal, created_at=2018-11-27T09:51:35Z, description=, device_id=429dff3a-236d-4072-8b7a-5530c9b3a492, device_owner=, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'd743ec94-34fb-4739-b043-fa1619a14761', u'ip_address': u'10.0.1.9'}, {u'subnet_id': u'8f34659f-d947-498f-b0de-51f34b37e859', u'ip_address': u'2001::f816:3eff:fe57:2b8f'}], id=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, mac_address=fa:16:3e:57:2b:8f, name=, network_id=bb294f07-63a6-4586-a4fc-ae541843f690, port_security_enabled=True, project_id=10b35bdd64ed47e2ad76f4f865e7d914, revision_number=1, security_groups=[u'7c7e236b-7ee3-45b4-9218-4afad2cd2a14'], status=DOWN, tags=[], tenant_id=10b35bdd64ed47e2ad76f4f865e7d914, updated_at=2018-11-27T09:51:35Z

Calling driver for network: bb294f07-63a6-4586-a4fc-ae541843f690 action: reload_allocations call_driver /usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py:135
Building host file: /var/lib/neutron/dhcp/bb294f07-63a6-4586-a4fc-ae541843f690/host _output_hosts_file /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:695
Done building host file /var/lib/neutron/dhcp/bb294f07-63a6-4586-a4fc-ae541843f690/host _output_hosts_file /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:734
Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-bb294f07-63a6-4586-a4fc-ae541843f690', 'ip', 'addr', 'show', 'tapb83b453f-f6'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
Running command (rootwrap daemon): ['kill', '-HUP', '540366'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
Reloading allocations for network: bb294f07-63a6-4586-a4fc-ae541843f690 reload_allocations /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:524
Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-bb294f07-63a6-4586-a4fc-ae541843f690', 'ip', '-4', 'route', 'list', 'dev', 'tapb83b453f-f6'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-bb294f07-63a6-4586-a4fc-ae541843f690', 'ip', '-6', 'route', 'list', 'dev', 'tapb83b453f-f6'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
Releasing semaphore "dhcp-agent-network-lock-bb294f07-63a6-4586-a4fc-ae541843f690" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
...
Trigger reload_allocations for port
admin_state_up=True, allowed_address_pairs=[], binding:host_id=compute-1.localdomain, binding:profile=, binding:vif_details=, binding:vif_type=unbound, binding:vnic_type=normal, created_at=2018-11-27T09:51:35Z, description=, device_id=429dff3a-236d-4072-8b7a-5530c9b3a492, device_owner=compute:nova, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'd743ec94-34fb-4739-b043-fa1619a14761', u'ip_address': u'10.0.1.9'}, {u'subnet_id': u'8f34659f-d947-498f-b0de-51f34b37e859', u'ip_address': u'2001::f816:3eff:fe57:2b8f'}], id=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, mac_address=fa:16:3e:57:2b:8f, name=, network_id=bb294f07-63a6-4586-a4fc-ae541843f690, port_security_enabled=True, project_id=10b35bdd64ed47e2ad76f4f865e7d914, revision_number=2, security_groups=[u'7c7e236b-7ee3-45b4-9218-4afad2cd2a14'], status=DOWN, tags=[], tenant_id=10b35bdd64ed47e2ad76f4f865e7d914, updated_at=2018-11-27T09:51:37Z
...
Trigger reload_allocations for port 
admin_state_up=True, allowed_address_pairs=[], binding:host_id=compute-1.localdomain, binding:profile=, binding:vif_details=, binding:vif_type=ovs, binding:vnic_type=normal, created_at=2018-11-27T09:51:35Z, description=, device_id=429dff3a-236d-4072-8b7a-5530c9b3a492, device_owner=compute:nova, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'd743ec94-34fb-4739-b043-fa1619a14761', u'ip_address': u'10.0.1.9'}, {u'subnet_id': u'8f34659f-d947-498f-b0de-51f34b37e859', u'ip_address': u'2001::f816:3eff:fe57:2b8f'}], id=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, mac_address=fa:16:3e:57:2b:8f, name=, network_id=bb294f07-63a6-4586-a4fc-ae541843f690, port_security_enabled=True, project_id=10b35bdd64ed47e2ad76f4f865e7d914, revision_number=3, security_groups=[u'7c7e236b-7ee3-45b4-9218-4afad2cd2a14'], status=DOWN, tags=[], tenant_id=10b35bdd64ed47e2ad76f4f865e7d914, updated_at=2018-11-27T09:51:38Z



Expected results:
VM should be up and connected to the network.

Additional info:
Might be related to u/s bug: https://bugs.launchpad.net/cloud-archive/+bug/1763442

Comment 1 Noam Manos 2018-11-27 16:40:03 UTC
Created attachment 1508962 [details]
neutron.agent.dhcp.agent unable to reload_allocations dhcp

Comment 2 Mike Kolesnik 2018-11-29 08:35:49 UTC
This seems like a general issue with the DHCP container, reassigning to Networking DFG to triage this.

Comment 3 Noam Manos 2018-11-29 09:19:48 UTC
Here's sosreports:
http://rhos-release.virt.bos.redhat.com/log/bz1653807/

Please notice on controller-2: 

delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766
REQ: "network-vif-deleted"
RESP: 404
RESP BODY: {"itemNotFound": {"message": "No instances found for any event", "code": 404}} _http_log_response 
Nova returned NotFound for event: network-vif-deleted


Found on /var/log/containers/neutron/server.log.6.gz :

 Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'\] _exclude_attributes_by_policy 
 Deleting port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 _pre_delete_port 
 Notify callbacks ['neutron.services.trunk.rules.enforce_port_deletion_rules--9223363300400916243', 'neutron.db.l3_db._prevent_l3_port_delete_callback--9223363300400289645'\] 
 for port, before_delete _notify_loop 
 Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents--9223372036853922824', 'neutron.services.segments.db._update_segment_host_mapping_for_agent--9223363300400471345'\] 
 for agent, after_update _notify_loop 
 For port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, host compute-1.localdomain, got binding levels [<neutron.plugins.ml2.models.PortBindingLevel[object at 7f21d2e218d0\]
 {port_id=u'6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766', host=u'compute-1.localdomain', level=0, driver=u'opendaylight_v2', segment_id=u'fee01708-8409-4a64-ba71-24333da1b749'}>\]
 get_binding_levels 
 Notify callbacks ['networking_odl.l3.l3_odl_v2.OpenDaylightL3RouterPlugin._precommit_delete_port_callback--9223372036847383169'\]
 for port, precommit_delete _notify_loop 
 Agent healthcheck: found 11 active agents agent_health_check 
 networking_odl.ml2.mech_driver_v2.OpenDaylightMechanismDriver method delete_port_precommit called with arguments (<neutron.plugins.ml2.driver_context.PortContext object at 0x7f21d24dc850>,) {} wrapper 
 Recorded (Entry ID: 10196) - delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 (Time stamp: 63678909462.4)
 Entry with ID 10196 depends on these entries: [\] record 
--
 static method f called with arguments (<networking_odl.ml2.mech_driver_v2.OpenDaylightMechanismDriver object at 0x7f21d4039910>, <neutron.plugins.ml2.driver_context.PortContext object at 0x7f21d24dc850>) {} wrapper 
 Lock "event-dispatch" acquired by "neutron.plugins.ml2.ovo_rpc.dispatch_events" :: waited 0.000s inner 
 Start processing journal entries sync_pending_entries 
 Notify callbacks ['neutron.notifiers.nova.Notifier._send_nova_notification--9223372036854767931'\]
 for port, before_response _notify_loop 
 Lock "notifier-24e1cced-1471-40c7-9c4c-360a1842c3f7" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 0.000s inner 
 Sending events: [{'tag': u'6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766', 'name': 'network-vif-deleted', 'server_uuid': u'429dff3a-236d-4072-8b7a-5530c9b3a492'}\] send_events 
 Pushing event deleted for resources: {'Port': ['ID=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766,revision_number=None'\]} push 
 Processing (Entry ID: 10196) - delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 (Time stamp: 63678909463.3)
 Sending METHOD (delete) URL (http://172.17.1.11:8081/controller/nb/v2/neutron/ports/6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766) JSON (None) request 
 172.17.1.17 "DELETE /v2.0/ports/6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 HTTP/1.1" status: 204  len: 149 time: 2.1816690
 Lock "event-dispatch" released by "neutron.plugins.ml2.ovo_rpc.dispatch_events" :: held 0.104s inner 
 Got response: () _check_response 
 Notify callbacks [\] for port, before_complete _notify_loop 
 Websocket notification for port status update _process_websocket_recv 
 PortStatus: Ignoring delete operation _process_websocket_recv 
 PortStatus: Ignoring delete operation _process_websocket_recv 
 Completed (Entry ID: 10196) - delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 (Time stamp: 63678909463.4)
 Finished processing journal entries sync_pending_entries 
 REQ: curl -g -i -X POST http://172.17.1.11:8774/v2.1/os-server-external-events -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA1}9f68775ddfeaa4beac4eab7e7aa25b3bbe076127" -H "X-OpenStack-Nova-API-Version: 2.1" -d '{"events": [{"tag": "6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766", "name": "network-vif-deleted", "server_uuid": "429dff3a-236d-4072-8b7a-5530c9b3a492"}\]}' _http_log_request 
 RESP: [404\]
 Content-Length: 78 Content-Type: application/json; charset=UTF-8 Date: Tue, 27 Nov 2018 09:57:44 GMT OpenStack-API-Version: compute 2.1 Server: Apache Vary: OpenStack-API-Version,X-OpenStack-Nova-API-Version X-OpenStack-Nova-API-Version: 2.1 x-compute-request-id: req-45816299-cc3f-44cd-a76f-17d243b001bd x-openstack-request-id: req-45816299-cc3f-44cd-a76f-17d243b001bd _http_log_response 
 RESP BODY: {"itemNotFound": {"message": "No instances found for any event", "code": 404}} _http_log_response 
 POST call to compute for http://172.17.1.11:8774/v2.1/os-server-external-events used request id req-45816299-cc3f-44cd-a76f-17d243b001bd request 
 Nova returned NotFound for event: [{'tag': u'6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766', 'name': 'network-vif-deleted', 'server_uuid': u'429dff3a-236d-4072-8b7a-5530c9b3a492'}\] send_events 
 Starting hostconfig periodic task.
 Starting _get_and_update_hostconfigs phase of periodic task hostconfig.
 REST/GET hostconfigs from ODL
 Sending METHOD (get) URL (http://172.17.1.11:8081/restconf/operational/neutron:neutron/hostconfigs/) JSON (None) request 
 ODLPORTBINDING hostconfigs

Comment 4 Brian Haley 2018-11-29 16:48:42 UTC
Assigning to Brent for further triage.

Comment 5 Noam Manos 2018-11-29 17:51:20 UTC
Created attachment 1509893 [details]
errors on ODL Karaf logs

Also seeing new errors on ODL Karaf logs:

ERROR | Blueprint Extender: 3 | UpgradeStateListener             | 406 - org.opendaylight.serviceutils.upgrade - 0.1.4.redhat-7 | Failed to write mdsal config
: Node was replaced by other transaction.

ERROR | nioEventLoopGroup-7-1 | ExceptionHandler                 | 392 - org.opendaylight.ovsdb.library - 1.6.4.redhat-7 | Exception occurred while processing connection pipeline
: Connection reset by peer

ERROR | pool-44-thread-1 | FatalExceptionHandler            | 41 - com.lmax.disruptor - 3.3.10 | Exception processing: 9262 org.opendaylight.controller.md.sal.dom.broker.impl.DOMNotificationRouterEvent@2033c1dd
: Shard member-2-shard-default-operational currently has no leader. Try again later.

ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.ext.routers.Routers_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | VpnFootprintService 
: Node children was modified by other transaction

Comment 7 Noam Manos 2018-11-30 01:53:28 UTC
Seems to be caused (blocked) by u/s bug:
https://jira.opendaylight.org/browse/SRVUTILS-3


2018-11-22T10:50:20,967 | ERROR | Blueprint Extender: 3 | UpgradeStateListener             | 406 - org.opendaylight.serviceutils.upgrade - 0.1.4.redhat-7 | Failed to write mdsal config
java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed for path /(urn:opendaylight:serviceutils:upgrade?revision=2018-07-02)upgrade-config, errorList=[RpcError [message=Optimistic lock failed for path /(urn:opendaylight:serviceutils:upgrade?revision=2018-07-02)upgrade-config, severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was replaced by other transaction.]]}

Comment 8 Noam Manos 2018-11-30 01:57:12 UTC
Might be also related to bug 1630069


Note You need to log in before you can comment on or make changes to this bug.