Description of problem: On OSP14 with ODL - Build of instance aborted: Failed to allocate the network - neutron.agent.dhcp.agent unable to reload_allocations dhcp Version-Release number of selected component (if applicable): OSP14 -p 2018-11-21.2 How reproducible: Sometimes Steps to Reproduce: 1. Install OSP14 + ODL 2. Spawn a VM. You can use this script: https://code.engineering.redhat.com/gerrit/gitweb?p=Neutron-QE.git;a=blob;f=Scripts/create_multi_topology.sh For example: ./create_multi_topology.sh -q -i rhel75 -t mvi -c YES -e skip -n 2 -v 1 Actual results: openstack server list --all +--------------------------------------+-----------------+--------+----------+-- | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------------+--------+----------+-- | 429dff3a-236d-4072-8b7a-5530c9b3a492 | rhel75_vm1_net1 | ERROR | | rhel75 | rhel_flavor | +--------------------------------------+-----------------+--------+----------+-- Build of instance 429dff3a-236d-4072-8b7a-5530c9b3a492 aborted: Failed to allocate the network(s), not rescheduling.', u'code': 500, u'details': u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1929, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2186, in _build_and_run_instance\n reason=msg)\n', u'created': u'2018-11-27T09:57:43Z' On controller-1 the the VM networks cannot start (status=DOWN): *** /var/log/containers/neutron/dhcp-agent.log *** ERROR neutron.agent.dhcp.agent Unable to reload_allocations dhcp for 13c1f9bc-73e9-4441-a2c4-c00262c94de8.: ProcessExecutionError: Exit code: 125; Stdin: ; Stdout: neutron-dnsmasq-qdhcp-13c1f9bc-73e9-4441-a2c4-c00262c94de8 ; Stderr: Error response from daemon: removal of container neutron-dnsmasq-qdhcp-13c1f9bc-73e9-4441-a2c4-c00262c94de8 is already in progress /usr/bin/docker-current: Error response from daemon: Conflict. The container name "/neutron-dnsmasq-qdhcp-13c1f9bc-73e9-4441-a2c4-c00262c94de8" is already in use by container a35399a0114e712e912105e12639522412aecfb9eaaf5a21b1586e1c0cf88b2c. You have to remove (or rename) that container to be able to reuse that name.. Acquired semaphore "dhcp-agent-network-lock-49b1c270-a2c1-4caf-b328-20e1951ed746" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212 Releasing semaphore "dhcp-agent-network-lock-49b1c270-a2c1-4caf-b328-20e1951ed746" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228 Lock "_check_child_processes" acquired by "neutron.agent.linux.external_process._check_child_processes" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273 Lock "_check_child_processes" released by "neutron.agent.linux.external_process._check_child_processes" :: held 0.002s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285 Acquired semaphore "dhcp-agent-network-lock-bb294f07-63a6-4586-a4fc-ae541843f690" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212 Trigger reload_allocations for port admin_state_up=True, allowed_address_pairs=[], binding:host_id=, binding:profile=, binding:vif_details=, binding:vif_type=unbound, binding:vnic_type=normal, created_at=2018-11-27T09:51:35Z, description=, device_id=429dff3a-236d-4072-8b7a-5530c9b3a492, device_owner=, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'd743ec94-34fb-4739-b043-fa1619a14761', u'ip_address': u'10.0.1.9'}, {u'subnet_id': u'8f34659f-d947-498f-b0de-51f34b37e859', u'ip_address': u'2001::f816:3eff:fe57:2b8f'}], id=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, mac_address=fa:16:3e:57:2b:8f, name=, network_id=bb294f07-63a6-4586-a4fc-ae541843f690, port_security_enabled=True, project_id=10b35bdd64ed47e2ad76f4f865e7d914, revision_number=1, security_groups=[u'7c7e236b-7ee3-45b4-9218-4afad2cd2a14'], status=DOWN, tags=[], tenant_id=10b35bdd64ed47e2ad76f4f865e7d914, updated_at=2018-11-27T09:51:35Z Calling driver for network: bb294f07-63a6-4586-a4fc-ae541843f690 action: reload_allocations call_driver /usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py:135 Building host file: /var/lib/neutron/dhcp/bb294f07-63a6-4586-a4fc-ae541843f690/host _output_hosts_file /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:695 Done building host file /var/lib/neutron/dhcp/bb294f07-63a6-4586-a4fc-ae541843f690/host _output_hosts_file /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:734 Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-bb294f07-63a6-4586-a4fc-ae541843f690', 'ip', 'addr', 'show', 'tapb83b453f-f6'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 Running command (rootwrap daemon): ['kill', '-HUP', '540366'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 Reloading allocations for network: bb294f07-63a6-4586-a4fc-ae541843f690 reload_allocations /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:524 Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-bb294f07-63a6-4586-a4fc-ae541843f690', 'ip', '-4', 'route', 'list', 'dev', 'tapb83b453f-f6'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-bb294f07-63a6-4586-a4fc-ae541843f690', 'ip', '-6', 'route', 'list', 'dev', 'tapb83b453f-f6'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103 Releasing semaphore "dhcp-agent-network-lock-bb294f07-63a6-4586-a4fc-ae541843f690" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228 ... Trigger reload_allocations for port admin_state_up=True, allowed_address_pairs=[], binding:host_id=compute-1.localdomain, binding:profile=, binding:vif_details=, binding:vif_type=unbound, binding:vnic_type=normal, created_at=2018-11-27T09:51:35Z, description=, device_id=429dff3a-236d-4072-8b7a-5530c9b3a492, device_owner=compute:nova, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'd743ec94-34fb-4739-b043-fa1619a14761', u'ip_address': u'10.0.1.9'}, {u'subnet_id': u'8f34659f-d947-498f-b0de-51f34b37e859', u'ip_address': u'2001::f816:3eff:fe57:2b8f'}], id=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, mac_address=fa:16:3e:57:2b:8f, name=, network_id=bb294f07-63a6-4586-a4fc-ae541843f690, port_security_enabled=True, project_id=10b35bdd64ed47e2ad76f4f865e7d914, revision_number=2, security_groups=[u'7c7e236b-7ee3-45b4-9218-4afad2cd2a14'], status=DOWN, tags=[], tenant_id=10b35bdd64ed47e2ad76f4f865e7d914, updated_at=2018-11-27T09:51:37Z ... Trigger reload_allocations for port admin_state_up=True, allowed_address_pairs=[], binding:host_id=compute-1.localdomain, binding:profile=, binding:vif_details=, binding:vif_type=ovs, binding:vnic_type=normal, created_at=2018-11-27T09:51:35Z, description=, device_id=429dff3a-236d-4072-8b7a-5530c9b3a492, device_owner=compute:nova, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'd743ec94-34fb-4739-b043-fa1619a14761', u'ip_address': u'10.0.1.9'}, {u'subnet_id': u'8f34659f-d947-498f-b0de-51f34b37e859', u'ip_address': u'2001::f816:3eff:fe57:2b8f'}], id=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, mac_address=fa:16:3e:57:2b:8f, name=, network_id=bb294f07-63a6-4586-a4fc-ae541843f690, port_security_enabled=True, project_id=10b35bdd64ed47e2ad76f4f865e7d914, revision_number=3, security_groups=[u'7c7e236b-7ee3-45b4-9218-4afad2cd2a14'], status=DOWN, tags=[], tenant_id=10b35bdd64ed47e2ad76f4f865e7d914, updated_at=2018-11-27T09:51:38Z Expected results: VM should be up and connected to the network. Additional info: Might be related to u/s bug: https://bugs.launchpad.net/cloud-archive/+bug/1763442
Created attachment 1508962 [details] neutron.agent.dhcp.agent unable to reload_allocations dhcp
This seems like a general issue with the DHCP container, reassigning to Networking DFG to triage this.
Here's sosreports: http://rhos-release.virt.bos.redhat.com/log/bz1653807/ Please notice on controller-2: delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 REQ: "network-vif-deleted" RESP: 404 RESP BODY: {"itemNotFound": {"message": "No instances found for any event", "code": 404}} _http_log_response Nova returned NotFound for event: network-vif-deleted Found on /var/log/containers/neutron/server.log.6.gz : Attributes excluded by policy engine: [u'binding:profile', u'binding:vif_details', u'binding:vif_type', u'binding:host_id'\] _exclude_attributes_by_policy Deleting port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 _pre_delete_port Notify callbacks ['neutron.services.trunk.rules.enforce_port_deletion_rules--9223363300400916243', 'neutron.db.l3_db._prevent_l3_port_delete_callback--9223363300400289645'\] for port, before_delete _notify_loop Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents--9223372036853922824', 'neutron.services.segments.db._update_segment_host_mapping_for_agent--9223363300400471345'\] for agent, after_update _notify_loop For port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766, host compute-1.localdomain, got binding levels [<neutron.plugins.ml2.models.PortBindingLevel[object at 7f21d2e218d0\] {port_id=u'6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766', host=u'compute-1.localdomain', level=0, driver=u'opendaylight_v2', segment_id=u'fee01708-8409-4a64-ba71-24333da1b749'}>\] get_binding_levels Notify callbacks ['networking_odl.l3.l3_odl_v2.OpenDaylightL3RouterPlugin._precommit_delete_port_callback--9223372036847383169'\] for port, precommit_delete _notify_loop Agent healthcheck: found 11 active agents agent_health_check networking_odl.ml2.mech_driver_v2.OpenDaylightMechanismDriver method delete_port_precommit called with arguments (<neutron.plugins.ml2.driver_context.PortContext object at 0x7f21d24dc850>,) {} wrapper Recorded (Entry ID: 10196) - delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 (Time stamp: 63678909462.4) Entry with ID 10196 depends on these entries: [\] record -- static method f called with arguments (<networking_odl.ml2.mech_driver_v2.OpenDaylightMechanismDriver object at 0x7f21d4039910>, <neutron.plugins.ml2.driver_context.PortContext object at 0x7f21d24dc850>) {} wrapper Lock "event-dispatch" acquired by "neutron.plugins.ml2.ovo_rpc.dispatch_events" :: waited 0.000s inner Start processing journal entries sync_pending_entries Notify callbacks ['neutron.notifiers.nova.Notifier._send_nova_notification--9223372036854767931'\] for port, before_response _notify_loop Lock "notifier-24e1cced-1471-40c7-9c4c-360a1842c3f7" acquired by "neutron.notifiers.batch_notifier.synced_send" :: waited 0.000s inner Sending events: [{'tag': u'6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766', 'name': 'network-vif-deleted', 'server_uuid': u'429dff3a-236d-4072-8b7a-5530c9b3a492'}\] send_events Pushing event deleted for resources: {'Port': ['ID=6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766,revision_number=None'\]} push Processing (Entry ID: 10196) - delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 (Time stamp: 63678909463.3) Sending METHOD (delete) URL (http://172.17.1.11:8081/controller/nb/v2/neutron/ports/6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766) JSON (None) request 172.17.1.17 "DELETE /v2.0/ports/6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 HTTP/1.1" status: 204 len: 149 time: 2.1816690 Lock "event-dispatch" released by "neutron.plugins.ml2.ovo_rpc.dispatch_events" :: held 0.104s inner Got response: () _check_response Notify callbacks [\] for port, before_complete _notify_loop Websocket notification for port status update _process_websocket_recv PortStatus: Ignoring delete operation _process_websocket_recv PortStatus: Ignoring delete operation _process_websocket_recv Completed (Entry ID: 10196) - delete port 6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766 (Time stamp: 63678909463.4) Finished processing journal entries sync_pending_entries REQ: curl -g -i -X POST http://172.17.1.11:8774/v2.1/os-server-external-events -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA1}9f68775ddfeaa4beac4eab7e7aa25b3bbe076127" -H "X-OpenStack-Nova-API-Version: 2.1" -d '{"events": [{"tag": "6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766", "name": "network-vif-deleted", "server_uuid": "429dff3a-236d-4072-8b7a-5530c9b3a492"}\]}' _http_log_request RESP: [404\] Content-Length: 78 Content-Type: application/json; charset=UTF-8 Date: Tue, 27 Nov 2018 09:57:44 GMT OpenStack-API-Version: compute 2.1 Server: Apache Vary: OpenStack-API-Version,X-OpenStack-Nova-API-Version X-OpenStack-Nova-API-Version: 2.1 x-compute-request-id: req-45816299-cc3f-44cd-a76f-17d243b001bd x-openstack-request-id: req-45816299-cc3f-44cd-a76f-17d243b001bd _http_log_response RESP BODY: {"itemNotFound": {"message": "No instances found for any event", "code": 404}} _http_log_response POST call to compute for http://172.17.1.11:8774/v2.1/os-server-external-events used request id req-45816299-cc3f-44cd-a76f-17d243b001bd request Nova returned NotFound for event: [{'tag': u'6a9e9751-9ed1-4ba7-9f4d-dca8b26a2766', 'name': 'network-vif-deleted', 'server_uuid': u'429dff3a-236d-4072-8b7a-5530c9b3a492'}\] send_events Starting hostconfig periodic task. Starting _get_and_update_hostconfigs phase of periodic task hostconfig. REST/GET hostconfigs from ODL Sending METHOD (get) URL (http://172.17.1.11:8081/restconf/operational/neutron:neutron/hostconfigs/) JSON (None) request ODLPORTBINDING hostconfigs
Assigning to Brent for further triage.
Created attachment 1509893 [details] errors on ODL Karaf logs Also seeing new errors on ODL Karaf logs: ERROR | Blueprint Extender: 3 | UpgradeStateListener | 406 - org.opendaylight.serviceutils.upgrade - 0.1.4.redhat-7 | Failed to write mdsal config : Node was replaced by other transaction. ERROR | nioEventLoopGroup-7-1 | ExceptionHandler | 392 - org.opendaylight.ovsdb.library - 1.6.4.redhat-7 | Exception occurred while processing connection pipeline : Connection reset by peer ERROR | pool-44-thread-1 | FatalExceptionHandler | 41 - com.lmax.disruptor - 3.3.10 | Exception processing: 9262 org.opendaylight.controller.md.sal.dom.broker.impl.DOMNotificationRouterEvent@2033c1dd : Shard member-2-shard-default-operational currently has no leader. Try again later. ERROR | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.ext.routers.Routers_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | VpnFootprintService : Node children was modified by other transaction
Seems to be caused (blocked) by u/s bug: https://jira.opendaylight.org/browse/SRVUTILS-3 2018-11-22T10:50:20,967 | ERROR | Blueprint Extender: 3 | UpgradeStateListener | 406 - org.opendaylight.serviceutils.upgrade - 0.1.4.redhat-7 | Failed to write mdsal config java.util.concurrent.ExecutionException: OptimisticLockFailedException{message=Optimistic lock failed for path /(urn:opendaylight:serviceutils:upgrade?revision=2018-07-02)upgrade-config, errorList=[RpcError [message=Optimistic lock failed for path /(urn:opendaylight:serviceutils:upgrade?revision=2018-07-02)upgrade-config, severity=ERROR, errorType=APPLICATION, tag=resource-denied, applicationTag=null, info=null, cause=org.opendaylight.yangtools.yang.data.api.schema.tree.ConflictingModificationAppliedException: Node was replaced by other transaction.]]}
Might be also related to bug 1630069