Bug 1809634 - [osp16] ERROR neutron.agent.dhcp.agent neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr:
Summary: [osp16] ERROR neutron.agent.dhcp.agent neutron_lib.exceptions.ProcessExecutio...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Rodolfo Alonso
QA Contact: Candido Campos
URL:
Whiteboard:
: 1810475 (view as bug list)
Depends On:
Blocks: 1810475
TreeView+ depends on / blocked
 
Reported: 2020-03-03 15:19 UTC by Chris Janiszewski
Modified: 2021-09-20 23:44 UTC (History)
13 users (show)

Fixed In Version: tripleo-ansible-0.5.1-0.20200528013427.9da6a00.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-01 10:33:01 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1868607 0 None None None 2020-03-25 18:34:04 UTC
OpenStack gerrit 714517 0 None MERGED Force container deletion if namespace does not exist in service_kill 2021-01-13 10:49:09 UTC
OpenStack gerrit 715019 0 None MERGED Force container deletion if namespace does not exist in service_kill 2021-01-13 10:49:51 UTC
OpenStack gerrit 730831 0 None MERGED Force container deletion if namespace does not exist in service_kill 2021-01-13 10:49:12 UTC

Description Chris Janiszewski 2020-03-03 15:19:10 UTC
Description of problem:
Few hours after controller reboot, the deployment of instances started failing with:
Error: Failed to perform requested operation on instance "ocpra-8q78n-master-0", the instance has an error status: Please try again later [Error: Build of instance 2e77e7f8-69bc-477d-99e4-10ff94cf59ac aborted: Failed to allocate the network(s), not rescheduling.].


dhcp-agent.log shows a stream of:

2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 218, in sync_state
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     self.disable_dhcp_helper(deleted_id)
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     result = f(*args, **kwargs)
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 357, in disable_dhcp_helper
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     self.disable_isolated_metadata_proxy(network)
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     result = f(*args, **kwargs)
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 700, in disable_isolated_metadata_proxy
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     self._process_monitor, uuid, self.conf, network.namespace)
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/metadata/driver.py", line 250, in destroy_monitored_metadata_proxy
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     pm.disable()
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 113, in disable
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     utils.execute(cmd, run_as_root=self.run_as_root)
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent     returncode=returncode)
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: 
2020-03-03 14:00:47.892 6397 ERROR neutron.agent.dhcp.agent 



Version-Release number of selected component (if applicable):
OSP16

How reproducible:
Intermittent 

Steps to Reproduce:
1.
2.
3.

Actual results:
Error

Expected results:
Success

Additional info:
sosreport -> http://chrisj.cloud/sosreport-chrisj-osp16-controller-0-2020-03-03-hzreywy.tar.xz

Comment 1 Chris Janiszewski 2020-03-05 09:35:19 UTC
One thing that have happened in this environment is the dhcp neutron port have disappeared and maybe it's related to this bug

(chrisj-osp16) [stack@undercloud-osp16 ~]$ openstack subnet show baremetal-sub
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field             | Value                                                                                                                                                            |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| allocation_pools  | 172.31.10.70-172.31.10.199                                                                                                                                       |
| cidr              | 172.31.10.0/24                                                                                                                                                   |
| created_at        | 2020-02-27T21:14:55Z                                                                                                                                             |
| description       |                                                                                                                                                                  |
| dns_nameservers   | 172.31.8.1                                                                                                                                                       |
| enable_dhcp       | True                                                                                                                                                             |
| gateway_ip        | 172.31.10.254                                                                                                                                                    |
| host_routes       |                                                                                                                                                                  |
| id                | e69757d3-e3c7-4634-8657-726c19883017                                                                                                                             |
| ip_version        | 4                                                                                                                                                                |
| ipv6_address_mode | None                                                                                                                                                             |
| ipv6_ra_mode      | None                                                                                                                                                             |
| location          | cloud='', project.domain_id=, project.domain_name='Default', project.id='f349b269b5bd470f83ffb18fdc1cc049', project.name='admin', region_name='regionOne', zone= |
| name              | baremetal-sub                                                                                                                                                    |
| network_id        | 45566c04-9a73-4736-acb5-abd040e63bed                                                                                                                             |
| prefix_length     | None                                                                                                                                                             |
| project_id        | f349b269b5bd470f83ffb18fdc1cc049                                                                                                                                 |
| revision_number   | 0                                                                                                                                                                |
| segment_id        | None                                                                                                                                                             |
| service_types     |                                                                                                                                                                  |
| subnetpool_id     | None                                                                                                                                                             |
| tags              |                                                                                                                                                                  |
| updated_at        | 2020-02-27T21:14:55Z                                                                                                                                             |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(chrisj-osp16) [stack@undercloud-osp16 ~]$ for i in `openstack port list | awk '/172.31.10/ {print $2}'`; do openstack port show $i | grep device_owner; done
| device_owner            | baremetal:none                                                                                                                                                   |
| device_owner            | compute:nova                                                                                                                                     |
| device_owner            |

Comment 2 Rodolfo Alonso 2020-03-23 11:49:06 UTC
Hello Chris:

I debugged this issue with Slawek and we found that, in case the namespace is not found, the haproxy kill script will fail and trigger again the resync process. This resync process will loop again and again returning always the same error:
"""
2020-03-03 13:59:42.147 6397 INFO neutron.agent.dhcp.agent [req-7555cc29-6a4a-4010-9211-25d45da0ea5c - - - - -] Synchronizing state complete
2020-03-03 13:59:42.147 6397 INFO neutron.agent.dhcp.agent [req-7555cc29-6a4a-4010-9211-25d45da0ea5c - - - - -] Synchronizing state
2020-03-03 13:59:42.438 6397 INFO neutron.agent.dhcp.agent [req-51828289-eee2-404c-9243-06be02eaca70 - - - - -] All active networks have been fetched through RPC.
2020-03-03 13:59:42.450 6397 ERROR neutron.agent.linux.utils [req-51828289-eee2-404c-9243-06be02eaca70 - - - - -] Exit code: 1; Stdin: ; Stdout: ; Stderr:
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent [req-51828289-eee2-404c-9243-06be02eaca70 - - - - -] Unable to sync network state on deleted network 1b2621fe-38c7-4572-9b5f-45b2f0687f94: neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr:
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 218, in sync_state
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     self.disable_dhcp_helper(deleted_id)
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     result = f(*args, **kwargs)
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 357, in disable_dhcp_helper
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     self.disable_isolated_metadata_proxy(network)
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     result = f(*args, **kwargs)
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/dhcp/agent.py", line 700, in disable_isolated_metadata_proxy
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     self._process_monitor, uuid, self.conf, network.namespace)
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/metadata/driver.py", line 250, in destroy_monitored_metadata_proxy
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     pm.disable()
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/external_process.py", line 113, in disable
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     utils.execute(cmd, run_as_root=self.run_as_root)
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py", line 147, in execute
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent     returncode=returncode)
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr:
2020-03-03 13:59:42.451 6397 ERROR neutron.agent.dhcp.agent
"""


At the same time, this is the log of the kill scripts:
"""
Tue Mar  3 13:59:29 UTC 2020 Deleting container neutron-dnsmasq-qdhcp-3d1b1411-f8d2-4348-a500-e7af340f2d1c (6a2ac5c799fb7175b20ffa768b2b297f2c9de8e3f5271d99cf41ccff3dc94b64)
6a2ac5c799fb7175b20ffa768b2b297f2c9de8e3f5271d99cf41ccff3dc94b64
Tue Mar  3 13:59:30 UTC 2020 No network namespace detected, exiting
Tue Mar  3 13:59:30 UTC 2020 No network namespace detected, exiting
Tue Mar  3 13:59:31 UTC 2020 No network namespace detected, exiting
"""


I'm going to propose a patch for tripleo-ansible (tripleo-heat-templates in OSP16) to solve this issue in the haproxy kill script.

Do you still have this deployment? Is it possible to debug this issue in your environment?

Regards and thank you in advance.

Comment 6 Chris Janiszewski 2020-03-23 15:22:48 UTC
Hi Rodolfo,

Thanks for looking into this and finding underlying issue. Unfortunately I had to use this environment for other workload and it's currently not available  to put a patch. 
I apologize for the inconvenience.

Comment 8 Rodolfo Alonso 2020-04-09 14:49:16 UTC
*** Bug 1810475 has been marked as a duplicate of this bug. ***

Comment 10 Jon Schlueter 2020-05-20 15:07:40 UTC
patch is included in latest released build in OSP 16.0

Comment 11 Lon Hohberger 2020-05-22 10:43:46 UTC
According to our records, this should be resolved by tripleo-ansible-0.4.2-0.20200404124615.67005aa.el8ost.  This build is available now.

Comment 17 stchen 2020-11-04 20:11:28 UTC
Closing EOL, OSP 16.0 has been retired as of Oct 27, 2020

Comment 18 Brian Haley 2020-11-09 22:41:17 UTC
Re-opening targeted at 16.1


Note You need to log in before you can comment on or make changes to this bug.