1797534 – oci runtime error with exit 17 during minor update on compute node

Bug 1797534 - oci runtime error with exit 17 during minor update on compute node

Summary: oci runtime error with exit 17 during minor update on compute node

Keywords:
Status:	CLOSED DUPLICATE of bug 1793455
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-docker
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	RHOS Maint
QA Contact:	RHOS Maint
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-03 11:24 UTC by Andre
Modified:	2023-03-24 16:56 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-04 12:41:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andre 2020-02-03 11:24:07 UTC

Description of problem:
Current issue when customer is running a minor update on the compute nodes:
~~~ Hiding some information from the logs, since they contains customer private information, full error report will be added as a private comment[1]
[...]
"stderr: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:245: running exec setns process for init caused \\\"exit status 17\\\"\".",
[...]
~~~

Previously, on the same update process, customer had issue on the controller nodes:
~~~ Hiding some information from the logs, since they contains customer private information, full error report will be added as a private comment[2]
[...]
        "stderr: /usr/bin/docker-current: Error response from daemon: service endpoint with name neutron_ovs_agent already exists."
~~~

He needed to re run the update 2 times, on each one a different controller showed the issue, finally the controllers were updated, but now the issue is on the compute, and that's a different one now.


I'm not sure both problems are related. In the current issue, do we have any workaround or solution? This issue is now blocking customer environment, he's not able to login to compute node nor create new instances.

Logs on the supportshell under /cases/02570698.

Version-Release number of selected component (if applicable):
OSP 13
ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost
openstack-tripleo-common-8.7.1-2.el7ost
openstack-tripleo-common-containers-8.7.1-2.el7ost
openstack-tripleo-heat-templates-8.4.1-16.el7ost
openstack-tripleo-image-elements-8.0.3-1.el7ost
openstack-tripleo-puppet-elements-8.1.1-1.el7ost
openstack-tripleo-ui-8.3.2-3.el7ost
openstack-tripleo-validations-8.5.0-2.el7ost
puppet-tripleo-8.5.1-3.el7ost
python-tripleoclient-9.3.1-4.el7ost


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Andre 2020-02-03 15:48:20 UTC

Hi,

Customer is able to access the compute again through ssh after a reboot.
SOSreport from the compute node is available on supportshell

Comment 4 Mike Burns 2020-02-04 12:41:04 UTC


*** This bug has been marked as a duplicate of bug 1793455 ***

Note You need to log in before you can comment on or make changes to this bug.