Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1745004

Summary:	[RHHI.next] baremetal: bootstrap ironic services sometimes fail on startup
Product:	OpenShift Container Platform	Reporter:	Steven Hardy <shardy>
Component:	Installer	Assignee:	Steven Hardy <shardy>
Installer sub component:	openshift-installer	QA Contact:	Arik Chernetsky <achernet>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	dhellmann, mcornea, pehunt
Version:	4.2.0
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-16 06:37:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Steven Hardy 2019-08-23 13:02:42 UTC

Description of problem:

We have observed that sometimes the ironic.service systemd unit (which starts some provisioning related containers via podman) appears active, but actually some of the containers are not responsive.

The root cause appears to be crio deletes some containers on restart, even when they have been started via podman - discussion is in-progress to figure out the best long-term fix for that.

As a workaround we can improve the systemd exec script so that we detect when the podman services become broken, and trigger a systemd restart - this approach will continue to work if/when the crio issues are resolved.

Related upstream issues:

https://github.com/openshift/installer/pull/2249 (the workaround and immediate fix to unblock testing)

These have some additional analysis and details of the crio issues:

https://github.com/openshift-metal3/dev-scripts/issues/753
https://github.com/openshift/installer/issues/2251

Comment 2 Steven Hardy 2019-08-28 08:19:15 UTC

Note this is ready to test but there's a mistake in a comment I'd like to fix via https://github.com/openshift/installer/pull/2281

Since we need a valid bug for the PR upstream I'll move this back to POST until that merges.

Comment 4 errata-xmlrpc 2019-10-16 06:37:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922