Bug 1745004

Summary: [RHHI.next] baremetal: bootstrap ironic services sometimes fail on startup
Product: OpenShift Container Platform Reporter: Steven Hardy <shardy>
Component: InstallerAssignee: Steven Hardy <shardy>
Installer sub component: openshift-installer QA Contact: Arik Chernetsky <achernet>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: dhellmann, mcornea, pehunt
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:37:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steven Hardy 2019-08-23 13:02:42 UTC
Description of problem:

We have observed that sometimes the ironic.service systemd unit (which starts some provisioning related containers via podman) appears active, but actually some of the containers are not responsive.

The root cause appears to be crio deletes some containers on restart, even when they have been started via podman - discussion is in-progress to figure out the best long-term fix for that.

As a workaround we can improve the systemd exec script so that we detect when the podman services become broken, and trigger a systemd restart - this approach will continue to work if/when the crio issues are resolved.

Related upstream issues:

https://github.com/openshift/installer/pull/2249 (the workaround and immediate fix to unblock testing)

These have some additional analysis and details of the crio issues:

https://github.com/openshift-metal3/dev-scripts/issues/753
https://github.com/openshift/installer/issues/2251

Comment 2 Steven Hardy 2019-08-28 08:19:15 UTC
Note this is ready to test but there's a mistake in a comment I'd like to fix via https://github.com/openshift/installer/pull/2281

Since we need a valid bug for the PR upstream I'll move this back to POST until that merges.

Comment 4 errata-xmlrpc 2019-10-16 06:37:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922