Bug 1745004 - [RHHI.next] baremetal: bootstrap ironic services sometimes fail on startup
Summary: [RHHI.next] baremetal: bootstrap ironic services sometimes fail on startup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.0
Assignee: Steven Hardy
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-23 13:02 UTC by Steven Hardy
Modified: 2019-10-16 06:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:37:21 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 2249 None None None 2019-08-23 13:05:05 UTC
Github openshift installer pull 2281 None None None 2019-08-28 08:19:52 UTC
Red Hat Product Errata RHBA-2019:2922 None None None 2019-10-16 06:37:45 UTC

Description Steven Hardy 2019-08-23 13:02:42 UTC
Description of problem:

We have observed that sometimes the ironic.service systemd unit (which starts some provisioning related containers via podman) appears active, but actually some of the containers are not responsive.

The root cause appears to be crio deletes some containers on restart, even when they have been started via podman - discussion is in-progress to figure out the best long-term fix for that.

As a workaround we can improve the systemd exec script so that we detect when the podman services become broken, and trigger a systemd restart - this approach will continue to work if/when the crio issues are resolved.

Related upstream issues:

https://github.com/openshift/installer/pull/2249 (the workaround and immediate fix to unblock testing)

These have some additional analysis and details of the crio issues:

https://github.com/openshift-metal3/dev-scripts/issues/753
https://github.com/openshift/installer/issues/2251

Comment 2 Steven Hardy 2019-08-28 08:19:15 UTC
Note this is ready to test but there's a mistake in a comment I'd like to fix via https://github.com/openshift/installer/pull/2281

Since we need a valid bug for the PR upstream I'll move this back to POST until that merges.

Comment 4 errata-xmlrpc 2019-10-16 06:37:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.