Bug 1775728
Summary: | Recent builds booting to Emergency mode during install on AWS: e.g. 4.3.0-0.nightly-2019-11-21-122827 and 4.3.0-0.nightly-2019-11-22-050018 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||||
Component: | RHCOS | Assignee: | Colin Walters <walters> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.3.0 | CC: | adahiya, bbreard, behoward, dcain, dustymabe, eslutsky, imcleod, jligon, nstielau | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.4.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1777036 (view as bug list) | Environment: | |||||||
Last Closed: | 2020-05-04 11:16:20 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1777036 | ||||||||
Attachments: |
|
Created attachment 1638804 [details]
System log
[ 58.569493] systemd-udevd[1074]: Process '/usr/bin/systemctl --no-block start coreos-luks-open@789cdd0a-07a5-485c-8373-4a2316680b6a' failed with exit code 4. [ 58.569783] multipathd[1091]: Nov 22 16:27:50 | /etc/multipath.conf does not exist, blacklisting all devices. [ 58.569809] multipathd[1091]: Nov 22 16:27:50 | You can run "/sbin/mpathconf --enable" to create [ 58.569823] multipathd[1091]: Nov 22 16:27:50 | /etc/multipath.conf. See man mpathconf(8) for more details 4.3.0-0.nightly-2019-11-19-122017 is known to be OK Hmm. I just sent up https://github.com/openshift/installer/pull/2714 which should help with this. I went through a normal install on AWS of the 4.4 nightly below with no emergency shell. I checked the rhcos version to make sure it had the bump from the PR. If anyone is still seeing issues, please reply. Otherwise I will close as verified. $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-128-119.ec2.internal Ready master 30m v1.16.2 ip-10-0-132-51.ec2.internal Ready worker 19m v1.16.2 ip-10-0-145-29.ec2.internal Ready worker 19m v1.16.2 ip-10-0-145-74.ec2.internal Ready master 30m v1.16.2 ip-10-0-164-54.ec2.internal Ready master 30m v1.16.2 ip-10-0-170-213.ec2.internal Ready worker 18m v1.16.2 $ oc debug node/ip-10-0-128-119.ec2.internal Starting pod/ip-10-0-128-119ec2internal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm-ostree status State: idle AutomaticUpdates: disabled Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0a8502a25bd2c039a8a3dcfb0396688f0801b96a77daec774172f4622fa792b7 CustomOrigin: Managed by machine-config-operator Version: 43.81.201912040328.0 (2019-12-04T03:33:31Z) ostree://e884477421640d1285c07a6dd9aaf01c9e125038ebbe6290a5e341eb3695a4d1 Version: 43.81.201911221453.0 (2019-11-22T14:58:44Z) sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2019-12-04-104500 True False 5m39s Cluster version is 4.4.0-0.nightly-2019-12-04-104500 *** Bug 1780120 has been marked as a duplicate of this bug. *** I have not seen this on recent builds. Marking verified on registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-05-213858 Seeing this on baremetal with the latest metal version rhcos-43.81.201912030353.0-metal.x86_64.raw.gz available on the public mirror here: https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.3/latest/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |
Created attachment 1638801 [details] Console of hung instance Description of problem: Running an OOTB openshift-install on AWS on recent builds like 4.3.0-0.nightly-2019-11-21-122827 and 4.3.0-0.nightly-2019-11-22-050018. Install fails and the bootstrap node refuses ssh connections. Looking at the screenshot of the instance console from AWS show it is in emergency mode. (Screenshot attached system log will be attached) Version-Release number of selected component (if applicable):4.3.0-0.nightly-2019-11-22-050018 How reproducible: Always - not sure how builds passed CI, but always happens. Steps to Reproduce: 1. On AWS openshift-install create cluster and walk through a normal install 2. 3.