Bug 1980679
Summary: | On a Azure IPI installation MCO fails to create new nodes | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Victor Medina <vmedina> | ||||
Component: | RHCOS | Assignee: | Benjamin Gilbert <bgilbert> | ||||
Status: | CLOSED ERRATA | QA Contact: | HuijingHei <hhei> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.7 | CC: | bgilbert, dornelas, jligon, miabbott, mrussell, nstielau, smilner | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.9.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1982002 (view as bug list) | Environment: | |||||
Last Closed: | 2021-10-18 17:39:01 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1981999 | ||||||
Bug Blocks: | 1982002, 1982003, 1982004 | ||||||
Attachments: |
|
Description
Victor Medina
2021-07-09 08:54:30 UTC
Created attachment 1799914 [details]
Serial console - try 2
I believe Gen 2 Azure VMs are not supported today. Passing over to the CoreOS team to take a look at the ignition failures as well. This is indeed a problem on gen 1. The CD device doesn't exist at all on gen 2. > ignition[1003]: failed to open config device: open /dev/disk/by-id/ata-Virtual_CD: no medium found Good find. It appears that the RHCOS Azure checkin (which causes the virtual CD to be removed) is racing with Ignition fetch. > [ 12.269632] ignition[1070]: GET error: Get "https://api-int.ocpazrd08.cloud.internal:22623/config/worker": dial tcp: lookup api-int.ocpazrd08.cloud.internal on [::1]:53: read udp [::1]:57936->[::1]:53: read: connection refused This part is probably normal. By design, Ignition may start accessing the network before networking is fully available. It'll retry until it succeeds. This has landed in Git; waiting for bootimage bump. Boot image bump is merged, moving to MODIFIED Thanks Benjamin for your confirmation, change status to verified The fix for this bug will not be delivered to customers until it lands in an updated bootimage. That process is tracked in bug 1981999, which is in state ASSIGNED. Moving this bug back to POST. The fix for this bug has landed in a bootimage bump, as tracked in bug 1981999 (now in status MODIFIED). Moving this bug to MODIFIED. launch 4.9.0-0.nightly-2021-09-25-094414 on azure $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-25-094414 True False 58m Cluster version is 4.9.0-0.nightly-2021-09-25-094414 $ oc get nodes $ oc debug node/worker-node sh-4.4# chroot /host sh-4.4# grep ^After /usr/lib/dracut/modules.d/30rhcos-afterburn-checkin/rhcos-afterburn-checkin.service After=ignition-fetch.service After=coreos-kargs-reboot.service sh-4.4# journalctl -u ignition-fetch | grep -i start; journalctl | grep coreos-kargs-reboot; journalctl -u rhcos-afterburn-checkin | grep -i start Sep 27 08:00:08 localhost systemd[1]: Starting Ignition (fetch)... ... Sep 27 08:00:15 localhost systemd[1]: Started Ignition (fetch). Sep 27 08:00:16 localhost systemd[1]: Starting Afterburn (Check In - from the initramfs)... Sep 27 08:00:45 localhost systemd[1]: Started Afterburn (Check In - from the initramfs). sh-4.4# cat /etc/os-release NAME="Red Hat Enterprise Linux CoreOS" VERSION="49.84.202109241334-0" ID="rhcos" ID_LIKE="rhel fedora" VERSION_ID="4.9" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux CoreOS 49.84.202109241334-0 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos" HOME_URL="https://www.redhat.com/" DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.9/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.9" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.9" OPENSHIFT_VERSION="4.9" RHEL_VERSION="8.4" OSTREE_VERSION='49.84.202109241334-0' Thanks @mnguyen, change status to verified according to Comment 17 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |