Bug 1935419
Summary: | Failed to scale worker using virtualmedia on Dell R640 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Alexander Chuzhoy <sasha> | ||||||||
Component: | Bare Metal Hardware Provisioning | Assignee: | Bob Fournier <bfournie> | ||||||||
Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Amit Ugol <augol> | ||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||
Severity: | high | ||||||||||
Priority: | high | CC: | bfournie, derekh, lshilin, yprokule | ||||||||
Version: | 4.7 | Keywords: | Triaged | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | 4.8.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
Cause: For some drives, the partition e.g. `/dev/sda1`
will not have the 'ro' file but the base device (`/dev/sda`) will have this file. As a result Ironic can't determine the partition is ReadOnly.
Consequence: Metadata cleaning can fail for this drive since it can be determined that the partition is ReadOnly.
Fix: If can't detect that the partition is ReadOnly, add an additional check for the base device.
Result: Metadata cleaning is not performed on the ReadOnly partition and no failure results.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 1937809 (view as bug list) | Environment: | |||||||||
Last Closed: | 2021-07-27 22:51:10 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1941636 | ||||||||||
Bug Blocks: | 1937809 | ||||||||||
Attachments: |
|
Description
Alexander Chuzhoy
2021-03-04 21:12:46 UTC
Created attachment 1760748 [details]
error in the console
Created attachment 1760749 [details]
ironic-conductor log
Created attachment 1760781 [details]
ironic-deploy-ramdisk log
Sasha - thanks for the ramdisk logs. So for this disk fails cleaning we see: 5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Mar 04 15:03:09 localhost.localdomain kernel: sd 15:0:0:0: [sdc] Write Protect is on 5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Mar 04 15:03:09 localhost.localdomain kernel: sd 15:0:0:0: [sdc] Mode Sense: 23 00 80 00 2021-03-04 15:03:40.440 1940 WARNING root [-] Could not determine if /dev/sdc1 is a read-only device. Error: [Errno 2] No such file or directory: '/sys/block/sdc1/ro': FileNotFoundError: [Errno 2] No such file or directory: '/sys/block/sdc1/ro' 2021-03-04 15:03:40.533 1940 DEBUG oslo_concurrency.processutils [-] CMD "wipefs --force --all /dev/sdc1" returned: 1 in 0.033s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:416 2021-03-04 15:03:40.685 1940 ERROR root [-] Failed to erase the metadata on device "/dev/sdc1". Error: Unexpected error while running command. 5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Command: wipefs --all /dev/sdc1 5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Exit code: 1 5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Stdout: '' 5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Stderr: 'wipefs: error: /dev/sdc1: probing initialization failed: Read-only file system\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command. ====================== This isn't one of the NVME drives (see below), it shows as "OEMDRV". Do you know what drive it is and why Write Protect is on? d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sda1" MODEL="" SIZE="402653184" ROTA="1" TYPE="part" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sda2" MODEL="" SIZE="133169152" ROTA="1" TYPE="part" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sda3" MODEL="" SIZE="1048576" ROTA="1" TYPE="part" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sda4" MODEL="" SIZE="3015687680" ROTA="1" TYPE="part" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sdb" MODEL="Virtual Floppy " SIZE="" ROTA="1" TYPE="disk" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sdc" MODEL="OEMDRV " SIZE="322961408" ROTA="1" TYPE="disk" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sdc1" MODEL="" SIZE="322960896" ROTA="1" TYPE="part" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="sr0" MODEL="Virtual CD " SIZE="507875328" ROTA="1" TYPE="rom" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="nvme0n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="nvme1n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="nvme2n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: KNAME="nvme3n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk" d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: " execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 (In reply to Bob Fournier from comment #4) > This isn't one of the NVME drives (see below), it shows as "OEMDRV". Do you > know what drive it is and why Write Protect is on? > d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: > KNAME="sdc" MODEL="OEMDRV " SIZE="322961408" ROTA="1" TYPE="disk" > d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz: > KNAME="sdc1" MODEL="" SIZE="322960896" ROTA="1" TYPE="part" I havn't seen this before but looking at info online it appears to be a drive the the host attaches in order to re-install the OEM OS. Apparently it gets removes after 18 hours or you can "restart the server and press F10 to enter the Lifecycle Controller configuration. Then exit the Lifecycle Controller and reboot again"[1] 1 - http://byronwright.blogspot.com/2014/08/remove-oemdrv-drive-from-dell-server.html Yeah as Derek found, this drive is unnecessary. We should also be able to unmount it via the iDRAC 9 GUI according to the Dell documentation - https://www.dell.com/support/kbdoc/en-us/000160908/how-to-mount-and-unmount-the-driver-packs-via-idrac9 Sasha, could you, please, verify it on your setup. Our Dells are different(Dell R740 (Core)) After several attempts - we seem to be blocked on https://bugzilla.redhat.com/show_bug.cgi?id=1941636 Verified. Version: registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-03-26-054333 The reported issue doesn't reproduce. The node got provisioned. [kni@r640-u09 ~]$ oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR openshift-machine-api openshift-master-0 unmanaged qe3-xphng-master-0 true openshift-machine-api openshift-master-1 unmanaged qe3-xphng-master-1 true openshift-machine-api openshift-master-2 unmanaged qe3-xphng-master-2 true openshift-machine-api openshift-worker-0 unmanaged qe3-xphng-worker-0-bb88q true openshift-machine-api openshift-worker-1 unmanaged qe3-xphng-worker-0-gkwdl true openshift-machine-api openshift-worker-3 provisioned qe3-xphng-worker-0-zkk9r true [kni@r640-u09 ~]$ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |