Bug 1935419 - Failed to scale worker using virtualmedia on Dell R640
Summary: Failed to scale worker using virtualmedia on Dell R640
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Bob Fournier
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On: 1941636
Blocks: 1937809
TreeView+ depends on / blocked
 
Reported: 2021-03-04 21:12 UTC by Alexander Chuzhoy
Modified: 2021-07-27 22:51 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: For some drives, the partition e.g. `/dev/sda1` will not have the 'ro' file but the base device (`/dev/sda`) will have this file. As a result Ironic can't determine the partition is ReadOnly. Consequence: Metadata cleaning can fail for this drive since it can be determined that the partition is ReadOnly. Fix: If can't detect that the partition is ReadOnly, add an additional check for the base device. Result: Metadata cleaning is not performed on the ReadOnly partition and no failure results.
Clone Of:
: 1937809 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:51:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
error in the console (75.07 KB, image/png)
2021-03-04 21:13 UTC, Alexander Chuzhoy
no flags Details
ironic-conductor log (450.79 KB, application/gzip)
2021-03-04 21:16 UTC, Alexander Chuzhoy
no flags Details
ironic-deploy-ramdisk log (270.47 KB, application/gzip)
2021-03-04 23:52 UTC, Alexander Chuzhoy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift ironic-ipa-downloader pull 62 0 None open Bug 1935419: Update ipa ramdisk version for OCP 4.8 2021-03-16 13:40:43 UTC
OpenStack Storyboard 2008696 0 None None None 2021-03-05 14:28:34 UTC
OpenStack gerrit 779111 0 None NEW Check the base device if the read-only file cannot be read 2021-03-07 20:30:13 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:51:41 UTC

Description Alexander Chuzhoy 2021-03-04 21:12:46 UTC
Failed to scale worker using virtualmedia on Dell PowerEdge R640

Version:
[kni@r640-u09 ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0     True        False         15h     Cluster version is 4.7.0



IdRAC Firmware Version:	4.22.00.00


Steps to reproduce:

Try to scale workers using virtualmedia.



Result:
The BM node passes inspection, but during provisioning it shows an error like in the attached image.
wipefs error /dev/sdc1 probing initialization failed. Read only file system.

Comment 1 Alexander Chuzhoy 2021-03-04 21:13:43 UTC
Created attachment 1760748 [details]
error in the console

Comment 2 Alexander Chuzhoy 2021-03-04 21:16:59 UTC
Created attachment 1760749 [details]
ironic-conductor log

Comment 3 Alexander Chuzhoy 2021-03-04 23:52:13 UTC
Created attachment 1760781 [details]
ironic-deploy-ramdisk log

Comment 4 Bob Fournier 2021-03-05 01:27:05 UTC
Sasha - thanks for the ramdisk logs.  So for this disk fails cleaning we see:

5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Mar 04 15:03:09 localhost.localdomain kernel: sd 15:0:0:0: [sdc] Write Protect is on
5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz: Mar 04 15:03:09 localhost.localdomain kernel: sd 15:0:0:0: [sdc] Mode Sense: 23 00 80 00

2021-03-04 15:03:40.440 1940 WARNING root [-] Could not determine if /dev/sdc1 is a read-only device. Error: [Errno 2] No such file or directory: '/sys/block/sdc1/ro': FileNotFoundError: [Errno 2] No such file or directory: '/sys/block/sdc1/ro'

2021-03-04 15:03:40.533 1940 DEBUG oslo_concurrency.processutils [-] CMD "wipefs --force --all /dev/sdc1" returned: 1 in 0.033s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:416

2021-03-04 15:03:40.685 1940 ERROR root [-] Failed to erase the metadata on device "/dev/sdc1". Error: Unexpected error while running command.
5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz:                                                                                           Command: wipefs --all /dev/sdc1
5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz:                                                                                           Exit code: 1
5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz:                                                                                           Stdout: ''
5b3d383d-350f-4c71-9ee2-e37ae5af905f_cleaning_2021-03-04-20-02-44.tar.gz:                                                                                           Stderr: 'wipefs: error: /dev/sdc1: probing initialization failed: Read-only file system\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.

======================

This isn't one of the NVME drives (see below), it shows as "OEMDRV".  Do you know what drive it is and why Write Protect is on?
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sda1" MODEL="" SIZE="402653184" ROTA="1" TYPE="part"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sda2" MODEL="" SIZE="133169152" ROTA="1" TYPE="part"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sda3" MODEL="" SIZE="1048576" ROTA="1" TYPE="part"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sda4" MODEL="" SIZE="3015687680" ROTA="1" TYPE="part"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sdb" MODEL="Virtual Floppy  " SIZE="" ROTA="1" TYPE="disk"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sdc" MODEL="OEMDRV          " SIZE="322961408" ROTA="1" TYPE="disk"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sdc1" MODEL="" SIZE="322960896" ROTA="1" TYPE="part"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="sr0" MODEL="Virtual CD      " SIZE="507875328" ROTA="1" TYPE="rom"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="nvme0n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="nvme1n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="nvme2n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           KNAME="nvme3n1" MODEL="Dell Express Flash NVMe P4610 1.6TB SFF " SIZE="1600000000000" ROTA="0" TYPE="disk"
d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:                                                                                           " execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103

Comment 5 Derek Higgins 2021-03-05 09:38:23 UTC
(In reply to Bob Fournier from comment #4)
> This isn't one of the NVME drives (see below), it shows as "OEMDRV".  Do you
> know what drive it is and why Write Protect is on?


> d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:   
> KNAME="sdc" MODEL="OEMDRV          " SIZE="322961408" ROTA="1" TYPE="disk"
> d6a6c8af-4d49-4561-ae87-cafabf752de7_cleaning_2021-03-04-21-01-51.tar.gz:   
> KNAME="sdc1" MODEL="" SIZE="322960896" ROTA="1" TYPE="part"

I havn't seen this before but looking at info online it appears to be a
drive the the host attaches in order to re-install the OEM OS. 

Apparently it gets removes after 18 hours or you can
"restart the server and press F10 to enter the Lifecycle Controller
configuration. Then exit the Lifecycle Controller and reboot again"[1]


1 - http://byronwright.blogspot.com/2014/08/remove-oemdrv-drive-from-dell-server.html

Comment 6 Bob Fournier 2021-03-05 12:11:41 UTC
Yeah as Derek found, this drive is unnecessary. We should also be able to unmount it via the iDRAC 9 GUI according to the Dell documentation - https://www.dell.com/support/kbdoc/en-us/000160908/how-to-mount-and-unmount-the-driver-packs-via-idrac9

Comment 8 Lubov 2021-03-18 07:05:47 UTC
Sasha, could you, please, verify it on your setup. Our Dells are different(Dell R740 (Core))

Comment 9 Alexander Chuzhoy 2021-03-23 23:02:33 UTC
After several attempts - we seem to be blocked on https://bugzilla.redhat.com/show_bug.cgi?id=1941636

Comment 10 Alexander Chuzhoy 2021-03-28 00:32:09 UTC
Verified.
Version:  registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-03-26-054333

The reported issue doesn't reproduce.
The node got provisioned.


[kni@r640-u09 ~]$ oc get bmh -A
NAMESPACE               NAME                 STATE         CONSUMER                   ONLINE   ERROR
openshift-machine-api   openshift-master-0   unmanaged     qe3-xphng-master-0         true     
openshift-machine-api   openshift-master-1   unmanaged     qe3-xphng-master-1         true     
openshift-machine-api   openshift-master-2   unmanaged     qe3-xphng-master-2         true     
openshift-machine-api   openshift-worker-0   unmanaged     qe3-xphng-worker-0-bb88q   true     
openshift-machine-api   openshift-worker-1   unmanaged     qe3-xphng-worker-0-gkwdl   true     
openshift-machine-api   openshift-worker-3   provisioned   qe3-xphng-worker-0-zkk9r   true     
[kni@r640-u09 ~]$

Comment 13 errata-xmlrpc 2021-07-27 22:51:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.