Bug 1900699 - Impossible to add new Node on OCP 4.6 using large ECKD disks - fdasd issue
Summary: Impossible to add new Node on OCP 4.6 using large ECKD disks - fdasd issue
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6.z
Hardware: s390x
OS: Linux
high
urgent
Target Milestone: ---
: 4.7.0
Assignee: Nikita Dubrovskii (IBM)
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1915617
TreeView+ depends on / blocked
 
Reported: 2020-11-23 14:51 UTC by yannkindelberger
Modified: 2021-02-24 15:35 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When coreos-installer invokes fdasd to check for a valid DASD label on s390x, udev re-probes the DASD device. Consequence: Formatting the DASD fails because udev is still accessing the device. Fix: After checking for a DASD label, wait for udev to finish processing the DASD. Result: Formatting the DASD is successful.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:35:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Installation log (165.88 KB, text/plain)
2020-11-23 14:51 UTC, yannkindelberger
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github coreos coreos-installer pull 425 0 None closed s390x: trigger 'udevadm settle' after checking DASD's validity 2021-02-15 23:11:06 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:35:55 UTC

Description yannkindelberger 2020-11-23 14:51:22 UTC
Created attachment 1732592 [details]
Installation log

Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:4.6.4 on Linux s390x

Platform: Linux on IBM Z under z/VM

UPI (semi-manual installation on customized infrastructure)

What happened?

We try to add a new worker node in our existing OCP 4.6.4 on s390x.
During the installation of the CoreOS, the installation fails because of fdasd command.

The DASD is in use during the fdasd command. See extract of the log below.
The complete log is attached to this case.

¬ 1164.605587| coreos-installer-service¬1432|: fdasd error:  Disk in use
¬ 1164.605637| coreos-installer-service¬1432|: DASD '/dev/dasda' is in use. Unmount it first§
¬ 1164.605777| coreos-installer-service¬1432|: Error: auto-formatting /dev/dasda failed
¬ 1164.605804| coreos-installer-service¬1432|: Caused by: "fdasd" "-a" "-s" "/dev/dasda" failed with exit code: 1

It seams that prior partitioning a DASD device with 'fdasd -a', 'fdasd -p -s'
is called to check if there are partitions of the device already.
If there is no partition 'fdasd -a' is issued.
It looks like udev is still keeping the device in use a few moments
longer after 'fdasd -p -s' finished its work.

In order to fix: run 'udevadm settle' after 'fdasd -p -s', to wait for outstanding udev events to get finished.

What did you expect to happen?

Installation successful of the CoreOS and worker node.

How to reproduce it (as minimally and precisely as possible)?
Add a new worker node on OCP 4.6.x on Linux on Z with a large ECKD disk (EAV).

#Enter text here.

Comment 1 Dan Li 2020-11-23 15:13:19 UTC
Removing "ibm-zseries" tag for broader visibility

Comment 2 Brenton Leanhardt 2020-11-30 18:09:56 UTC
This looks like the coreos installer.  Moving.

Comment 3 Micah Abbott 2020-12-01 21:37:59 UTC
Targeting for 4.7 with high priority; if a fix is required in 4.6.z we will create the necessary clone BZs

Comment 4 Nikita Dubrovskii (IBM) 2020-12-02 09:10:53 UTC
Hi, installer doesn't use `fdasd -p` to read partition table. 
Your DASD is in some wrong state, so please run `fdasd -a /dev/dasda` from the emergency console and then try installing again. 
I saw similar DASD issue once last year, it may happen due to unexpected poweroff of zVM. 
I'll try reproducing this issue and fixing it.

Comment 5 Hendrik Brueckner 2020-12-03 17:14:45 UTC
The message:

¬ 1164.605587| coreos-installer-service¬1432|: fdasd error:  Disk in use
¬ 1164.605637| coreos-installer-service¬1432|: DASD '/dev/dasda' is in use. Unmount it first§

sounds like that some process has still the dasd opened (could be a mount or some utility that operates on the dasd device).

Comment 6 yannkindelberger 2020-12-10 16:54:57 UTC
Hello,
Worked on the case with Nikita Dubrovskii (IBM).
The fix provided by Nikita fixed rhcos-image.
The z/VM Linux system was installed successfully.

Comment 8 Michael Nguyen 2021-01-13 13:49:51 UTC
Closing as verified based on https://bugzilla.redhat.com/show_bug.cgi?id=1900699#c6

Comment 11 errata-xmlrpc 2021-02-24 15:35:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.