1900699 – Impossible to add new Node on OCP 4.6 using large ECKD disks - fdasd issue

Bug 1900699 - Impossible to add new Node on OCP 4.6 using large ECKD disks - fdasd issue

Summary: Impossible to add new Node on OCP 4.6 using large ECKD disks - fdasd issue

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.6.z
Hardware:	s390x
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Nikita Dubrovskii (IBM)
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1915617
TreeView+	depends on / blocked

Reported:	2020-11-23 14:51 UTC by yannkindelberger
Modified:	2021-02-24 15:35 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: When coreos-installer invokes fdasd to check for a valid DASD label on s390x, udev re-probes the DASD device. Consequence: Formatting the DASD fails because udev is still accessing the device. Fix: After checking for a DASD label, wait for udev to finish processing the DASD. Result: Formatting the DASD is successful.
Clone Of:
Environment:
Last Closed:	2021-02-24 15:35:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Installation log (165.88 KB, text/plain) 2020-11-23 14:51 UTC, yannkindelberger	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	coreos coreos-installer pull 425	0	None	closed	s390x: trigger 'udevadm settle' after checking DASD's validity	2021-02-15 23:11:06 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:35:55 UTC

Description yannkindelberger 2020-11-23 14:51:22 UTC

Created attachment 1732592 [details]
Installation log

Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version:4.6.4 on Linux s390x

Platform: Linux on IBM Z under z/VM

UPI (semi-manual installation on customized infrastructure)

What happened?

We try to add a new worker node in our existing OCP 4.6.4 on s390x.
During the installation of the CoreOS, the installation fails because of fdasd command.

The DASD is in use during the fdasd command. See extract of the log below.
The complete log is attached to this case.

¬ 1164.605587| coreos-installer-service¬1432|: fdasd error:  Disk in use
¬ 1164.605637| coreos-installer-service¬1432|: DASD '/dev/dasda' is in use. Unmount it first§
¬ 1164.605777| coreos-installer-service¬1432|: Error: auto-formatting /dev/dasda failed
¬ 1164.605804| coreos-installer-service¬1432|: Caused by: "fdasd" "-a" "-s" "/dev/dasda" failed with exit code: 1

It seams that prior partitioning a DASD device with 'fdasd -a', 'fdasd -p -s'
is called to check if there are partitions of the device already.
If there is no partition 'fdasd -a' is issued.
It looks like udev is still keeping the device in use a few moments
longer after 'fdasd -p -s' finished its work.

In order to fix: run 'udevadm settle' after 'fdasd -p -s', to wait for outstanding udev events to get finished.

What did you expect to happen?

Installation successful of the CoreOS and worker node.

How to reproduce it (as minimally and precisely as possible)?
Add a new worker node on OCP 4.6.x on Linux on Z with a large ECKD disk (EAV).

#Enter text here.

Comment 1 Dan Li 2020-11-23 15:13:19 UTC

Removing "ibm-zseries" tag for broader visibility

Comment 2 Brenton Leanhardt 2020-11-30 18:09:56 UTC

This looks like the coreos installer.  Moving.

Comment 3 Micah Abbott 2020-12-01 21:37:59 UTC

Targeting for 4.7 with high priority; if a fix is required in 4.6.z we will create the necessary clone BZs

Comment 4 Nikita Dubrovskii (IBM) 2020-12-02 09:10:53 UTC

Hi, installer doesn't use `fdasd -p` to read partition table. 
Your DASD is in some wrong state, so please run `fdasd -a /dev/dasda` from the emergency console and then try installing again. 
I saw similar DASD issue once last year, it may happen due to unexpected poweroff of zVM. 
I'll try reproducing this issue and fixing it.

Comment 5 Hendrik Brueckner 2020-12-03 17:14:45 UTC

The message:

¬ 1164.605587| coreos-installer-service¬1432|: fdasd error:  Disk in use
¬ 1164.605637| coreos-installer-service¬1432|: DASD '/dev/dasda' is in use. Unmount it first§

sounds like that some process has still the dasd opened (could be a mount or some utility that operates on the dasd device).

Comment 6 yannkindelberger 2020-12-10 16:54:57 UTC

Hello,
Worked on the case with Nikita Dubrovskii (IBM).
The fix provided by Nikita fixed rhcos-image.
The z/VM Linux system was installed successfully.

Comment 8 Michael Nguyen 2021-01-13 13:49:51 UTC

Closing as verified based on https://bugzilla.redhat.com/show_bug.cgi?id=1900699#c6

Comment 11 errata-xmlrpc 2021-02-24 15:35:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.