1905159 – Installation on previous unused dasd fails after formatting

Bug 1905159 - Installation on previous unused dasd fails after formatting

Summary: Installation on previous unused dasd fails after formatting

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.7
Hardware:	s390x
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Nikita Dubrovskii (IBM)
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	1933766
Blocks:	ocp-47-z-tracker
TreeView+	depends on / blocked

Reported:	2020-12-07 16:36 UTC by Stefan Orth
Modified:	2021-07-27 22:35 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: When coreos-installer queries the kernel for the sector size of an unformatted DASD, it is told that the DASD has 512 byte sectors. Consequence: A 512-byte-sector OS image is incorrectly used to install the OS, even though coreos-installer formats the DASD with 4096 byte sectors. Fix: When installing to an unformatted DASD, assume that the sector size will be 4096 bytes. Result: coreos-installer installs the correct OS image to the DASD.
Clone Of:
Clones:	1933766 (view as bug list)
Environment:
Last Closed:	2021-07-27 22:34:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	coreos coreos-installer pull 430	0	None	closed	s390x: use recommended blocksize of 4096 bytes for unformatted ECKD DASD disks	2021-02-19 07:36:07 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:35:11 UTC

Description Stefan Orth 2020-12-07 16:36:41 UTC

Description of problem:

Installation on a previous unused dasd fails after formatting the dasd. Starting the installation process again, the installation works. It also works if I format the dasd manually on another node and then start the installation.

Version-Release number of selected component (if applicable):

Client Version: 4.7.0-0.nightly-s390x-2020-12-03-121304
Server Version: 4.7.0-0.nightly-s390x-2020-12-03-121304
Kubernetes Version: v1.19.2+ad738ba

How reproducible:

Use a dasd which is not formatted (example like c7b2) and start installation.
 
Bus-ID    Status    Name      Device  Type         BlkSz  Size      Blocks
================================================================================
0.0.5060  active    dasda     94:0    ECKD         4096   61441MB   15729120
0.0.c7b2  n/f       dasdb     94:4    ECKD     

Steps to Reproduce:
1.
2.
3.

Actual results:

Installation stops (emergency console) with the following error:

-------

Ý   24.467764¨ coreos-installer-serviceÝ1163¨: Installing Red Hat Enterprise Lin
ux CoreOS 47.83.202012030410-0 (Ootpa) s390x (512-byte sectors) 
Ý   24.483811¨ coreos-installer-serviceÝ1163¨: Performing low-level format for /
dev/dasda 
Ý  113.366567¨ coreos-installer-serviceÝ1163¨: cyl       1 of   60102 |---------
------------------------|  0% Ý--¨ cyl      11 of   60102 | cyl      21 of   601


 60102 | cyl   56941 of   60102 | cyl   56951 of   60102 | cyl   56961 of   6010
2 | cyl   56971 of   60102 | cyl   56981 of   60102 | cyl   56991 of   60102 | c
yl   57001 of   60102 | cyl   57011 of   60102 | cyl   57021 of   60102 | cyl   
57031 of   60102 
Ý  309.899787¨ dasd-eckd 0.0.c7b1: DASD with 4 KB/block, 43273440 KB total size,
 48 KB/track, compatible disk layout 
Ý  309.899812¨ dasda: detected capacity change from 0 to 44312002560 
Ý  309.900438¨  dasda: 
Ý  309.906175¨  dasda:VOL1/  0XC7B1: 
Ý  309.906498¨ coreos-installer-serviceÝ1163¨:  | cyl   57041 of   60102 | cyl  
 57051 of   60102 | cyl   57061 of   60102 | cyl   57071 of   60102 | cyl   5708
1 of   60102 | cyl   57091 of   60102 | cyl   57101 of   60102 |################
###############--| 95% Ý14s¨     cyl   57111 of   60102 | cyl   57121 of   60102
 | cyl   57131 of   60102

                                                            MORE...   BOET8359
59961 of   60102 | cyl   59971 of   60102 | cyl   59981 of   60102 | cyl   59991
 of   60102 | cyl   60001 of   60102 | cyl   60011 of   60102 | cyl   60021 of  
 60102 | cyl   60031 of   60102 | cyl   60041 of   60102 | cyl   60051 of   6010
2 | cyl   60061 of   60102 | cyl   60071 of   60102 | cyl   60081 of   60102 | c
yl   60091 of   60102 | cyl   60101 of   60102 | cyl   60102 of   60102 |#######
##########################|100% Ý4m 45s¨ 
Ý  309.906651¨ coreos-installer-serviceÝ1163¨: Finished formatting the device. 
Ý  309.906676¨ coreos-installer-serviceÝ1163¨: Rereading the partition table... 
ok 
Ý  310.159474¨  dasda:VOL1/  0XC7B1: 
Ý  310.160437¨  dasda:VOL1/  0XC7B1: 
Ý  310.163560¨  dasda:VOL1/  0XC7B1: 
Ý  310.163873¨ coreos-installer-serviceÝ1163¨: Error: source has sector size 512
 but destination has sector size 4096 
Ý  310.164017¨ coreos-installer-serviceÝ1163¨: Resetting partition table 
Ý  310.399325¨ coreos-installer-serviceÝ1163¨: Error: install failed 
Ý  310.171875¨  dasda: 
Ý Ý0;1;31mFAILED Ý0m¨ Failed to start CoreOS Installer

------

Expected results:

The installation process is able to handle not formatted dasds.

Additional info:

Comment 1 Benjamin Gilbert 2020-12-08 21:22:36 UTC

Earlier in the logs, is there a message like "Found non-standard sector size {} for {}, assuming 512b-compatible"?

It looks as though we detect 512-byte sectors, select the 512b image, and then the DASD turns into a 4Kn disk once formatted.  Then when we go to use the selected install image, we find a sector size mismatch.

If there's a clean way to detect the sector size the DASD will have after formatting (maybe a DASD-specific ioctl?) that'd probably be best.  Otherwise we have to either delay or retry image selection, which happens early, in the command-line parser.

4.6 presumably has the same problem.

Comment 2 Stefan Orth 2020-12-09 09:17:13 UTC

Unfortunately I have not the full log, because it was copied from 3270 console. What I found is the following message:

    4.433100¨ dasd-eckd 0.0.c7b1: New DASD 3390/0C (CU 3990/01) with 60102 cyli
nders, 15 heads, 224 sectors 
Ý    4.434379¨ dasd-eckd 0.0.c7b1: The DASD is not formatted

Comment 3 Nikita Dubrovskii (IBM) 2020-12-09 12:56:18 UTC

Hi. I reproduced this on my zVM:

```
[   10.883466] coreos-installer-service[1106]: Installing Red Hat Enterprise Linux CoreOS 46.82.202012090742-0 (Ootpa) s390x (512-by
te sectors) 
[   11.095542] coreos-installer-service[1106]: Performing low-level format for /dev/disk/by-path/ccw-0.0.6609 
```

So IOCTL returns 512b for uninitialized DASD. 

Working on fix

Comment 4 Nikita Dubrovskii (IBM) 2020-12-09 13:38:04 UTC

Update:
If when we run `dasdfmt` for uninitialized DASD, than its blocksize changes from 512 to 4096 (default for `dasdfmt`)

```
$ fdasd -p /dev/dasdb
fdasd error:  Unsupported disk format

$ ioctl_sector_size
Device /dev/dasdb has 512 sector size
```

@Stefan : as a workaround you can add `coreos.inst.image_url=xxxxxx' option to your `boot.parm`

Comment 5 Stefan Orth 2020-12-10 07:26:49 UTC

@ndubrovs What should be the content of the parameter? I have 'coreos.live.rootfs_url=http://xxxx-live-rootfs.img' as parameter in my boot.parm due to live installation (I guess since 4.6).

Comment 6 Nikita Dubrovskii (IBM) 2020-12-10 09:48:46 UTC

@Stefan

`rootfs_url` is used to specify `live-rootfs` image (before it was within `live-initramfs`).

Usually you have several disk images, for example:
- rhcos-46.82.202012091721-0-metal.s390x.raw
- rhcos-46.82.202012091721-0-metal4k.s390x.raw
- rhcos-47.83.202012080826-0-dasd.s390x.raw

So for DASD you can set:  `coreos.inst.image_url=http://x.x.x.x/rhcos-xxxxxxxxx-dasd.s390x.raw`

Comment 7 Micah Abbott 2021-01-15 20:54:12 UTC

Higher priority work has prevented this issue from being solved; adding UpcomingSprint keyword

Comment 9 Nikita Dubrovskii (IBM) 2021-01-18 10:51:18 UTC

@Micah Abbott

The fix for this one was merged on Dec 16 - https://github.com/coreos/coreos-installer/pull/430
Sorry, forgot to mention it here

Comment 10 krmoser 2021-01-26 18:23:04 UTC

Micah and Nikita,

1. Has this fix been incorporated into an available/existing RHCOS 47.83 build?

2. If not, is there a target date/build when this fix may be included in an upcoming RHCOS 47.83 build?

Thank you,
Kyle

Comment 11 Micah Abbott 2021-01-26 19:12:26 UTC

(In reply to krmoser from comment #10)
> Micah and Nikita,
> 
> 1. Has this fix been incorporated into an available/existing RHCOS 47.83
> build?

This fix was included as part of `coreos-installer-0.8.0-1.rhaos4.7.el8` which first found it's way into RHCOS 47.83.202101120742-0

You can get OCP 4.7.0-fc.4 from the mirror https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/pre-release/4.7.0-fc.4/

...which uses RHCOS 47.83.202101171239-0 and includes a newer version of `coreos-installer`

Comment 12 Nikita Dubrovskii (IBM) 2021-01-28 10:40:31 UTC

There is a 2nd part of a fix for that issue - https://github.com/ibm-s390-tools/s390-tools/commit/bf9482709fa63797d7bacb2ab93a86efa3962528 
But i'm not sure about RPM contains it or not

Comment 13 Micah Abbott 2021-01-28 14:18:39 UTC

(In reply to Nikita Dubrovskii (IBM) from comment #12)
> There is a 2nd part of a fix for that issue -
> https://github.com/ibm-s390-tools/s390-tools/commit/
> bf9482709fa63797d7bacb2ab93a86efa3962528 
> But i'm not sure about RPM contains it or not

As best as I can tell, we are including `s390utils-base-2.6.0-33.el8` in the most recent RHCOS 4.7 builds.  This version of `s390utils-base` is from Jul 2020, so it does not have the patch referenced in that commit.

I don't see any new releases of that package planned for RHEL 8.3.z, so if we wanted to include it in RHCOS, we would need a new BZ to get the package created and shipped as part of the RHEL 8.3 z-stream.

Comment 14 Stefan Orth 2021-01-28 14:57:49 UTC

Is the 2nd part needed to fix the problem in rhcos as mentioned in comment11 ?

Comment 15 Nikita Dubrovskii (IBM) 2021-01-29 08:42:42 UTC

(In reply to Stefan Orth from comment #14)
> Is the 2nd part needed to fix the problem in rhcos as mentioned in comment11
> ?

I guess you were talking about comment12. Than yes, it's also needed

Comment 17 Prashanth Sundararaman 2021-02-22 15:51:04 UTC

Hi Dan,

Can we get this fix (https://github.com/ibm-s390-tools/s390-tools/commit/bf9482709fa63797d7bacb2ab93a86efa3962528) included in s390utils for RHEL8.4 ?

Thanks

Comment 18 Dan Horák 2021-02-22 17:07:36 UTC

(In reply to Prashanth Sundararaman from comment #17)
> Hi Dan,
> 
> Can we get this fix
> (https://github.com/ibm-s390-tools/s390-tools/commit/
> bf9482709fa63797d7bacb2ab93a86efa3962528) included in s390utils for RHEL8.4 ?

We are entering the exception phase for 8.4, so it would need a further approval, but I believe there is still a chance. If not, then 8.5 + 8.4 z-stream will be the way. And we will need a separate bug for RHEL as usually.

Comment 21 Micah Abbott 2021-03-01 14:59:24 UTC

Retargeting for 4.8.0; we will need an updated `s390utils` RPM with the fix from https://github.com/ibm-s390-tools/s390-tools/commit/bf9482709fa63797d7bacb2ab93a86efa3962528 included to properly call this problem fixed.

Will drop this from the 4.7.z errata.

Comment 23 Micah Abbott 2021-03-17 17:23:35 UTC

Moving back to POST since we are waiting on a new `s390utils` RPM from BZ#1933766.  It looks like it won't be delivered until part of RHEL 8.4, so this is going to sticking around for a while.

Comment 24 Micah Abbott 2021-05-12 19:35:16 UTC

Once RHCOS rebases to RHEL 8.4 GA, we can move this to MODIFIED and ask that this scenario is retested.

Comment 25 Micah Abbott 2021-05-20 16:49:01 UTC

RHCOS 4.8 moved to using RHEL 8.4 GA content with build 48.84.202105182219-0 which included `s390utils-2.15.1-5.el8`

This build and newer are available in the OCP 4.8 nightly payloads.

Moving to MODIFIED.

Comment 27 Stefan Orth 2021-05-26 15:08:37 UTC

Works with:

Client Version: 4.8.0-0.nightly-s390x-2021-05-26-071457
Server Version: 4.8.0-0.nightly-s390x-2021-05-26-071457
Kubernetes Version: v1.21.0-rc.0+936c9e2

VERSION="48.84.202105260123-0"

Before installation:
-------------------
[root@m3558001 AUTOMATION]# lsdasd
Bus-ID     Status      Name      Device  Type  BlkSz  Size      Blocks
==============================================================================
0.0.a987   active      dasda     94:0    ECKD  4096   21129MB   5409180
0.0.a989   active      dasdb     94:4    ECKD  4096   21129MB   5409180
0.0.ee0e   n/f         dasdd     94:12   ECKD


After installation:
-------------------
[core@worker-0 ~]$ lsdasd
Bus-ID    Status    Name      Device  Type         BlkSz  Size      Blocks
================================================================================
0.0.ee0e  active    dasda     94:0    ECKD         4096   42259MB   10818360

Comment 28 Micah Abbott 2021-05-26 16:52:06 UTC

Marking VERIFIED based on comment #27.  Thanks Stefan!

Comment 31 errata-xmlrpc 2021-07-27 22:34:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.