RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1933235 - RHEL8.4 - installer fails to write boot record on 4k scsi lun on s390x
Summary: RHEL8.4 - installer fails to write boot record on 4k scsi lun on s390x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: s390utils
Version: 8.4
Hardware: s390x
OS: Linux
unspecified
urgent
Target Milestone: rc
: 8.4
Assignee: Dan Horák
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1796871 1918723
TreeView+ depends on / blocked
 
Reported: 2021-02-26 08:21 UTC by Thomas Staudt
Modified: 2022-10-03 14:30 UTC (History)
21 users (show)

Fixed In Version: s390utils-2.15.1-5.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1918723
: 1937894 (view as bug list)
Environment:
Last Closed: 2021-05-18 14:55:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 191653 0 None None None 2021-02-26 09:37:12 UTC
Red Hat Product Errata RHBA-2021:1617 0 None None None 2021-05-18 14:55:27 UTC

Description Thomas Staudt 2021-02-26 08:21:32 UTC
+++ This bug was initially created as a clone of Bug #1918723 +++

Version:

$ openshift-install version
install_version: "4.7.0-0.nightly-s390x-2021-01-19-095800"
rhcos_version: "4.7.0-fc.2-s390x"

Platform:

s390x

Please specify:
UPI

What happened?

trying to install ocp on a 4k scsi lun the installer detects the sector size correctly and writes the correct image to the disk. after that it will try to restart the node but fails. The s390x scsi bootloader fails to find the boot record on that disc.

the first guess is that zipl fails to write it, although there are no obvious errors.

---- after image write to disk ----

Ý   91.859350¨ GPT:Primary header thinks Alt. header is not at the end of the d
sk. 
Ý   91.859358¨ GPT:853247 != 31457279 
Ý   91.859359¨ GPT:Alternate GPT header not at the end of the disk. 
Ý   91.859360¨ GPT:853247 != 31457279 
Ý   91.859362¨ GPT: Use GNU Parted to correct GPT errors. 
Ý   91.859370¨  sda: sda3 sda4 
Ý   92.128978¨ EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts:
(null) 
Ý   92.131577¨ coreos-installer-serviceÝ1157¨: Writing Ignition config 
Ý   92.131715¨ coreos-installer-serviceÝ1157¨: Writing first-boot kernel argume
ts 
Ý   92.131782¨ coreos-installer-serviceÝ1157¨: Modifying kernel arguments 
Ý   92.133024¨ coreos-installer-serviceÝ1157¨: Installing bootloader 
Ý   92.247270¨ coreos-installer-serviceÝ1157¨: Updating re-IPL device 
Ý   92.301487¨ coreos-installer-serviceÝ1157¨: Install complete. 
Ý Ý0;32m  OK   Ý0m¨ Started CoreOS Installer. 
Ý Ý0;32m  OK   Ý0m¨ Reached target CoreOS Installer Target. 
Ý Ý0;32m  OK   Ý0m¨ Started Reboot after CoreOS Installer. 
Ý Ý0;32m  OK   Ý0m¨ Reached target Finalize CoreOS Installer Target. 
Ý   92.306811¨ GPT:Primary header thinks Alt. header is not at the end of the d
sk. 
Ý   92.306816¨ GPT:853247 != 31457279 
Ý   92.306818¨ GPT:Alternate GPT header not at the end of the disk. 
Ý   92.306820¨ GPT:853247 != 31457279 
Ý   92.306823¨ GPT: Use GNU Parted to correct GPT errors. 
Ý   92.306831¨  sda: sda3 sda4 
         Stopping Login Service... 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Network is Online. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Timers. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Daily Cleanup of Temporary Directories. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Finalize CoreOS Installer Target. 
         Unmounting /sysroot... 
         Stopping Reboot after CoreOS Installer... 
Ý Ý0;32m  OK   Ý0m¨ Stopped Network Manager Wait Online. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Network. 
         Stopping Restore /run/initramfs on shutdown... 
         Stopping Network Manager... 
Ý Ý0;32m  OK   Ý0m¨ Closed LVM2 poll daemon socket. 
Ý Ý0;32m  OK   Ý0m¨ Stopped daily update of the root trust anchor for DNSSEC. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Run update-ca-trust. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Reboot after CoreOS Installer. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Login Service. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Restore /run/initramfs on shutdown. 
         Unmounting /boot... 
Ý Ý0;32m  OK   Ý0m¨ Stopped target CoreOS Installer Target. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Prepare for CoreOS Installer Target. 
Ý Ý0;32m  OK   Ý0m¨ Unmounted /sysroot. 
Ý Ý0;32m  OK   Ý0m¨ Unmounted /boot. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Network Manager. 
         Stopping D-Bus System Message Bus... 
Ý Ý0;32m  OK   Ý0m¨ Stopped D-Bus System Message Bus. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Basic System. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Slices. 
Ý Ý0;32m  OK   Ý0m¨ Removed slice User and Session Slice. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Paths. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Sockets. 
Ý Ý0;32m  OK   Ý0m¨ Closed D-Bus System Message Bus Socket. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target System Initialization. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Local Encrypted Volumes. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Dispatch Password Requests to Console Directory Watc
h. 
         Stopping Load/Save Random Seed... 
         Stopping Update UTMP about System Boot/Shutdown... 
Ý Ý0;32m  OK   Ý0m¨ Stopped Apply Kernel Variables. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Load Kernel Modules. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Update is Completed. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Rebuild Hardware Database. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Rebuild Dynamic Linker Cache. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Load/Save Random Seed. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Update UTMP about System Boot/Shutdown. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Create Volatile Files and Directories. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Local File Systems. 
         Unmounting /run/ephemeral... 
         Unmounting Temporary Directory (/tmp)... 
         Unmounting /etc... 
         Unmounting /var... 
Ý Ý0;1;31mFAILED Ý0m¨ Failed unmounting /etc. 
Ý Ý0;1;31mFAILED Ý0m¨ Failed unmounting /var. 
Ý Ý0;32m  OK   Ý0m¨ Unmounted /run/ephemeral. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Local File Systems (Pre). 
Ý Ý0;32m  OK   Ý0m¨ Stopped Create Static Device Nodes in /dev. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Create System Users. 
         Stopping Monitoring of LVM2 mirrors   ng dmeventd or progress polling..
. 
Ý Ý0;32m  OK   Ý0m¨ Unmounted Temporary Directory (/tmp). 
Ý Ý0;32m  OK   Ý0m¨ Reached target Unmount All Filesystems. 
Ý Ý0;32m  OK   Ý0m¨ Stopped target Swap. 
Ý Ý0;32m  OK   Ý0m¨ Stopped Monitoring of LVM2 mirrors,   sing dmeventd or progr
ess polling. 
Ý Ý0;32m  OK   Ý0m¨ Reached target Shutdown. 
Ý Ý0;32m  OK   Ý0m¨ Reached target Final Step. 
         Starting Reboot... 
Ý   93.131336¨ printk: systemd-shutdow: 1 output lines suppressed due to ratelim
iting 
Ý   93.137693¨ systemd-shutdownÝ1¨: Syncing filesystems and block devices. 
Ý   93.140114¨ systemd-shutdownÝ1¨: Sending SIGTERM to remaining processes... 
Ý   93.142954¨ systemd-journaldÝ1020¨: Received SIGTERM from PID 1 (systemd-shut
dow). 
Ý   93.209056¨ systemd-shutdownÝ1¨: Sending SIGKILL to remaining processes... 
Ý   93.211032¨ systemd-shutdownÝ1¨: Unmounting file systems. 
Ý   93.211723¨ Ý1307¨: Remounting '/var' read-only in with options 'seclabel,att
r2,discard,inode64,logbufs=8,logbsize=32k,noquota'. 
Ý   93.219215¨ Ý1308¨: Unmounting '/var'. 
Ý   93.299096¨ Ý1309¨: Remounting '/etc' read-only in with options 'seclabel,att
r2,discard,inode64,logbufs=8,logbsize=32k,noquota'. 
Ý   93.299626¨ Ý1310¨: Unmounting '/etc'. 
Ý   93.349533¨ XFS (loop0): Unmounting Filesystem 
Ý   93.401728¨ systemd-shutdownÝ1¨: All filesystems unmounted. 
Ý   93.401732¨ systemd-shutdownÝ1¨: Deactivating swaps. 
Ý   93.401747¨ systemd-shutdownÝ1¨: All swaps deactivated. 
Ý   93.401749¨ systemd-shutdownÝ1¨: Detaching loop devices. 
Ý   93.402192¨ systemd-shutdownÝ1¨: Not all loop devices detached, 1 left. 
Ý   93.402194¨ systemd-shutdownÝ1¨: Detaching DM devices. 
01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
02: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
03: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
00: Storage cleared - system reset.
00: HCPLDI2816I Acquiring the machine loader from the processor controller.
00: HCPLDI2817I Load completed from the processor controller. 
00: HCPLDI2817I Now starting the machine loader.
01: HCPGSP2630I The virtual machine is placed in CP mode due to a SIGP stop and 
store status from CPU 00.
02: HCPGSP2630I The virtual machine is placed in CP mode due to a SIGP stop and 
store status from CPU 00.
03: HCPGSP2630I The virtual machine is placed in CP mode due to a SIGP stop and 
store status from CPU 00.

---- reboot ----

00: MLOEVL012I: Machine loader up and running (version v2.4.7).
00: MLOLOA013E: Invalid Program Table, wrong or missing magic number.
00: MLOLOA041E: Unable to find the component table pointer.
00: MLOEVL010E: IPL failed.
00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 0002887A

--- Additional comment from Scott Dodson on 2021-01-21 13:29:25 UTC ---

This is coreos installer not openshift installer that's in question here.

--- Additional comment from Micah Abbott on 2021-01-21 15:56:13 UTC ---

A number of changes around multiarch support landed in recent devel builds.

If possible, please retry with the latest builds from https://releases-rhcos-art.cloud.privileged.psi.redhat.com/?stream=releases/rhcos-4.7-s390x

--- Additional comment from Alexander Klein on 2021-01-22 08:41:54 UTC ---

restested with
install_version: "4.7.0-0.nightly-s390x-2021-01-22-032543"
rhcos_version: "4.7.0-fc.3-s390x"

exactly same behaviour.

--- Additional comment from Prashanth Sundararaman on 2021-01-22 14:34:04 UTC ---

@Alexander Klein - one thing you can try is:

Boot without any coreos.inst.* arguments, and with "ignition.firstboot ignition.platform.id=metal ignition.config.url=http://..."

login as core and then you can run the coreos-installer to write the image to disk. that way you will know what the installer is doing:

sudo coreos-installer install <disk-device> --ignition-url <url> --insecure-ignition

--- Additional comment from Nikita Dubrovskii (IBM) on 2021-01-25 14:02:02 UTC ---

Reproduced this:

[   57.327334] coreos-installer-service[1127]: Installing bootloader 
[   57.433335] coreos-installer-service[1127]: Target device information 
[   57.433418] coreos-installer-service[1127]:   Device..........................: 08:00 
[   57.433451] coreos-installer-service[1127]:   Partition.......................: 08:03 
[   57.433482] coreos-installer-service[1127]:   Device name.....................: sda 
[   57.433515] coreos-installer-service[1127]:   Device driver name..............: sd 
[   57.433547] coreos-installer-service[1127]:   Type............................: disk partition 
[   57.433580] coreos-installer-service[1127]:   Disk layout.....................: SCSI disk layout 
[   57.433613] coreos-installer-service[1127]:   Geometry - start................: 2048 
[   57.433644] coreos-installer-service[1127]:   File system block size..........: 4096 
[   57.433676] coreos-installer-service[1127]:   Physical block size.............: 4096 
[   57.433711] coreos-installer-service[1127]:   Device size in physical blocks..: 98304 
[   57.433745] coreos-installer-service[1127]: Building bootmap in '/tmp/coreos-installer-xFJooy' 
[   57.433779] coreos-installer-service[1127]: Adding IPL section 
[   57.433813] coreos-installer-service[1127]:   initial ramdisk...: /tmp/coreos-installer-xFJooy/ostree/rhcos-475697892c8c54e5ee323
999298fea919e71e0c2e8824af7c6fab9e16b6bae14/initramfs-4.18.0-240.10.1.el8_3.s390x.img 
[   57.433843] coreos-installer-service[1127]:   kernel image......: /tmp/coreos-installer-xFJooy/ostree/rhcos-475697892c8c54e5ee323
999298fea919e71e0c2e8824af7c6fab9e16b6bae14/vmlinuz-4.18.0-240.10.1.el8_3.s390x 
[   57.433875] coreos-installer-service[1127]:   kernel parmline...: 'random.trust_cpu=on ignition.platform.id=metal ostree=/ostree/
boot.1/rhcos/475697892c8c54e5ee323999298fea919e71e0c2e8824af7c6fab9e16b6bae14/0 rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,po
tno=0 zfcp.allow_lun_scan=0 cio_ignore=all,!condev rd.zfcp=0.0.1985,0x500507605e819cc2,0x0001000000000000 ignition.firstboot rd.need
net=1 ip=172.18.142.4::172.18.0.1:255.254.0.0:coreos:encbdf0:off nameserver=172.18.0.1 ' 
[   57.433912] coreos-installer-service[1127]:   component address: 
[   57.433944] coreos-installer-service[1127]:     heap area.......: 0x00002000-0x00005fff 
[   57.433975] coreos-installer-service[1127]:     stack area......: 0x0000f000-0x0000ffff 
[   57.434012] coreos-installer-service[1127]:     internal loader.: 0x0000a000-0x0000dfff 
[   57.434043] coreos-installer-service[1127]:     parameters......: 0x00009000-0x00009fff 
[   57.434073] coreos-installer-service[1127]:     kernel image....: 0x00010000-0x00687fff 
[   57.434104] coreos-installer-service[1127]:     parmline........: 0x00689000-0x00689fff 
[   57.434136] coreos-installer-service[1127]:     initial ramdisk.: 0x00690000-0x04549fff 
[   57.434166] coreos-installer-service[1127]: Preparing boot device: sda. 
[   57.434196] coreos-installer-service[1127]: Detected SCSI PCBIOS disk layout. 
[   57.434229] coreos-installer-service[1127]: Writing SCSI master boot record. 
[   57.434261] coreos-installer-service[1127]: Syncing disks... 
[   57.434294] coreos-installer-service[1127]: Done. 
[   57.434398] coreos-installer-service[1127]: Updating re-IPL device 
[   57.435428] coreos-installer-service[1127]: Re-IPL type: fcp 
[   57.435460] coreos-installer-service[1127]: WWPN:        0x500507605e819cc2 
[   57.435494] coreos-installer-service[1127]: LUN:         0x0001000000000000 
[   57.435525] coreos-installer-service[1127]: Device:      0.0.1985 
[   57.435558] coreos-installer-service[1127]: bootprog:    0 
[   57.435589] coreos-installer-service[1127]: br_lba:      0 
[   57.435620] coreos-installer-service[1127]: Loadparm:    "" 
[   57.435652] coreos-installer-service[1127]: Bootparms:   "" 
[   57.480009] coreos-installer-service[1127]: Install complete. 
00: Storage cleared - system reset.
00: HCPLDI2816I Acquiring the machine loader from the processor controller.
00: HCPLDI2817I Load completed from the processor controller. 
00: HCPLDI2817I Now starting the machine loader.
01: HCPGSP2630I The virtual machine is placed in CP mode due to a SIGP stop and store status from CPU 00.
00: MLOEVL012I: Machine loader up and running (version v2.4.7).
00: MLOLOA013E: Invalid Program Table, wrong or missing magic number.
00: MLOLOA041E: Unable to find the component table pointer.
00: MLOEVL010E: IPL failed.
00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 0002887A


Same image works on DASD (4k) , continue debugging

--- Additional comment from Nikita Dubrovskii (IBM) on 2021-01-26 15:33:16 UTC ---

Short update:
Wrote a small (~200 lines of C code) patcher tool and was able to boot the system. Now looking for the root-cause of data corruption

--- Additional comment from Nikita Dubrovskii (IBM) on 2021-01-27 15:46:48 UTC ---

Fix is ready - https://github.com/ibm-s390-tools/s390-tools/pull/107

--- Additional comment from Holger Wolf on 2021-01-27 18:37:20 UTC ---

@hannsj_uhl.com please mirror this to IBM - this is a s390 patch needed to be integrated into RHEL 8

--- Additional comment from Micah Abbott on 2021-01-29 14:25:23 UTC ---

Discussed this with the multi-arch team and decided that this was not a blocker and would be targeted for 4.8

The summary is this problem has been present for some time now and even RHEL would be affected by this issue.  Customers won't be able to do similar operations in RHEL or RHCOS until this fix is packaged and shipped as part RHEL.  The fix is targeting 8.4, with a possibility to backport it to 8.3

--- Additional comment from Micah Abbott on 2021-02-07 20:31:56 UTC ---

Waiting on upstream patch to be merged and new RPM to be released

--- Additional comment from IBM Bug Proxy on 2021-02-15 08:00:40 UTC ---

------- Comment From stefan.haberland.com 2021-02-12 04:16 EDT-------
Fix is reviewed and tested and will be pulled.

--- Additional comment from Hendrik Brueckner on 2021-02-18 15:32:44 UTC ---

@Dan: This is about the zipl issues that needs to go into RHEL as well (https://github.com/ibm-s390-linux/s390-tools/pull/107). I am not sure whether you will like to track this for RHEL separately. However, I think that we also need a z-stream for the s390-tools to rhel-8.4.z to be picked up by RHCOS. Let me know how to best approach this from your side. Thanks.

--- Additional comment from Dan Horák on 2021-02-18 15:43:07 UTC ---

I believe we will need a RHEL clone for this item. There is still a chance to squeeze it to 8.4 GA as an exception. If not approved, then 8.5 + 8.4.0.z.

--- Additional comment from IBM Bug Proxy on 2021-02-19 13:00:28 UTC ---

------- Comment From Jan.Hoeppner.com 2021-02-19 07:54 EDT-------
The fix now available upstream:
https://github.com/ibm-s390-linux/s390-tools/commit/4a3957fab5696cc410c5b495956859a424e3552a ("zipl: fix reading 4k disk's geometry")

Comment 1 Thomas Staudt 2021-02-26 08:24:41 UTC
Please also include this fro RHEL 8.4 - see 
https://bugzilla.redhat.com/show_bug.cgi?id=1918723#c13
Thanks.

Comment 4 IBM Bug Proxy 2021-03-04 08:50:31 UTC
------- Comment From tstaudt.com 2021-03-04 03:38 EDT-------
(In reply to comment #7)
> scratch build:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35237094
> yum repo:
> http://brew-task-repos.usersys.redhat.com/repos/scratch/dhorak/s390utils/2.
> 15.1/4.el8.rhbz1933235/

If feedback from IBM is required, please provide an external link.

Comment 5 Dan Horák 2021-03-04 12:17:03 UTC
test rpms http://people.redhat.com/~dhorak/s390/.rhbz1933235/ - they contain just this single fix on top of the existing 8.4 package

Comment 6 Alexander Klein 2021-03-08 08:39:58 UTC
tested it with those rpms and it's working.

Comment 7 Vilém Maršík 2021-03-09 11:23:16 UTC
Michael, can you take this copy as well?

Comment 15 IBM Bug Proxy 2021-03-24 08:51:41 UTC
------- Comment From hannsj_uhl.com 2021-03-24 04:43 EDT-------
fyi ... this bugzilla is verified with the RHEL8.4 Nightly build from 03/22/2021 ... thanks ...

Comment 16 Micah Abbott 2021-03-24 14:32:41 UTC
Moving to VERIFIED based on testing by IBM in comment #15

Comment 18 errata-xmlrpc 2021-05-18 14:55:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (s390utils bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1617


Note You need to log in before you can comment on or make changes to this bug.