Bug 1977236
Summary: | XFS with bigtime=1 inobtcount=1 enabled prevents booting of some ppc64le p9 with older firmware | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Petr Janda <pjanda> | ||||||||
Component: | xfsprogs | Assignee: | Eric Sandeen <esandeen> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Zorro Lang <zlang> | ||||||||
Severity: | unspecified | Docs Contact: | Michal Stubna <mstubna> | ||||||||
Priority: | unspecified | ||||||||||
Version: | 9.0 | CC: | abitaraf, bgoncalv, borgan, bugproxy, dhorak, dwysocha, efuller, esandeen, fweimer, gduarte, igkioka, jaredz, jbastian, jomiller, jpazdziora, jwboyer, mhofmann, mstubna, msuchane, pzatko, sbarcomb, swhiteho, vslavik, xzhou, zlang | ||||||||
Target Milestone: | beta | ||||||||||
Target Release: | 9.0 Beta | ||||||||||
Hardware: | ppc64le | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2021-11-03 15:20:50 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1898842, 1971841 | ||||||||||
Attachments: |
|
Description
Petr Janda
2021-06-29 09:32:01 UTC
copying my reply from a related email thread: Petitboot is just an application running in a Linux based mini-distro, for new kernel you need the whole new firmware build. The kernel in the mini-distro (skiroot) must support the filesystem where the /boot content is stored, at least read-only. Which might be doable for some OpenPOWER system (if the firmware has upstream on github), but difficult for others. My first workaround would be to limit the RTT testing to virtualized systems (KVM or LPAR/PowerVM) as they boot via grub2. A second workaround is to let anaconda format /boot as eg. ext4 (or an older xfs or ...) that would be understood by skiroot's kernel. The proper solution is to update the firmware ... So, this isn't really an xfsprogs bug per se, it's just that Petitboot doesn't understand the latest xfs features. Kernel v5.10 understands these new features, and it was released in So, this isn't really an xfsprogs bug per se, it's just that Petitboot doesn't understand the latest xfs features. Kernel v5.10 understands these new features, and it was released in Dec 202. Two paths forward: upgrade the kernel in Petitboot to 5.10 or newer, or teach Anaconda to make the /boot filesystem on this architecture with these features disabled, for now. Unfortunately I don't think there is any way for mkfs.xfs to detect that it's formatting boot for these systems to auto-tune itself... -Eric *** Bug 1977193 has been marked as a duplicate of this bug. *** ------- Comment From chavez.com 2021-06-29 10:56 EDT------- Hi Steve, Can you clarify what you mean by Petiboot hangs? Are you saying it neither attempts to boot the newly installed RHEL 9 OS or any of the options from the menu are selectable or something else? RHBZ bug 1977236 mentions the issue with that hang is that petiboot can't mount a /boot filesystem with this XFS feature (XFS_SB_FEAT_INCOMPAT_BIGTIME) enabled. It gives a couple of choices. One is to use something like ext4 for /boot or update the OPAL firmware. I'll add someone from the OPAL team to comment on whether newer versions of petiboot have an xfsprogs able to mount an XFS filesystem with the newer features. Lastly, you may want to follow the instructions in 1977236 and see if attempting to manually mount the XFS filesystem does fail for you the same way with: ------- Comment From chavez.com 2021-06-29 11:00 EDT------- BTW, can you provide the level of firmware that is installed on that P9 box? Thanks. ------- Comment From cdeadmin.com 2021-06-29 11:03 EDT------- cde00 (cdeadmin.com) added native attachment /tmp/AIXOS13394811/console on 2021-06-29 10:02:57 (In reply to IBM Bug Proxy from comment #7) > ------- Comment From chavez.com 2021-06-29 10:56 EDT------- > Hi Steve, > > Can you clarify what you mean by Petiboot hangs? Are you saying it neither > attempts to boot the newly installed RHEL 9 OS or any of the options from > the menu are selectable or something else? > > RHBZ bug 1977236 mentions the issue with that hang is that petiboot can't > mount a /boot filesystem with this XFS feature > (XFS_SB_FEAT_INCOMPAT_BIGTIME) enabled. It gives a couple of choices. One is > to use something like ext4 for /boot or update the OPAL firmware. Please don't switch to ext4 for this, if we have to special-case the /boot mkfs anyway, it would be preferable to just disable these features in XFS at mkfs time. > I'll add someone from the OPAL team to comment on whether newer versions of > petiboot have an xfsprogs able to mount an XFS filesystem with the newer > features. To be clear - The issue is not xfsprogs, which is not involved in mounting the filesystem. A kernel version 5.10 or newer is needed to mount an XFS filesystem with these features. Thanks, -Eric If I see right, then op-build is "stuck" at kernel 5.4.x :-( op-build is the buildsystem for the OpenPOWER firmware = https://github.com/open-power/op-build (In reply to Eric Sandeen from comment #5) > So, this isn't really an xfsprogs bug per se, it's just that Petitboot > doesn't understand the latest xfs features. Kernel v5.10 understands these > new features, and it was released in Dec 202. Two paths forward: upgrade > the kernel in Petitboot to 5.10 or newer, or teach Anaconda to make the > /boot filesystem on this architecture with these features disabled, for now. > > Unfortunately I don't think there is any way for mkfs.xfs to detect that > it's formatting boot for these systems to auto-tune itself... > > -Eric IBM, can you share your plans for Petitboot in this area? I assume that Petitboot has plans to get updated to support xfs for this? Thanks, -Steve Created attachment 1800084 [details]
console log
Steve, is there any way we can tie minimum petitboot/firmware requirements to a RHEL9 install? That's really the only sane path forward, IMHO. (In reply to Eric Sandeen from comment #13) > Steve, is there any way we can tie minimum petitboot/firmware requirements > to a RHEL9 install? That's really the only sane path forward, IMHO. Eric, I'm still waiting on IBM answering my question when and if Petitboot we be updated for this, without an answer I'm not sure we know what our path forward could be. -Steve I just wondered if there is any way for the OS (i.e. Anaconda) to detect the petitboot version prior to install... there is "lsmcode", which reports the firmware details, bellow is an output from my Talos [dan@talos libica]$ sudo lsmcode Version of System Firmware : Product Name : OpenPOWER Firmware Product Version : talos-v1.20-161-g76f78f4 Product Extra : skiboot-bc106a0 Product Extra : bmc-firmware-version-2.00 Product Extra : occ-a8d0767 Product Extra : hostboot-884b60b Product Extra : buildroot-2017.11.2-8-g4b6188e0f2 Product Extra : machine-xml-221192a Product Extra : sbe-a389a5d Product Extra : petitboot-v1.7.1-p836d356 Product Extra : linux-v4.15.9-openpower1-p9e03417 and from our team's Boston [root@ibm-p9b-generic-01 ~]# lsmcode Version of System Firmware : Product Name : OpenPOWER Firmware Product Version : SUPERMICRO-P9DSU-V2.14-20190807-prod Product Extra : skiboot-v6.0.20 Product Extra : bmc-firmware-version-2.13 Product Extra : occ-8fa3854 Product Extra : hostboot-8591ded-p4f715ce Product Extra : buildroot-2018.11.3-12-g222837a Product Extra : capp-ucode-p9-dd2-v4 Product Extra : machine-xml-734a35e Product Extra : hostboot-binaries-hw072719a.op920 Product Extra : sbe-b6ee17b Product Extra : hcode-hw072719a.op920 Product Extra : petitboot-v1.7.5-p11ed908 Product Extra : linux-4.19.57-openpower1-p48ee860 ------- Comment From chavez.com 2021-07-19 14:54 EDT------- *** Bug 193695 has been marked as a duplicate of this bug. *** ------- Comment From chavez.com 2021-07-19 15:00 EDT------- Hi Ryan, We are seeing additional reports of this problem now. Are there any plans to update OPAL's kernel version this year? If not, Red Hat may have to include special checks in RHEL 9 to determine OPAL's kernel version to avoid passing mkfs.xfs the bigtime=1 option that prevents OPAL from mounting the xfs filesystem if less than version 5.10. Created attachment 1803503 [details]
console logs with call traces
------- Comment on attachment From preeti.thakur.com 2021-07-20 01:50 EDT-------
Hi,
here is an update from my side
I could able to boot system when installed with ext4 but while boot call traces are seen.
attaching console logs
------- Comment From cdeadmin.com 2021-07-20 01:52 EDT------- cde00 (cdeadmin.com) added native attachment /tmp/AIXOS13394811/wcwsp3.txt on 2021-07-20 00:52:45 ------- Comment From hegdevasant.com 2021-07-21 08:41 EDT------- (In reply to comment #12) > If I see right, then op-build is "stuck" at kernel 5.4.x :-( We are rebase `op-build` kernel via https://github.com/open-power/op-build/pull/4214. IT will move to 5.10 soon. Regarding official firmware, I need to check with our release management team. -Vasant (In reply to IBM Bug Proxy from comment #21) > ------- Comment From hegdevasant.com 2021-07-21 08:41 EDT------- > (In reply to comment #12) > > If I see right, then op-build is "stuck" at kernel 5.4.x :-( > > We are rebase `op-build` kernel via > https://github.com/open-power/op-build/pull/4214. IT will move to 5.10 soon. > > Regarding official firmware, I need to check with our release management > team. > > -Vasant when will the firmware be released? without a fix .. customers won't be able to install RHEL 9.0. we need a plan from IBM to fix this issue. -Steve ------- Comment From chavez.com 2021-07-28 14:14 EDT------- (In reply to comment #25) > when will the firmware be released? without a fix .. customers won't be able > to install RHEL 9.0. we need a plan from IBM to fix this issue. > -Steve Hi Steve, Vasant has been actively discussing this issue almost daily with architects and hardware product owners. Hopefully, he'll have an update for y'all soon. *** Bug 1985565 has been marked as a duplicate of this bug. *** ------- Comment From preeti.thakur.com 2021-08-11 07:03 EDT------- updated with new fw [root@ltc-wcwsp3 ~]# lsmcode Product Name : OpenPOWER Firmware Product Version : witherspoon-OP9-v2.6-9.93 Product Extra : skiboot-v6.8-45-g8246de863 Product Extra : bmc-firmware-version-0.00 Product Extra : occ-16131c3 Product Extra : hostboot-9e73780 Product Extra : buildroot-2021.02.3-2-g2c7a998 Product Extra : capp-ucode-p9-dd2-v4 Product Extra : machine-xml-0f9b366 Product Extra : hostboot-binaries-hw080421a.opmst10 Product Extra : sbe-8b47418 Product Extra : hcode-hw080421a.opmst Product Extra : petitboot-v1.12 Product Extra : linux-5.10.50-openpower1-p59fd803 and we are able to detect /boot with xfs file system .. though the call traces are still seen. above has been verified with 0626 build as could not able to install the system with 0725 or beta build for which will be raising a new defect. Created attachment 1813129 [details]
call traces
------- Comment From cdeadmin.com 2021-08-11 09:23 EDT------- cde00 (cdeadmin.com) added native attachment /tmp/AIXOS13394811/wcwsp3_Calltraces.txt on 2021-08-11 08:22:57 ------- Comment From preeti.thakur.com 2021-08-11 10:08 EDT------- rasied a defect for issue as mentioned in comment 30 https://bugzilla.linux.ibm.com/show_bug.cgi?id=193955 Your call traces are likely the same problem as reported in: https://bugzilla.kernel.org/show_bug.cgi?id=210749 As for the original XFS issue, we can close this now, yes? The original issue was not a RHEL bug per se, but a firmware compatibility question... this issue should, however, be documented (or even programatically tested at install time) I think. ------- Comment From kalshett.com 2021-08-12 09:46 EDT------- (In reply to comment #28) > Hi Preeti, Kalpana, > Can you please flash below FW on witherspoon and then try to install RHEL 9 > and see if it works fine or not? > https://github.com/open-power/op-build/releases/download/v2.7/witherspoon. > pnor @Preeti: The above version shown v2.7 pnor but from your testing I see the the version shown as v2.6 witherspoon-OP9-v2.6-9.93 Can you please confirm did you tested pnor posted by Vasanth ? https://github.com/open-power/op-build/releases/download/v2.7/witherspoon.pnor Also, IMO, it is always better to first recreate the original issue that is reported on this defect and apply the pnor suggested by Vasanth and see original reported (this defect) is not seen. ------- Comment From kalshett.com 2021-08-12 10:25 EDT------- (In reply to comment #28) > Hi Preeti, Kalpana, > Can you please flash below FW on witherspoon and then try to install RHEL 9 > and see if it works fine or not? > https://github.com/open-power/op-build/releases/download/v2.7/witherspoon. > pnor > -Vasant Vasanth: From IBM internal builds we can get v2.6 from below link: https://rchweb.rchland.ibm.com/afs/rchland.ibm.com/projects/esw/op999/Builds/999.2132.20210810n/images/lab/witherspoon/ So how do we apply the witherspoon.pnor alone from the git link that you have posted? I.e, https://github.com/open-power/op-build/releases/tag/v2.7 ------- Comment From preeti.thakur.com 2021-08-12 10:36 EDT------- in reply to comment 35 was in sync with Vasant and he confirmed for go ahead. will restest incase of any discrepancy ------- Comment From preeti.thakur.com 2021-08-12 10:43 EDT------- below tar was provided for update which was used https://rchweb.rchland.ibm.com/afs/rchland.ibm.com/projects/esw/op999/Builds/999.2132.20210810n/images/lab/witherspoon/witherspoon.pnor.squashfs.tar ------- Comment From hegdevasant.com 2021-08-16 01:41 EDT------- Ok.. test was done using upstream op-build v2.7 ... where we have rebased petitboot kernel to 5.10.x .. This was to double check whether 5.10 kernel works fine or not. This is *not* official released firmware. We are working with IBM program management to fix official firmware .. which will take some time. In the meantime if you want to test you above mentioned upstream firmware (or just rebase petitboot kernel to 5.10.x). -Vasant ------- Comment From preeti.thakur.com 2021-08-16 02:08 EDT------- thanks alot Vasant for update. Since fix is already applied in said machine ie wcwsp3 and its working, we can continue to test here. for other systems we can apply the fix provided and continue with our testing. we can keep this defect open till the time firmware is official released. Thanks ------- Comment From hegdevasant.com 2021-08-17 04:04 EDT------- (In reply to comment #34) > Your call traces are likely the same problem as reported in: > > https://bugzilla.kernel.org/show_bug.cgi?id=210749 I guess this is different from call traces/kernel PANIC hit during installation. In above defect we hit call traces as we hit duplicate sysfs node name, but system continues to boot. But in case of LTC bz 193955 (which is not-yet-mirrored to RedHat bugzilla) we hit PANIC during installation. -Vasant ------- Comment From gusld.com 2021-08-17 14:40 EDT------- (In reply to comment #39) > Ok.. test was done using upstream op-build v2.7 ... where we have rebased > petitboot kernel to 5.10.x .. This was to double check whether 5.10 kernel > works fine or not. > > This is *not* official released firmware. We are working with IBM program > management to fix official firmware .. which will take some time. Thanks Vasant! Red Hat, Even though we are working to get a firmware fix released, we should consider that customers may have P9 systems with old firmware... maybe we should still have a check done during the RHEL9.0 installation to detect old firmware and, in this case, either enable the workaround or abort the installation with an instructive message so that customers know what to do. (In reply to IBM Bug Proxy from comment #39) > Red Hat, > > Even though we are working to get a firmware fix released, we should > consider that customers may have P9 systems with old firmware... maybe we > should still have a check done during the RHEL9.0 installation to detect old > firmware and, in this case, either enable the workaround or abort the > installation with an instructive message so that customers know what to do. I agree. Can you file a bug against Anaconda for this issue, and include detailed steps for how the installer can detect teh installed firmware version, and which version it should look for? Thanks, -Eric ------- Comment From hegdevasant.com 2021-08-19 05:59 EDT------- Hello RedHat, By any chance do you know the xfs patches that caused this regression? Any idea how easy/difficult to backport them to 5.3/5.4 kernel? If its something fairly easy then may be its worth to backport those patches instead of rebeasng. That's the other option I'm thinking of. -Vasant (In reply to IBM Bug Proxy from comment #41) > ------- Comment From hegdevasant.com 2021-08-19 05:59 EDT------- > Hello RedHat, > > By any chance do you know the xfs patches that caused this regression? Any > idea how easy/difficult to backport them to 5.3/5.4 kernel? > > If its something fairly easy then may be its worth to backport those patches > instead of rebeasng. That's the other option I'm thinking of. > > -Vasant The timestamp series was 25-30 patches, and the inode btree counter was about 10 patches. There may also be some dependencies; I'd have to look more closely. It's not something we would probably consider backporting out of sequence ourselves, just for what it's worth. Does the firmware need to mount the host filesystem in read-write, or in readonly mode? The inode btree counters are at least RO compatible. Does the firmware care about timestamps on this filesystem at all? If all it needs to do is find boot image blocks, it may not care, and we could do something more targeted to just ignore the feature if timestamps don't matter. If you need to write to it, or check timestamps, that won't be an option. We still need to know how to check for firmware version, so that the installer can at least warn the user if their current firmware is known to be incompatible, can you please provide that info as well? Thanks, -Eric The firmware is a Linux (mini-)distro based on the buildroot project giving you access to petitboot (kexec bootloader as a userspace app), a shell and other tools. I'm not sure about the need of RO or RW access. The kernel version is stored in the device-tree in /proc/device-tree/ibm,firmware-versions/linux ------- Comment From hegdevasant.com 2021-08-20 03:30 EDT------- (In reply to comment #46) > The firmware is a Linux (mini-)distro based on the buildroot project giving > you access to petitboot (kexec bootloader as a userspace app), a shell and > other tools. I'm not sure about the need of RO or RW access. > The kernel version is stored in the device-tree in > /proc/device-tree/ibm,firmware-versions/linux AFAIK we mount boot partition RO only and parse grub.cfg to detected all installed kernels. -Vasant ------- Comment From gusld.com 2021-08-24 10:56 EDT------- (In reply to comment #46) > The firmware is a Linux (mini-)distro based on the buildroot project giving > you access to petitboot (kexec bootloader as a userspace app), a shell and > other tools. I'm not sure about the need of RO or RW access. > The kernel version is stored in the device-tree in > /proc/device-tree/ibm,firmware-versions/linux Vasant, Is this the standard/recommended way to query the firmware version? This doesn't seem to be present at least on my ZZ bare metal. Another way could be running "lsmcode", but it's reading the device tree if I see right ... IIRC the "ibm,firmware-versions" entry was only added to skiboot at some point, it wasn't always there. Looking at git history it came with skiboot 5.9 (https://github.com/open-power/skiboot/blob/master/doc/release-notes/skiboot-5.9.rst) It's documented in https://github.com/open-power/skiboot/blob/master/doc/device-tree/ibm%2Cfirmware-versions.rst I've filed bug #1997832 against Anaconda - we will need to detect the petitboot version in any case. Whether we recommend an upgrade or implement some sort of workaround is yet to be determined. As this is not really an XFS bug per se, we may eventually close this bug in favor of the Anaconda bug, which is the only place a remedy can really occur. Thanks, -Eric We now have a firmware build for P9 Witherspoon systems with petitboot kernel 5.10... it is available for Red Hat at: http://rhel8gduarteibm.usersys.redhat.com/fw/OP940_2138A-prod/ We should update a Witherspoon system and verify that this firmware update fixes the issue. (In reply to Gustavo Luiz Duarte (IBM) from comment #54) > We now have a firmware build for P9 Witherspoon systems with petitboot > kernel 5.10... it is available for Red Hat at: > > http://rhel8gduarteibm.usersys.redhat.com/fw/OP940_2138A-prod/ > > We should update a Witherspoon system and verify that this firmware update > fixes the issue. I should have mentioned that this is a production signed firmware image and should be installed on a system which currently has a production signed firmware... there are instructions on the above URL on how to check the currently installed firmware image using the CheckSecurityLevel.sh script (available at https://docs.engineering.redhat.com/display/KE/PowerPC#PowerPC-Signed/Unsignedfirmware) and how to proceed with the firmware upgrade. If a development signed (aka. unsigned, aka. imprint) image is required, please let me know. ------- Comment From chavez.com 2021-09-29 15:23 EDT------- *** Bug 193955 has been marked as a duplicate of this bug. *** Thank you, Gustavo. I am updating ibm-p9wr-03.ibm2 with this new firmware and will report back shortly I can confirm that with the firmware Gustavo provided we are able to install RHEL9 on Witherspoon systems that have production firmware. I'm going to close this as NOTABUG, for xfs/xfsprogs in any case. The issue is on the petitboot firmware side. The Anaconda bug should have implemented changes to handle this gracefully and notify the user: https://bugzilla.redhat.com/show_bug.cgi?id=1997832 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |