Bug 2161152 - systems does not boot after kernelupgrade to 4.18.0-425.10.1.el8_7.x86_64
Summary: systems does not boot after kernelupgrade to 4.18.0-425.10.1.el8_7.x86_64
Keywords:
Status: CLOSED DUPLICATE of bug 2160842
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: 8.7
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: core-kernel-bot
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-16 06:48 UTC by Need Real Name
Modified: 2023-08-08 03:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-19 22:23:39 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-145134 0 None None None 2023-01-16 06:48:49 UTC

Description Need Real Name 2023-01-16 06:48:13 UTC
Description of problem:
systems does not boot after kernelupgrade from 4.18.0-425.3.1.el8.x86_64 to 4.18.0-425.10.1.el8_7.x86_64
System shows
watchdog: bug: soft logup - cpu7 stuck for 23s
timed out waiting for device dev-mapper
dependency failed for resume from hibernation using device dev mapper 
 watchdog bug soft lockup cpu7 stuck for 22s

When choosing  the old kernel 4.18.0-425.3.1.el8.x86_64 at grub2 menu, system boots again

Version-Release number of selected component (if applicable):
4.18.0-425.10.1.el8_7.x86_64



How reproducible:
We have two Dell 3600 series workstation (with NVME), which are affected by this problem


Steps to Reproduce:
1.Update kernel from 4.18.0-425.3.1.el8.x86_64 to 4.18.0-425.10.1.el8_7.x86_64
2.reboot
3.systems shows error above 

Actual results:


Expected results:


Additional info:

Comment 1 Rafael Aquini 2023-01-16 20:09:21 UTC
This seems to be the same class of issue as reported at
https://bugzilla.redhat.com/show_bug.cgi?id=2160842

Would you mind setting the system up to collect a vmcore
when such occurrences are observed?

Comment 2 Need Real Name 2023-01-16 21:05:48 UTC
Hi
could you provide a link to a document how to collect vmcore?
The affected systems run at two customer locations and we will need a step-by-step guide to collectthe information.

Regards

Hansjörg

Comment 3 Rafael Aquini 2023-01-16 22:12:02 UTC
(In reply to Need Real Name from comment #2)
> Hi
> could you provide a link to a document how to collect vmcore?
> The affected systems run at two customer locations and we will need a
> step-by-step guide to collectthe information.
> 

You can start here: https://access.redhat.com/solutions/6038

It's also advisable, if you don't know how to set up the system to
capture a vmcore when it hangs, to open a support case with Red Hat
support department and get help to accomplish it. 
(Bugzilla is not a support channel)

Comment 4 Need Real Name 2023-01-17 05:46:40 UTC
Hi
thanks.
The problem occurs at very early boot(waiting for device-mapper) when trying to resume from hibernation (as you can see from the log.
Therefore I doubpt, that kdump would be available at this stage?
Regards

Hansjörg

Comment 5 Need Real Name 2023-01-19 06:40:03 UTC
Hi

I found this

https://elrepo.org/bugs/view.php?id=1316

"RHEL8.7 system with the kernel version 4.18.0-425.3.1.el8.x86_64 fails to boot with soft lockup message"

Any kmod packages that use the affected 'pv_lock_ops' symbol need rebuilding against (bug-free) kernel-4.18.0-425.10.1.el8_7

And the affected systems have nvidia.ko installed from elrepo
I will test the new nvidia.ko  today and let you know, if it helps

Regards

Hansjörg

Comment 6 Rafael Aquini 2023-01-19 22:23:39 UTC
(In reply to Need Real Name from comment #5)
> Hi
> 
> I found this
> 
> https://elrepo.org/bugs/view.php?id=1316
> 
> "RHEL8.7 system with the kernel version 4.18.0-425.3.1.el8.x86_64 fails to
> boot with soft lockup message"
> 
> Any kmod packages that use the affected 'pv_lock_ops' symbol need rebuilding
> against (bug-free) kernel-4.18.0-425.10.1.el8_7
> 

Yes, that's correct. There was an unwittingly and silent KABI break introduced on
kernel-4.18.0-425.el8, which made modules built for older releases stop loading
due to the paravirt lock patching. The fix for that KABI break, introduced in 
kernel-4.18.0-425.10.1.el8_7 end up causing the same problem for modules compiled
against all earlier RHEL-8.7 builds. So, in your case a module that
was compiled for kernel-4.18.0-425.3.1.el8.x86_64 will stop loading when updating
the kernel to 4.18.0-425.10.1.el8_7.

-- Rafael

*** This bug has been marked as a duplicate of bug 2160842 ***

Comment 7 Need Real Name 2023-01-22 14:55:05 UTC
Hi

with the new
nvidia-x11-drv-525.85.05-1.el8_7.elrepo.x86_64
the system boots with  4.18.0-425.10.1.el8_7 again

Regards

Hansjörg


Note You need to log in before you can comment on or make changes to this bug.