Bug 1856283 - hard lockup while kvm tests
Summary: hard lockup while kvm tests
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: ppc64le
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2020-07-13 09:57 UTC by Michel Normand
Modified: 2021-05-03 07:49 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-03 07:49:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journalctl_abanb_today_20200713.log (789.13 KB, text/plain)
2020-07-13 09:57 UTC, Michel Normand
no flags Details

Description Michel Normand 2020-07-13 09:57:29 UTC
Created attachment 1700805 [details]
journalctl_abanb_today_20200713.log

1. Please describe the problem:

my local openQA server with fedora 32, starting to fail after last dnf distro-sync done on 20200709 (with kernel   5.7.8-200) failed on 20200711 when starting  to execute a set of openQA tests (using kvm guests)

and opened ssh session closed, I have to ipmi power off/on to recover

Two occurences similar backtrace,
the attached log associated to 2nd occurence.
===
Jul 11 06:27:20 abanb.tlslab.ibm.com kernel: watchdog: CPU 0 detected hard LOCKUP on other CPUs 16
... 
Jul 11 06:59:59 abanb.tlslab.ibm.com worker[4273]: [info] Test schedule has changed, reloading test_order.json
-- Reboot -- <= power off/on via ipmi 
Jul 13 07:15:14 localhost.localdomain kernel: dt-cpu-ftrs: setup for ISA 2070
... 
Jul 13 08:30:20 abanb.tlslab.ibm.com kernel: watchdog: CPU 8 detected hard LOCKUP on other CPUs 32
...
Jul 11 06:59:59 abanb.tlslab.ibm.com worker[4273]: [info] Test schedule has changed, reloading test_order.json
-- Reboot -- <= power off/on via ipmi
Jul 13 07:15:14 localhost.localdomain kernel: dt-cpu-ftrs: setup for ISA 2070
===
watchdog: CPU 8 detected hard LOCKUP on other CPUs 32
watchdog: CPU 8 TB:2372958361977, last SMP heartbeat TB:2364766439129 (15999ms ago)
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu:         32-...0: (4 ticks this GP) idle=c56/1/0x4000000000000000 softirq=227906/227908 fqs=2203 
        (detected by 48, t=6002 jiffies, g=422993, q=4057)
Sending NMI from CPU 48 to CPUs 32:
CPU 32 didn't respond to backtrace IPI, inspecting paca.
irq_soft_mask: 0x03 in_mce: 0 in_nmi: 0 current: 13963 (qemu-system-ppc)
Back trace of paca->saved_r1 (0xc000000bf97ab570) (possibly stale):
Call Trace:
[c000000bf97ab570] [c00000000013cc5c] guest_bypass+0x38/0x2c0 (unreliable)
[c000000bf97ab640] [c00000000013bda8] kvmppc_call_hv_entry+0x28/0x9c
[c000000bf97ab6b0] [c008000002eb0a00] __kvmppc_vcore_entry+0xa0/0x104 [kvm_hv]
[c000000bf97ab890] [c008000002eaa444] kvmppc_run_core+0xedc/0x2820 [kvm_hv]
[c000000bf97aba50] [c008000002eaf490] kvmppc_vcpu_run_hv+0x5d8/0xec0 [kvm_hv]
[c000000bf97abb60] [c008000003e4e77c] kvmppc_vcpu_run+0x34/0x48 [kvm]
[c000000bf97abb80] [c008000003e4a20c] kvm_arch_vcpu_ioctl_run+0x334/0x450 [kvm]
[c000000bf97abc10] [c008000003e39114] kvm_vcpu_ioctl+0x27c/0x760 [kvm]
[c000000bf97abd70] [c000000000568784] sys_ioctl+0xf4/0x150
[c000000bf97abdc0] [c000000000032630] system_call_exception+0xf0/0x180
[c000000bf97abe20] [c00000000000ca70] system_call_common+0xf0/0x278
===

2. What is the Version-Release number of the kernel: 5.7.8-200


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

worked before with kernel 5.6.17-300

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

IBM local openQA server on a P8 host


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

not tested

6. Are you running any modules that not shipped with directly Fedora's kernel?:

no


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

the attached log is journalctl --since today
to

Comment 1 Fedora Program Management 2021-04-29 16:55:39 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Michel Normand 2021-05-03 07:49:43 UTC
no failure anymore


Note You need to log in before you can comment on or make changes to this bug.