Bug 2084442
Summary: | watchdog: BUG: soft lockup - CPU# stuck for ... - no watchdog action | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | lejeczek <peljasz> | ||||
Component: | qemu-kvm | Assignee: | Michael S. Tsirkin <mst> | ||||
qemu-kvm sub component: | PCI | QA Contact: | Yiqian Wei <yiwei> | ||||
Status: | CLOSED NOTABUG | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | ailan, berrange, bstinson, chayang, coli, jinzhao, juzhang, jwboyer, mkletzan, mrezanin, mst, nilal, virt-maint, xiaohli, yiwei, ymankad | ||||
Version: | CentOS Stream | Keywords: | Triaged | ||||
Target Milestone: | rc | ||||||
Target Release: | 9.3 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2023-09-05 13:29:50 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 2180898 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
lejeczek
2022-05-12 07:21:06 UTC
I did not reproduce this bug on RHEL.9.1.0 Host src and dst host version: qemu-kvm-7.0.0-1.el9.x86_64 kernel-5.14.0-92.el9.x86_64 edk2-ovmf-20220221gitb24306f15d-1.el9.noarch guest : rhel9.1.0 Test steps: 1. In src end, boot a rhel9.1.0 guest with "-device i6300esb" qemu cli[1] 2. In dst end, boot a rhel9.1.0 guest with qemu cli[1] and append '-incoming defer' 3. Migrate vm from src to dst dst qmp: {"execute": "migrate-incoming", "arguments": {"uri": "tcp:[::]:4000"}} src qmp: {"execute": "migrate", "arguments": {"uri": "tcp:$dst_host_ip:4000"}} Additional info: 1. Not reproduce this bug with q35 + seabios 2. qemu cli[1]: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \ -blockdev node-name=file_ovmf_vars,driver=file,filename=/mnt/yiwei/OVMF_VARS.fd,auto-read-only=on,discard=unmap \ -blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars \ -machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -device i6300esb,id=wdt0,bus=pcie-pci-bridge-0 \ -watchdog-action reset \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 12G \ -object memory-backend-ram,size=12G,id=mem-machine_mem \ -smp 10,maxcpus=10,cores=5,threads=1,sockets=2 \ -cpu IvyBridge,enforce \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/mnt/yiwei/rhel9.1-ovmf.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \ -device virtio-net-pci,mac=9a:df:ca:53:c2:69,id=idz43iXV,netdev=idPOEPyA,bus=pcie-root-port-3,addr=0x0 \ -netdev tap,id=idPOEPyA,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio \ -qmp tcp:0:4444,server=on,wait=off \ I realize I might have made it bit vague, I'll try again: ...live migration ...not the problem I'm reporting here. I think is the problem here is faulty/malfunctioning "watchdog" - so if somehow, in whatever manner you can cause "watchdog: BUG: soft lockup..." so VM becomes non-responsive (I've also notice that host reports high CPU load for such "broken" VM at that time) you should see no action from the watchdog - which I believe should happen. Hi lejeczek, I didn't reproduce this bug when run stress in guest. How did you reproduce this bug,could you provide the steps and command line to reproduce this bug? Thanks, yiqian To shed bit more light on my systems: VMs qcow2s are stored on GlusterFS vols (say 3 node/peer cluster should do) - those gluster vols are autofs-mounted(since rhel, inexplicably to me, removed libgfapi from qemu) so qemu/libvirt access those qcow2s via fuse. To stress such system-setup out I'd imagine live-migration of a single VM will not do, instead a "mass" live-migration is when the issue happens, say: - node1 has already a few VMs up & running and you migrate a few more VMs in fast succession, to node1 from a nodeX - if you were to add HA/pacemaker to the equation and let such ha-cluster manage your VMs, then that will also allow GlusterFS to be involved (though can be done without, manually) - reboot one such ha-cluster's node so then: a) live-migration will take place b) also gluster vol will be healing It might not happen every time but when it(something) does happen, then you should get quite a few! VMs(migrated) being "soft lockup-ed" and then.... the watchdog "issue". Hardware resources will most likely matter very much so, if you have big CPUs and lots of resources then that probably will be not good, not helpful. Smaller the systems the better, easier to stress out. I test all this with mid-shelf Ryzens. thanks, L With my limited understanding I looked at this issue and it seems the message you are getting is not from a watchdog device, but from the linux kernel lockup detector. If a the CPU does not get enough execution time, for example when overcommiting the host, then soft lockup is detected and based on the configuration and settings it can trigger a kernel panic. I think the only two options here are to either make sure that the host is not overcommitted cpu-wise or disabling the lockup detection via cmdline or sysctl. Oh sorry, I misread the description. You are expecting the watchdog device to reset the VM once such lockup happens. How long have you tried waiting after the soft lockup message? Is it possible that QEMU does not get enough cpu time so that it can emulate the watchdog? If yes, then I suspect QEMU might actually be at fault here. Since there has been no resolution planned for this issue and we are at the point in the release where we need to limit risk and change, I'm removing from the current release in order to have the work properly planned for some future release. (In reply to Martin Kletzander from comment #7) > Oh sorry, I misread the description. You are expecting the watchdog device > to reset the VM once such lockup happens. How long have you tried waiting > after the soft lockup message? Is it possible that QEMU does not get enough > cpu time so that it can emulate the watchdog? If yes, then I suspect QEMU > might actually be at fault here. This is a 10 CPU guest. Any one of those CPUs could potentially pet the watchdog and keep it from firing, so I don't think it is indicative of a watchdog bug unless we can demonstrate that all 10 CPUs are fully non-responsive. A hardware watchdog alone is not sufficient to detect all potential problems which lead to non-responsive application services. You need to, in addition, have external application level liveliness probes and be willing to fence the VM if they fail to respond. In Machine & PCI team meeting today, Michael commented that the fix for this bug is already upstream, and it should come as part of the QEMU 9.3 rebase. However, it is a high-risk change since it could expose qemu or guest bugs leading to unexpected resets. Marking this as TestOnly and adding a depends on the rebase BZ. PS: Michael is having some Bugzilla access issues, so once that is sorted, he should be able to answer any questions. QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. Hi lejeczek, You could provide the following information: 1.What is the number of host cpus for this bug? 2.How many VMs are booted and how many cpus are used by the vm? Reproduce steps: host version: kernel-5.14.0-312.el9.x86_64 qemu-kvm-7.2.0-14.el9_2.x86_64 edk2-ovmf-20230301gitf80f052277c8-3.el9.noarch guest: rhel9.3.0 reproduce steps: 1. Boot 8 guests with "-m 1G and -smp 4,sockets=1,cores=4,threads=1" on host # sh ovmf.sh ovmf 8 2.check dmesg in guest test results: please see Attachment: guest_dmesg.txt host information: 1) memory # free -h total used free shared buff/cache available Mem: 7.5Gi 3.1Gi 4.4Gi 6.0Mi 214Mi 4.3Gi Swap: 7.8Gi 44Mi 7.8Gi 2)cpu CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel BIOS Vendor ID: Intel Model name: Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz BIOS Model name: Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz CPU family: 6 Model: 60 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Hi Michael, Please help to check that the above reproduction steps are right? thanks a lot. Created attachment 1965004 [details]
guest dmesg information
Hi guys - the original reporter of the BZ her. Since I reported the bug the whole lot in my test-lab has changed - not to mention the obvious: software stack updated. Hardware-wise, perhaps not a number of cores nor RAM capacity are different but their "families" I replaced with younger ones, also network's key parts. Now... a year later, with all those changes and following version of: libvirt-daemon-driver-qemu-9.0.0-7.el9.x86_64 ipxe-roms-qemu-20200823-9.git4bd064de.el9.noarch qemu-img-8.0.0-1.el9.x86_64 qemu-kvm-common-8.0.0-1.el9.x86_64 qemu-kvm-core-8.0.0-1.el9.x86_64 glusterfs-server-11.0-1.el9s.x86_64 Hardware which currently participate in test-lab are: 3 x Ryzen 3800 + 32GB ram + 10Gbps lan Also with bit of, back then, tweaking & testing of HA/pacemaker - in order to lower/control simultaneous resource(VirtualDomain) migration - when a node is rebooted/shut down/stood by - now... I do not see original: soft lockup - CPU# and to try to clarify my original message - I might have done it better - yes, I reckoned that issue was: bare-metal<=>Qemu which resulted in watchdog - in VM - did act as expected, but did not act so only ! when VM was tainted by "soft lockup", otherwise if the same VM was "healthy" then watchdog did its job. so, I'm afraid I'll not be able to provide you guys with any more concrete - concrete as debug/trace - info, unless I get to see this very issue again. many thanks, L. Hi lejeczek, Could you help to check Comment 20 ? hit "watchdog: BUG: soft lockup - CPU#2 stuck for 70s! [kexec:1895]" information in guest. Thanks, Yiqian Hi. Those dmesgs look familiar, similar. One certain thing I can make is that I too - like you do, I see - was(still am) over-committing resources - guests together were set to ask more of the hosts than hosts themselves had physical capacity. (which I'd reckon, is a common place) Can reproduce this bug with fix "qemu-kvm-8.0.0-3.el9.x86_64" version host version: kernel-5.14.0-312.el9.x86_64 qemu-kvm-8.0.0-3.el9.x86_64 edk2-ovmf-20230301gitf80f052277c8-3.el9.noarch guest: rhel9.3.0 The same test steps and results as Comment 20 it seems that watchdog at least works as expected. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |