Created attachment 872334 [details] log of bisecting upstream kernel ** Description of problem: I have a long-term Fedora 19 qemu/KVM virtual machine that uses OVMF boot firmware (UEFI for VMs). After upgrading the guest kernel from kernel-3.12.11-201.fc19.x86_64 to kernel-3.13.3-100.fc19.x86_64 suspending the machine (S3) broke -- after the first VCPU is brought down, the VM hangs mid-suspend. Further VCPUs are not reached/stopped. (Again, this is not a *resume* bug. The hang happens during suspend.) ** Version-Release number of selected component (if applicable): kernel-3.13.3-100.fc19.x86_64 I spent the last seven hours bisecting the problem, using the upstream stable tree, between 3.12.11 and 3.13.3. The culprit commit is commit 1cf7e9c68fe84248174e998922b39e508375e7c1 Author: Jens Axboe <axboe> Date: Fri Nov 1 10:52:52 2013 -0600 virtio_blk: blk-mq support Switch virtio-blk from the dual support for old-style requests and bios to use the block-multiqueue. Acked-by: Asias He <asias> Signed-off-by: Jens Axboe <axboe> Signed-off-by: Christoph Hellwig <hch> ** How reproducible: 100% ** Steps to Reproduce: 1. Boot Fedora 19 to the multi-user target. 2. Log in as root at the console. 3. Enter "pm-suspend". ** Actual results: (a) The following is logged to the serial console (note: ignore_loglevel no_console_suspend): PM: Syncing filesystems ... done. PM: Preparing system for mem sleep Freezing user space processes ... (elapsed 0.005 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. PM: Entering mem sleep PM: suspend of devices complete after 206.472 msecs PM: late suspend of devices complete after 0.204 msecs PM: noirq suspend of devices complete after 2.426 msecs ACPI: Preparing to enter system sleep state S3 PM: Saving platform NVS memory Disabling non-boot CPUs ... Unregister pv shared memory for cpu 1 smpboot: CPU 1 is now offline <hangs> (b) virt-manager displays the status of the VM as Running. ** Expected results: (a) The following *additional* lines printed to the console during suspend: +Unregister pv shared memory for cpu 2 +smpboot: CPU 2 is now offline +Unregister pv shared memory for cpu 3 +Broke affinity for irq 10 +Broke affinity for irq 11 +smpboot: CPU 3 is now offline <suspends> (b) virt-manager displays the status of the VM as Suspended. ** Additional info: Attaching bisection log.
(I used "/boot/config-3.12.11-201.fc19.x86_64" + make oldconfig for each round of the bisection.)
Possibly fixed by the following upstream commit, first in v3.14-rc3: commit 5124c285797aa33d5fdb909fbe808a61bcc5eb9d Author: Christoph Hellwig <hch> Date: Mon Feb 10 03:24:39 2014 -0800 virtio_blk: use blk_mq_complete_request Make sure to complete requests on the submitting CPU. Previously this was done in blk_mq_end_io, but the responsibility shifted to the drivers. Signed-off-by: Christoph Hellwig <hch> Signed-off-by: Jens Axboe <axboe> (I didn't test it, just found it as a candidate with "git log".)
Part of the following series: 1 30a91cb blk-mq: rework I/O completions 2 5124c28 virtio_blk: use blk_mq_complete_request 3 ce2c350 null_blk: use blk_complete_request and blk_mq_complete_request 4 1874198 blk-mq: rework flush sequencing logic
Can you test a rawhide kernel on that VM? It already has the latest fixes you highlight and should be easy enough to try.
Oh I didn't know it was safe (or "officially condoned") to install a rawhide kernel in Fedora 19. I tested "kernel-3.14.0-0.rc5.git2.1.fc21.x86_64.rpm", from <http://koji.fedoraproject.org/koji/buildinfo?buildID=502859>. Unfortunately, suspend fails with the same symptoms.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.13.5-100.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
(In reply to Justin M. Forbes from comment #7) > Fedora 19 has now been rebased to 3.13.5-100.fc19. Please test this kernel > update and let us know if you issue has been resolved My issue has been caused, not resolved, by the update.
The problem can be worked around by replacing all virtio-blk disks with virtio-scsi disks. Tested with 3.13.6-100.fc19.x86_64.
Still broken in 3.15.0-0.rc0.git8.1.fc21.x86_64.
I found the issue here, it's a bug in the lib/percpu_counter.c when offlining a CPU. Attached a fix in the kernel.org bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=73501
Created attachment 882916 [details] Reset percpu counters for CPU_DEAD_FROZEN (patch by Jens Axboe, in email format) (In reply to Jens Axboe from comment #11) > I found the issue here, it's a bug in the lib/percpu_counter.c when > offlining a CPU. Attached a fix in the kernel.org bugzilla: > > https://bugzilla.kernel.org/show_bug.cgi?id=73501 The fix works (see my comment in the kernel bugzilla); thank you very much. Attaching Jens' patch here too. I gitified it a bit more for purely practical reasons, so that it applies with git-am -- I took care to keep the authorship and the S-o-b intact. Beyond the bugzilla references, I didn't write up a commit message (although I could think of something, based on <https://bugzilla.kernel.org/show_bug.cgi?id=73501#c4> -- the optimal solution would be to wait for Jens' upstream commit, and cherry-pick it into Fedora. In any case, <https://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow> says, POST - This state is primarily used by developers working on virtualization and the kernel. - A bug is moved to the POST state (from ASSIGNED) when a patch has been attached to the bugzilla entry and the gate keeper is waiting for the patch to receive three ACKS. Therefore, POST means "a patch is ready, but not yet applied". Hence moving this BZ to POST.
Upstream posting: http://thread.gmane.org/gmane.linux.kernel/1678373
Fedora 20 scratch build, x86_64 only: https://koji.fedoraproject.org/koji/taskinfo?taskID=6711141 This build includes the following patches on top of kernel-3.13.9-200.fc20: - drm/bochs: new driver (Gerd Hoffmann) 0a6659bdc5e8221da99eebb176fd9591435e38de (v3.14) - drm: fix bochs kconfig dependencies (Gerd Hoffmann) 77ac9a05d4a0be6b2ab22b61d7fb36d29c212d72 (v3.14) - drm: bochs: add power management support (Gerd Hoffmann) http://article.gmane.org/gmane.comp.video.dri.devel/100860 - Fix bad percpu counter state during suspend (Jens Axboe) http://article.gmane.org/gmane.linux.kernel/1678373 The first three patches add a paravirt video driver for qemu's stdvga (bochs) that supports S3. These patches are unrelated to this BZ, but they are a useful addition because the only other such option is QXL, which has sluggish performance on the character console (slowing down boot, for example). The fourth patch is the fix for this bug. Testing a 4 VCPU, 3 virtio-blk, stdvga guest, running OVMF SVN r15433, plus the test kernel: PM: Syncing filesystems ... done. PM: Preparing system for mem sleep Freezing user space processes ... (elapsed 0.003 seconds) done. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. PM: Entering mem sleep PM: suspend of devices complete after 177.577 msecs PM: late suspend of devices complete after 0.160 msecs PM: noirq suspend of devices complete after 1.302 msecs ACPI: Preparing to enter system sleep state S3 PM: Saving platform NVS memory Disabling non-boot CPUs ... Unregister pv shared memory for cpu 1 smpboot: CPU 1 is now offline blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 2 -> Queue 0 CPU 3 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 2 -> Queue 0 CPU 3 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 2 -> Queue 0 CPU 3 -> Queue 0 Unregister pv shared memory for cpu 2 smpboot: CPU 2 is now offline blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 3 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 3 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 3 -> Queue 0 Unregister pv shared memory for cpu 3 Broke affinity for irq 1 smpboot: CPU 3 is now offline blk-mq: CPU -> queue map CPU 0 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 <suspends> At resume: kvm-clock: cpu 0, msr 0:9d522001, primary cpu clock, resume ACPI: Low-level resume complete ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S0_] (20131115/hwxface-580) PM: Restoring platform NVS memory Enabling non-boot CPUs ... x86: Booting SMP configuration: smpboot: Booting Node 0 Processor 1 APIC 0x1 kvm-clock: cpu 1, msr 0:9d522041, secondary cpu clock KVM setup async PF for cpu 1 kvm-stealtime: cpu 1, msr 9d08e000 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 CPU1 is up smpboot: Booting Node 0 Processor 2 APIC 0x2 kvm-clock: cpu 2, msr 0:9d522081, secondary cpu clock KVM setup async PF for cpu 2 kvm-stealtime: cpu 2, msr 9d10e000 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 CPU 2 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 CPU 2 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 CPU 2 -> Queue 0 CPU2 is up smpboot: Booting Node 0 Processor 3 APIC 0x3 kvm-clock: cpu 3, msr 0:9d5220c1, secondary cpu clock KVM setup async PF for cpu 3 kvm-stealtime: cpu 3, msr 9d18e000 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 CPU 2 -> Queue 0 CPU 3 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 CPU 2 -> Queue 0 CPU 3 -> Queue 0 blk-mq: CPU -> queue map CPU 0 -> Queue 0 CPU 1 -> Queue 0 CPU 2 -> Queue 0 CPU 3 -> Queue 0 CPU3 is up ACPI: Waking up from system sleep state S3 PM: noirq resume of devices complete after 20.106 msecs PM: early resume of devices complete after 0.084 msecs pci 0000:00:01.0: PIIX3: Enabling Passive Release usb usb1: root hub lost power or was reset virtio-pci 0000:00:04.0: irq 40 for MSI/MSI-X virtio-pci 0000:00:03.0: irq 41 for MSI/MSI-X virtio-pci 0000:00:03.0: irq 42 for MSI/MSI-X virtio-pci 0000:00:03.0: irq 43 for MSI/MSI-X virtio-pci 0000:00:04.0: irq 44 for MSI/MSI-X virtio-pci 0000:00:06.0: irq 45 for MSI/MSI-X ata1: port disabled--ignoring virtio-pci 0000:00:08.0: irq 46 for MSI/MSI-X virtio-pci 0000:00:08.0: irq 47 for MSI/MSI-X virtio-pci 0000:00:07.0: irq 48 for MSI/MSI-X virtio-pci 0000:00:07.0: irq 49 for MSI/MSI-X virtio-pci 0000:00:06.0: irq 50 for MSI/MSI-X ata2.01: NODEV after polling detection ata2.00: configured for MWDMA2 usb 1-1: reset full-speed USB device number 2 using uhci_hcd PM: resume of devices complete after 565.690 msecs PM: Finishing wakeup. Restarting tasks ... done. Very well behaved; thanks.
Upstream commit: commit e39435ce68bb4685288f78b1a7e24311f7ef939f Author: Jens Axboe <axboe> Date: Tue Apr 8 16:04:12 2014 -0700 lib/percpu_counter.c: fix bad percpu counter state during suspend
Thanks Jens and Laszlo.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.14.4-100.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.
?! This was diagnosed and fixed months ago, and the fixes are in the RH kernel. Please close this properly.
Right. The original upstream fix (commit e39435ce68bb4685288f78b1a7e24311f7ef939f, identified in comment 15) has been released in v3.15. However it has also been cherry picked for stable/linux-3.14.y, as commit 08362f5864a8d13b62f2ff0e9a1d7a60f2a58d96. This cherry-pick (ie. backport) is contained in: v3.14.4. According to comment 17, Fedora 19 was (then) based on 3.14.4-100.fc19. The most recent .fc19 kernel build is "kernel-3.14.8-100.fc19" in Koji at the moment. Similarly for Fedora 20: kernel-3.14.8-200.fc20. Hence the fix is present in Fedora, by virtue of the upstream stable process and the downstream rebase. According to <https://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow#CLOSED>, > Once a bug has been fixed and included in a new package in rawhide or the > updates repo it should be closed. For a stable or Branched release, the > resolution ERRATA should be used. For Rawhide, the resolution RAWHIDE > should be used. > > Note that the resolution must match the release against which the bug is > filed. Therefore, a bug reported for a stable release cannot be closed as > 'fixed' if a fix is shipped only in Rawhide. A bug in one stable release > cannot be closed as 'fixed' - ERRATA - if a fix is shipped only in a > different stable release. The correct resolution for these situations is > NEXTRELEASE, with an explanation as a comment. Given that the issue has been addressed for *both* Fedora 19 and Fedora 20 (from which Fedora 19 is the release which the bug was filed against), the correct resolution is ERRATA here. Updating.