Bug 1074235 - guest upgrade to 3.13 breaks S3 (suspend) in qemu virtual machine (bisected)
Summary: guest upgrade to 3.13 breaks S3 (suspend) in qemu virtual machine (bisected)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: rebase-regression-3.13
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-09 05:06 UTC by Laszlo Ersek
Modified: 2014-06-23 15:35 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-23 14:41:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
log of bisecting upstream kernel (2.60 KB, text/x-log)
2014-03-09 05:06 UTC, Laszlo Ersek
no flags Details
Reset percpu counters for CPU_DEAD_FROZEN (patch by Jens Axboe, in email format) (992 bytes, patch)
2014-04-04 22:39 UTC, Laszlo Ersek
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 73501 0 None None None Never

Description Laszlo Ersek 2014-03-09 05:06:55 UTC
Created attachment 872334 [details]
log of bisecting upstream kernel

** Description of problem:
I have a long-term Fedora 19 qemu/KVM virtual machine that uses OVMF boot firmware (UEFI for VMs). After upgrading the guest kernel from

  kernel-3.12.11-201.fc19.x86_64

to

  kernel-3.13.3-100.fc19.x86_64

suspending the machine (S3) broke -- after the first VCPU is brought down, the VM hangs mid-suspend. Further VCPUs are not reached/stopped.

(Again, this is not a *resume* bug. The hang happens during suspend.)

** Version-Release number of selected component (if applicable):
kernel-3.13.3-100.fc19.x86_64

I spent the last seven hours bisecting the problem, using the upstream stable tree, between 3.12.11 and 3.13.3. The culprit commit is

commit 1cf7e9c68fe84248174e998922b39e508375e7c1
Author: Jens Axboe <axboe>
Date:   Fri Nov 1 10:52:52 2013 -0600

    virtio_blk: blk-mq support
    
    Switch virtio-blk from the dual support for old-style requests and bios
    to use the block-multiqueue.
    
    Acked-by: Asias He <asias>
    Signed-off-by: Jens Axboe <axboe>
    Signed-off-by: Christoph Hellwig <hch>

** How reproducible:
100%

** Steps to Reproduce:
1. Boot Fedora 19 to the multi-user target.
2. Log in as root at the console.
3. Enter "pm-suspend".

** Actual results:
(a) The following is logged to the serial console (note: ignore_loglevel no_console_suspend):

PM: Syncing filesystems ... done.
PM: Preparing system for mem sleep
Freezing user space processes ... (elapsed 0.005 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: Entering mem sleep
PM: suspend of devices complete after 206.472 msecs
PM: late suspend of devices complete after 0.204 msecs
PM: noirq suspend of devices complete after 2.426 msecs
ACPI: Preparing to enter system sleep state S3
PM: Saving platform NVS memory
Disabling non-boot CPUs ...
Unregister pv shared memory for cpu 1
smpboot: CPU 1 is now offline
<hangs>

(b) virt-manager displays the status of the VM as Running.

** Expected results:
(a) The following *additional* lines printed to the console during suspend:

+Unregister pv shared memory for cpu 2
+smpboot: CPU 2 is now offline
+Unregister pv shared memory for cpu 3
+Broke affinity for irq 10
+Broke affinity for irq 11
+smpboot: CPU 3 is now offline
<suspends>

(b) virt-manager displays the status of the VM as Suspended.

** Additional info:
Attaching bisection log.

Comment 1 Laszlo Ersek 2014-03-09 05:17:43 UTC
(I used "/boot/config-3.12.11-201.fc19.x86_64" + make oldconfig for each round of the bisection.)

Comment 2 Laszlo Ersek 2014-03-09 05:34:11 UTC
Possibly fixed by the following upstream commit, first in v3.14-rc3:

commit 5124c285797aa33d5fdb909fbe808a61bcc5eb9d
Author: Christoph Hellwig <hch>
Date:   Mon Feb 10 03:24:39 2014 -0800

    virtio_blk: use blk_mq_complete_request
    
    Make sure to complete requests on the submitting CPU.  Previously this
    was done in blk_mq_end_io, but the responsibility shifted to the drivers.
    
    Signed-off-by: Christoph Hellwig <hch>
    Signed-off-by: Jens Axboe <axboe>

(I didn't test it, just found it as a candidate with "git log".)

Comment 3 Laszlo Ersek 2014-03-09 05:54:11 UTC
Part of the following series:

   1  30a91cb blk-mq: rework I/O completions
   2  5124c28 virtio_blk: use blk_mq_complete_request
   3  ce2c350 null_blk: use blk_complete_request and blk_mq_complete_request
   4  1874198 blk-mq: rework flush sequencing logic

Comment 4 Josh Boyer 2014-03-10 12:50:32 UTC
Can you test a rawhide kernel on that VM?  It already has the latest fixes you highlight and should be easy enough to try.

Comment 5 Laszlo Ersek 2014-03-10 14:03:15 UTC
Oh I didn't know it was safe (or "officially condoned") to install a rawhide kernel in Fedora 19.

I tested "kernel-3.14.0-0.rc5.git2.1.fc21.x86_64.rpm", from <http://koji.fedoraproject.org/koji/buildinfo?buildID=502859>. Unfortunately, suspend fails with the same symptoms.

Comment 7 Justin M. Forbes 2014-03-10 14:48:29 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.13.5-100.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 8 Laszlo Ersek 2014-03-10 15:12:05 UTC
(In reply to Justin M. Forbes from comment #7)

> Fedora 19 has now been rebased to 3.13.5-100.fc19.  Please test this kernel
> update and let us know if you issue has been resolved

My issue has been caused, not resolved, by the update.

Comment 9 Laszlo Ersek 2014-03-24 21:44:51 UTC
The problem can be worked around by replacing all virtio-blk disks with virtio-scsi disks. Tested with 3.13.6-100.fc19.x86_64.

Comment 10 Laszlo Ersek 2014-04-03 21:52:30 UTC
Still broken in 3.15.0-0.rc0.git8.1.fc21.x86_64.

Comment 11 Jens Axboe 2014-04-04 20:46:07 UTC
I found the issue here, it's a bug in the lib/percpu_counter.c when offlining a CPU. Attached a fix in the kernel.org bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=73501

Comment 12 Laszlo Ersek 2014-04-04 22:39:47 UTC
Created attachment 882916 [details]
Reset percpu counters for CPU_DEAD_FROZEN (patch by Jens Axboe, in email format)

(In reply to Jens Axboe from comment #11)
> I found the issue here, it's a bug in the lib/percpu_counter.c when
> offlining a CPU. Attached a fix in the kernel.org bugzilla:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=73501

The fix works (see my comment in the kernel bugzilla); thank you very much.

Attaching Jens' patch here too. I gitified it a bit more for purely
practical reasons, so that it applies with git-am -- I took care to keep the
authorship and the S-o-b intact. Beyond the bugzilla references, I didn't
write up a commit message (although I could think of something, based on
<https://bugzilla.kernel.org/show_bug.cgi?id=73501#c4> -- the optimal
solution would be to wait for Jens' upstream commit, and cherry-pick it into
Fedora.

In any case, <https://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow>
says,

  POST

  - This state is primarily used by developers working on virtualization and
    the kernel.
  - A bug is moved to the POST state (from ASSIGNED) when a patch has been
    attached to the bugzilla entry and the gate keeper is waiting for the
    patch to receive three ACKS. Therefore, POST means "a patch is ready,
    but not yet applied".

Hence moving this BZ to POST.

Comment 13 Laszlo Ersek 2014-04-05 20:22:18 UTC
Upstream posting:
http://thread.gmane.org/gmane.linux.kernel/1678373

Comment 14 Laszlo Ersek 2014-04-06 02:14:50 UTC
Fedora 20 scratch build, x86_64 only:
https://koji.fedoraproject.org/koji/taskinfo?taskID=6711141

This build includes the following patches on top of
kernel-3.13.9-200.fc20:

- drm/bochs: new driver (Gerd Hoffmann)
  0a6659bdc5e8221da99eebb176fd9591435e38de (v3.14)

- drm: fix bochs kconfig dependencies (Gerd Hoffmann)
  77ac9a05d4a0be6b2ab22b61d7fb36d29c212d72 (v3.14)

- drm: bochs: add power management support (Gerd Hoffmann)
  http://article.gmane.org/gmane.comp.video.dri.devel/100860

- Fix bad percpu counter state during suspend (Jens Axboe)
  http://article.gmane.org/gmane.linux.kernel/1678373

The first three patches add a paravirt video driver for qemu's stdvga
(bochs) that supports S3. These patches are unrelated to this BZ, but they
are a useful addition because the only other such option is QXL, which has
sluggish performance on the character console (slowing down boot, for
example).

The fourth patch is the fix for this bug.

Testing a 4 VCPU, 3 virtio-blk, stdvga guest, running OVMF SVN r15433, plus
the test kernel:

  PM: Syncing filesystems ... done.
  PM: Preparing system for mem sleep
  Freezing user space processes ... (elapsed 0.003 seconds) done.
  Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
  PM: Entering mem sleep
  PM: suspend of devices complete after 177.577 msecs
  PM: late suspend of devices complete after 0.160 msecs
  PM: noirq suspend of devices complete after 1.302 msecs
  ACPI: Preparing to enter system sleep state S3
  PM: Saving platform NVS memory
  Disabling non-boot CPUs ...
  Unregister pv shared memory for cpu 1
  smpboot: CPU 1 is now offline
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 2 -> Queue 0
    CPU 3 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 2 -> Queue 0
    CPU 3 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 2 -> Queue 0
    CPU 3 -> Queue 0
  Unregister pv shared memory for cpu 2
  smpboot: CPU 2 is now offline
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 3 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 3 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 3 -> Queue 0
  Unregister pv shared memory for cpu 3
  Broke affinity for irq 1
  smpboot: CPU 3 is now offline
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
  <suspends>

At resume:

  kvm-clock: cpu 0, msr 0:9d522001, primary cpu clock, resume
  ACPI: Low-level resume complete
  ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S0_] (20131115/hwxface-580)
  PM: Restoring platform NVS memory
  Enabling non-boot CPUs ...
  x86: Booting SMP configuration:
  smpboot: Booting Node 0 Processor 1 APIC 0x1
  kvm-clock: cpu 1, msr 0:9d522041, secondary cpu clock
  KVM setup async PF for cpu 1
  kvm-stealtime: cpu 1, msr 9d08e000
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
  CPU1 is up
  smpboot: Booting Node 0 Processor 2 APIC 0x2
  kvm-clock: cpu 2, msr 0:9d522081, secondary cpu clock
  KVM setup async PF for cpu 2
  kvm-stealtime: cpu 2, msr 9d10e000
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
    CPU 2 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
    CPU 2 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
    CPU 2 -> Queue 0
  CPU2 is up
  smpboot: Booting Node 0 Processor 3 APIC 0x3
  kvm-clock: cpu 3, msr 0:9d5220c1, secondary cpu clock
  KVM setup async PF for cpu 3
  kvm-stealtime: cpu 3, msr 9d18e000
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
    CPU 2 -> Queue 0
    CPU 3 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
    CPU 2 -> Queue 0
    CPU 3 -> Queue 0
  blk-mq: CPU -> queue map
    CPU 0 -> Queue 0
    CPU 1 -> Queue 0
    CPU 2 -> Queue 0
    CPU 3 -> Queue 0
  CPU3 is up
  ACPI: Waking up from system sleep state S3
  PM: noirq resume of devices complete after 20.106 msecs
  PM: early resume of devices complete after 0.084 msecs
  pci 0000:00:01.0: PIIX3: Enabling Passive Release
  usb usb1: root hub lost power or was reset
  virtio-pci 0000:00:04.0: irq 40 for MSI/MSI-X
  virtio-pci 0000:00:03.0: irq 41 for MSI/MSI-X
  virtio-pci 0000:00:03.0: irq 42 for MSI/MSI-X
  virtio-pci 0000:00:03.0: irq 43 for MSI/MSI-X
  virtio-pci 0000:00:04.0: irq 44 for MSI/MSI-X
  virtio-pci 0000:00:06.0: irq 45 for MSI/MSI-X
  ata1: port disabled--ignoring
  virtio-pci 0000:00:08.0: irq 46 for MSI/MSI-X
  virtio-pci 0000:00:08.0: irq 47 for MSI/MSI-X
  virtio-pci 0000:00:07.0: irq 48 for MSI/MSI-X
  virtio-pci 0000:00:07.0: irq 49 for MSI/MSI-X
  virtio-pci 0000:00:06.0: irq 50 for MSI/MSI-X
  ata2.01: NODEV after polling detection
  ata2.00: configured for MWDMA2
  usb 1-1: reset full-speed USB device number 2 using uhci_hcd
  PM: resume of devices complete after 565.690 msecs
  PM: Finishing wakeup.
  Restarting tasks ... done.

Very well behaved; thanks.

Comment 15 Laszlo Ersek 2014-04-09 10:00:23 UTC
Upstream commit:

commit e39435ce68bb4685288f78b1a7e24311f7ef939f
Author: Jens Axboe <axboe>
Date:   Tue Apr 8 16:04:12 2014 -0700

    lib/percpu_counter.c: fix bad percpu counter state during suspend

Comment 16 Josh Boyer 2014-04-09 13:50:13 UTC
Thanks Jens and Laszlo.

Comment 17 Justin M. Forbes 2014-05-21 19:31:06 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.14.4-100.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 18 Justin M. Forbes 2014-06-23 14:41:21 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 19 Jens Axboe 2014-06-23 14:45:19 UTC
?!

This was diagnosed and fixed months ago, and the fixes are in the RH kernel. Please close this properly.

Comment 20 Laszlo Ersek 2014-06-23 15:35:46 UTC
Right.

The original upstream fix (commit e39435ce68bb4685288f78b1a7e24311f7ef939f, identified in comment 15) has been released in v3.15.

However it has also been cherry picked for stable/linux-3.14.y, as commit 08362f5864a8d13b62f2ff0e9a1d7a60f2a58d96.

This cherry-pick (ie. backport) is contained in: v3.14.4.

According to comment 17, Fedora 19 was (then) based on 3.14.4-100.fc19.

The most recent .fc19 kernel build is "kernel-3.14.8-100.fc19" in Koji at the moment. Similarly for Fedora 20: kernel-3.14.8-200.fc20. Hence the fix is present in Fedora, by virtue of the upstream stable process and the downstream rebase.

According to <https://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow#CLOSED>,

> Once a bug has been fixed and included in a new package in rawhide or the
> updates repo it should be closed. For a stable or Branched release, the
> resolution ERRATA should be used. For Rawhide, the resolution RAWHIDE
> should be used.
>
> Note that the resolution must match the release against which the bug is
> filed. Therefore, a bug reported for a stable release cannot be closed as
> 'fixed' if a fix is shipped only in Rawhide. A bug in one stable release
> cannot be closed as 'fixed' - ERRATA - if a fix is shipped only in a
> different stable release. The correct resolution for these situations is
> NEXTRELEASE, with an explanation as a comment.

Given that the issue has been addressed for *both* Fedora 19 and Fedora 20 (from which Fedora 19 is the release which the bug was filed against), the correct resolution is ERRATA here. Updating.


Note You need to log in before you can comment on or make changes to this bug.