Bug 992282 - PANIC: "Oops: 0000 [#1] SMP " when resuming from S4 after hot plugging vcpu into guest
PANIC: "Oops: 0000 [#1] SMP " when resuming from S4 after hot plugging vcpu i...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
6.5
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Virtualization Maintenance
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-04 23:43 EDT by Chao Yang
Modified: 2013-08-12 22:56 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-12 22:56:08 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Chao Yang 2013-08-04 23:43:36 EDT
Description of problem:
Launched a rhel6.5 guest, then hot plugged vcpus into guest. Guest kernel panic happened when trying to resume.

<6>kvm-clock: cpu 0, msr 0:2c0167c1, primary cpu clock, resume
<6>PM: Restoring platform NVS memory
<4>ACPI Error: Found unknown opcode D0 at AML address ffffc900006123b6 offset 3, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode 7 at AML address ffffc900006123b7 offset 4, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode D0 at AML address ffffc900006123b8 offset 5, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode 7 at AML address ffffc900006123b9 offset 6, ignoring (20090903/psloop-140)
<4>ACPI Error (psargs-0359): [A^Aÿÿ] Namespace lookup failure, AE_NOT_FOUND
<4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LNKA._SRS] (Node ffff88011e6da2b8), AE_NOT_FOUND
<4>ACPI Exception: AE_NOT_FOUND, Evaluating _SRS (20090903/pci_link-367)
<4>ACPI Error (psargs-0359): [CPU ] Namespace lookup failure, AE_NOT_FOUND
<4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LNKB._SRS] (Node ffff88011e6dafd8), AE_NOT_FOUND
<4>ACPI Exception: AE_NOT_FOUND, Evaluating _SRS (20090903/pci_link-367)
<4>ACPI Error: Found unknown opcode 4 at AML address ffffc90000612537 offset 4, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode 20 at AML address ffffc90000612538 offset 5, ignoring (20090903/psloop-140)
<4>ACPI Error: Needed type [Reference], found [Integer] ffff88011d8a31e0 (20090903/exresop-104)
<4>ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands for [OpcodeName unavailable] (20090903/dswexec-445)
<4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LNKC._SRS] (Node ffff88011e6dae98), AE_AML_OPERAND_TYPE
<4>ACPI Exception: AE_AML_OPERAND_TYPE, Evaluating _SRS (20090903/pci_link-367)
<4>ACPI Error: Found unknown opcode C2 at AML address ffffc900006125f3 offset 0, ignoring (20090903/psloop-140)
<4>ACPI Exception: AE_CTRL_PENDING, While creating Arg 1 (20090903/dsutils-763)
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffff8130227f>] acpi_ps_peek_opcode+0xd/0x1f
<4>PGD 37991067 PUD 37990067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/power/state
<4>CPU 0
<4>Modules linked in: fuse autofs4 sunrpc 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput ppdev parport_pc parport sg microcode virtio_balloon snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
<4>
<4>Pid: 2717, comm: bash Not tainted 2.6.32-403.el6.x86_64 #1 Red Hat KVM
<4>RIP: 0010:[<ffffffff8130227f>]  [<ffffffff8130227f>] acpi_ps_peek_opcode+0xd/0x1f
<4>RSP: 0018:ffff8800379cbb78  EFLAGS: 00010093
<4>RAX: 00000000ff9eda0d RBX: ffff880037bf5800 RCX: ffffc90000612604
<4>RDX: 0000000000000000 RSI: ffff8801194824b0 RDI: ffff880037bf5830
<4>RBP: ffff8800379cbb78 R08: 0000000000000001 R09: 00000000fffffffc
<4>R10: 0000000000000002 R11: 0000000000000002 R12: 0000000000000000
<4>R13: 0000000000000000 R14: ffff8801194824f8 R15: ffff880037bf5830
<4>FS:  00007fc089a77700(0000) GS:ffff88002c000000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 000000003798e000 CR4: 00000000000007f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process bash (pid: 2717, threadinfo ffff8800379ca000, task ffff8801189b2040)
<4>Stack:
<4> ffff8800379cbbf8 ffffffff81302be3 ffff880037bf5800 ffffffff8116a862
<4><d> ffff8800379cbbb8 ffff880037bf5800 ffff8801149c7438 0000000000000000
<4><d> ffff8801194824f8 0000000000000000 ffff8800379cbbd8 ffff880037bf5800
<4>Call Trace:
<4> [<ffffffff81302be3>] acpi_ps_parse_loop+0x189/0x946
<4> [<ffffffff8116a862>] ? kmem_cache_alloc+0x182/0x190
<4> [<ffffffff81302330>] acpi_ps_parse_aml+0x9f/0x2de
<4> [<ffffffff81303aa8>] acpi_ps_execute_method+0x1e9/0x2b9
<4> [<ffffffff812ff24e>] acpi_ns_evaluate+0xe6/0x1ad
<4> [<ffffffff81304dc5>] acpi_rs_set_srs_method_data+0xe8/0x110
<4> [<ffffffff813076b8>] ? acpi_exception+0x75/0x80
<4> [<ffffffff813048c7>] acpi_set_current_resources+0x3c/0x4a
<4> [<ffffffff812ec8b4>] acpi_pci_link_set+0x133/0x1f0
<4> [<ffffffff812ec9a9>] irqrouter_resume+0x38/0x4b
<4> [<ffffffff813632e5>] __sysdev_resume+0x25/0xe0
<4> [<ffffffff81363421>] sysdev_resume+0x81/0x160
<4> [<ffffffff810bbdfe>] hibernation_snapshot+0x20e/0x250
<4> [<ffffffff810bbf1c>] hibernate+0xdc/0x230
<4> [<ffffffff810ba61c>] state_store+0xec/0x100
<4> [<ffffffff811624fa>] ? alloc_pages_current+0xaa/0x110
<4> [<ffffffff8127df27>] kobj_attr_store+0x17/0x20
<4> [<ffffffff811fe225>] sysfs_write_file+0xe5/0x170
<4> [<ffffffff811838b8>] vfs_write+0xb8/0x1a0
<4> [<ffffffff811841b1>] sys_write+0x51/0x90
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 4d 89 fc e9 a3 fe ff ff 55 48 89 e5 0f 1f 44 00 00 c9 81 ff 00 01 00 00 19 c0 83 c0 02 c3 55 48 89 e5 0f 1f 44 00 00 48 8b 57 08 <0f> b6 02 66 83 f8 5b 75 07 0f b6 42 01 80 cc 5b c9 c3 55 48 89
<1>RIP  [<ffffffff8130227f>] acpi_ps_peek_opcode+0xd/0x1f
<4> RSP <ffff8800379cbb78>
<4>CR2: 0000000000000000


Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.382.el6.x86_64
2.6.32-403.el6.x86_64(both host and guest)

How reproducible:
100%

Steps to Reproduce:
1. launch a 6.5 guest
CLI:
/usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 8,sockets=2,cores=2,threads=2,maxcpus=160 -rtc base=utc,clock=host,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/test-x86_64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:ab,bus=pci.0,addr=0x3,bootindex=3 -spice port=5900,disable-ticketing,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=33554432 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -balloon virtio -monitor stdio -serial unix:/tmp/serial,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. hot plug vcpu into guest
3. suspend guest to disk
4. adjust the number of vcpu in cli to reflect the correct NO. then resume guest 
CLI:
 /usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 20,sockets=2,cores=2,threads=2,maxcpus=160 -rtc base=utc,clock=host,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/test-x86_64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:ab,bus=pci.0,addr=0x3,bootindex=3 -spice port=5900,disable-ticketing,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=33554432 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -balloon virtio -monitor stdio -serial unix:/tmp/serial,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0


Actual results:
Guest kernel panic'd.

Expected results:


Additional info:
*NOTE*: I added 'acpi_sleep=s4_nohwsig' to guest kernel line to work around another bug: 
Bug 967107 - Brickland: panic on resume from hibernation

 
Detailed vmcore:
      KERNEL: /usr/lib/debug/lib/modules/2.6.32-403.el6.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2013-08-05-11:26:54/vmcore  [PARTIAL DUMP]
        CPUS: 19
        DATE: Mon Aug  5 11:28:34 2013
      UPTIME: 00:01:52
LOAD AVERAGE: 1.23, 0.40, 0.14
       TASKS: 273
    NODENAME: unused
     RELEASE: 2.6.32-403.el6.x86_64
     VERSION: #1 SMP Mon Jul 29 09:46:32 EDT 2013
     MACHINE: x86_64  (2666 Mhz)
      MEMORY: 4 GB
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
         PID: 2717
     COMMAND: "bash"
        TASK: ffff8801189b2040  [THREAD_INFO: ffff8800379ca000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 2717   TASK: ffff8801189b2040  CPU: 0   COMMAND: "bash"
 #0 [ffff8800379cb740] machine_kexec at ffffffff8103668b
 #1 [ffff8800379cb7a0] crash_kexec at ffffffff810c1cc2
 #2 [ffff8800379cb870] oops_end at ffffffff8151f700
 #3 [ffff8800379cb8a0] no_context at ffffffff81046dfb
 #4 [ffff8800379cb8f0] __bad_area_nosemaphore at ffffffff81047085
 #5 [ffff8800379cb940] bad_area at ffffffff810471ae
 #6 [ffff8800379cb970] __do_page_fault at ffffffff8104795f
 #7 [ffff8800379cba90] do_page_fault at ffffffff8152164e
 #8 [ffff8800379cbac0] page_fault at ffffffff8151ea05
    [exception RIP: acpi_ps_peek_opcode+13]
    RIP: ffffffff8130227f  RSP: ffff8800379cbb78  RFLAGS: 00010093
    RAX: 00000000ff9eda0d  RBX: ffff880037bf5800  RCX: ffffc90000612604
    RDX: 0000000000000000  RSI: ffff8801194824b0  RDI: ffff880037bf5830
    RBP: ffff8800379cbb78   R8: 0000000000000001   R9: 00000000fffffffc
    R10: 0000000000000002  R11: 0000000000000002  R12: 0000000000000000
    R13: 0000000000000000  R14: ffff8801194824f8  R15: ffff880037bf5830
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8800379cbb80] acpi_ps_parse_loop at ffffffff81302be3
#10 [ffff8800379cbc00] acpi_ps_parse_aml at ffffffff81302330
#11 [ffff8800379cbc40] acpi_ps_execute_method at ffffffff81303aa8
#12 [ffff8800379cbc80] acpi_ns_evaluate at ffffffff812ff24e
#13 [ffff8800379cbcb0] acpi_rs_set_srs_method_data at ffffffff81304dc5
#14 [ffff8800379cbd10] acpi_set_current_resources at ffffffff813048c7
#15 [ffff8800379cbd40] acpi_pci_link_set at ffffffff812ec8b4
#16 [ffff8800379cbd80] irqrouter_resume at ffffffff812ec9a9
#17 [ffff8800379cbda0] __sysdev_resume at ffffffff813632e5
#18 [ffff8800379cbdd0] sysdev_resume at ffffffff81363421
#19 [ffff8800379cbe00] hibernation_snapshot at ffffffff810bbdfe
#20 [ffff8800379cbe20] hibernate at ffffffff810bbf1c
#21 [ffff8800379cbe40] state_store at ffffffff810ba61c
#22 [ffff8800379cbe90] kobj_attr_store at ffffffff8127df27
#23 [ffff8800379cbea0] sysfs_write_file at ffffffff811fe225
#24 [ffff8800379cbef0] vfs_write at ffffffff811838b8
#25 [ffff8800379cbf30] sys_write at ffffffff811841b1
#26 [ffff8800379cbf80] system_call_fastpath at ffffffff8100b072
    RIP: 0000003c7bcdb660  RSP: 00007fff6a1c2f28  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: ffffffff8100b072  RCX: 0000000000000000
    RDX: 0000000000000005  RSI: 00007fc089a7f000  RDI: 0000000000000001
    RBP: 00007fc089a7f000   R8: 000000000000000a   R9: 00007fc089a77700
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000005
    R13: 0000003c7bf8e780  R14: 0000000000000005  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
Comment 2 Qunfang Zhang 2013-08-05 01:27:51 EDT
Hi, chayang
From the cli when resuming the guest, it shows: "-smp 20,sockets=2,cores=2,threads=2,maxcpus=160". And socket * cores * threads does not equals to 20. Could you also adjust other options for "-smp" and re-try? Thanks.
Comment 3 Chao Yang 2013-08-12 22:32:03 EDT
(In reply to Qunfang Zhang from comment #2)
> Hi, chayang
> From the cli when resuming the guest, it shows: "-smp
> 20,sockets=2,cores=2,threads=2,maxcpus=160". And socket * cores * threads
> does not equals to 20. Could you also adjust other options for "-smp" and
> re-try? Thanks.

Thanks for noticing this.
I checked how virt-manager starts qemu-kvm instance, then retried again.

Steps:
1. boot a rhel guest with:
/usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 8,sockets=20,cores=4,threads=2,maxcpus=160

2. hot plug up to 20 CPUs to guest

3. suspend to disk

4. launch same cli but modify -smp to reflect the fact.
/usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 20,sockets=20,cores=4,threads=2,maxcpus=160

Actual Result:
Guest resumed correctly. But it took much more time to resume from S4. There is an existing similar bug: Bug 996038
Comment 4 Qunfang Zhang 2013-08-12 22:45:32 EDT
Hi, chayang
So this bug itself does not exist any more, right?
Comment 5 Chao Yang 2013-08-12 22:55:39 EDT
(In reply to Qunfang Zhang from comment #4)
> Hi, chayang
> So this bug itself does not exist any more, right?

Actually, I believe this is not a bug.

Note You need to log in before you can comment on or make changes to this bug.