992282 – PANIC: "Oops: 0000 [#1] SMP " when resuming from S4 after hot plugging vcpu into guest

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 992282 - PANIC: "Oops: 0000 [#1] SMP " when resuming from S4 after hot plugging vcpu into guest

Summary: PANIC: "Oops: 0000 [#1] SMP " when resuming from S4 after hot plugging vcpu i...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Virtualization Maintenance
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-05 03:43 UTC by Chao Yang
Modified:	2013-08-13 02:56 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-08-13 02:56:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Chao Yang 2013-08-05 03:43:36 UTC

Description of problem:
Launched a rhel6.5 guest, then hot plugged vcpus into guest. Guest kernel panic happened when trying to resume.

<6>kvm-clock: cpu 0, msr 0:2c0167c1, primary cpu clock, resume
<6>PM: Restoring platform NVS memory
<4>ACPI Error: Found unknown opcode D0 at AML address ffffc900006123b6 offset 3, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode 7 at AML address ffffc900006123b7 offset 4, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode D0 at AML address ffffc900006123b8 offset 5, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode 7 at AML address ffffc900006123b9 offset 6, ignoring (20090903/psloop-140)
<4>ACPI Error (psargs-0359): [A^Aÿÿ] Namespace lookup failure, AE_NOT_FOUND
<4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LNKA._SRS] (Node ffff88011e6da2b8), AE_NOT_FOUND
<4>ACPI Exception: AE_NOT_FOUND, Evaluating _SRS (20090903/pci_link-367)
<4>ACPI Error (psargs-0359): [CPU ] Namespace lookup failure, AE_NOT_FOUND
<4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LNKB._SRS] (Node ffff88011e6dafd8), AE_NOT_FOUND
<4>ACPI Exception: AE_NOT_FOUND, Evaluating _SRS (20090903/pci_link-367)
<4>ACPI Error: Found unknown opcode 4 at AML address ffffc90000612537 offset 4, ignoring (20090903/psloop-140)
<4>ACPI Error: Found unknown opcode 20 at AML address ffffc90000612538 offset 5, ignoring (20090903/psloop-140)
<4>ACPI Error: Needed type [Reference], found [Integer] ffff88011d8a31e0 (20090903/exresop-104)
<4>ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands for [OpcodeName unavailable] (20090903/dswexec-445)
<4>ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LNKC._SRS] (Node ffff88011e6dae98), AE_AML_OPERAND_TYPE
<4>ACPI Exception: AE_AML_OPERAND_TYPE, Evaluating _SRS (20090903/pci_link-367)
<4>ACPI Error: Found unknown opcode C2 at AML address ffffc900006125f3 offset 0, ignoring (20090903/psloop-140)
<4>ACPI Exception: AE_CTRL_PENDING, While creating Arg 1 (20090903/dsutils-763)
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffff8130227f>] acpi_ps_peek_opcode+0xd/0x1f
<4>PGD 37991067 PUD 37990067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/power/state
<4>CPU 0
<4>Modules linked in: fuse autofs4 sunrpc 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput ppdev parport_pc parport sg microcode virtio_balloon snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
<4>
<4>Pid: 2717, comm: bash Not tainted 2.6.32-403.el6.x86_64 #1 Red Hat KVM
<4>RIP: 0010:[<ffffffff8130227f>]  [<ffffffff8130227f>] acpi_ps_peek_opcode+0xd/0x1f
<4>RSP: 0018:ffff8800379cbb78  EFLAGS: 00010093
<4>RAX: 00000000ff9eda0d RBX: ffff880037bf5800 RCX: ffffc90000612604
<4>RDX: 0000000000000000 RSI: ffff8801194824b0 RDI: ffff880037bf5830
<4>RBP: ffff8800379cbb78 R08: 0000000000000001 R09: 00000000fffffffc
<4>R10: 0000000000000002 R11: 0000000000000002 R12: 0000000000000000
<4>R13: 0000000000000000 R14: ffff8801194824f8 R15: ffff880037bf5830
<4>FS:  00007fc089a77700(0000) GS:ffff88002c000000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 000000003798e000 CR4: 00000000000007f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process bash (pid: 2717, threadinfo ffff8800379ca000, task ffff8801189b2040)
<4>Stack:
<4> ffff8800379cbbf8 ffffffff81302be3 ffff880037bf5800 ffffffff8116a862
<4><d> ffff8800379cbbb8 ffff880037bf5800 ffff8801149c7438 0000000000000000
<4><d> ffff8801194824f8 0000000000000000 ffff8800379cbbd8 ffff880037bf5800
<4>Call Trace:
<4> [<ffffffff81302be3>] acpi_ps_parse_loop+0x189/0x946
<4> [<ffffffff8116a862>] ? kmem_cache_alloc+0x182/0x190
<4> [<ffffffff81302330>] acpi_ps_parse_aml+0x9f/0x2de
<4> [<ffffffff81303aa8>] acpi_ps_execute_method+0x1e9/0x2b9
<4> [<ffffffff812ff24e>] acpi_ns_evaluate+0xe6/0x1ad
<4> [<ffffffff81304dc5>] acpi_rs_set_srs_method_data+0xe8/0x110
<4> [<ffffffff813076b8>] ? acpi_exception+0x75/0x80
<4> [<ffffffff813048c7>] acpi_set_current_resources+0x3c/0x4a
<4> [<ffffffff812ec8b4>] acpi_pci_link_set+0x133/0x1f0
<4> [<ffffffff812ec9a9>] irqrouter_resume+0x38/0x4b
<4> [<ffffffff813632e5>] __sysdev_resume+0x25/0xe0
<4> [<ffffffff81363421>] sysdev_resume+0x81/0x160
<4> [<ffffffff810bbdfe>] hibernation_snapshot+0x20e/0x250
<4> [<ffffffff810bbf1c>] hibernate+0xdc/0x230
<4> [<ffffffff810ba61c>] state_store+0xec/0x100
<4> [<ffffffff811624fa>] ? alloc_pages_current+0xaa/0x110
<4> [<ffffffff8127df27>] kobj_attr_store+0x17/0x20
<4> [<ffffffff811fe225>] sysfs_write_file+0xe5/0x170
<4> [<ffffffff811838b8>] vfs_write+0xb8/0x1a0
<4> [<ffffffff811841b1>] sys_write+0x51/0x90
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 4d 89 fc e9 a3 fe ff ff 55 48 89 e5 0f 1f 44 00 00 c9 81 ff 00 01 00 00 19 c0 83 c0 02 c3 55 48 89 e5 0f 1f 44 00 00 48 8b 57 08 <0f> b6 02 66 83 f8 5b 75 07 0f b6 42 01 80 cc 5b c9 c3 55 48 89
<1>RIP  [<ffffffff8130227f>] acpi_ps_peek_opcode+0xd/0x1f
<4> RSP <ffff8800379cbb78>
<4>CR2: 0000000000000000


Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.382.el6.x86_64
2.6.32-403.el6.x86_64(both host and guest)

How reproducible:
100%

Steps to Reproduce:
1. launch a 6.5 guest
CLI:
/usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 8,sockets=2,cores=2,threads=2,maxcpus=160 -rtc base=utc,clock=host,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/test-x86_64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:ab,bus=pci.0,addr=0x3,bootindex=3 -spice port=5900,disable-ticketing,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=33554432 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -balloon virtio -monitor stdio -serial unix:/tmp/serial,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. hot plug vcpu into guest
3. suspend guest to disk
4. adjust the number of vcpu in cli to reflect the correct NO. then resume guest 
CLI:
 /usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 20,sockets=2,cores=2,threads=2,maxcpus=160 -rtc base=utc,clock=host,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/home/test-x86_64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:42:48:ab,bus=pci.0,addr=0x3,bootindex=3 -spice port=5900,disable-ticketing,seamless-migration=on -k en-us -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=33554432 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -balloon virtio -monitor stdio -serial unix:/tmp/serial,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0


Actual results:
Guest kernel panic'd.

Expected results:


Additional info:
*NOTE*: I added 'acpi_sleep=s4_nohwsig' to guest kernel line to work around another bug: 
Bug 967107 - Brickland: panic on resume from hibernation

 
Detailed vmcore:
      KERNEL: /usr/lib/debug/lib/modules/2.6.32-403.el6.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2013-08-05-11:26:54/vmcore  [PARTIAL DUMP]
        CPUS: 19
        DATE: Mon Aug  5 11:28:34 2013
      UPTIME: 00:01:52
LOAD AVERAGE: 1.23, 0.40, 0.14
       TASKS: 273
    NODENAME: unused
     RELEASE: 2.6.32-403.el6.x86_64
     VERSION: #1 SMP Mon Jul 29 09:46:32 EDT 2013
     MACHINE: x86_64  (2666 Mhz)
      MEMORY: 4 GB
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
         PID: 2717
     COMMAND: "bash"
        TASK: ffff8801189b2040  [THREAD_INFO: ffff8800379ca000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 2717   TASK: ffff8801189b2040  CPU: 0   COMMAND: "bash"
 #0 [ffff8800379cb740] machine_kexec at ffffffff8103668b
 #1 [ffff8800379cb7a0] crash_kexec at ffffffff810c1cc2
 #2 [ffff8800379cb870] oops_end at ffffffff8151f700
 #3 [ffff8800379cb8a0] no_context at ffffffff81046dfb
 #4 [ffff8800379cb8f0] __bad_area_nosemaphore at ffffffff81047085
 #5 [ffff8800379cb940] bad_area at ffffffff810471ae
 #6 [ffff8800379cb970] __do_page_fault at ffffffff8104795f
 #7 [ffff8800379cba90] do_page_fault at ffffffff8152164e
 #8 [ffff8800379cbac0] page_fault at ffffffff8151ea05
    [exception RIP: acpi_ps_peek_opcode+13]
    RIP: ffffffff8130227f  RSP: ffff8800379cbb78  RFLAGS: 00010093
    RAX: 00000000ff9eda0d  RBX: ffff880037bf5800  RCX: ffffc90000612604
    RDX: 0000000000000000  RSI: ffff8801194824b0  RDI: ffff880037bf5830
    RBP: ffff8800379cbb78   R8: 0000000000000001   R9: 00000000fffffffc
    R10: 0000000000000002  R11: 0000000000000002  R12: 0000000000000000
    R13: 0000000000000000  R14: ffff8801194824f8  R15: ffff880037bf5830
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8800379cbb80] acpi_ps_parse_loop at ffffffff81302be3
#10 [ffff8800379cbc00] acpi_ps_parse_aml at ffffffff81302330
#11 [ffff8800379cbc40] acpi_ps_execute_method at ffffffff81303aa8
#12 [ffff8800379cbc80] acpi_ns_evaluate at ffffffff812ff24e
#13 [ffff8800379cbcb0] acpi_rs_set_srs_method_data at ffffffff81304dc5
#14 [ffff8800379cbd10] acpi_set_current_resources at ffffffff813048c7
#15 [ffff8800379cbd40] acpi_pci_link_set at ffffffff812ec8b4
#16 [ffff8800379cbd80] irqrouter_resume at ffffffff812ec9a9
#17 [ffff8800379cbda0] __sysdev_resume at ffffffff813632e5
#18 [ffff8800379cbdd0] sysdev_resume at ffffffff81363421
#19 [ffff8800379cbe00] hibernation_snapshot at ffffffff810bbdfe
#20 [ffff8800379cbe20] hibernate at ffffffff810bbf1c
#21 [ffff8800379cbe40] state_store at ffffffff810ba61c
#22 [ffff8800379cbe90] kobj_attr_store at ffffffff8127df27
#23 [ffff8800379cbea0] sysfs_write_file at ffffffff811fe225
#24 [ffff8800379cbef0] vfs_write at ffffffff811838b8
#25 [ffff8800379cbf30] sys_write at ffffffff811841b1
#26 [ffff8800379cbf80] system_call_fastpath at ffffffff8100b072
    RIP: 0000003c7bcdb660  RSP: 00007fff6a1c2f28  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: ffffffff8100b072  RCX: 0000000000000000
    RDX: 0000000000000005  RSI: 00007fc089a7f000  RDI: 0000000000000001
    RBP: 00007fc089a7f000   R8: 000000000000000a   R9: 00007fc089a77700
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000005
    R13: 0000003c7bf8e780  R14: 0000000000000005  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

Comment 2 Qunfang Zhang 2013-08-05 05:27:51 UTC

Hi, chayang
From the cli when resuming the guest, it shows: "-smp 20,sockets=2,cores=2,threads=2,maxcpus=160". And socket * cores * threads does not equals to 20. Could you also adjust other options for "-smp" and re-try? Thanks.

Comment 3 Chao Yang 2013-08-13 02:32:03 UTC

(In reply to Qunfang Zhang from comment #2)
> Hi, chayang
> From the cli when resuming the guest, it shows: "-smp
> 20,sockets=2,cores=2,threads=2,maxcpus=160". And socket * cores * threads
> does not equals to 20. Could you also adjust other options for "-smp" and
> re-try? Thanks.

Thanks for noticing this.
I checked how virt-manager starts qemu-kvm instance, then retried again.

Steps:
1. boot a rhel guest with:
/usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 8,sockets=20,cores=4,threads=2,maxcpus=160

2. hot plug up to 20 CPUs to guest

3. suspend to disk

4. launch same cli but modify -smp to reflect the fact.
/usr/libexec/qemu-kvm -name test -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -smp 20,sockets=20,cores=4,threads=2,maxcpus=160

Actual Result:
Guest resumed correctly. But it took much more time to resume from S4. There is an existing similar bug: Bug 996038

Comment 4 Qunfang Zhang 2013-08-13 02:45:32 UTC

Hi, chayang
So this bug itself does not exist any more, right?

Comment 5 Chao Yang 2013-08-13 02:55:39 UTC

(In reply to Qunfang Zhang from comment #4)
> Hi, chayang
> So this bug itself does not exist any more, right?

Actually, I believe this is not a bug.

Note You need to log in before you can comment on or make changes to this bug.