Bug 981437

Summary: kernel panic when using VMs [via kvm] kernel BUG at kernel/timer.c:729!
Product: [Fedora] Fedora Reporter: Satish Balay <balay>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: amwang, error, gansalmon, hancockrwd, itamar, jonathan, kernel-maint, madhu.chinakonda, mrsam, peter.devalck, rc556677
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-05 12:45:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel panic stack trace none

Description Satish Balay 2013-07-04 16:45:16 UTC
Created attachment 768916 [details]
kernel panic stack trace

Description of problem:

Kernel panics when shutting down win7 VM

Version-Release number of selected component (if applicable):

kernel-3.9.9-301.fc19.x86_64
qemu-kvm-1.4.2-4.fc19.x86_64

How reproducible:

Always [so far]

Steps to Reproduce:
1. F19 upgraded from F18 arround beta time [which is upgraded from F17]
2. have all updates from testing
3. run a win7 VM via virt-manager
4. run chkdsk /f c: and reboot [which does a reboot and then runs chkdsk]
5. after chkdsk is done - attempt to shutdown the win7 VM

Actual results:

get a kernel panic

Expected results:

smooth sailing?

Additional info:

kernel-3.9.6-301.fc19.x86_64  does not give this error.

But I've seen this issue with kernel-3.9.8-300.fc19.x86_64 and  kernel-3.10.0-2.fc20.x86_64

I'm having other issues with using VMs on f19 - for eg: bug 956306 - and was trying out 3.10.0-2.fc20.x86_64 with an external monitor - and a couple of VMs [Win7, rawhide] and the machine would crash with a blank screen.

My subsequent attempt to check if VMs filesystem was corrupted [with chsdsk] would also crash. After a bit of retrys and elimination - I have this reproducible error [when not using an external monitor]. And a prior kernel which doesn't show this issue

Attaching a picture of the stack trace

Comment 1 pdvalck 2013-07-04 17:19:19 UTC
I experience the same bug on 3.9.8-300.fc19.x86_64.

Kernel panic will happen when shutting down Win7 guest but has not happend so far with a WinXP guest. The Win7 VM has 2 CPUs instead of 1 and uses the virtio disk and network drivers.

Comment 2 Satish Balay 2013-07-04 17:39:07 UTC
I have 4CPUs on the win7 VM and also use virtio disk and network drivers

Comment 3 Sam Varshavchik 2013-07-04 18:24:28 UTC
Win7 QEMU VM, one CPU, panics the kernel when the VM shuts down.

I captured a dump of this panic.

      KERNEL: /usr/lib/debug/lib/modules/3.9.8-300.fc19.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2013.07.04-14:01:30/vmcore  [PARTIAL DUMP]
        CPUS: 8
        DATE: Thu Jul  4 14:01:25 2013
      UPTIME: 00:02:25
LOAD AVERAGE: 1.15, 0.61, 0.24
       TASKS: 449
    NODENAME: thinkpad
     RELEASE: 3.9.8-300.fc19.x86_64
     VERSION: #1 SMP Thu Jun 27 19:24:23 UTC 2013
     MACHINE: x86_64  (2392 Mhz)
      MEMORY: 8 GB
       PANIC: "kernel BUG at kernel/timer.c:729!"
         PID: 1902
     COMMAND: "qemu-system-x86"
        TASK: ffff880209545dc0  [THREAD_INFO: ffff880214d24000]
         CPU: 3
       STATE: TASK_RUNNING (PANIC)
crash> bt -s
PID: 1902   TASK: ffff880209545dc0  CPU: 3   COMMAND: "qemu-system-x86"
 #0 [ffff880214d258f0] machine_kexec+386 at ffffffff8103dc52
 #1 [ffff880214d25940] crash_kexec+99 at ffffffff810c69b3
 #2 [ffff880214d25a08] oops_end+176 at ffffffff81647cf0
 #3 [ffff880214d25a30] die+75 at ffffffff810168bb
 #4 [ffff880214d25a60] do_trap+96 at ffffffff816475b0
 #5 [ffff880214d25ab0] do_invalid_op+149 at ffffffff81013f55
 #6 [ffff880214d25b50] invalid_op+30 at ffffffff8165011e
    [exception RIP: __mod_timer.part.39+4]
    RIP: ffffffff8163ca8b  RSP: ffff880214d25c08  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880206888880  RCX: ffffffff81ce2b70
    RDX: 0000000000000000  RSI: 00000000fffda5c3  RDI: ffff880206888880
    RBP: ffff880214d25c08   R8: 00000000d5bec7ac   R9: 00000000f69ac714
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 00000000fffda5c3  R14: ffff880228884818  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
 #8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
 #9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292
    RIP: 00007f90e262412d  RSP: 00007fff7007ef40  RFLAGS: 00000293
    RAX: 0000000000000000  RBX: 00007f90e6b90fb0  RCX: ffffffffffffffff
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000018
    RBP: 00007fff7007ef88   R8: 00007f90e6b90fb0   R9: 0000000000000000
    R10: 000000007fffffff  R11: 0000000000000293  R12: 0000000000000001
    R13: 00007f90e6b71560  R14: 0000000000000000  R

Comment 4 Robert Hancock 2013-07-05 04:11:48 UTC
I have experienced the same problem with the 3.9.8-300.fc19.x86_64 kernel, also when shutting down a Windows 7 VM. I captured this partial oops output:

Jul  4 21:59:33 newcastle kernel: [  282.261107] kernel BUG at kernel/timer.c:729!
Jul  4 21:59:33 newcastle kernel: [  282.262870] invalid opcode: 0000 [#1] SMP
Jul  4 21:59:33 newcastle kernel: [  282.264554] Modules linked in: vhost_net macvtap macvlan fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc iTCO_wdt iTCO_vendor_support acpi_cpufreq mperf coretemp kvm_intel kvm crc32c_intel microcode serio_raw i7core_edac i2c_i801 edac_core snd_hda_codec_via joydev gspca_spca561 gspca_main videodev media snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm lpc_ich mfd_core r8169 mii snd_page_alloc snd_timer snd soundcore asus_atk0110 uinput binfmt_misc ata_generic pata_acpi nouveau firewire_ohci firewire_core mxm_wmi crc_itu_t video i2c_algo_bit drm_kms_helper pata_jmicron ttm drm usb_storage i2c_core wmi
Jul  4 21:59:33 newcastle kernel: [  282.272634] CPU 2
Jul  4 21:59:33 newcastle kernel: [  282.272651] Pid: 2041, comm: qemu-system-x86 Not tainted 3.9.8-300.fc19.x86_64 #1 System manufacturer System Product Name/P7P55D PRO
Jul  4 21:59:33 newcastle kernel: [  282.276218] RIP: 0010:[<ffffffff8163ca8b>] [<ffffffff8163ca8b>] __mod_timer.part.39+0x4/0x6
Jul  4 21:59:33 newcastle kernel: [  282.278495] RSP: 0018:ffff8804254d9c08  EFLAGS: 00010246
Jul  4 21:59:33 newcastle kernel: [  282.280650] RAX: 0000000000000000 RBX: ffff8803fa9e4940 RCX: ffffffff81ce2b70
Jul  4 21:59:33 newcastle kernel: [  282.282826] RDX: 0000000000000000 RSI: 00000000ffffa75d RDI: ffff8803fa9e4940

Comment 5 Richard Chan 2013-07-05 05:02:53 UTC
*** Bug 981495 has been marked as a duplicate of this bug. ***

Comment 6 Cong Wang 2013-07-05 08:36:59 UTC
See bug 980254.

There is a fix here:
https://bugzilla.redhat.com/show_bug.cgi?id=880035#c53

Could anyone try it? I can't reproduce it locally...

Comment 7 Josh Boyer 2013-07-05 12:45:03 UTC

*** This bug has been marked as a duplicate of bug 980254 ***

Comment 8 Satish Balay 2013-07-05 15:42:05 UTC
kernel-3.9.8-300.7.fc19.x86_64 does fix this problem for me. Thanks!