Bug 1019584

Summary: apic_timer_interrupt kernel panic in kvm
Product: [Fedora] Fedora Reporter: Attila Fazekas <afazekas>
Component: kernelAssignee: Vivek Goyal <vgoyal>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: fedora-kernel-block, fedora-kernel-scsi, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, marcelo.barbosa, mtosatti
Target Milestone: ---Flags: jforbes: needinfo?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-10 14:40:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
serial_console.log
none
libvirt.xml none

Description Attila Fazekas 2013-10-16 06:19:27 UTC
Created attachment 812768 [details]
serial_console.log

Description of problem:

Kernel panic on IRQ handling.

The instance ran an openstack test suite (tempest), the instance did various operations,
including iscsi operations and running soft qemu processes.

I do not have a reproducer.

The full serial console log attached.

Version-Release number of selected component (if applicable):
Not tainted 3.11.3-201.fc19.x86_64.debug

Additional info:
Host system:
kernel: 3.10.14-100.fc18.x86_64 
qemu-kvm-1.2.2-14.fc18.x86_64

Comment 1 Attila Fazekas 2013-10-16 06:20:19 UTC
Created attachment 812770 [details]
libvirt.xml

Comment 2 Attila Fazekas 2013-10-16 10:19:59 UTC
virsh dump created.

It has the full memory content of the virtual machine.

The 3.8GiB original file is compressed to 557_864_352 byte by xz.
https://docs.google.com/file/d/0B7DSkY_fWI88RzNjR1JET2pRbnc

Note:
The xfs_buf_iodone_work in the serial console probably not a related issue. I tried to mount a not formatted loopback file, and it obviously did not worked.

Comment 3 Josh Boyer 2013-10-16 12:32:16 UTC
[ 2252.329039] general protection fault: 0000 [#1] SMP 
[ 2252.330008] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ebt_arp ebt_ip xt_nat xfs libcrc32c iptable_mangle kvm nbd iptable_nat nf_nat_ipv4 tun ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE nf_nat xt_CHECKSUM bridge stp llc joydev microcode nf_conntrack_ipv4 nf_defrag_ipv4 virtio_balloon xt_conntrack serio_raw virtio_net nf_conntrack cirrus ttm drm_kms_helper drm mperf i2c_piix4 i2c_core nfsd auth_rpcgss nfs_acl lockd binfmt_misc sunrpc crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_blk ata_generic pata_acpi [last unloaded: iptable_mangle]
[ 2252.330008] CPU: 0 PID: 20896 Comm: nova-conductor Not tainted 3.11.3-201.fc19.x86_64.debug #1
[ 2252.340295] Hardware name: Fedora Project OpenStack Nova, BIOS Bochs 01/01/2011
[ 2252.340295] task: ffff88005266a4b0 ti: ffff88003ad72000 task.ti: ffff88003ad72000
[ 2252.340295] RIP: 0010:[<ffffffff810e9754>]  [<ffffffff810e9754>] __lock_acquire+0x54/0x1b20
[ 2252.340295] RSP: 0000:ffff88011b203d20  EFLAGS: 00010046
[ 2252.340295] RAX: 0000000000000046 RBX: 0000000000000002 RCX: 0000000000000000
[ 2252.340295] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b83
[ 2252.340295] RBP: ffff88011b203dd0 R08: 0000000000000002 R09: 0000000000000001
[ 2252.340295] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88005266a4b0
[ 2252.340295] R13: 0000000000000000 R14: 6b6b6b6b6b6b6b83 R15: 0000000000000000
[ 2252.340295] FS:  00007fbf1d0b3740(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
[ 2252.340295] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2252.340295] CR2: 00007ffdad4d2000 CR3: 000000003cb63000 CR4: 00000000000407f0
[ 2252.359358] Stack:
[ 2252.359358]  ffff88005266a4b0 0000000000000269 ffffffff81c47d40 0000000000000000
[ 2252.359358]  ffff88011b203df8 ffffffff810e99f5 ffffffff810b77ff ffff88005266a4b0
[ 2252.359358]  000000035266abb0 ffff88005266a4b0 ffffffff81731876 ffff88011b203d88
[ 2252.359358] Call Trace:
[ 2252.359358]  <IRQ> 
[ 2252.359358]  [<ffffffff810e99f5>] ? __lock_acquire+0x2f5/0x1b20
[ 2252.359358]  [<ffffffff810b77ff>] ? local_clock+0x5f/0x70
[ 2252.359358]  [<ffffffff81731876>] ? _raw_spin_unlock_irqrestore+0x36/0x70
[ 2252.359358]  [<ffffffff810565bf>] ? kvm_clock_read+0x2f/0x50
[ 2252.359358]  [<ffffffff81021859>] ? sched_clock+0x9/0x10
[ 2252.359358]  [<ffffffff810b752d>] ? sched_clock_local+0x1d/0x80
[ 2252.359358]  [<ffffffff810eba12>] lock_acquire+0xa2/0x1f0
[ 2252.359358]  [<ffffffff8135d549>] ? __blkg_release_rcu+0x79/0x280
[ 2252.359358]  [<ffffffff81731772>] _raw_spin_lock_irq+0x52/0x90
[ 2252.359358]  [<ffffffff8135d549>] ? __blkg_release_rcu+0x79/0x280
[ 2252.359358]  [<ffffffff8135d549>] __blkg_release_rcu+0x79/0x280
[ 2252.359358]  [<ffffffff8135d5c0>] ? __blkg_release_rcu+0xf0/0x280
[ 2252.359358]  [<ffffffff81132e52>] rcu_process_callbacks+0x202/0x7d0
[ 2252.359358]  [<ffffffff8107b3c7>] __do_softirq+0x107/0x410
[ 2252.359358]  [<ffffffff8107b8a5>] irq_exit+0xc5/0xd0
[ 2252.359358]  [<ffffffff8173dd85>] smp_apic_timer_interrupt+0x45/0x60
[ 2252.359358]  [<ffffffff8173c6f2>] apic_timer_interrupt+0x72/0x80
[ 2252.359358]  <EOI> 
[ 2252.359358]  [<ffffffff81732598>] ? retint_swapgs+0x13/0x1b
[ 2252.359358] Code: 85 c0 8b 05 ef b6 bb 00 41 0f 45 d8 85 c0 0f 84 0b 01 00 00 8b 05 55 12 ff 00 49 89 fe 41 89 f7 41 89 d3 85 c0 0f 84 0c 01 00 00 <49> 8b 06 ba 01 00 00 00 48 3d 60 bf 13 82 0f 44 da 41 83 ff 01 
[ 2252.410538] RIP  [<ffffffff810e9754>] __lock_acquire+0x54/0x1b20
[ 2252.410538]  RSP <ffff88011b203d20>
[ 2252.410538] ---[ end trace 125adbb6be183141 ]---

Comment 4 Attila Fazekas 2013-11-25 08:04:37 UTC
The host CPU is an Ivy Bridge, "Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz".
(family 6, model 58, stepping 9, microcode 0x19)

The qemu does not supports PEBS emulation, so in the guests kernel log you can see:

'perf_event_intel: PEBS disabled due to CPU errata, please upgrade microcode'

On the host system the microcode is up to date.

Last time I just used the 'debug' kernel by accident, I haven't seen the issue with a not debug kernel yet.

Comment 5 Marcelo Tosatti 2013-12-19 19:41:52 UTC


        /*
         * Lockdep should run with IRQs disabled, otherwise we could
         * get an interrupt which would want to take locks, which would
         * end up in lockdep and have you got a head-ache already?
         */
        if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
                return 0;

        if (lock->key == &__lockdep_no_validate__)
                check = 1;

   0xffffffff810e9754 <+84>:    mov    (%r14),%rax      <--- OOPS
   0xffffffff810e9757 <+87>:    mov    $0x1,%edx
   0xffffffff810e975c <+92>:    cmp    $0xffffffff8213bf60,%rax

ffffffff8213bf60 B __lockdep_no_validate__

R14: 6b6b6b6b6b6b6b83 

Kernel packages at 

http://kojipkgs.fedoraproject.org//packages/kernel/3.11.3/201.fc19/x86_64/kernel-debug-debuginfo-3.11.3-201.fc19.x86_64.rpm

Comment 6 Justin M. Forbes 2014-01-03 22:08:32 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.12.6-200.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 7 Justin M. Forbes 2014-03-10 14:40:05 UTC
*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested.