Bug 902012 - [abrt]: BUG: soft lockup - CPU#0 stuck for 23s! [sh:1268]
Summary: [abrt]: BUG: soft lockup - CPU#0 stuck for 23s! [sh:1268]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 18
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:3975092385cd4b136670d864942...
: 923840 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-20 13:07 UTC by Gary Titchmarsh
Modified: 2013-10-03 01:11 UTC (History)
31 users (show)

Fixed In Version: kernel-3.10.13-101.fc18
Clone Of:
: 1030615 (view as bug list)
Environment:
Last Closed: 2013-10-01 01:56:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Gary Titchmarsh 2013-01-20 13:07:23 UTC
Description of problem:
Using openvswitch 1.7.1 & Ovirt 3.2 nightly:

create bridge ovirtmgmt
add port p2p1 to bridge ovirtmgmt

restart system

soft lockup

Additional info:
BUG: soft lockup - CPU#0 stuck for 23s! [sh:1268]
Modules linked in: dm_service_time ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables rfcomm bnep xfs vfat fat vhost_net tun macvtap macvlan snd_hda_codec_hdmi snd_hda_codec_realtek arc4 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo ath9k snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device videobuf2_core ath9k_common ath9k_hw snd_pcm videodev btusb iTCO_wdt coretemp ath dell_wmi media dell_laptop sparse_keymap kvm_intel iTCO_vendor_support dcdbas bluetooth mac80211 kvm snd_page_alloc snd_timer cfg80211 snd mei lpc_ich rfkill soundcore microcode i2c_i801 mfd_core uinput nouveau mxm_wmi i915 i2c_algo_bit ttm drm_kms_helper crc32c_intel sdhci_pci ghash_clmulni_intel drm sdhci r8169 mmc_core mii i2c_core wmi video dm_multipath sunrpc
CPU 0 
Pid: 1268, comm: sh Not tainted 3.7.2-204.fc18.x86_64 #1 Dell Inc. XPS L521X/029M77
RIP: 0010:[<ffffffff8106e397>]  [<ffffffff8106e397>] lock_timer_base.isra.37+0x27/0x70
RSP: 0018:ffff88024cfc5cd8  EFLAGS: 00000246
RAX: ffff88024cfc5fd8 RBX: 0000000000000000 RCX: 0000000000000080
RDX: 0000000000000000 RSI: ffff88024cfc5d10 RDI: ffff88024c5f94e8
RBP: ffff88024cfc5cf8 R08: ffff88025f216a40 R09: 00000000ffffffff
R10: 0000000000016a40 R11: ffffffff812dd0e2 R12: ffff88025f5edb08
R13: 0000000000000000 R14: ffffffff00000000 R15: 0000000000000000
FS:  00007f7e580b2740(0000) GS:ffff88025f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000038c8a74100 CR3: 000000024b86e000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 1268, threadinfo ffff88024cfc4000, task ffff88024cbd4560)
Stack:
 ffff88024c5f94d0 ffff88024c5f9400 ffff88024ce807c8 ffff88024c5f9520
 ffff88024cfc5d28 ffffffff8106eca0 ffff88024cfc5e18 ffffffff8113901d
 ffff88024c5f94d0 ffff88024c5f9400 ffff88024cfc5d48 ffffffff8106ed4a
Call Trace:
 [<ffffffff8106eca0>] try_to_del_timer_sync+0x20/0x70
 [<ffffffff8113901d>] ? __alloc_pages_nodemask+0x18d/0x990
 [<ffffffff8106ed4a>] del_timer_sync+0x5a/0x70
 [<ffffffff812c145e>] ? elevator_alloc.isra.7+0x8e/0x100
 [<ffffffff812e23b6>] cfq_exit_queue+0x36/0xf0
 [<ffffffff812c11b8>] elevator_exit+0x38/0x60
 [<ffffffff812c1a78>] elevator_change+0x148/0x230
 [<ffffffff812c265b>] elv_iosched_store+0x2b/0x60
 [<ffffffff812ca7f4>] queue_attr_store+0x64/0xc0
 [<ffffffff81207558>] sysfs_write_file+0xd8/0x150
 [<ffffffff811950fc>] vfs_write+0xac/0x180
 [<ffffffff81195442>] sys_write+0x52/0xa0
 [<ffffffff8163989e>] ? do_page_fault+0xe/0x10
 [<ffffffff8163ded9>] system_call_fastpath+0x16/0x1b
Code: 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 4c 89 6d f0 4c 89 75 f8 49 89 fd 48 89 5d e0 4c 89 65 e8 49 89 f6 49 8b 5d 00 <49> 89 dc 49 83 e4 fc 74 31 4c 89 e7 e8 f8 74 5c 00 49 89 06 49

Comment 1 cristi falcas 2013-02-04 23:26:04 UTC
boot in various kernels: 3.7.5-201.fc18.x86_64, 3.7.5-201.fc18.x86_64, and older ones also.

The booting process can stop during plymouth or after X starts


Package: kernel
OS Release: Fedora release 18 (Spherical Cow)

Comment 2 Julien Recurt 2013-02-18 16:40:00 UTC
seems to be fixed with kernel 3.7.8-202.fc18.x64_86

Comment 3 Michal Noga 2013-03-12 18:01:59 UTC
I put in the usb port (USB3.0) usb key.




Package: kernel
OS Release: Fedora release 18 (Spherical Cow)

Comment 4 Cole Robinson 2013-04-02 13:39:41 UTC
*** Bug 923840 has been marked as a duplicate of this bug. ***

Comment 5 Ohad Basan 2013-04-02 13:42:02 UTC
I experience it in kernel-3.8.3-203.fc18.x86_64

Comment 6 Bernard Fouché 2013-04-04 10:48:43 UTC
I hit a similar problem today, while copying 500GB of data with scp. The bug appeared on the VM receiving the data. Strangely (for me!) the process didn't crash and the copy still continues. (I'll check the files once the copy ends)

I run kernel 3.8.5-201.fc18.x86_64 on both host and vm. I run a single VM with 20 CPU, the host has 4.

Here is the trace:

[56008.198033] BUG: soft lockup - CPU#1 stuck for 29s! [scp:6597]
[56008.198033] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables virtio_net i2c_piix4 i2c_core virtio_balloon microcode crc32c_intel ghash_clmulni_intel virtio_blk
[56008.198033] CPU 1 
[56008.198033] Pid: 6597, comm: scp Not tainted 3.8.5-201.fc18.x86_64 #1 Bochs Bochs
[56008.198033] RIP: 0010:[<ffffffff813019e0>]  [<ffffffff813019e0>] copy_user_generic_string+0x30/0x40
[56008.198033] RSP: 0018:ffff8804ff641b58  EFLAGS: 00010246
[56008.198033] RAX: 00007f68c3c51680 RBX: ffff8804fe34b500 RCX: 0000000000000200
[56008.198033] RDX: 0000000000000000 RSI: 00007f68c3c51680 RDI: ffff880062600000
[56008.198033] RBP: ffff8804ff641b68 R08: ffff880062600000 R09: 00000000fffff000
[56008.198033] R10: ffffea0001898000 R11: 0000000000000000 R12: 000000001a29c000
[56008.198033] R13: ffff8804d93c6aa0 R14: 0000000000000004 R15: ffff8804ff641ad8
[56008.198033] FS:  00007f68c1a30800(0000) GS:ffff88050a640000(0000) knlGS:0000000000000000
[56008.198033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56008.198033] CR2: 00007f1d0d2e8000 CR3: 00000004fe287000 CR4: 00000000000407e0
[56008.198033] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[56008.198033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[56008.198033] Process scp (pid: 6597, threadinfo ffff8804ff640000, task ffff8804fdc1dd80)
[56008.198033] Stack:
[56008.198033]  ffffffff81131b9a 000000001a29c000 ffff8804ff641c38 ffffffff81131d98
[56008.198033]  ffff8804ff641be0 ffffffff811c627f ffff8804ff641e58 000000001a29c000
[56008.198033]  0000000000001000 ffff8804fdc1dd80 ffff8804ff641fd8 00000000811b6591
[56008.198033] Call Trace:
[56008.198033]  [<ffffffff81131b9a>] ? iov_iter_copy_from_user_atomic+0x6a/0xa0
[56008.198033]  [<ffffffff81131d98>] generic_file_buffered_write+0x148/0x280
[56008.198033]  [<ffffffff811c627f>] ? __mark_inode_dirty+0x3f/0x200
[56008.198033]  [<ffffffff81133da1>] __generic_file_aio_write+0x1d1/0x3d0
[56008.198033]  [<ffffffff8113401f>] generic_file_aio_write+0x7f/0x100
[56008.198033]  [<ffffffff8121c509>] ext4_file_write+0xa9/0x430
[56008.198033]  [<ffffffff8119d347>] do_sync_write+0xa7/0xe0
[56008.198033]  [<ffffffff8119d9fc>] vfs_write+0xac/0x180
[56008.198033]  [<ffffffff8119dd42>] sys_write+0x52/0xa0
[56008.198033]  [<ffffffff810df1ac>] ? __audit_syscall_exit+0x20c/0x2c0
[56008.198033]  [<ffffffff81658c19>] system_call_fastpath+0x16/0x1b
[56008.198033] Code: 74 30 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0 66 66 90 c3 0f 1f 00 66 66 90 21 d2 
[56496.033889] [sched_delayed] sched: RT throttling activated

Comment 7 Vivek Goyal 2013-05-01 20:31:40 UTC
Copying/pasting some discussion which is happening in a mail thread.

There's not much to go on, but my first guess would be someone who plays
in the timer code.  Not sure why a _try_ function would hang!

-Jeff

Comment 8 Vivek Goyal 2013-05-01 20:32:22 UTC
Yep. Looks like lock_timer_base() is hung for long time. May be spinning
on spin_lock_irqsave(&base->lock, *flags);

So tom, looks like it might not even be a CFQ issue but a timer related
thing. Somebody who knows timer code will be a better person to look
at it.

Thanks
Vivek

Comment 9 Vivek Goyal 2013-05-01 20:33:05 UTC
Nope.  But this isn't likely a timer_base issue IMO.

The code path to get here is (AFAICT)

cfq_exit_queue()

which attempts to shutdown the timer by calling

cfq_shutdown_timer_wq(cfqd)

which does

del_timer_sync(&cfqd->idle_slice_timer);

del_timer_sync() waits for the timer to become idle, that is that the timer is
no longer executing it's function and is an a "sleep" state so in theory it is
safe to kill.

cfqd->idle_slice_timer's function is cfq_idle_slice_timer() which appears to
call spin_lock_irqsave(cfqd->queue->queue_lock, flags).  That's the lock I would
look at.  If the process has acquired this lock, then the next function I would
look at is  cfq_clear_cfqq_deep() which is called later on in
cfq_idle_slice_timer().

IMO.

... I can't remember if RHEL7/upstream has /proc/sys/kernel/softlockup_panic or
not.  It would be useful to flip that on and get a vmcore.

Prarit

Comment 10 Vivek Goyal 2013-05-01 20:39:42 UTC
(In reply to comment #9)
> Nope.  But this isn't likely a timer_base issue IMO.
> 
> The code path to get here is (AFAICT)
> 
> cfq_exit_queue()
> 
> which attempts to shutdown the timer by calling
> 
> cfq_shutdown_timer_wq(cfqd)
> 
> which does
> 
> del_timer_sync(&cfqd->idle_slice_timer);
> 
> del_timer_sync() waits for the timer to become idle, that is that the timer
> is
> no longer executing it's function and is an a "sleep" state so in theory it
> is
> safe to kill.

True. But backtrace also shows, lock_timer_base() as top level function.

[<ffffffff8106e397>] lock_timer_base.isra.37+0x27/0x70

And wait for handler to finish happens outside that.

May be it is just by chance that we are waiting for idle_slice_timer handler
to finish in a loop, and calling lock_timer_base() in a loop and soft lockup
detector caught it at wrong time.

So I am not sure. But it opens two possibilities.

- Idle slice handler is taking too long to finish
- Something wrong with lock_timer_base()

> 
> cfqd->idle_slice_timer's function is cfq_idle_slice_timer() which appears to
> call spin_lock_irqsave(cfqd->queue->queue_lock, flags).  That's the lock I
> would
> look at.  If the process has acquired this lock, then the next function I
> would
> look at is  cfq_clear_cfqq_deep() which is called later on in
> cfq_idle_slice_timer().
> 

In cfq_exit_queue() path we deliberately don't take queue lock so that if
handler is running somewhere, it can take the queue lock and finish.

And by now we have flushed the queue. So there should not be any requests
which will be processed by CFQ. So it is strange why handler will take
such a long time to finish.


> IMO.
> 
> ... I can't remember if RHEL7/upstream has /proc/sys/kernel/softlockup_panic
> or
> not.  It would be useful to flip that on and get a vmcore.

I checked that softlockup_panic is present in F18. Reporter of the bz, can you
please set that can capture the vmcore.

Also are there easy steps to reproduce this problem. I have regular F18 installed in a kvm virtual machine. No ovirt, openswitch stuff. I am able
to change the scheduler.

Vivek

Comment 11 Ohad Basan 2013-05-19 11:57:49 UTC
Just upgraded kernel to version 3.9.2-200.fc18.x86_64
problem persists...

Comment 12 Dax Kelson 2013-05-23 18:06:58 UTC
(In reply to Ohad Basan from comment #11)
> Just upgraded kernel to version 3.9.2-200.fc18.x86_64
> problem persists...

Confirmed. Problem persists. Is there an older kernel where this bug doesn't occur?

Comment 13 Dax Kelson 2013-05-23 18:19:51 UTC
This bug triggers for me on Fedora 18 ovirt nodes when I tried to add an iSCSI domain.

Comment 14 Vivek Goyal 2013-05-23 19:02:17 UTC
(In reply to Dax Kelson from comment #13)
> This bug triggers for me on Fedora 18 ovirt nodes when I tried to add an
> iSCSI domain.

Any change you can capture the kernel core dump. You need to set softlockup_panic to make sure softlockup triggers panic. Of course also need to start kdump service.

Comment 15 Dax Kelson 2013-05-24 00:11:09 UTC
(In reply to Vivek Goyal from comment #14)
> (In reply to Dax Kelson from comment #13)
> > This bug triggers for me on Fedora 18 ovirt nodes when I tried to add an
> > iSCSI domain.
> 
> Any change you can capture the kernel core dump. You need to set
> softlockup_panic to make sure softlockup triggers panic. Of course also need
> to start kdump service.

      KERNEL: /usr/lib/debug/lib/modules/3.9.2-200.fc18.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2013.05.23-23:53:44/vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Thu May 23 17:53:43 2013
      UPTIME: 00:07:24
LOAD AVERAGE: 2.50, 0.70, 0.30
       TASKS: 198
    NODENAME: station6.example.com
     RELEASE: 3.9.2-200.fc18.x86_64
     VERSION: #1 SMP Mon May 13 13:59:47 UTC 2013
     MACHINE: x86_64  (3092 Mhz)
      MEMORY: 9.9 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 1730
     COMMAND: "sh"
        TASK: ffff88029fffc650  [THREAD_INFO: ffff88029fc1c000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 1730   TASK: ffff88029fffc650  CPU: 0   COMMAND: "sh"
 #0 [ffff8802afa03ce0] machine_kexec at ffffffff8103f79b
 #1 [ffff8802afa03d50] crash_kexec at ffffffff810c95c8
 #2 [ffff8802afa03e20] panic at ffffffff81656f57
 #3 [ffff8802afa03ea0] watchdog_timer_fn at ffffffff810eecd5
 #4 [ffff8802afa03ed0] __run_hrtimer at ffffffff81086263
 #5 [ffff8802afa03f20] hrtimer_interrupt at ffffffff81086b87
 #6 [ffff8802afa03f90] smp_apic_timer_interrupt at ffffffff8166b789
 #7 [ffff8802afa03fb0] apic_timer_interrupt at ffffffff8166a65d
--- <IRQ stack> ---
 #8 [ffff88029fc1dc28] apic_timer_interrupt at ffffffff8166a65d
    [exception RIP: lock_timer_base.isra.37+35]
    RIP: ffffffff8106ecf3  RSP: ffff88029fc1dcd8  RFLAGS: 00000246
    RAX: ffff88029fc1dfd8  RBX: 0000000000000000  RCX: 0000000000000080
    RDX: 0000000000000000  RSI: ffff88029fc1dd10  RDI: ffff88029f842ce8
    RBP: ffff88029fc1dcf8   R8: ffff8802afa16e00   R9: 00000000ffffffff
    R10: 0000000000016e00  R11: ffffffff812f6332  R12: 0000000000000000
    R13: ffff880200000041  R14: ffff8802afde3b08  R15: 0000000000000002
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
 #9 [ffff88029fc1dd00] try_to_del_timer_sync at ffffffff8106f600
#10 [ffff88029fc1dd30] del_timer_sync at ffffffff8106f6aa
#11 [ffff88029fc1dd50] cfq_exit_queue at ffffffff812fb6a6
#12 [ffff88029fc1dd90] elevator_exit at ffffffff812d6f18
#13 [ffff88029fc1ddb0] elevator_change at ffffffff812d7788
#14 [ffff88029fc1de20] elv_iosched_store at ffffffff812d838b
#15 [ffff88029fc1de50] queue_attr_store at ffffffff812e0d24
#16 [ffff88029fc1dea0] sysfs_write_file at ffffffff81214768
#17 [ffff88029fc1df00] vfs_write at ffffffff8119fd5c
#18 [ffff88029fc1df30] sys_write at ffffffff811a01f2
#19 [ffff88029fc1df80] system_call_fastpath at ffffffff81669a59
    RIP: 00007fba5b287740  RSP: 00007fffe8217e78  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: ffffffff81669a59  RCX: 0000000000000008
    RDX: 0000000000000009  RSI: 00007fba5bba5000  RDI: 0000000000000001
    RBP: 00007fba5bba5000   R8: 000000000000000a   R9: 00007fba5bb9b740
    R10: 00007fffe8217c10  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000009  R14: 00007fba5b554280  R15: 0000000000000009
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash>

Comment 16 Dax Kelson 2013-05-24 00:24:26 UTC
(In reply to Vivek Goyal from comment #14)
> (In reply to Dax Kelson from comment #13)
> > This bug triggers for me on Fedora 18 ovirt nodes when I tried to add an
> > iSCSI domain.
> 
> Any change you can capture the kernel core dump. You need to set
> softlockup_panic to make sure softlockup triggers panic. Of course also need
> to start kdump service.

http://go.gurulabs.com/vmcore

Comment 17 Vivek Goyal 2013-05-24 18:22:05 UTC
Thanks for the vmcore. Did "bt -a" and rest of the 3 cpus seem to be idling. So it does not look like the case of contention on spinlock.

crash> bt -a
PID: 1730   TASK: ffff88029fffc650  CPU: 0   COMMAND: "sh"
 #0 [ffff8802afa03ce0] machine_kexec at ffffffff8103f79b
 #1 [ffff8802afa03d50] crash_kexec at ffffffff810c95c8
 #2 [ffff8802afa03e20] panic at ffffffff81656f57
 #3 [ffff8802afa03ea0] watchdog_timer_fn at ffffffff810eecd5
 #4 [ffff8802afa03ed0] __run_hrtimer at ffffffff81086263
 #5 [ffff8802afa03f20] hrtimer_interrupt at ffffffff81086b87
 #6 [ffff8802afa03f90] smp_apic_timer_interrupt at ffffffff8166b789
 #7 [ffff8802afa03fb0] apic_timer_interrupt at ffffffff8166a65d
--- <IRQ stack> ---
 #8 [ffff88029fc1dc28] apic_timer_interrupt at ffffffff8166a65d
    [exception RIP: lock_timer_base.isra.37+35]
    RIP: ffffffff8106ecf3  RSP: ffff88029fc1dcd8  RFLAGS: 00000246
    RAX: ffff88029fc1dfd8  RBX: 0000000000000000  RCX: 0000000000000080
    RDX: 0000000000000000  RSI: ffff88029fc1dd10  RDI: ffff88029f842ce8
    RBP: ffff88029fc1dcf8   R8: ffff8802afa16e00   R9: 00000000ffffffff
    R10: 0000000000016e00  R11: ffffffff812f6332  R12: 0000000000000000
    R13: ffff880200000041  R14: ffff8802afde3b08  R15: 0000000000000002
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
 #9 [ffff88029fc1dd00] try_to_del_timer_sync at ffffffff8106f600
#10 [ffff88029fc1dd30] del_timer_sync at ffffffff8106f6aa
#11 [ffff88029fc1dd50] cfq_exit_queue at ffffffff812fb6a6
#12 [ffff88029fc1dd90] elevator_exit at ffffffff812d6f18
#13 [ffff88029fc1ddb0] elevator_change at ffffffff812d7788
#14 [ffff88029fc1de20] elv_iosched_store at ffffffff812d838b
#15 [ffff88029fc1de50] queue_attr_store at ffffffff812e0d24
#16 [ffff88029fc1dea0] sysfs_write_file at ffffffff81214768
#17 [ffff88029fc1df00] vfs_write at ffffffff8119fd5c
#18 [ffff88029fc1df30] sys_write at ffffffff811a01f2
#19 [ffff88029fc1df80] system_call_fastpath at ffffffff81669a59
    RIP: 00007fba5b287740  RSP: 00007fffe8217e78  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: ffffffff81669a59  RCX: 0000000000000008
    RDX: 0000000000000009  RSI: 00007fba5bba5000  RDI: 0000000000000001
    RBP: 00007fba5bba5000   R8: 000000000000000a   R9: 00007fba5bb9b740
    R10: 00007fffe8217c10  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000009  R14: 00007fba5b554280  R15: 0000000000000009
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

PID: 0      TASK: ffff8802a0cd1770  CPU: 1   COMMAND: "swapper/1"
 #0 [ffff8802afa87e60] crash_nmi_callback at ffffffff81037352
 #1 [ffff8802afa87e70] nmi_handle.isra.0 at ffffffff816626f9
 #2 [ffff8802afa87ec0] do_nmi at ffffffff81662810
 #3 [ffff8802afa87ef0] end_repeat_nmi at ffffffff81661d01
    [exception RIP: intel_idle+168]
    RIP: ffffffff81365aa8  RSP: ffff8802a0ccde08  RFLAGS: 00000046
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000046
    RDX: ffff8802a0ccde08  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff81365aa8   R8: ffffffff81365aa8   R9: 0000000000000018
    R10: ffff8802a0ccde08  R11: 0000000000000046  R12: ffffffffffffffff
    R13: 0000000000000001  R14: ffff8802a0ccdfd8  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8802a0ccde08] intel_idle at ffffffff81365aa8
 #5 [ffff8802a0ccde40] cpuidle_enter at ffffffff815076d9
 #6 [ffff8802a0ccde50] cpuidle_wrap_enter at ffffffff81508011
 #7 [ffff8802a0ccdeb0] cpuidle_enter_tk at ffffffff81508080
 #8 [ffff8802a0ccdec0] cpuidle_idle_call at ffffffff81507d45
 #9 [ffff8802a0ccdf00] cpu_idle at ffffffff8101d12f

PID: 0      TASK: ffff8802a0cd2ee0  CPU: 2   COMMAND: "swapper/2"
 #0 [ffff8802afb07e60] crash_nmi_callback at ffffffff81037352
 #1 [ffff8802afb07e70] nmi_handle.isra.0 at ffffffff816626f9
 #2 [ffff8802afb07ec0] do_nmi at ffffffff81662810
 #3 [ffff8802afb07ef0] end_repeat_nmi at ffffffff81661d01
    [exception RIP: intel_idle+168]
    RIP: ffffffff81365aa8  RSP: ffff8802a0ccfe08  RFLAGS: 00000046
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000046
    RDX: ffff8802a0ccfe08  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff81365aa8   R8: ffffffff81365aa8   R9: 0000000000000018
    R10: ffff8802a0ccfe08  R11: 0000000000000046  R12: ffffffffffffffff
    R13: 0000000000000002  R14: ffff8802a0ccffd8  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8802a0ccfe08] intel_idle at ffffffff81365aa8
 #5 [ffff8802a0ccfe40] cpuidle_enter at ffffffff815076d9
 #6 [ffff8802a0ccfe50] cpuidle_wrap_enter at ffffffff81508011
 #7 [ffff8802a0ccfeb0] cpuidle_enter_tk at ffffffff81508080
 #8 [ffff8802a0ccfec0] cpuidle_idle_call at ffffffff81507d45
 #9 [ffff8802a0ccff00] cpu_idle at ffffffff8101d12f

PID: 0      TASK: ffff8802a0cd4650  CPU: 3   COMMAND: "swapper/3"
 #0 [ffff8802afb87e60] crash_nmi_callback at ffffffff81037352
 #1 [ffff8802afb87e70] nmi_handle.isra.0 at ffffffff816626f9
 #2 [ffff8802afb87ec0] do_nmi at ffffffff81662810
 #3 [ffff8802afb87ef0] end_repeat_nmi at ffffffff81661d01
    [exception RIP: intel_idle+168]
    RIP: ffffffff81365aa8  RSP: ffff8802a0cd9e08  RFLAGS: 00000046
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000046
    RDX: ffff8802a0cd9e08  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff81365aa8   R8: ffffffff81365aa8   R9: 0000000000000018
    R10: ffff8802a0cd9e08  R11: 0000000000000046  R12: ffffffffffffffff
    R13: 0000000000000003  R14: ffff8802a0cd9fd8  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8802a0cd9e08] intel_idle at ffffffff81365aa8
 #5 [ffff8802a0cd9e40] cpuidle_enter at ffffffff815076d9
 #6 [ffff8802a0cd9e50] cpuidle_wrap_enter at ffffffff81508011
 #7 [ffff8802a0cd9eb0] cpuidle_enter_tk at ffffffff81508080
 #8 [ffff8802a0cd9ec0] cpuidle_idle_call at ffffffff81507d45
 #9 [ffff8802a0cd9f00] cpu_idle at ffffffff8101d12f
crash> bt -a
PID: 1730   TASK: ffff88029fffc650  CPU: 0   COMMAND: "sh"
 #0 [ffff8802afa03ce0] machine_kexec at ffffffff8103f79b
 #1 [ffff8802afa03d50] crash_kexec at ffffffff810c95c8
 #2 [ffff8802afa03e20] panic at ffffffff81656f57
 #3 [ffff8802afa03ea0] watchdog_timer_fn at ffffffff810eecd5
 #4 [ffff8802afa03ed0] __run_hrtimer at ffffffff81086263
 #5 [ffff8802afa03f20] hrtimer_interrupt at ffffffff81086b87
 #6 [ffff8802afa03f90] smp_apic_timer_interrupt at ffffffff8166b789
 #7 [ffff8802afa03fb0] apic_timer_interrupt at ffffffff8166a65d
--- <IRQ stack> ---
 #8 [ffff88029fc1dc28] apic_timer_interrupt at ffffffff8166a65d
    [exception RIP: lock_timer_base.isra.37+35]
    RIP: ffffffff8106ecf3  RSP: ffff88029fc1dcd8  RFLAGS: 00000246
    RAX: ffff88029fc1dfd8  RBX: 0000000000000000  RCX: 0000000000000080
    RDX: 0000000000000000  RSI: ffff88029fc1dd10  RDI: ffff88029f842ce8
    RBP: ffff88029fc1dcf8   R8: ffff8802afa16e00   R9: 00000000ffffffff
    R10: 0000000000016e00  R11: ffffffff812f6332  R12: 0000000000000000
    R13: ffff880200000041  R14: ffff8802afde3b08  R15: 0000000000000002
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
 #9 [ffff88029fc1dd00] try_to_del_timer_sync at ffffffff8106f600
#10 [ffff88029fc1dd30] del_timer_sync at ffffffff8106f6aa
#11 [ffff88029fc1dd50] cfq_exit_queue at ffffffff812fb6a6
#12 [ffff88029fc1dd90] elevator_exit at ffffffff812d6f18
#13 [ffff88029fc1ddb0] elevator_change at ffffffff812d7788
#14 [ffff88029fc1de20] elv_iosched_store at ffffffff812d838b
#15 [ffff88029fc1de50] queue_attr_store at ffffffff812e0d24
#16 [ffff88029fc1dea0] sysfs_write_file at ffffffff81214768
#17 [ffff88029fc1df00] vfs_write at ffffffff8119fd5c
#18 [ffff88029fc1df30] sys_write at ffffffff811a01f2
#19 [ffff88029fc1df80] system_call_fastpath at ffffffff81669a59
    RIP: 00007fba5b287740  RSP: 00007fffe8217e78  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: ffffffff81669a59  RCX: 0000000000000008
    RDX: 0000000000000009  RSI: 00007fba5bba5000  RDI: 0000000000000001
    RBP: 00007fba5bba5000   R8: 000000000000000a   R9: 00007fba5bb9b740
    R10: 00007fffe8217c10  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000009  R14: 00007fba5b554280  R15: 0000000000000009
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

PID: 0      TASK: ffff8802a0cd1770  CPU: 1   COMMAND: "swapper/1"
 #0 [ffff8802afa87e60] crash_nmi_callback at ffffffff81037352
 #1 [ffff8802afa87e70] nmi_handle.isra.0 at ffffffff816626f9
 #2 [ffff8802afa87ec0] do_nmi at ffffffff81662810
 #3 [ffff8802afa87ef0] end_repeat_nmi at ffffffff81661d01
    [exception RIP: intel_idle+168]
    RIP: ffffffff81365aa8  RSP: ffff8802a0ccde08  RFLAGS: 00000046
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000046
    RDX: ffff8802a0ccde08  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff81365aa8   R8: ffffffff81365aa8   R9: 0000000000000018
    R10: ffff8802a0ccde08  R11: 0000000000000046  R12: ffffffffffffffff
    R13: 0000000000000001  R14: ffff8802a0ccdfd8  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8802a0ccde08] intel_idle at ffffffff81365aa8
 #5 [ffff8802a0ccde40] cpuidle_enter at ffffffff815076d9
 #6 [ffff8802a0ccde50] cpuidle_wrap_enter at ffffffff81508011
 #7 [ffff8802a0ccdeb0] cpuidle_enter_tk at ffffffff81508080
 #8 [ffff8802a0ccdec0] cpuidle_idle_call at ffffffff81507d45
 #9 [ffff8802a0ccdf00] cpu_idle at ffffffff8101d12f

PID: 0      TASK: ffff8802a0cd2ee0  CPU: 2   COMMAND: "swapper/2"
 #0 [ffff8802afb07e60] crash_nmi_callback at ffffffff81037352
 #1 [ffff8802afb07e70] nmi_handle.isra.0 at ffffffff816626f9
 #2 [ffff8802afb07ec0] do_nmi at ffffffff81662810
 #3 [ffff8802afb07ef0] end_repeat_nmi at ffffffff81661d01
    [exception RIP: intel_idle+168]
    RIP: ffffffff81365aa8  RSP: ffff8802a0ccfe08  RFLAGS: 00000046
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000046
    RDX: ffff8802a0ccfe08  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff81365aa8   R8: ffffffff81365aa8   R9: 0000000000000018
    R10: ffff8802a0ccfe08  R11: 0000000000000046  R12: ffffffffffffffff
    R13: 0000000000000002  R14: ffff8802a0ccffd8  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8802a0ccfe08] intel_idle at ffffffff81365aa8
 #5 [ffff8802a0ccfe40] cpuidle_enter at ffffffff815076d9
 #6 [ffff8802a0ccfe50] cpuidle_wrap_enter at ffffffff81508011
 #7 [ffff8802a0ccfeb0] cpuidle_enter_tk at ffffffff81508080
 #8 [ffff8802a0ccfec0] cpuidle_idle_call at ffffffff81507d45
 #9 [ffff8802a0ccff00] cpu_idle at ffffffff8101d12f

PID: 0      TASK: ffff8802a0cd4650  CPU: 3   COMMAND: "swapper/3"
 #0 [ffff8802afb87e60] crash_nmi_callback at ffffffff81037352
 #1 [ffff8802afb87e70] nmi_handle.isra.0 at ffffffff816626f9
 #2 [ffff8802afb87ec0] do_nmi at ffffffff81662810
 #3 [ffff8802afb87ef0] end_repeat_nmi at ffffffff81661d01
    [exception RIP: intel_idle+168]
    RIP: ffffffff81365aa8  RSP: ffff8802a0cd9e08  RFLAGS: 00000046
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000046
    RDX: ffff8802a0cd9e08  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff81365aa8   R8: ffffffff81365aa8   R9: 0000000000000018
    R10: ffff8802a0cd9e08  R11: 0000000000000046  R12: ffffffffffffffff
    R13: 0000000000000003  R14: ffff8802a0cd9fd8  R15: 0000000000000000
    ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8802a0cd9e08] intel_idle at ffffffff81365aa8
 #5 [ffff8802a0cd9e40] cpuidle_enter at ffffffff815076d9
 #6 [ffff8802a0cd9e50] cpuidle_wrap_enter at ffffffff81508011
 #7 [ffff8802a0cd9eb0] cpuidle_enter_tk at ffffffff81508080
 #8 [ffff8802a0cd9ec0] cpuidle_idle_call at ffffffff81507d45
 #9 [ffff8802a0cd9f00] cpu_idle at ffffffff8101d12f

Comment 18 Vivek Goyal 2013-05-24 19:14:29 UTC
I suspect that timer->base filed is null and that's we are looping in lock_timer_base().


static struct tvec_base *lock_timer_base(struct timer_list *timer,
                                        unsigned long *flags)
        __acquires(timer->base->lock)
{
        struct tvec_base *base;

        for (;;) {
                struct tvec_base *prelock_base = timer->base;
                base = tbase_get_base(prelock_base);
                if (likely(base != NULL)) {
                        spin_lock_irqsave(&base->lock, *flags);
                        if (likely(prelock_base == timer->base))
                                return base;
                        /* The timer has migrated to another CPU */
                        spin_unlock_irqrestore(&base->lock, *flags);
                }
                cpu_relax();
        }
}

Comment 19 Vivek Goyal 2013-05-24 19:22:23 UTC
Following is disassembly of code around RIP(ffffffff8106ecf3) at the exception time.


crash> disassemble /m 0xffffffff8106ecf3
Dump of assembler code for function lock_timer_base:
700     static struct tvec_base *lock_timer_base(struct timer_list *timer,
   0xffffffff8106ecd0 <+0>:     data32 data32 data32 xchg %ax,%ax
   0xffffffff8106ecd5 <+5>:     push   %rbp
   0xffffffff8106ecd6 <+6>:     mov    %rsp,%rbp
   0xffffffff8106ecd9 <+9>:     sub    $0x20,%rsp
   0xffffffff8106ecdd <+13>:    mov    %r13,-0x10(%rbp)
   0xffffffff8106ece1 <+17>:    mov    %r14,-0x8(%rbp)
   0xffffffff8106ece5 <+21>:    mov    %rdi,%r13
   0xffffffff8106ece8 <+24>:    mov    %rbx,-0x20(%rbp)
   0xffffffff8106ecec <+28>:    mov    %r12,-0x18(%rbp)
   0xffffffff8106ecf0 <+32>:    mov    %rsi,%r14

701                                             unsigned long *flags)
702             __acquires(timer->base->lock)
703     {
704             struct tvec_base *base;
705     
706             for (;;) {
707                     struct tvec_base *prelock_base = timer->base;
   0xffffffff8106ecf3 <+35>:    mov    0x0(%r13),%rbx

708                     base = tbase_get_base(prelock_base);
709                     if (likely(base != NULL)) {
   0xffffffff8106ecf7 <+39>:    mov    %rbx,%r12
   0xffffffff8106ecfa <+42>:    and    $0xfffffffffffffffc,%r12
   0xffffffff8106ecfe <+46>:    je     0xffffffff8106ed31 <lock_timer_base+97>

710                             spin_lock_irqsave(&base->lock, *flags);
   0xffffffff8106ed00 <+48>:    mov    %r12,%rdi
   0xffffffff8106ed03 <+51>:    callq  0xffffffff81661340 <_raw_spin_lock_irqsave>
   0xffffffff8106ed08 <+56>:    mov    %rax,(%r14)

711                             if (likely(prelock_base == timer->base))
   0xffffffff8106ed0b <+59>:    cmp    0x0(%r13),%rbx
   0xffffffff8106ed0f <+63>:    jne    0xffffffff8106ed26 <lock_timer_base+86>

712                                     return base;
713                             /* The timer has migrated to another CPU */
714                             spin_unlock_irqrestore(&base->lock, *flags);
715                     }
716                     cpu_relax();
717             }
718     }
   0xffffffff8106ed11 <+65>:    mov    %r12,%rax

I am not assembly code expert but I suspect that R13 contains the pointer to timer. Look at line 707 (RIP) and we do following.

707                     struct tvec_base *prelock_base = timer->base;
   0xffffffff8106ecf3 <+35>:    mov    0x0(%r13),%rbx

backtrace says rbx is 0000000000000000. and rbx should be containing the value of timer->base. So I suspect that timer->base is null and that's why we are looping.

One thing I don't understand and that if R13 is pointer to timer, then why offset of base field is 0. It is 3rd field in the timer_list structure. May be
R13 contains pointer to timer->base instead of timer? 

R13 is ffff880200000041 and I tried to access the contents at this memory location but it is not part of vmcore. Strange, even filtering should not have
removed kernel memory buffers.

Comment 20 Ohad Basan 2013-05-26 06:52:41 UTC
The bug does appear to me in 3.6.10-4
since I upgraded that the problem showed up

Comment 21 Ohad Basan 2013-05-26 06:52:58 UTC
The bug does not appear to me in 3.6.10-4
since I upgraded that the problem showed up

Comment 22 Vivek Goyal 2013-05-28 16:04:21 UTC
can we try a simple patch like following. This is just to verify that timer->base is null for whatever reason and that's why we are spinning in for loop.

 kernel/timer.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c       2013-05-03 22:05:19.000000000 -0400
+++ linux-2.6/kernel/timer.c    2013-05-28 23:43:38.142900050 -0400
@@ -713,6 +713,9 @@ static struct tvec_base *lock_timer_base
                                return base;
                        /* The timer has migrated to another CPU */
                        spin_unlock_irqrestore(&base->lock, *flags);
+               } else {
+                       printk("timer->base is null. timer=%p timer->base=%p."
+                               "Will retry.\n", timer, &timer->base);
                }
                cpu_relax();
        }

Comment 23 Alessandro 2013-06-02 17:50:59 UTC
(In reply to Ohad Basan from comment #21)
> The bug does not appear to me in 3.6.10-4
> since I upgraded that the problem showed up

I confirm that we have several ovirt F 18 nodes and have downgraded all the kernels to 3.6.10-4 to have them boot with no hangs.

With newer kernel sometimes (I mean most of the times like 10 times on 11 boots) it hangs on boot and sometimes it doesn't.

It detects soft lock on one random CPU and then it soft locks ALL the CPU's (8 core and 4 core servers), so the boot never ends and the server never gets accessible.

To get passed with kernel higher than 3.6.10, we need to manually reset the server several times (no fixed number of resets unfortunately) till it gets passed.

With 3.6.10 it always boots without any problem

After booting this way, the node works fine without any anomaly till it is rebooted

Booting with 3.6.10 it always boots without any problem

Hope this bug gets fixed since it prevents unattended boots of the servers.

Thank you and best regards

Comment 24 Alessandro 2013-06-02 17:56:13 UTC
Sorry for reposting, but I forgot to suggest to raise the severity level of this bug to something like CRITICAL since it doesn't allow nodes to become operational

Best regards

Comment 25 Ohad Basan 2013-06-02 18:10:01 UTC
(In reply to Alessandro from comment #24)
> Sorry for reposting, but I forgot to suggest to raise the severity level of
> this bug to something like CRITICAL since it doesn't allow nodes to become
> operational
> 
> Best regards

that would be great idea imho
it's really problematic

Comment 26 Dax Kelson 2013-06-06 21:26:09 UTC
(In reply to Vivek Goyal from comment #22)
> can we try a simple patch like following. This is just to verify that
> timer->base is null for whatever reason and that's why we are spinning in
> for loop.

I applied the patch to the latest F18 kernel as of today 3.9.4-200

Here is some dmesg output:

...
[  735.571449] scsi6 : iSCSI Initiator over TCP/IP
[  736.578398] scsi 6:0:0:0: Direct-Access     LIO-ORG  ovirt            4.0  PQ: 0 ANSI: 5
[  736.578809] sd 6:0:0:0: Attached scsi generic sg2 type 0
[  736.579566] sd 6:0:0:0: [sdb] 629137408 512-byte logical blocks: (322 GB/299 GiB)
[  736.581363] sd 6:0:0:0: [sdb] Write Protect is off
[  736.581369] sd 6:0:0:0: [sdb] Mode Sense: 43 00 10 08
[  736.583205] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  736.598929]  sdb: unknown partition table
[  736.601265] sd 6:0:0:0: [sdb] Attached SCSI disk
[  736.700529] bio: create slab <bio-2> at 2
[  736.704973] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.704977] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.704979] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.704980] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.

[snip many many lines]

[  736.705912] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.705913] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.705914] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.705916] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.705917] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.705918] timer->base is null. timer=ffff88029bdc0cd0 timer->base=ffff88029bdc0ce8.Will retry.
[  736.786968] FS-Cache: Loaded
[  736.800225] Key type dns_resolver registered
[  736.820192] FS-Cache: Netfs 'nfs' registered for caching
[  736.837758] SELinux: initialized (dev 0:34, type nfs), uses genfs_contexts
...

Comment 27 Vivek Goyal 2013-06-07 20:27:47 UTC
(In reply to Dax Kelson from comment #26)
> (In reply to Vivek Goyal from comment #22)
> > can we try a simple patch like following. This is just to verify that
> > timer->base is null for whatever reason and that's why we are spinning in
> > for loop.
> 
> I applied the patch to the latest F18 kernel as of today 3.9.4-200
> 
> Here is some dmesg output:

So this time softlock does not happen? I was expecting that if timer->base
continues to be NULL, then we will see timer base null message for long time
and then we will see softlock warning. That will confirm that timer->base
remains null for a long time and we are spinning in a loop and that's why
softlockup warning shows up.

After that question will arise why timer->base is null for so long.

Comment 28 Vivek Goyal 2013-06-07 20:47:18 UTC
(In reply to Alessandro from comment #23)
> (In reply to Ohad Basan from comment #21)
> > The bug does not appear to me in 3.6.10-4
> > since I upgraded that the problem showed up
> 
> I confirm that we have several ovirt F 18 nodes and have downgraded all the
> kernels to 3.6.10-4 to have them boot with no hangs.
> 
> With newer kernel sometimes (I mean most of the times like 10 times on 11
> boots) it hangs on boot and sometimes it doesn't.
> 
> It detects soft lock on one random CPU and then it soft locks ALL the CPU's
> (8 core and 4 core servers), so the boot never ends and the server never
> gets accessible.
> 
> To get passed with kernel higher than 3.6.10, we need to manually reset the
> server several times (no fixed number of resets unfortunately) till it gets
> passed.

Try elevator=deadline on kernel command line. That might help atleast as a workaround for the problem till a solution is found.

Comment 29 Vivek Goyal 2013-06-07 20:49:08 UTC
(In reply to Vivek Goyal from comment #27)
> (In reply to Dax Kelson from comment #26)
> > (In reply to Vivek Goyal from comment #22)
> > > can we try a simple patch like following. This is just to verify that
> > > timer->base is null for whatever reason and that's why we are spinning in
> > > for loop.
> > 
> > I applied the patch to the latest F18 kernel as of today 3.9.4-200
> > 
> > Here is some dmesg output:
> 
> So this time softlock does not happen? I was expecting that if timer->base
> continues to be NULL, then we will see timer base null message for long time
> and then we will see softlock warning. That will confirm that timer->base
> remains null for a long time and we are spinning in a loop and that's why
> softlockup warning shows up.
> 
> After that question will arise why timer->base is null for so long.

Or is it possible that printk() can force scheduling hence avoid soft lockup
situation. Try trace_printk() instead and then we can read traces from debugfs
tracing files.

Comment 30 Vivek Goyal 2013-06-07 20:54:27 UTC
(In reply to Alessandro from comment #23)
> 
> It detects soft lock on one random CPU and then it soft locks ALL the CPU's
> (8 core and 4 core servers), so the boot never ends and the server never
> gets accessible.

So when all core detect softlockup, where are rest of the cores spinning? Are they spinning at same place? Do you have backtraces of softlockup warnings 
on other cpus.

Comment 31 Alessandro 2013-06-07 21:04:54 UTC
(In reply to Vivek Goyal from comment #30)
> (In reply to Alessandro from comment #23)
> > 
> > It detects soft lock on one random CPU and then it soft locks ALL the CPU's
> > (8 core and 4 core servers), so the boot never ends and the server never
> > gets accessible.
> 
> So when all core detect softlockup, where are rest of the cores spinning?
> Are they spinning at same place? Do you have backtraces of softlockup
> warnings 
> on other cpus.

I don't have backtrace since the system never gets usable even using a different console: the only resort is reboot over and over again till it boots

I loops sayng CPU#1 was stuck then #2 #3 #4 and #1 again

I'll try the proposed kernel parameter on next reboot

For now I have all the nodes "stuck" on kernel 3.6

Thank you for answering

Best regads

Comment 32 Dax Kelson 2013-06-07 21:31:14 UTC
(In reply to Vivek Goyal from comment #27)
>
> So this time softlock does not happen?

Yes. These messages occured during bootup.

Here is a softlockup that occured with my patched kernel:

      KERNEL: /usr/lib/debug/lib/modules/3.9.4-200.timerdebug.fc18.x86_64/vmlinux
    DUMPFILE: /var/tmp/abrt/vmcore-127.0.0.1-2013.06.06-21:20:29/vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Thu Jun  6 15:20:27 2013
      UPTIME: 00:01:48
LOAD AVERAGE: 5.05, 1.36, 0.47
       TASKS: 214
    NODENAME: station2.example.com
     RELEASE: 3.9.4-200.timerdebug.fc18.x86_64
     VERSION: #1 SMP Thu Jun 6 12:48:54 MDT 2013
     MACHINE: x86_64  (3093 Mhz)
      MEMORY: 9.9 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 1602
     COMMAND: "sh"
        TASK: ffff88029f38c650  [THREAD_INFO: ffff88029f34c000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash> bt -a
PID: 36     TASK: ffff8802a0e8ddc0  CPU: 0   COMMAND: "kworker/0:1"
 #0 [ffff8802afa07e60] crash_nmi_callback at ffffffff81037342
 #1 [ffff8802afa07e70] nmi_handle.isra.0 at ffffffff81662bf9
 #2 [ffff8802afa07ec0] do_nmi at ffffffff81662d10
 #3 [ffff8802afa07ef0] end_repeat_nmi at ffffffff81662201
    [exception RIP: cfb_imageblit+1197]
    RIP: ffffffff813625ad  RSP: ffff8802a08b7a98  RFLAGS: 00000006
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000006
    RDX: ffff8802a08b7a98  RSI: 0000000000000000  RDI: 0000000000000001
    RBP: ffffffff813625ad   R8: ffffffff813625ad   R9: 0000000000000000
    R10: ffff8802a08b7a98  R11: 0000000000000006  R12: ffffffffffffffff
    R13: 0000000000000001  R14: 0000000000000000  R15: 000000000000020a
    ORIG_RAX: 000000000000020a  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #4 [ffff8802a08b7a98] cfb_imageblit at ffffffff813625ad
 #5 [ffff8802a08b7aa0] cpumask_next_and at ffffffff813011f3
 #6 [ffff8802a08b7ac0] update_sd_lb_stats at ffffffff8109d473
 #7 [ffff8802a08b7bf0] vsnprintf at ffffffff8130c001
 #8 [ffff8802a08b7ce0] vt_console_print at ffffffff813cc483
 #9 [ffff8802a08b7d40] call_console_drivers.constprop.18 at ffffffff8105fec9
#10 [ffff8802a08b7d70] console_unlock at ffffffff81061373
#11 [ffff8802a08b7db0] fb_flashcursor at ffffffff81355ca8
#12 [ffff8802a08b7e00] process_one_work at ffffffff8107b7c3
#13 [ffff8802a08b7e50] worker_thread at ffffffff8107d0ef
#14 [ffff8802a08b7ec0] kthread at ffffffff81082c90
#15 [ffff8802a08b7f50] ret_from_fork at ffffffff81669eac

PID: 1602   TASK: ffff88029f38c650  CPU: 1   COMMAND: "sh"
 #0 [ffff8802afa83ce0] machine_kexec at ffffffff8103f78b
 #1 [ffff8802afa83d50] crash_kexec at ffffffff810c9708
 #2 [ffff8802afa83e20] panic at ffffffff81657417
 #3 [ffff8802afa83ea0] watchdog_timer_fn at ffffffff810eefb5
 #4 [ffff8802afa83ed0] __run_hrtimer at ffffffff810862b3
 #5 [ffff8802afa83f20] hrtimer_interrupt at ffffffff81086bd7
 #6 [ffff8802afa83f90] smp_apic_timer_interrupt at ffffffff8166bc89
 #7 [ffff8802afa83fb0] apic_timer_interrupt at ffffffff8166ab5d
--- <IRQ stack> ---
 #8 [ffff88029f34db28] apic_timer_interrupt at ffffffff8166ab5d
    [exception RIP: vprintk_emit+468]
    RIP: ffffffff810617e4  RSP: ffff88029f34dbd8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 00000000000000e3
    RDX: 0000000000000000  RSI: 0000000000000082  RDI: 0000000000000246
    RBP: ffff88029f34dc58   R8: ffffffff81e97160   R9: ffffffff81eac344
    R10: 000000000002ae80  R11: 0000000000040000  R12: 0000000000000096
    R13: ffff88029f34db58  R14: 0000000000000000  R15: 0000000000000054
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
 #9 [ffff88029f34dc60] printk at ffffffff81657580
#10 [ffff88029f34dcc0] lock_timer_base at ffffffff8106e6a8
#11 [ffff88029f34dd00] try_to_del_timer_sync at ffffffff8106f641
#12 [ffff88029f34dd30] del_timer_sync at ffffffff8106f6ea
#13 [ffff88029f34dd50] cfq_exit_queue at ffffffff812fb9d6
#14 [ffff88029f34dd90] elevator_exit at ffffffff812d7248
#15 [ffff88029f34ddb0] elevator_change at ffffffff812d7ab8
#16 [ffff88029f34de20] elv_iosched_store at ffffffff812d86bb
#17 [ffff88029f34de50] queue_attr_store at ffffffff812e1054
#18 [ffff88029f34dea0] sysfs_write_file at ffffffff81214a78
#19 [ffff88029f34df00] vfs_write at ffffffff811a005c
#20 [ffff88029f34df30] sys_write at ffffffff811a04f2
#21 [ffff88029f34df80] system_call_fastpath at ffffffff81669f59
    RIP: 00007f90c59c6740  RSP: 00007fff7eb5fea8  RFLAGS: 00010202
    RAX: 0000000000000001  RBX: ffffffff81669f59  RCX: 0000000000000008
    RDX: 0000000000000009  RSI: 00007f90c62e4000  RDI: 0000000000000001
    RBP: 00007f90c62e4000   R8: 000000000000000a   R9: 00007f90c62d8740
    R10: 00007fff7eb5fc40  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000009  R14: 00007f90c5c93280  R15: 0000000000000009
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

PID: 402    TASK: ffff88029b259770  CPU: 2   COMMAND: "systemd-journal"
 #0 [ffff8802afb07e60] crash_nmi_callback at ffffffff81037342
 #1 [ffff8802afb07e70] nmi_handle.isra.0 at ffffffff81662bf9
 #2 [ffff8802afb07ec0] do_nmi at ffffffff81662d10
 #3 [ffff8802afb07ef0] end_repeat_nmi at ffffffff81662201
    [exception RIP: finish_open+57]
    RIP: ffffffff8119e2e9  RSP: ffff88029a54bcb8  RFLAGS: 00000246
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000246
    RDX: ffff88029a54bcb8  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff8119e2e9   R8: ffffffff8119e2e9   R9: 0000000000000018
    R10: ffff88029a54bcb8  R11: 0000000000000246  R12: ffffffffffffffff
    R13: ffff88029eb1ca78  R14: ffff88029f6408b8  R15: ffff88029b335400
    ORIG_RAX: ffff88029b335400  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff88029a54bcb8] finish_open at ffffffff8119e2e9
 #5 [ffff88029a54bcd0] do_last at ffffffff811ae909
 #6 [ffff88029a54bdb0] path_openat at ffffffff811af117
 #7 [ffff88029a54be50] do_filp_open at ffffffff811af7d1
 #8 [ffff88029a54bf10] do_sys_open at ffffffff8119f634
 #9 [ffff88029a54bf70] sys_open at ffffffff8119f741
#10 [ffff88029a54bf80] system_call_fastpath at ffffffff81669f59
    RIP: 00007f36d7da04f0  RSP: 00007fff67037568  RFLAGS: 00000246
    RAX: 0000000000000002  RBX: ffffffff81669f59  RCX: ffffffffffffffff
    RDX: 00000000000001b6  RSI: 0000000000080000  RDI: 00007f36d9e530b0
    RBP: 00007fff67037570   R8: 0000000000000000   R9: 00007f36d95bb01d
    R10: 0000000000000238  R11: 0000000000000246  R12: ffffffff8119f741
    R13: ffff88029a54bf78  R14: 00007fff67038200  R15: 00007fff67037e28
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

PID: 638    TASK: ffff88029d02aee0  CPU: 3   COMMAND: "rs:main Q:Reg"
 #0 [ffff8802afb87e60] crash_nmi_callback at ffffffff81037342
 #1 [ffff8802afb87e70] nmi_handle.isra.0 at ffffffff81662bf9
 #2 [ffff8802afb87ec0] do_nmi at ffffffff81662d10
 #3 [ffff8802afb87ef0] end_repeat_nmi at ffffffff81662201
    [exception RIP: path_get+51]
    RIP: ffffffff811aa9c3  RSP: ffff88029d01fd18  RFLAGS: 00000257
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000257
    RDX: ffff88029d01fd18  RSI: 0000000000000018  RDI: 0000000000000001
    RBP: ffffffff811aa9c3   R8: ffffffff811aa9c3   R9: 0000000000000018
    R10: ffff88029d01fd18  R11: 0000000000000257  R12: ffffffffffffffff
    R13: ffff8802a4f99e9c  R14: 0000000000000000  R15: 00000000000000ff
    ORIG_RAX: 00000000000000ff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff88029d01fd18] path_get at ffffffff811aa9c3
 #5 [ffff88029d01fd30] fsnotify_create_event at ffffffff811e0718
 #6 [ffff88029d01fd90] send_to_group at ffffffff811df812
 #7 [ffff88029d01fe30] fsnotify at ffffffff811dfbb5
 #8 [ffff88029d01ff00] vfs_write at ffffffff811a00b5
 #9 [ffff88029d01ff30] sys_write at ffffffff811a04f2
#10 [ffff88029d01ff80] system_call_fastpath at ffffffff81669f59
    RIP: 00007f6d751310cd  RSP: 00007f6d723b4c78  RFLAGS: 00000246
    RAX: 0000000000000001  RBX: ffffffff81669f59  RCX: 00007f6d759f4430
    RDX: 0000000000000085  RSI: 00007f6d6c000c40  RDI: 0000000000000005
    RBP: 00007f6d6c000c40   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000293  R12: 00007f6d6c000c40
    R13: 00007f6d723b48e0  R14: 0000000000000000  R15: 0000000000000085
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash>

Here are the files you can use to do your own analysis:

http://go.gurulabs.com/vmlinux.xz
http://go.gurulabs.com/vmcore.xz

Comment 33 Ohad Basan 2013-07-30 07:29:44 UTC
I am hitting something very similar
Message from syslogd@tigris04 at Jul 30 10:19:24 ...
 kernel:[75827.673018] BUG: soft lockup - CPU#27 stuck for 23s! [sh:6189]

when connecting an iscsi storage to an ovirt node
this is an AMD processor
and the kernel is the latest in fedora 19
it's 3.10

Comment 34 Tomoki Sekiyama 2013-08-29 15:17:51 UTC
I submitted a patch to resolve a similar issue:
http://marc.info/?l=linux-kernel&m=137752512914423&w=2

This patch fixes a race between I/O scheduler initialization and switching by  sysfs knob.
I appreciate if anybody having this bug can try whether the patch can solve this.

Thanks,

Comment 35 Vivek Goyal 2013-08-29 20:30:54 UTC
I think above patch should fix the issue. so elevator switch will call del_timer_sync() on a timer on which init_timer() has not been done yet. Which will explain why timer->base is NULL and why lock_timer_base() is spinning
infinitely.

Comment 36 Tomoki Sekiyama 2013-09-03 15:14:57 UTC
I submitted patch v2 that may fix this issue.
https://lkml.org/lkml/2013/8/30/551
Any tests/reviews/comments are appreciated.

Comment 37 Ohad Basan 2013-09-03 15:18:13 UTC
What's the difference between v1 and v2?
I compiled my kernel with v1 and so far it's working.

Comment 38 Tomoki Sekiyama 2013-09-03 15:27:29 UTC
(In reply to Ohad Basan from comment #37)

Thanks for testing!
There are no big changes in the logic; split it into 2 patches for ease of reviews, added some comments and assertion for locking, and fixed a minor bug on memory allocation failure.

Comment 39 Nir Soffer 2013-09-10 21:46:53 UTC
I have the same issue with 3.10.10-200.fc19.x86_64 on intel CPU. The trigger seems to be adding iscsi targets.

Comment 40 Ohad Basan 2013-09-17 10:36:28 UTC
I confirm that this patch solves the problem

Comment 41 Ohad Basan 2013-09-22 11:40:00 UTC
Does anyone know when will this be merged to the fedora kernel?

Comment 42 Alessandro 2013-09-23 07:01:39 UTC
Good question #41. 

The problem is still here in Fedora 19 3.11.1-200 and we cannot force anyone to rebuild kernel from sorce to apply the patch or insert elevator=deadline in the kernel line.

Since the patch seems to work, it wold be really nice to have it merged in the kernel tree.

Hope someone gets this annoyng thing done

Best regards

Comment 43 Josh Boyer 2013-09-23 14:27:05 UTC
(In reply to Alessandro from comment #42)
> Good question #41. 
> 
> The problem is still here in Fedora 19 3.11.1-200 and we cannot force anyone
> to rebuild kernel from sorce to apply the patch or insert elevator=deadline
> in the kernel line.
> 
> Since the patch seems to work, it wold be really nice to have it merged in
> the kernel tree.

As soon as it seems set upstream, we'll backport it.  Right now, there's a standing question from yesterday.  It should also be CC'd to stable I would think.

Comment 44 Alessandro 2013-09-24 07:38:05 UTC
Uh

Josh this is a WONDERFUL news

This bug is long standing and really problematic for unattended boot

Thank you very much

Comment 45 Josh Boyer 2013-09-27 13:26:31 UTC
The patches got acks upstream.  I've added them to all releases.

Comment 46 Alessandro 2013-09-27 13:29:05 UTC
Thank you!

It's a long awaited fix coming!

Comment 47 Fedora Update System 2013-09-27 21:42:03 UTC
kernel-3.11.2-201.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.11.2-201.fc19

Comment 48 Fedora Update System 2013-09-28 20:44:44 UTC
kernel-3.10.13-101.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.10.13-101.fc18

Comment 49 Fedora Update System 2013-09-29 01:21:25 UTC
Package kernel-3.11.2-201.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.2-201.fc19'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-17865/kernel-3.11.2-201.fc19
then log in and leave karma (feedback).

Comment 50 Fedora Update System 2013-09-30 16:27:31 UTC
kernel-3.11.2-301.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.2-301.fc20

Comment 51 Fedora Update System 2013-10-01 01:56:31 UTC
kernel-3.11.2-201.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 52 Fedora Update System 2013-10-02 06:35:28 UTC
kernel-3.11.2-301.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 53 Fedora Update System 2013-10-03 01:11:10 UTC
kernel-3.10.13-101.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.