Bug 820459

Summary: BUG: soft lockup - CPU#0 stuck for 22s! since update to kernel 3.x
Product: [Fedora] Fedora Reporter: Dave Steele <dave.steele>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: dave.steele, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-10 12:15:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Steele 2012-05-10 02:21:06 UTC
Description of problem: Machine will lockup and become unresponsive usually within 24 hours after boot. If connected via ssh console shows "BUG: soft lockup - CPU#0 stuck for 22s!" followed by a process name and PID. Issue began after updating to version 3.x.x kernels, problem never seen in several months of running 2.6.x kernels. Attempts to kill the offending process fail, and machine needs to be power cycled to recover. If machine is left running, reported load quickly rises into the hundreds. Following is in /var/log/messages from last occurrence:

May  9 19:03:26 mythtv kernel: [206272.023004] BUG: soft lockup - CPU#0 stuck for 22s! [mythfrontend:3545]
May  9 19:03:26 mythtv kernel: [206272.023004] Modules linked in: joydev hidp lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rfcomm rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bnep coretemp nvidia(PO) microcode snd_hda_codec_hdmi snd_hda_codec_cirrus snd_usb_audio arc4 snd_hda_intel snd_hda_codec brcmsmac snd_seq mac80211 dvb_usb_dib0700 brcmutil cfg80211 dib0090 crc8 cordic dib7000p dib7000m dib0070 dvb_usb snd_hwdep dib8000 dib9000 dvb_core snd_pcm btusb dib3000mc snd_timer uvcvideo videobuf2_core videodev tg3 bluetooth snd_usbmidi_lib snd_rawmidi snd_seq_device media videobuf2_vmalloc videobuf2_memops bcma rc_core dibx000_common i2c_core rfkill snd soundcore snd_page_alloc applesmc input_polldev uinput sunrpc sdhci_pci sdhci firewire_ohci firewire_core mmc_core crc_itu_t ata_generic pata_acpi video usb_storage [last unloaded: scsi_wait_scan]
May  9 19:03:26 mythtv kernel: [206272.023004] Modules linked in: joydev hidp lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rfcomm rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bnep coretemp nvidia(PO) microcode snd_hda_codec_hdmi snd_hda_codec_cirrus snd_usb_audio arc4 snd_hda_intel snd_hda_codec brcmsmac snd_seq mac80211 dvb_usb_dib0700 brcmutil cfg80211 dib0090 crc8 cordic dib7000p dib7000m dib0070 dvb_usb snd_hwdep dib8000 dib9000 dvb_core snd_pcm btusb dib3000mc snd_timer uvcvideo videobuf2_core videodev tg3 bluetooth snd_usbmidi_lib snd_rawmidi snd_seq_device media videobuf2_vmalloc videobuf2_memops bcma rc_core dibx000_common i2c_core rfkill snd soundcore snd_page_alloc applesmc input_polldev uinput sunrpc sdhci_pci sdhci firewire_ohci firewire_core mmc_core crc_itu_t ata_generic pata_acpi video usb_storage [last unloaded: scsi_wait_scan]
May  9 19:03:26 mythtv kernel: [206272.023004]
May  9 19:03:26 mythtv kernel: [206272.023004] Pid: 3545, comm: mythfrontend Tainted: P           O 3.3.4-3.fc16.i686.PAE #1 Apple Inc. Macmini4,1/Mac-F2208EC8
May  9 19:03:26 mythtv kernel: [206272.023004] EIP: 0060:[<fb5bef1e>] EFLAGS: 00000246 CPU: 0
May  9 19:03:26 mythtv kernel: [206272.023004] EIP is at nv_vm_malloc_pages+0xfe/0x3b0 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004] EAX: 00200000 EBX: ef0d37c0 ECX: f7dfe000 EDX: 00000000
May  9 19:03:26 mythtv kernel: [206272.023004] ESI: 00000101 EDI: 00000000 EBP: e9f95c18 ESP: e9f95bec
May  9 19:03:26 mythtv kernel: [206272.023004]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
May  9 19:03:26 mythtv kernel: [206272.023004] Process mythfrontend (pid: 3545, ti=e9f94000 task=e2aa0000 task.ti=e9f94000)
May  9 19:03:26 mythtv kernel: [206272.023004] Stack:
May  9 19:03:26 mythtv kernel: [206272.023004]  c0200000 00000000 00200000 00000000 fb5b5ab0 000082d0 00000000 c0200000
May  9 19:03:26 mythtv kernel: [206272.023004]  00000000 00000001 00000000 e9f95c70 fb5b7d7e 00000000 fb73e3e8 00000000
May  9 19:03:26 mythtv kernel: [206272.023004]  00000001 00000001 00000140 ef040004 ef040304 00000140 fb348b84 e9f50004
May  9 19:03:26 mythtv kernel: [206272.023004] Call Trace:
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b5ab0>] ? nvos_create_alloc+0x130/0x230 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b7d7e>] nv_alloc_pages+0x14e/0x380 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb348b84>] ? _nv004005rm+0x51ce/0xb1de [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb58ab5d>] _nv014485rm+0x141/0x14a [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb3f9e27>] ? _nv013180rm+0x1bf/0x36d [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb3fa2d2>] ? _nv013174rm+0x2fd/0x388 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1fbe74>] ? _nv005983rm+0x109b/0x2678 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1ff210>] ? _nv005907rm+0x1dd/0x689 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1fefef>] ? _nv005753rm+0xdd9/0xe1d [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1f8d09>] ? _nv005750rm+0x73c/0xec8 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<faf732d4>] ? _nv001104rm+0x98c/0xf5a [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<faf8170d>] ? _nv000956rm+0x8c/0xdf [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5888f9>] ? _nv001100rm+0x4c9/0x67c [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5952c2>] ? rm_ioctl+0x6d/0x16b [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0944471>] ? __slab_free+0xe9/0x1b5
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c066d752>] ? _copy_from_user+0x42/0x60
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b6a21>] ? nv_kern_ioctl+0x151/0x450 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c05170a1>] ? tlb_finish_mmu+0x11/0x40
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b6d20>] ? nv_kern_ioctl+0x450/0x450 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b6d40>] ? nv_kern_unlocked_ioctl+0x20/0x30 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c05565b2>] ? do_vfs_ioctl+0x82/0x5a0
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0560403>] ? mntput_no_expire+0x23/0x100
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c051d364>] ? remove_vma+0x44/0x60
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c051d364>] ? remove_vma+0x44/0x60
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c051e3ca>] ? do_munmap+0x16a/0x290
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0556b3f>] ? sys_ioctl+0x6f/0x80
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0951fdf>] ? sysenter_do_call+0x12/0x28
May  9 19:03:26 mythtv kernel: [206272.023004] Code: 04 75 15 8b 45 e8 31 d2 e8 80 14 f4 c4 85 c0 89 45 f0 0f 84 8d 02 00 00 8b 45 f0 c7 44 24 04 00 00 00 00 89 04 24 e8 a2 04 00 00 <89> d1 09 c1 0f 84 78 01 00 00 83 fa 00 76 8b 8b 4b 18 8b 7d ec
May  9 19:03:26 mythtv kernel: [206272.023004] Call Trace:
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b5ab0>] ? nvos_create_alloc+0x130/0x230 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b7d7e>] nv_alloc_pages+0x14e/0x380 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb348b84>] ? _nv004005rm+0x51ce/0xb1de [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb58ab5d>] _nv014485rm+0x141/0x14a [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb3f9e27>] ? _nv013180rm+0x1bf/0x36d [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb3fa2d2>] ? _nv013174rm+0x2fd/0x388 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1fbe74>] ? _nv005983rm+0x109b/0x2678 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1ff210>] ? _nv005907rm+0x1dd/0x689 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1fefef>] ? _nv005753rm+0xdd9/0xe1d [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb1f8d09>] ? _nv005750rm+0x73c/0xec8 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<faf732d4>] ? _nv001104rm+0x98c/0xf5a [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<faf8170d>] ? _nv000956rm+0x8c/0xdf [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5888f9>] ? _nv001100rm+0x4c9/0x67c [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5952c2>] ? rm_ioctl+0x6d/0x16b [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0944471>] ? __slab_free+0xe9/0x1b5
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c066d752>] ? _copy_from_user+0x42/0x60
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b6a21>] ? nv_kern_ioctl+0x151/0x450 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c05170a1>] ? tlb_finish_mmu+0x11/0x40
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b6d20>] ? nv_kern_ioctl+0x450/0x450 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<fb5b6d40>] ? nv_kern_unlocked_ioctl+0x20/0x30 [nvidia]
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c05565b2>] ? do_vfs_ioctl+0x82/0x5a0
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0560403>] ? mntput_no_expire+0x23/0x100
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c051d364>] ? remove_vma+0x44/0x60
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c051d364>] ? remove_vma+0x44/0x60
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c051e3ca>] ? do_munmap+0x16a/0x290
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0556b3f>] ? sys_ioctl+0x6f/0x80
May  9 19:03:26 mythtv kernel: [206272.023004]  [<c0951fdf>] ? sysenter_do_call+0x12/0x28 


Version-Release number of selected component (if applicable):
Currently running 3.3.4-3.fc16.i686.PAE, but has occurred on all 3.x.x kernels

How reproducible:
Usually happens within 24hours of booting, no pattern that I can observe yet. As machine is a MythTV frontend, mythfrontend usually triggers it when beginning to watch a recording, but has also been observed being triggered launching a web browser or the bluetooth daemon.

Steps to Reproduce:
1.Boot machine and use with uptime of >24h
2.occurs seemingly randomly on the launch of a process, has happened with several different processes 
3.
  
Actual results:
Machine locks up and does not respond to direct input, existing ssh sessions stay connected but are very unresponsive and repeats

Message from syslogd@mythtv at May  9 19:04:50 ...
kernel:[206356.023004] BUG: soft lockup - CPU#0 stuck for 23s [mythfrontend:3545]

Message from syslogd@mythtv at May  9 19:04:50 ...
kernel:[206356.023004] Process mythfrontend (pid: 3545, ti=e9f94000 task=e2aa0000 task.ti=e9f94000)

Message from syslogd@mythtv at May  9 19:04:50 ...
kernel:[206356.023004] Stack:

Message from syslogd@mythtv at May  9 19:04:50 ...
kernel:[206356.023004] Call Trace:

Message from syslogd@mythtv at May  9 19:04:50 ...
kernel:[206356.023004] Code: 0c 3e 8d 74 26 00 c7 44 24 08 18 5b 72 fb c7 44 24 04 c4 ff 73 fb c7 04 24 04 00 00 00 e8 c9 10 00 00 31 c0 31 d2 c9 c3 8d 76 00 <55> 89 e5 83 ec 0c 3e 8d 74 26 00 8b 45 0c 8b 55 08 83 f8 00 77

Expected results:
Process to launch and machine to remain responsive

Additional info:

Comment 1 Josh Boyer 2012-05-10 12:15:52 UTC
Seems the proprietary nvidia driver has issues.  You should contact nVidia about that.

If you can recreate the lockups without using that module, let us know.