Bug 221314

Summary: kernel oops: spinlock crash in smbd
Product: [Fedora] Fedora Reporter: Wolfgang Breyha <wbreyha>
Component: kernelAssignee: David Howells <dhowells>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: triage, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-06 17:17:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wolfgang Breyha 2007-01-03 19:22:30 UTC
Description of problem:

We've a Samba Host running on FC5 for a while now. But since late december it
crashes since a new (fast and Gbit connected) W2k3 server does his backups.

The host crashed about 5 times now after about 2 hours of backup. Since I've
connected and configured the serial console today I've only a crashdump from
today yet.

First the setup. It's an P4 2.66GHz on a ASUS P5LD2-VM DH with 1GB RAM. Samba
provides shares on a large Raid-5 array built with the intel onboard sata(ahci
mode) controller and an additional Promise TX4 sata.

here is the lspci output:
00:00.0 Host bridge: Intel Corporation 945G/P Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 945G Integrated Graphics
Controller (rev 02)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition
Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
(rev 01)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2
(rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev
01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev
01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev
01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev
01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GH (ICH7DH) LPC Interface Bridge
(rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
(rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA
Storage Controllers cc=AHCI (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:04.0 Mass storage controller: Integrated Technology Express, Inc. ITE 8211F
Single Channel UDMA 133 (ASUS 8211 (ITE IT8212 ATA RAID Controller)) (rev 11)
01:0a.0 Mass storage controller: Promise Technology, Inc. PDC20718 (SATA 300
TX4) (rev 02)
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
Controller


kernels used while crashing:
2.6.18-1.2200.fc5
2.6.18-1.2257.fc5
additional kernel params:
selinux=0

FC5 is yum updated to current state of today

samba used while crashing:
3.0.23c-1.fc5 (fedora update rpms)
3.0.23d-1 (samba original rpms)

The crashdump:
BUG: unable to handle kernel NULL pointer dereference at virtual address
00000000
 printing eip:
*pde = 3448e067
Oops: 0000 [#1]
last sysfs file: /devices/platform/i2c-9191/9191-0290/temp3_max_hyst
Modules linked in: ipv6 autofs4 w83627ehf hwmon eeprom i2c_isa hidp l2cap
bluetooth ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT xt_state
ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables raid456 xor
video sbs i2c_ec container button battery asus_acpi ac lp parport_pc parport
ehci_hcd uhci_hcd floppy sg snd_hda_intel snd_hda_codec snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device e1000 snd_pcm_oss
snd_mixer_oss ide_cd i2c_i801 snd_pcm i2c_core serio_raw pcspkr cdrom snd_timer
snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod raid1 ext3
jbd ahci sata_promise libata sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c04d278d>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-1.2257.fc5 #1)
EIP is at rb_erase+0xf6/0x22f
eax: 00000001   ebx: 00000000   ecx: 00000000   edx: f7be86c8
esi: f7be86c8   edi: f7be8448   ebp: c07946a0   esp: f7feff44
ds: 007b   es: 007b   ss: 0068
Process events/0 (pid: 4, ti=f7fef000 task=c18e05a0 task.ti=f7fef000)
Stack: 00000001 f7be8440 f7be8448 f7e22740 00000282 c04a87cf c0669840 c0669844
       c0428a48 c18e06c4 00000246 f7e22740 c04a8788 00000000 f7e22760 f7e22740
       f7e22758 00000000 c0428f46 00000001 00000000 c18e15f0 00010000 00000000
Call Trace:
 [<c04a87cf>] key_cleanup+0x47/0xce
 [<c0428a48>] run_workqueue+0x85/0xc5
 [<c0428f46>] worker_thread+0xe8/0x11a
 [<c042b1a5>] kthread+0xad/0xd8
 [<c0403adf>] kernel_thread_helper+0x7/0x10
DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
Leftover inexact backtrace:
EIP:    0060:[<c04d278d>]    Not tainted VLI
EFLAGS: 00010246   (2.6.18-1.2257.fc5 #1)
EIP is at rb_erase+0xf6/0x22f
eax: 00000001   ebx: 00000000   ecx: 00000000   edx: f7be86c8
esi: f7be86c8   edi: f7be8448   ebp: c07946a0   esp: f7feff44
ds: 007b   es: 007b   ss: 0068
Process events/0 (pid: 4, ti=f7fef000 task=c18e05a0 task.ti=f7fef000)
Stack: 00000001 f7be8440 f7be8448 f7e22740 00000282 c04a87cf c0669840 c0669844
       c0428a48 c18e06c4 00000246 f7e22740 c04a8788 00000000 f7e22760 f7e22740
       f7e22758 00000000 c0428f46 00000001 00000000 c18e15f0 00010000 00000000
Call Trace:
 [<c04a87cf>] key_cleanup+0x47/0xce
 [<c0428a48>] run_workqueue+0x85/0xc5
 [<c0428f46>] worker_thread+0xe8/0x11a
 [<c042b1a5>] kthread+0xad/0xd8
 [<c0403adf>] kernel_thread_helper+0x7/0x10
DWARF2 unwinder stuck at kernel_thread_helper+0x7/0x10
Leftover inexact backtrace:
 =======================
Code: 05 89 5a 08 eb 08 89 5a 04 eb 03 89 5d 00 83 3c 24 01 0f 85 46 01 00 00
e9 12 01 00 00 8b 4e 08 39 d9 0f 85 85 00 00 00 8b 4e 04 <8b> 01 a8 01 75 14 83
c8 01 89 ea 89 01 89 f0 83 26 fe e8 3c fd
EIP: [<c04d278d>] rb_erase+0xf6/0x22f SS:ESP 0068:f7feff44
 <0>BUG: spinlock lockup on CPU#0, smbd/3687, c0669780 (Not tainted)
 [<c0403f28>] dump_trace+0x69/0x1af
 [<c0404086>] show_trace_log_lvl+0x18/0x2c
 [<c0404601>] show_trace+0xf/0x11
 [<c040468b>] dump_stack+0x15/0x17
 [<c04d53fe>] _raw_spin_lock+0xbf/0xdc
 [<c04a84fb>] key_alloc+0x1d2/0x32f
 [<c04a9301>] keyring_alloc+0x30/0x6a
 [<c04aa995>] alloc_uid_keyring+0x4c/0xb2
 [<c0423596>] alloc_uid+0x95/0x13b
 [<c04265d6>] set_user+0xb/0x8e
 [<c0427e63>] sys_setresuid+0x111/0x1dd
 [<c0402da7>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 =======================
BUG: soft lockup detected on CPU#0!
 [<c0403f28>] dump_trace+0x69/0x1af
 [<c0404086>] show_trace_log_lvl+0x18/0x2c
 [<c0404601>] show_trace+0xf/0x11
 [<c040468b>] dump_stack+0x15/0x17
 [<c043f51c>] softlockup_tick+0x90/0xa1
 [<c042319f>] update_process_times+0x35/0x57
 [<c040631c>] timer_interrupt+0x58/0x90
 [<c043f79e>] handle_IRQ_event+0x23/0x49
 [<c043f846>] __do_IRQ+0x82/0xde
 [<c0405385>] do_IRQ+0x9a/0xb8

... the "soft lockup" continues to dump every few seconds....

The share accessed by the W2k3 server is configured as...
[backup]
        comment = backups go here
        path = /data/backup
        valid users = user1, user2
        admin users = user1, user2
        read list = user1, user2
        write list = user1, user2
        force user = backup
        force group = backup
        read only = No
        directory mask = 06775
        force directory mode = 06770

Version-Release number of selected component (if applicable):
see above

How reproducible:
running the backup in our setup crashes the server for sure.

Steps to Reproduce:
?
  
Actual results:
kernel ooops

Expected results:
stable server

Additional info:
If you need further details like the complete smb.conf, etc. ... please let me
know. I'll provide it ASAP.

Comment 1 Wolfgang Breyha 2007-04-26 16:08:24 UTC
seems to be fixed in 2.6.19+ since it never occurred since 2.6.19 and 2.6.20 are
installed on the particular machine.

Comment 2 Bug Zapper 2008-04-04 05:28:28 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 3 Bug Zapper 2008-05-06 17:17:40 UTC
This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.