Description of problem: I have a RHEL 4.6 server (2.6.9-67.ELsmp) running as a NAS head. After some period of uptime, I get an Oops and panic. There is significant I/O load on this box, but it seems unlikely that load is the only factor, as the crashes are usually days apart. In the log extract I have, the smb process is implicated, so I upgraded to Samba 3.0.28, prior to the most recent crash. Version-Release number of selected component (if applicable): RHEL 4U6, kernel 2.6.9-67.ELsmp, Samba 3.0.28 How reproducible: No clear trigger event is obvious, although heavy I/O through Samba could be implicated. This server has been operations for quite some time without issue, but the load has recently increased as we've consolidated our network storage. Steps to Reproduce: 1. 2. 3. Actual results: Oops and panic. Expected results: Normal file services. Additional info: Nov 7 15:38:24 nas smbd[16997]: [2008/11/07 15:38:24, 0] smbd/nttrans.c:call_nt_transact_ioctl(2481) Nov 7 15:38:24 nas smbd[16997]: call_nt_transact_ioctl(0x9005c): Currently not implemented. Nov 7 15:45:01 nas nss_wins[8082]: authenticated mount request from 192.168.129.100:862 for /shares/dev/DISTRIB (/shares/dev) Nov 7 15:46:49 nas kernel: Unable to handle kernel paging request at virtual address 00100104 Nov 7 15:46:49 nas kernel: printing eip: Nov 7 15:46:49 nas kernel: c012adb0 Nov 7 15:46:49 nas kernel: *pde = 31639001 Nov 7 15:46:49 nas kernel: Oops: 0002 [#1] Nov 7 15:46:49 nas kernel: SMP Nov 7 15:46:49 nas kernel: Modules linked in: vfat fat iptable_filter ip_tables joydev nfsd exportfs lockd nfs_acl lp autofs4 i2c_dev i2c_co re vmnet(U) parport_pc parport vmmon(U) sunrpc xfs_quota(U) xfs(U) dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd hw_random s2 io(U) bnx2 ext3 jbd mppVhba(U) qla2400(U) qla2xxx(U) qla2xxx_conf(U) ata_piix libata aacraid(U) mppUpper(U) sg sd_mod scsi_mod Nov 7 15:46:49 nas kernel: CPU: 3 Nov 7 15:46:49 nas kernel: EIP: 0060:[<c012adb0>] Tainted: P VLI Nov 7 15:46:49 nas kernel: EFLAGS: 00010002 (2.6.9-67.ELsmp) Nov 7 15:46:49 nas kernel: EIP is at free_uid+0x22/0x60 Nov 7 15:46:49 nas kernel: eax: 00100100 ebx: dc55f640 ecx: dc55f658 edx: 00200200 Nov 7 15:46:49 nas kernel: esi: 00000082 edi: d3f42f28 ebp: 00000000 esp: d3f42ea4 Nov 7 15:46:49 nas kernel: ds: 007b es: 007b ss: 0068 Nov 7 15:46:49 nas kernel: Process smbd (pid: 13508, threadinfo=d3f42000 task=df046770) Nov 7 15:46:49 nas kernel: Stack: e3f6e4f0 db56eb88 c012b4d0 0000000a 00000000 d3f42f28 df046770 df046c64 Nov 7 15:46:49 nas kernel: c012b557 d3f42000 df046c64 d3f42000 d3f42000 c012cbc2 c03518c0 df046c64 Nov 7 15:46:49 nas kernel: d3f42fc4 d3f42f08 d3f42f28 d3f42fc4 df046c64 d3f42000 d3f42000 c0105bd4 Nov 7 15:46:49 nas kernel: Call Trace: Nov 7 15:46:49 nas kernel: [<c012b4d0>] __dequeue_signal+0xfb/0x155 Nov 7 15:46:49 nas kernel: [<c012b557>] dequeue_signal+0x2d/0x54 Nov 7 15:46:49 nas kernel: [<c012cbc2>] get_signal_to_deliver+0xcf/0x346 Nov 7 15:46:49 nas kernel: [<c0105bd4>] do_signal+0x55/0xd9 Nov 7 15:46:49 nas kernel: [<c011e7fb>] __wake_up_common+0x36/0x51 Nov 7 15:46:49 nas kernel: [<c02d644d>] schedule+0x855/0x8f3 Nov 7 15:46:49 nas kernel: [<c02d644d>] schedule+0x855/0x8f3 Nov 7 15:46:49 nas kernel: [<c02d647d>] schedule+0x885/0x8f3 Nov 7 15:46:49 nas kernel: [<c0105c80>] do_notify_resume+0x28/0x38 Nov 7 15:46:49 nas kernel: [<c02d866a>] work_notifysig+0x13/0x15 Nov 7 15:46:50 nas kernel: Code: e8 01 c6 1a 00 89 d8 5b c3 56 85 c0 53 89 c3 74 55 9c 5e fa ba 40 cf 32 c0 e8 51 a4 09 00 85 c0 74 42 8d 4b 18 8b 43 18 8b 51 04 <89> 50 04 89 02 c7 41 04 00 02 20 00 8b 43 24 c7 43 18 00 01 10 Nov 7 15:46:50 nas kernel: <0>Fatal exception: panic in 5 seconds Nov 7 15:46:56 nas kernel: Kernel panic - not syncing: Fatal exception Nov 7 17:05:49 nas syslogd 1.4.1: restart.
Patch addressed to this issue was committed to kernel-2.6.9-74 of RHEL 4.7. Shawn, please, try this kernel: http://people.redhat.com/aarapov/kernel/2.6.9-74/ And let us know if it works for you. Thanks!
The current qlogic 24xx driver(8.02.12) does not support this kernel, so this isn't an option for me. I've dumped Samba 3.0.28 because the 3.0.32 is supposed to correct a race condition that could cause this, but I'm still getting lockups, but without the "Process smb". I've also upgraded to 2.6.9-67.0.22smp, which is the last maintenance kernel in 4.6, and I still get lockups but without a log entry for the panic.
Okay, I will prepare patched 2.6.9-67.
Shawn, please, test this kernel: http://people.redhat.com/aarapov/kernel/2.6.9-67.test Thanks!
Shawn, have you had a chance to test the kernel?
The next maintenance window is Friday, December 12th. Also, we've had three configuration changes since the last crash, so it makes sense to stand pat and see if the lockup occurs, since the "Oops" message hasn't recurred in the last two failures.
shawn, any luck with the patched kernel?
We removed the XFS filesystem driver, and haven't had a crash in 20 days, running the 2.6.9-67.0.22 kernel. Since the "oops" did not recur after the first crash, I'm not inclined to implement a fix for that issue at this time. When Qlogic releases upgraded drivers, we will consider upgrading to a current production kernel. Of course, if it ain't broke ...
Good. ... I'm closing this bug with WONTFIX. Please, reopon it in case of hitting it again. Thanks, Shawn.