From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: When running a dump or tar operation of 64k blocksize from anexternal scsi attached raid device to scsi attached tape I get a system crash. When running the same exact process on the UP variant kernel (2.4.21-27.0.2.EL) the operation runs flawlessly. In order to eliminate possibilities of bugs in specific device drivers I have substituted a number of components in order to utilize different driver modules in the test. SCSI Cards used: LSI 22320R dual channel U320 SCSI (mptbase/mptscsih), LSI 20320R single channel U320 SCSI (mptbase/mptscsih), Adaptec 39320A-R (aic79xx) Tape devices used: Quantum SuperDLT 600 (st), HP LTO (st) RAID device used: Infortrend A16U (sd_mod) SCSI configuration: raid and tape sharing bus, raid and tape on separate buses (dual channel scsi card) Boot options: I have tried defaults as well as "noapic" Commands used: `dump -0 -b 64 -f /dev/st0 /raid`, `tar -b 128 -cvf /dev/st0 /raid`, `restore -C -b 64 -f /dev/st0` In all cases using the 2.4.21-27.0.2.ELsmp kernel crashes. Using the 2.4.21-27.0.2.EL (UP) the same tests that cause crashes works without error. On the UP kernel I am able to perform restore/compares without any errors. Version-Release number of selected component (if applicable): kernel-2.4.21-27.0.2.ELsmp How reproducible: Always Steps to Reproduce: 1. Install RHEL3-UD4 with 2.4.21-27.0.2.ELsmp and 2.4.21-27.0.2.EL 2. Attach SCSI tape, SCSI raid (ext3fs) 3. Make ext3 fs on raid and mount at /raid 4. Boot SMP kernel 5. `mt -f /dev/st0 status` 6. `dump -0 -b 64 -f /dev/st0 /raid' 7. Observe crash Additional info: attachment of log files, rpm listings, sysinfo and crash screen soon to come
Please attach console output with oops/panic info.
dump -0 -b 64 -f /dev/st0 /raid DUMP: Date of this level 0 dump: Mon May 2 13:13:02 2005 DUMP: Dumping /dev/sda1 (/raid) to /dev/st0 DUMP: Added inode 8 to exclude list (journal inode) DUMP: Added inode 7 to exclude list (resize inode) DUMP: Label: none DUMP: mapping (Pass I) [regular files] DUMP: Kernel BUG at pci_dma:43 invalid operand: 0000 CPU 0 Pid: 2202, comm: dump Not tainted RIP: 0010:[<ffffffff80116b2a>]{pci_map_sg+106} RSP: 0018:0000010033901c68 EFLAGS: 00010086 RAX: 00000000801fadc0 RBX: 000001003f5ba080 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000010031361000 RDI: 0000010004b30000 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000009 R13: 0000000000000001 R14: 000001003f5ba040 R15: 0000010004b30000 FS: 0000002a969644c0(0000) GS:ffffffff805e1540(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000005a43e0 CR3: 0000000000101000 CR4: 00000000000006e0 Call Trace: [<ffffffff80116ba3>]{pci_map_sg+227} [<ffffffffa002b41c>]{:aic79xx:ahd_linux_run_device_queue+796} [<ffffffffa0026ffb>]{:aic79xx:ahd_linux_queue+731} [<ffffffffa00008a0>]{:scsi_mod:scsi_dispatch_cmd+640} [<ffffffffa0009bbf>]{:scsi_mod:scsi_request_fn+1039} [<ffffffffa0008cff>]{:scsi_mod:__scsi_insert_special+127} [<ffffffffa0008d6f>]{:scsi_mod:scsi_insert_special_req+31} [<ffffffffa0000bae>]{:scsi_mod:scsi_do_req_Rsmp_6b1beddb+350} [<ffffffffa0096310>]{:st:st_sleep_done+0} [<ffffffffa0096516>]{:st:st_do_scsi+310} [<ffffffffa0098089>]{:st:st_write+2121} [<ffffffff8015f452>]{sys_write+178} [<ffffffff801102a7>]{system_call+119} Process dump (pid: 2202, stackpage=10033901000) Stack: 0000010033901c68 0000000000000018 ffffffff80116ba3 0000000000000246 000001003f0bf800 0000010004a61480 000001003f5ba040 000001003ffe0000 000001003f75ae80 0000010004a5a85a ffffffffa002b41c 0000000000000000 000001003f0bf800 000001003ffe0000 0000000000000000 0000010004a8c800 0000000000000000 000001003f0bf930 ffffffffa0026ffb 000001003f82ba60 000001003f0bf800 000001003f0bf800 ffffffffa00008a0 0000000000000282 0000010033901d78 0000000000000293 000001003f82ba60 000001003f82ba60 000001003f0bf800 000001003f790400 0000010004a8c800 000001003f790430 ffffffffa0009bbf 000001000000e588 0000000000000000 000001003f82ba00 000001003f790430 000001003f82ba00 0000010033901f08 0000010033901f08 Call Trace: [<ffffffff80116ba3>]{pci_map_sg+227} [<ffffffffa002b41c>]{:aic79xx:ahd_linux_run_device_queue+796} [<ffffffffa0026ffb>]{:aic79xx:ahd_linux_queue+731} [<ffffffffa00008a0>]{:scsi_mod:scsi_dispatch_cmd+640} [<ffffffffa0009bbf>]{:scsi_mod:scsi_request_fn+1039} [<ffffffffa0008cff>]{:scsi_mod:__scsi_insert_special+127} [<ffffffffa0008d6f>]{:scsi_mod:scsi_insert_special_req+31} [<ffffffffa0000bae>]{:scsi_mod:scsi_do_req_Rsmp_6b1beddb+350} [<ffffffffa0096310>]{:st:st_sleep_done+0} [<ffffffffa0096516>]{:st:st_do_scsi+310} [<ffffffffa0098089>]{:st:st_write+2121} [<ffffffff8015f452>]{sys_write+178} [<ffffffff801102a7>]{system_call+119} Code: 0f 0b 20 34 2d 80 ff ff ff ff 2b 00 eb 5d 48 8b 4b 08 48 85 Kernel panic: Fatal exception mapping (Pass IINMI Watchdog detected LOCKUP on CPU0, eip ffffffffa003d3f5, registers: CPU 0 Pid: 2202, comm: dump Not tainted RIP: 0010:[<ffffffffa003d3f5>]{:aic79xx:.text.lock.aic79xx_core+55} RSP: 0018:ffffffff805e63a8 EFLAGS: 00000086 RAX: 0000010004a7d000 RBX: 000001003ffe0000 RCX: ffffffffa003a4a0 RDX: 000001003ffe0000 RSI: 000001003ffe3180 RDI: 000001003ffe0000 RBP: 000001003ffe0000 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000000008 R11: 0000000000000010 R12: ffffffff805e63b0 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff804a1d00 FS: 0000002a969644c0(0000) GS:ffffffff805e1540(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000005a43e0 CR3: 0000000000101000 CR4: 00000000000006e0 Call Trace: <EOE> [<ffffffff801302ec>]{timer_bh+684} [<ffffffff80110807>]{common_interrupt+95} [<ffffffff8012afbf>]{bh_action+79} [<ffffffff8012ae6b>]{tasklet_hi_action+139} [<ffffffff8012ab2e>]{do_softirq+174} [<ffffffff80113463>]{do_IRQ+339} [<ffffffff80110807>]{common_interrupt+95} <EOI> [<ffffffff801f93bd>]{__make_request+1277} [<ffffffff801f9347>]{__make_request+1159} [<ffffffff801f951b>]{generic_make_request+331} [<ffffffff801f9591>]{submit_bh_rsector+97} [<ffffffff8016089e>]{write_locked_buffers+62} [<ffffffff80160a24>]{write_some_buffers+372} [<ffffffff8012499c>]{call_console_drivers+268} [<ffffffff80124cd5>]{printk+485} [<ffffffff80160a57>]{write_unlocked_buffers+23} [<ffffffff80160b6e>]{sync_buffers+30} [<ffffffff80160cda>]{fsync_dev+10} [<ffffffff80160e1b>]{sys_sync+11} [<ffffffff80124148>]{panic+296} [<ffffffff801113c0>]{show_trace+640} [<ffffffff801114fd>]{show_stack+205} [<ffffffff80111640>]{show_registers+304} [<ffffffff801117ce>]{die+238} [<ffffffff8011192d>]{do_trap+301} [<ffffffff80111c76>]{do_invalid_op+166} [<ffffffff80116b2a>]{pci_map_sg+106} [<ffffffff8013e53f>]{do_no_page+95} [<ffffffff80110b06>]{error_exit+0} [<ffffffffa002b41c>]{:aic79xx:ahd_linux_run_device_queue+796} [<ffffffffa0026ffb>]{:aic79xx:ahd_linux_queue+731} [<ffffffffa00008a0>]{:scsi_mod:scsi_dispatch_cmd+640} [<ffffffffa0009bbf>]{:scsi_mod:scsi_request_fn+1039} [<ffffffffa0008cff>]{:scsi_mod:__scsi_insert_special+127} [<ffffffffa0008d6f>]{:scsi_mod:scsi_insert_special_req+31} [<ffffffffa0000bae>]{:scsi_mod:scsi_do_req_Rsmp_6b1beddb+350} [<ffffffffa0096310>]{:st:st_sleep_done+0} [<ffffffffa0096516>]{:st:st_do_scsi+310} [<ffffffffa0098089>]{:st:st_write+2121} [<ffffffff8015f452>]{sys_write+178} [<ffffffff801102a7>]{system_call+119} Process dump (pid: 2202, stackpage=10033901000) Stack: ffffffff805e63a8 0000000000000018 0000000000000000 0000000000100000 0000000000000000 0000000000000000 ffffffff803ed6e0 0000000000000001 0000000000000000 0000000000000000 0000000000000000 0000010004b7a400 0000000000000042 0000010004b83280 ffffff0000000000 000000fffffff000 0000000000000000 0000010004b7ba80 0000000000000000 0000000000000000 0000000100000001 0000000000000001 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000bbeb80 0000000000000001 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: <EOE> [<ffffffff801302ec>]{timer_bh+684} [<ffffffff80110807>]{common_interrupt+95} [<ffffffff8012afbf>]{bh_action+79} [<ffffffff8012ae6b>]{tasklet_hi_action+139} [<ffffffff8012ab2e>]{do_softirq+174} [<ffffffff80113463>]{do_IRQ+339} [<ffffffff80110807>]{common_interrupt+95} <EOI> [<ffffffff801f93bd>]{__make_request+1277} [<ffffffff801f9347>]{__make_request+1159} [<ffffffff801f951b>]{generic_make_request+331} [<ffffffff801f9591>]{submit_bh_rsector+97} [<ffffffff8016089e>]{write_locked_buffers+62} [<ffffffff80160a24>]{write_some_buffers+372} [<ffffffff8012499c>]{call_console_drivers+268} [<ffffffff80124cd5>]{printk+485} [<ffffffff80160a57>]{write_unlocked_buffers+23} [<ffffffff80160b6e>]{sync_buffers+30} [<ffffffff80160cda>]{fsync_dev+10} [<ffffffff80160e1b>]{sys_sync+11} [<ffffffff80124148>]{panic+296} [<ffffffff801113c0>]{show_trace+640} [<ffffffff801114fd>]{show_stack+205} [<ffffffff80111640>]{show_registers+304} [<ffffffff801117ce>]{die+238} [<ffffffff8011192d>]{do_trap+301} [<ffffffff80111c76>]{do_invalid_op+166} [<ffffffff80116b2a>]{pci_map_sg+106} [<ffffffff8013e53f>]{do_no_page+95} [<ffffffff80110b06>]{error_exit+0} [<ffffffff80116b2a>]{pci_map_sg+106} [<ffffffff80116ba3>]{pci_map_sg+227} [<ffffffffa002b41c>]{:aic79xx:ahd_linux_run_device_queue+796} [<ffffffffa0026ffb>]{:aic79xx:ahd_linux_queue+731} [<ffffffffa00008a0>]{:scsi_mod:scsi_dispatch_cmd+640} [<ffffffffa0009bbf>]{:scsi_mod:scsi_request_fn+1039} [<ffffffffa0008cff>]{:scsi_mod:__scsi_insert_special+127} [<ffffffffa0008d6f>]{:scsi_mod:scsi_insert_special_req+31} [<ffffffffa0000bae>]{:scsi_mod:scsi_do_req_Rsmp_6b1beddb+350} [<ffffffffa0096310>]{:st:st_sleep_done+0} [<ffffffffa0096516>]{:st:st_do_scsi+310} [<ffffffffa0098089>]{:st:st_write+2121} [<ffffffff8015f452>]{sys_write+178} [<ffffffff801102a7>]{system_call+119} Code: f3 90 7e f5 e9 f4 d0 ff ff 90 90 41 56 31 f6 41 55 41 54 49 console shuts up ...
Created attachment 113931 [details] gziped text file of system boot, version information, system config.
* New information * With the exact same hardware and software configuration I added the RHEL4/x86_64 kernel (2.6.9-5.ELsmp) and I am able to run successful dumps, restores and post-restore file compares. No errors at all. It would appear that the cause of the crashes during dump operations are due to something in 2.4.21-27.0.2.ELsmp that has been fixed in 2.6.9-5.ELsmp. Information from kernel performing flawless backups and restores: unme -a: Linux localhost.localdomain 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST 2005 x86_64 x86_64 x86_64 GNU/Linux /proc/cmdline: ro root=LABEL=/ console=tty0 console=ttyS0,9600n8 lsmod: Module Size Used by md5 5697 1 ipv6 280993 8 usbserial 33329 0 lp 15089 0 button 9057 0 autofs4 23241 0 e1000 92749 0 floppy 65809 0 sg 42489 0 parport_pc 29313 0 parport 43981 2 lp,parport_pc ohci_hcd 24273 0 st 41957 0 ext3 137297 4 jbd 68849 1 ext3 aic79xx 178237 2 sd_mod 19393 4 scsi_mod 140177 4 sg,st,aic79xx,sd_mod
Is running a Redhat released 2.6 variant kernel under RHEL3 a supported action?
Jeff, answer is "no".
Ernie, Okay then given the fact that the bug exists in 2.4.21-27.0.2.ELsmp then how do I obtain a solution to the bug that is also "supported".
We are going to need to fix the 2.4 kernel in RHEL 3. It is important for you to file a problem report through the Red Hat support organization, to ensure that this issue is prioritized corectly and gets attention from the correct maintainers. Bugzilla is an informal, less reliable, way to access support. The root of the problem is "Kernel BUG at pci_dma:43". I will look in to this.
After a very long phone call with RH support they informed me that since my RHEL license is "Educational" I am not entitled to submit a bug through RH Support. They redirected me to this site and asked that I make it clear that this is the avenue I have to take to submit this problem to Redhat for a fix. Please escalate this within the Redhat organization in any way that you are able. Thank You
Okay, sorry to send you off on an unproductive route. The crash is in the following code: ./arch/x86_64/kernel/pci-dma.c int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, int nents, int direction) { int i; BUG_ON(direction == PCI_DMA_NONE); /* * temporary 2.4 hack */ for (i = 0; i < nents; i++ ) { struct scatterlist *s = &sg[i]; int flush = (i == nents-1); void *addr = s->address; if (addr) ===> line 43 BUG_ON(s->page || s->offset); else if (s->page) addr = page_address(s->page) + s->offset; else BUG(); s->dma_address = __pci_map_single(hwdev, addr, s->length, direction, flush); if (unlikely(s->dma_address == bad_dma_address)) goto error; } ------------------ Looks as though a driver is passing an illegal scatter-gather list. We will investigate.
Hi, I'm having panic problems here too with an x86_64 machine with an adaptec aic79xx scsi adapter, when trying to backup on tape. Maybe related. I'm attaching my panic informations
Created attachment 115075 [details] Panic message from kernel
In fact, the bug appeared firs for me with kernel 2.4.21-27.0.4. It happens almost every time I try to macke a tape backup with kernel 2.4.21-27.0.4 through 2.4.21-32.0.1. It has happened only once with kernel 2.4.21-27.0.2. Not sure to understand why. If this can help...
Created attachment 119845 [details] patch to free write buffer after write completes This bug appears to be a duplicate of BZ 162212. Please try the attached patch. Let us know the result.
I have already verified a fix through the ET system.
Josef.. Can you elaborate on the fix you verified? Was it a patch or a compiled kernel rpm? Who/where was the source of the fix? Thanks
I worked with Chris Vanhoof who supplied a test rpm and I am currently waiting for an officially supported rpm that we can supply to Bank of America. You can get more info in the ET #75570. Thanks
Jeff, please try the patch in comment #17 to verify whether we've fixed the problem. Reverting to NEEDINFO.
I will test to find out if it fixes the problem. Management is refusing anything except a fix embedded in an officially support RHEL kernel RPM update.
I will post a link with an officially supported RHEL kernel RPM update once I receive one from Chris Vanhoof. We are expecting one soon.
We have been seeing a very similar kernel oops using Veritas NetBackup 5.1 over a QLA2312 to a fiber attached SDLT220 library. I would be very interested in seeing a test kernel released in a beta channel perhaps.
Matthew, Please try the test kernel located here: http://people.redhat.com/dledford/st_tape_test/x86_64/ Please note that this is strictly a test kernel. Not supported. If the testing goes well, the fix will be in U7. If is committed and you need it sooner, contact the Red Hat support organization.
Hi, Jeff. Have you had a chance to try the patch in comment #17 yet?
A fix for this problem was committed to the RHEL3 U7 patch pool on 19-Oct-2005 (in kernel version 2.4.21-37.6.EL). Propagating acks from bug 162212. *** This bug has been marked as a duplicate of 162212 ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html