Bug 468624 - 5000x chipset, cutting data DVD with K3b/growisofs causes kernel panic and trashes file system
Summary: 5000x chipset, cutting data DVD with K3b/growisofs causes kernel panic and t...
Keywords:
Status: CLOSED DUPLICATE of bug 446086
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: i386
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: David Milburn
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-26 21:12 UTC by Todd
Modified: 2009-02-22 13:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-19 15:09:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
lspci -v > lspci.txt (8.56 KB, text/plain)
2008-10-28 17:08 UTC, Todd
no flags Details
dmesg > dmesg.txt (25.30 KB, text/plain)
2008-10-28 17:14 UTC, Todd
no flags Details


Links
System ID Private Priority Status Summary Last Updated
CentOS 3002 0 None None None Never

Description Todd 2008-10-26 21:12:59 UTC
Hi All,

Severity = high as I could have lost my business, not "urgent" as
I have a workaround.

Please note, I can cut "anything" I want with (Windows) Deep Burner under 
Wine.  I also can cut CD's with (Linux) cdrecord without a problem.

My mobo: http://www.supermicro.com/products/motherboard/Xeon1333/5000X/X7DAL-E+.cfm
My chipset: http://www.intel.com/products/server/chipsets/5000x/5000x-overview.htm
Tested DVD burners: SATA Plextor PX-755SAn, SATA Sony/NEC/Optiarc AD-7200S-0B DVD burner 

This problem first occurred in CentOS 4.x. I had several servers out
there with SATA DVD drives in them. If you wrote "anything"
to them with "any program", you got a kernel panic. I had
the replace them all with PATA drives at my own expense.

When CentOS 5.0 came along, the problem disappeared. I
have one Plextor SATA drive left out there: mine. It
seems that whatever kernel changes were made between
CentOS 5.1 and 5.2, the problem came back in spades.  

The panic always occurrs at the start point where K3b starts to burn.
If you get to the point were you actually start burning, you are okay.

Since upgrading top CentOS 5.2, six times now while cutting a data
dvd with k3b/growsiofs, my machine has gone into a kernel panic with
flashing lights on my keyboard. And, twice my file system got corrupted. 
Bad corrupted. Had I not had excellent backup, I would have lost my business.

Since then, I have stayed away from burning DVD's with k3b/growisofs, 
except once by accident. I ran k9copy and it automatically started K3B. 
I did not realize it would do that, or I wouldn't have run it. I got 
a kernel panic at the splash screen for K3B.

And, never a problem cutting anything with Windows programs under Wine.

To reproduce this:

1) BACKUP YOUR SYSTEM. Do not get arrogant and think this can only
happen to me. Backup up your stuff. I recommend "dump".

2) Create a compressed single file of approximate 1 to 2 GB in size.
I used "dump" (dump -0a -z -f $DumpFile $OurStuff'/.')

3) write this file to a SATA DVD burner with K3B/growisofs.

Repeat this enough times, you will get a kernel panic. Should only
take about 4 to 8 attempts.

This is from my /var/log/messages.1 on the second file system crash:

Jul 18 23:00:53 rn1 kernel: usb 1-7.1: USB disconnect, address 6

Jul 18 23:00:54 rn1 kernel: usb 1-7.3: USB disconnect, address 5

Jul 18 23:02:37 rn1 kernel: cdrom: This disc doesn't have any tracks I recognize!

Jul 18 23:04:33 rn1 last message repeated 2 times

Jul 18 23:10:18 rn1 last message repeated 3 times

Jul 18 23:10:22 rn1 smbd[6343]: [2008/07/18 23:10:22, 0] lib/util_sock.c:read_data(534)

Jul 18 23:10:22 rn1 smbd[6343]: read_data: read failure for 4 bytes to client 192.168.255.197. Error = No route to host

Jul 18 23:10:46 rn1 kernel: cdrom: This disc doesn't have any tracks I recognize!

Jul 18 23:45:40 rn1 syslogd 1.4.1: restart. 


Many thanks,
-T

Comment 1 David Milburn 2008-10-27 23:55:30 UTC
Would you please attach dmesg output and "lspci -v"?

Also, when you see the kernel panic do you see any specific libata routines
on the stack trace? Thanks.

Comment 2 Todd 2008-10-28 17:08:48 UTC
Created attachment 321709 [details]
lspci -v > lspci.txt

Comment 3 Todd 2008-10-28 17:14:22 UTC
Created attachment 321711 [details]
dmesg > dmesg.txt

> Also, when you see the kernel panic do you see any specific 
> libata routines on the stack trace?

I apologize, but I do not know what a libata routine or a stack
trace is.

-T

Comment 4 David Milburn 2008-10-28 20:57:42 UTC
Looking at dmesg, you are using ata_piix driver, this is a low-level driver
that is part of the linux sata subsystem (libata).

Normally, a kernel panic will generate a stack trace showing where the 
kernel is actually crashing, in this case I would expect something like the
following (Just an example). In this example we know that ata_sff_check_status
was called right before the crash.

Process insmod (pid: 776, threadinfo ffff81027f5ca000, task ffff810108fc17e0)
Stack:  ffffffff880c3330 0000000000000001 ffffffff880c3475 ffff810108cc4000
 ffffffff880bec82 0000000000000282 ffffffff880bee80 0000000000000016
 ffff810108cc4000 ffff81027fb5a9d8 ffffffff880b5eb9 ffff8104bf8f2400
Call Trace:
 [<ffffffff880c3330>] :libata:ata_sff_check_status+0x10/0x16
 [<ffffffff880c3475>] :libata:ata_sff_freeze+0x3a/0x4c
 [<ffffffff880bec82>] :libata:__ata_port_freeze+0x52/0x58
 [<ffffffff880bee80>] :libata:ata_eh_freeze_port+0x2b/0x40
 [<ffffffff880b5eb9>] :libata:ata_host_start+0x10f/0x169
 [<ffffffff880c28a6>] :libata:ata_pci_sff_activate_host+0x32/0x1c3
 [<ffffffff880c5335>] :libata:ata_sff_interrupt+0x0/0x1c7
 [<ffffffff802177ce>] pcibios_set_master+0x80/0x85
 [<ffffffff880e9cd4>] :ata_piix:piix_init_one+0x66a/0x695
 [<ffffffff80155b50>] pci_device_probe+0x104/0x184
 [<ffffffff801b758c>] driver_probe_device+0x52/0xaa
 [<ffffffff801b76bb>] __driver_attach+0x65/0xb6
 [<ffffffff801b7656>] __driver_attach+0x0/0xb6
 [<ffffffff801b6ed5>] bus_for_each_dev+0x43/0x6e
 [<ffffffff801b6b1b>] bus_add_driver+0x7e/0x130
 [<ffffffff80155d28>] __pci_register_driver+0x4b/0x6c
 [<ffffffff880f8017>] :ata_piix:piix_init+0x17/0x28
 [<ffffffff800a4d15>] sys_init_module+0xaf/0x1e8
 [<ffffffff8005d116>] system_call+0x7e/0x83

Do you have a system available to test on?

A few things come to mind, from the link above, the chipset supports ahci mode,
have you tried changing the sata controller settings in the bios? It would good
to know if this is reproducible using ahci driver, as well as the ata_piix
driver.

If possible setting up a serial console may help catching a stack trace
that shows where the kernel was at the time of the crash, I would expect
to see the libata and ata_piix routines on the call trace.

Also, we have a rhel5.3 beta test kernel which includes a significant update to
the sata subsystem, if possible it would be good to know if it fails under
the kernel-2.6.18-120.el5 test kernel

http://people.redhat.com/dzickus/el5/120.el5/

Thanks.

Comment 5 Todd 2008-10-31 00:02:51 UTC
> Do you have a system available to test on?

Yes and no.  

There is a 2 in 3 chance I will be spending an hour with fsck

There is a 1 in 3 chance my hard drive will be trashed and I loose
an entire day's billings (a lot on $).

There is an indeterminate chance something will go wrong with
my backups I will loose my entire livelihood.  My life would be
very nearly ruined.

So, "yes" to non-destructive testing and "no" to destructive testing.

Is the fact I can burn data DVD's just fine with the Windows Deep Burner
under Wine any kind of clue?  What does Wine do differently over K3b/
growisofs?

-T

Comment 6 Todd 2008-10-31 00:35:16 UTC
I rebooted and checked my BIOS:

         Enhanced (NON-AHCI) mode: Sata and Pata drivers are autodetected
         and placed in native IDE mode.

I was too chicken to change it.

-T

Comment 7 David Milburn 2008-10-31 21:59:38 UTC
Ok, thanks for the info. I don't want to potentially trash your system, those
suggestions were only if you had another test system, same configuration. I
will see if I can find a system to reproduce the problem, it is possible that
it is a timing related bug so you may not be hitting it when running the app
under Wine.

Comment 8 David Milburn 2009-02-19 15:09:39 UTC
Todd,

I am going to close this as a duplicate of BZ 446086, I was able to crash
a system and verify the problem was no longer reproducible with the patch
attached to BZ 446086, this was also confirmed by the reporter of that bug.

*** This bug has been marked as a duplicate of bug 446086 ***


Note You need to log in before you can comment on or make changes to this bug.