Bug 242229

Summary: [pata_it821x] Hang on module load - IRQ routing ?
Product: [Fedora] Fedora Reporter: Paul Smith <phhs80>
Component: kernelAssignee: Alan Cox <alan>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 7CC: cebbert, cvizitiu, davej, gotenks, jeff, lance.raymond, martin.vgagern, rje, stsp
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.22.9-91.fc7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-03 20:13:14 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
lspci output
none
dmesg output
none
lspci output for supermicro pdsba+
none
portion of dmesg with vanilla 2.6.22.1 kernel
none
messages after modprobe_it831x before crash
none
Debugging messages from modprobe and reading the device
none
My copy of the module source, with debugging lines added.
none
Module source with lots of debugging added.
none
Console log from 2.6.23-rc6 including debug output
none
Patch adding debugging code to the module none

Description Paul Smith 2007-06-02 08:01:17 EDT
The process of upgrading from FC6 to F7 gets deadlocked when

"Loading pata_it821x driver".

Thus, no upgrading of F7 is possible here.
Comment 1 Chris Lumens 2007-06-02 10:40:55 EDT
What do you see on tty3 when this happens?
Comment 2 Paul Smith 2007-06-02 16:04:46 EDT
> What do you see on tty3 when this happens?

It happens after the tty3 stage. I get a blue screen with 4 installation options
and I choose the first one. Then, I get another blue screen (text mode) and the
reported problem when loading SCSI drivers. 

I do not know whether it matters, but I have a Pentium Dual Core and 3 IDE hard
disks.

It seems that a similar problem is happening to other people:

http://marc.info/?l=fedora-list&m=118080895617949&w=2

Paul
Comment 3 john d 2007-06-03 15:20:38 EDT
the same happens to me when trying to do a fresh installation of fedora 7 on an
empty harddisk (mb: asus p5gd2 - harddisk over SATA).

john
Comment 5 Alan Cox 2007-06-04 11:00:59 EDT
Can you tell me which firmware the IT8212 in question has loaded (it'll say in
the BIOS messages it displays)
Comment 6 Paul Smith 2007-06-04 11:30:57 EDT
> Can you tell me which firmware the IT8212 in question has loaded (it'll say in
> the BIOS messages it displays)

Could you please give me some details about how to obtain the information that
you are asking me for?

Paul
Comment 7 Rob Emanuele 2007-06-04 15:53:59 EDT
On my system with the same issue: GigaRAID BIOS v 1.41  
Comment 8 Paul Smith 2007-06-04 16:31:55 EDT
Here:

GigaRAID ATAPI BIOS v 1.71
Comment 9 Beat 2007-06-05 12:03:26 EDT
Hello, 
I have the same problem on Asus P5GD2-Basic motherboard. 
Main bios V1007 Beta 2
ITE Bios V1.7.1.591
I have a plextor PX-716A on master 1 and a harddisk on master 2.
Every things works if the ITE8212 controller is disabled in bios. But for me
this is not the solution, because I need the plextor drive.

Regards
Beat
Comment 10 john d 2007-06-05 13:26:59 EDT
hi, 

i have the same board as Beat - disabling ITE8212 is no option for me neither as
my only HD is connected using the controller...

regards
john 
Comment 11 Paul Smith 2007-06-07 12:31:39 EDT
For people who want to install F7, there is the following workaround that I have
found:

Install F7 with yum, as described in:

http://fedoraproject.org/wiki/YumUpgradeFaq

In order to have F7 booting, I had to switch (in grub) to the previous kernel.

Paul
Comment 12 Alan Cox 2007-06-07 12:45:43 EDT
Paul, can you attach an lspci -vvxxx and a dmesg of the F7 boot with the kernel
that works (ie the old one) so I can collect more data on what is still a bit of
a mystery
Comment 13 Paul Smith 2007-06-07 13:13:03 EDT
Created attachment 156488 [details]
lspci output
Comment 14 Paul Smith 2007-06-07 13:13:31 EDT
Created attachment 156489 [details]
dmesg output
Comment 15 Paul Smith 2007-06-07 13:16:08 EDT
They are attached.

The problem with the new kernel seems to occur at "starting udev" stage.

Paul
Comment 16 john d 2007-06-11 07:03:05 EDT
one question: why is this bug classified as severity and priority low? i`m not
even able to use fedora due to this bug - at least for me this therefore is a
little bit more than just a minor bug...

john
Comment 17 Alan Cox 2007-06-12 09:48:31 EDT
Because that is how the original reporter rated it
Comment 18 Paul Smith 2007-06-12 11:49:08 EDT
I cannot change priority; I can only change severity.

I have changed the severity to "urgent".
Comment 19 Jim Dishaw 2007-06-27 08:28:35 EDT
Same problem here.  Unable to do a new install of F7.  Motherboard is ASUS
P5LD2, CPU is Intel Core 2 Duo 3.4 GHz.  Other OS's installed and running well
are are FC6 and StartCom Linux.
Comment 20 Chuck Ebbert 2007-06-28 16:10:20 EDT
*** Bug 245979 has been marked as a duplicate of this bug. ***
Comment 21 Alan Cox 2007-06-28 16:12:08 EDT
Been doing some testing on this one. On my test box it is now all working
correctly in non-raid firmware mode with 2.6.22rc6. RAID mode needs some further
poking to deal with bugs in the emulation but Tejun's latest patches should have
that licked too
Comment 22 Ciprian 2007-07-02 10:33:06 EDT
Anyone googling for a workaround on how to make a fresh installation from DVD
(using the iso available on 02 July 07) here's some tips; at least with Gigabyte
GA81945P BIOS 1.75 do the following:

1. From BIOS disable the "On-chip primary/secondary PCI IDE" and also RAID
2. Physically disconnect any PATA device (e.g. DVD-ROM) from the motherboard
(doesn't matter if it's the ITE821x controller or the chipset)
3. Make sure you have only one hdd connected to the motherboard on SATA0 

... problem "fixed"; you can now use an external USB DVD-ROM to install normally. 

As of 07 Jul '07 it still won't boot with any PATA device connected in parralel
with SATAs (it freezes on starting Udev) but you can use the external DVD-ROM
unit until hopefully this bug gets fixed. 
Comment 23 David Marsh 2007-07-02 11:19:24 EDT
Same problem here using an Abit GD8 Motherboard with Intel 915P Chipset and ITE
8211F IDE. Infact the issue with this chipset became a problem sometime during
fc5 's life at some update. The original fc5 always installed and booted fine
(ignoring some later updates which break it). Also core 6 would hang at boot
after clean install. Passing all-generic-ide to the kernel is a work around in
fc6 to get the system booted after install but this work around fails on fc7 dvd
install and fc7 live cd were the system hands at the loading pata_it821x screen
or hangs at loading kernel in the live edition.
Comment 24 Alan Cox 2007-07-02 13:47:31 EDT
Ok with 2.6.22-rc7 all appears well with all the boards I can try and with both
raid and non-raid mode.

Comment 25 john d 2007-07-12 12:33:50 EDT
(In reply to comment #24)
> Ok with 2.6.22-rc7 all appears well with all the boards I can try and with both
> raid and non-raid mode.
> 
> 

is it possible to upgrade from fc6 in some way using this kernel? if so, is
there a documented way to do so?

thanks a lot!

john 
Comment 26 Paul Smith 2007-08-05 20:29:51 EDT
(In reply to comment #24)
> Ok with 2.6.22-rc7 all appears well with all the boards I can try and with both
> raid and non-raid mode.

The bug persists here. Hardware:

GigaRAID ATAPI BIOS v 1.71.

Paul


Comment 27 Paul Smith 2007-08-05 20:34:21 EDT
(In reply to comment #26)
> (In reply to comment #24)
> > Ok with 2.6.22-rc7 all appears well with all the boards I can try and with both
> > raid and non-raid mode.
> 
> The bug persists here. Hardware:
> 
> GigaRAID ATAPI BIOS v 1.71.

With kernel 2.6.22.1-41.fc7.

Paul

Comment 28 Jeff Norden 2007-08-08 16:46:50 EDT
I'll confirm this same problem with a Supermicro PDSBA+ motherboard.
The system reports an ITE bios version of 1.7.1.64
The only pata device in the system is the cdrom.

IT does *not* seem to be fixed by the 2.6.22 kernel.  I've confirmed this a
couple of ways:

1) I used pungi to spin a minimal distro with the 2.6.22.1-41.fc7.x86_64
kernel.  Booting the resulting cd still hangs at the loading pata_ite821x
step.  The cd boots fine on other hardware, where I can confirm that the
installer is using the new kernel.

2) The supermicro system is running FC6 right now.  Under FC6, I built a
vanilla 2.6.22.1 kernel with pata_it821x included (i,e, just unpack the
kernel, no patches, the only change made to default settings is to add the
pata_it821x module).  With this kernel, the module does get loaded, but
doesn't seem to do anything since my cdrom still shows up as /dev/hda.
However, the cdrom seems to work just fine.

3) I then re-built the kernel using the .config file from
kernel-2.6.22.1-41.fc7.src.rpm.  With this kernel, the system freezes at the
"starting udev" step, which is when the module will be loaded.

4) If I remove pata_it821x.ko from the appropriate subdirectory of
/lib/modules/, then the kernel from (3) above will boot fine, but with no
way to access my pata cdrom.  I then copied the module back into its
original location and did "modprobe pata_it821x".  The module loaded, and
it recognized the CD, and the system seemed to be ok for about 5 or 10
seconds, when it totally froze up, requiring a hard boot.  The one possibly
useful thing I noticed here is that the messages report that the cdrom is
being set to use UDMA/33, while under the vanilla 2.6.22.1 kernel or the
standard FC6 kernel, /proc/ide/hda/settings reports a speed of 66.

I'm attaching 3 files below.  The first is an lspci output, the second is
the relevant dmesg output from the vanilla kernel, and the third is the
output that results from "modprobe pata_it821x" before the system freezes.

I'll try to figure this out some more if I can find some time.  Any pointers
on what to try would be great.

Thanks,
-Jeff
Comment 29 Jeff Norden 2007-08-08 16:49:59 EDT
Created attachment 160935 [details]
lspci output for supermicro pdsba+
Comment 30 Jeff Norden 2007-08-08 16:53:59 EDT
Created attachment 160936 [details]
portion of dmesg with vanilla 2.6.22.1 kernel
Comment 31 Jeff Norden 2007-08-08 16:56:33 EDT
Created attachment 160937 [details]
messages after modprobe_it831x before crash
Comment 32 Chuck Ebbert 2007-08-08 17:04:02 EDT
(In reply to comment #28)
> 2) The supermicro system is running FC6 right now.  Under FC6, I built a
> vanilla 2.6.22.1 kernel with pata_it821x included (i,e, just unpack the
> kernel, no patches, the only change made to default settings is to add the
> pata_it821x module).  With this kernel, the module does get loaded, but
> doesn't seem to do anything since my cdrom still shows up as /dev/hda.
> However, the cdrom seems to work just fine.

Add the kernel parameter "combined_mode=libata" to try to make the IDE
driver get out of the way of the new driver.
Comment 33 Jeff Norden 2007-08-17 17:03:18 EDT
Ok, here is some more info, based on my supermicro MB setup.

1) The combined_mode option seems to have been removed in the 2.6.22 kernel.

2) I patched my kernel to add Alan Cox's pata_dma option, which I found here:
  http://www.redhat.com/archives/fedora-extras-commits/2007-July/msg00934.html
Booting with libata.pata_dma=0 works fine, although with dma disabled for
the cdrom.  Hopefully, this patch will make it to fedora updates fairly
quickly.

3) With dma enabled: If I use the trick of booting without the module in
place, kill udevd, and then insert the module, then this phase goes fine.
The cdrom is recognized and reported correctly, and no crash occurs.  If I
then do MAKEDEV scd0, any attempt to read from scd0 (even just reading one
block using dd) will cause the system to freeze within a few seconds.

I added lines to pata_it821x.c in order to trace the subroutines that are
called.  The last one called before the system freezes is
it821x_passthru_bmdma_start(), so I guess this confirms that a lost
interrupt is the problem (the bdma_start() subroutine does exit, so the
system isn't freezing up in there).  Interestingly, though, a successful
pair of bmda_start() and bmda_stop() calls also occur earlier, when the
module is first inserted, right before the cdrom drive is identified.

I guess the real question is why this hardware seems to work fine with the
older ide code, but fails under libata.

Hope this helps,
-Jeff
Comment 34 Alan Cox 2007-08-19 18:34:21 EDT
Interesting that turning off DMA is helpful, and possibly very important
information from your testing.

bmdma_start will get used first time for the command and then for data. If you
stick a printk in at 
it821x_passthru_qc_issue_prot and print qc->tf.command it will show which
command is being issued each time. Also if your box has >= 1GB RAM try booting
with mem=900M. I don't think that will have any effect but just check it.

I'll try and find a similar DVD drive here and test that see if it shows anything.
Comment 35 Alan Cox 2007-08-21 10:04:32 EDT
Progress - testing with a CD drive I can get a stuck IRQ off my IT821x which I
don't get off a disk. Doing some more investigating.
Comment 36 Jeff Norden 2007-08-22 18:56:24 EDT
Some more info:

My system has 4GB of memory, but mem=900M doesn't have any effect on this
problem (as you thought).

I added the extra printout of qc->tf.command - the command being issued is 0xA0,
which I guess is ATA_CMD_PACKET.

One additional piece of info that might help, is that it seems to take exactly
30 seconds for the system to freeze up after the read from the device is tried.
(This is longer than I thought it was.)

I'll attach a log of the debugging output, which shows a trace of all the it821x
subroutines that get called, as well as a copy of the modified pata_it821x.c
which produced the output.

Thanks,
-Jeff
Comment 37 Jeff Norden 2007-08-22 19:03:27 EDT
Created attachment 164589 [details]
Debugging messages from modprobe and reading the device
Comment 38 Jeff Norden 2007-08-22 19:04:35 EDT
Created attachment 164590 [details]
My copy of the module source, with debugging lines added.
Comment 39 Jeff Norden 2007-08-29 15:07:21 EDT
Here is a fix that works on my hardware.  Add the following lines at the
start of the it821x_check_atapi_dma() subroutine:

	/* Only use dma for transfers to/from the media. */
	if (qc->nbytes < 2048)
	  return -EOPNOTSUPP;

After poking around quite a bit, I discovered that libata is a lot more
aggressive about using dma than the older ide code is.  The old code only
uses dma for read or writes to or from the media in the drive, but libata
always uses it, e.g: to read the name of the drive.  In fact, libata calls
bmdma_start() even when qc->nbytes is zero, which seems unnecessary.  I
first tried to cancel the dma when qc->nbytes==0, but this isn't sufficient
to fix the problem.

An alternate fix would be to check qc->scscicmd[0], but there are several
different possible read and write commands, so checking the number of bytes
seems simpler.  Anything destined for the media will be at least 2048 bytes,
and I haven't come across any smaller transfers that cause a problem.

---

In the original code, the first dma call is from a GPCMD_INQUIRY command.
This one succeeds, the sequence of calls is:
  it821x_passthru_bmdma_start()
  ata_interrupt()
  ata_bmdma_status()
  it821x_passthru_bmdma_stop()

The second dma call is from a GPCMD_TEST_UNIT_READY command, and
it821x_passthru_bmdma_start() is never followed by the corresponding
ata_interrupt().  Instead, after 30 seconds, a call to ata_bmdma_freeze()
occurs, which then executes the line:
   iowrite8(ap->ctl, ioaddr->ctl_addr);
and the system immediately freezes up. (I don't think that is the intended
effect of ata_bmdma_freeze(), but at least the subroutine is aptly named :-)
I don't know why libata handles the missed interrupt so badly, but it might
be worth trying to figure that out.  When I first added the check for
qc->nbytes==0, the system just froze in the same way at the first non-zero
length transfer.

---

I'll attach my current debugging version of pata_it821x.c, which does more
than the previous one.  It prints both qc->tf.command and qc->scsicmd, so
you can tell more about what is happening.  You can control several aspects
of the behavior with parameter arguments, which makes testing things out
easier (including some other fixes that I tried but don't seem to work at
all for me).

-Jeff
Comment 40 Jeff Norden 2007-08-29 15:13:45 EDT
Created attachment 179641 [details]
Module source with lots of debugging added.
Comment 41 Alan Cox 2007-08-29 15:25:55 EDT
Interesting - there must be more revs of the chip/firmware than I ever realised.

The qc->nbytes = 0 case is  a libata bug as I read the spec. I doubt any
hardware cares about it which I guess is why its never been noticed, but its
most definitely a bug.

With the fix you've got does it hang if you rip or play an audio CD (that ends
up with a strange DMA size > 2048). I'm wondering if the needed check is
something like length % 512, length % 2048 or >= as you have now ?

After that I can submit a change - or better yet you could mail me a diff with
the OSDL Signed-off-by: line on it, so you get the full credit you deserve as
the author of the fixes
Comment 42 Jeff Norden 2007-08-31 12:34:53 EDT
I've tried out an audio CD with no problems.  I think the problem has more to do
with the underlying packet command than the size of the transfer.

I wonder if the qc->nbytes=0 problem occurs with hard disks too, or just atapi
devices.  At some point I'm going to add a pata disk to the system, at which
time I can try to see.

I'll email you a patch shortly (I've located the Documentation/SubmittingPatches
file, but haven't had time to read it through yet).

Thanks
-Jeff
Comment 43 Alan Cox 2007-09-10 11:29:49 EDT
*** Bug 242325 has been marked as a duplicate of this bug. ***
Comment 44 lance raymond 2007-09-10 21:57:24 EDT
Hey all, just looking for an update or help on this issue.  Rerading from post
1, there are some hardcore guys here, so looking for the end all solution or at
least a workaround.  I saw a kernel patch link which lost me, so how is this
thing looking.

I will gladly probvide specs, logs, etc. even try some things to test (just need
to hold my hand a bit) as some of the above is, well, deep  :)

Thanks all.

Lance

Comment 45 Chris Stofberg 2007-09-12 09:53:54 EDT
Just discovered this thread.
I tried to install F7 from DVD as soon as it became available.  Tried everything
I  could think of; sometimes the installation would complete normally, but
failed to boot.  Eventually I copied the F7 DVD iso to an external USB drive,
burned the Rescue iso to CD, booted that,  and installed from HDD (USB).  This
method has worked for me over several installs on different computers, including
an install on an old Dell Latitude laptop (266MHZ, 128MB). Each install was a
piece of cake, like Fedora installs up to F7 have always been.

Sorry I am unable to suggest another solution, but from now on I plan to use the
above method.
Comment 46 Martin von Gagern 2007-09-13 04:59:15 EDT
I've has problems with pata_it821x for a while now, documented in
http://bugzilla.kernel.org/show_bug.cgi?id=7507. The fix from comment 39 which
seems to be included in 2.6.23-rc6 does not solve the issue for me.
libata.pata_dma=0 from comment 33 does "solve" the issue, although hard disks
without DMA are unacceptable in a production use.

I've also had problems with the old it821x driver, and wrote about it in
http://bugzilla.kernel.org/show_bug.cgi?id=7506. There seem to be some
similarities. Both times it's DMA, although the old driver solves it by
disabling DMA after waiting 30 seconds. This sounds a lot like the 30 seconds
mentioned in comment 36.

My hardware is an ASUS P5GDC-V Deluxe motherboard with an ITE8212.
My system is no Red Hat, but as the discussion here seems more useful than
anything I got on the kernel bugzilla so far, I'll cc here.
Comment 47 Martin von Gagern 2007-09-13 06:35:24 EDT
Created attachment 194461 [details]
Console log from 2.6.23-rc6 including debug output

This is a verbose log from my system, including module loading and the 30
second pause, all with debug messages. It was captured by using a serial
console, a null modem cable, and a second machine with screen used to log the
session.
The log explains why the 2048-bytes-fix can't help me---looks like
it821x_check_atapi_dma isn't called here at all.
My revision is 0x13. The last command sent seems to be 0xc8.
Comment 48 Martin von Gagern 2007-09-13 06:40:06 EDT
Created attachment 194471 [details]
Patch adding debugging code to the module

This is an adaptation of Jeff's debugging code from comment 40, now as a patch
against 2.6.23-rc6, which means module version 0.3.8.
This is what I used to generate the debugging messages in comment 47.
Comment 49 Chuck Ebbert 2007-09-13 17:06:43 EDT
(In reply to comment #46)
> I've has problems with pata_it821x for a while now, documented in
> http://bugzilla.kernel.org/show_bug.cgi?id=7507. The fix from comment 39 which
> seems to be included in 2.6.23-rc6 does not solve the issue for me.
> libata.pata_dma=0 from comment 33 does "solve" the issue, although hard disks
> without DMA are unacceptable in a production use.
> 

Alan has a new patch that allows selectively disabling DMA for different device
types (disks, ATAPI and CF.) It is already in rawhide and will go into F7 and
FC6 next.
Comment 50 Martin von Gagern 2007-09-14 03:16:56 EDT
(In reply to comment #49)
> selectively disabling DMA for different device types (disks, ATAPI and CF.)

That won't help me much, as it's hard disks I want to use the IT8212 for, my
optical drives are on another controller. And I would prefer to keep it that way
if possible, to have the two drives connected to two different channels of the
IT8212, while my optical drives share the single other channel.
Comment 51 Chuck Ebbert 2007-09-14 15:07:51 EDT
Fix is in kernel-2.6.22.6-81.fc7, appearing soon in updates-testing.
Comment 52 Paul Smith 2007-09-27 16:28:14 EDT
I would like to report that today's F7 kernel update fixes the problem.

Maybe this new kernel should be included in the F7 installation dvd, as
otherwise some people will not be able to install F7.

Paul
Comment 53 Alan Cox 2007-09-27 17:13:41 EDT
Thst great news - although the it821x driver hasn't changed so it must be
something else involved. Is this true for the other people with similar boards
on this bug ?
Comment 54 Paul Smith 2007-09-27 17:22:46 EDT
Let me add that the successful kernel is the 2.6.22.7-85.fc7.

Paul
Comment 55 Jeff Norden 2007-09-28 13:03:32 EDT
Alan: I checked the src rpm for 2.6.22.7-85.fc7, and it seems that the it821x
driver *has* been fixed.  There is a file named:
 linux-2.6-libata-pata_it821x-dma.patch
which contains the patch.

I did a Fedora7 re-spin last night which includes the new kernel.  It boots fine
on my problem system, although I haven't used it to do a full install yet.  For
anyone who needs it now, I've put it on our ftp server in:

ftp://math.tntech.edu/fedora7-updated/

The bandwidth on our campus seems to vary from minute-to-minute so your luck in
downloading the DVD iso may vary.  The re-spin actually has updates of *all* the
F7 packages through Sept 26 2007.  It is pretty easy to do this now, but I
couldn't find specific directions anywhere, so I wrote a short README file
explaining how to use pungi to just create an updated install disk, and put it
on our server also.

I also "fixed" one other thing when I did the respin: I changed the setting for
the emacs package from type="optional" to type="default".  Just my two-cents :-)
Comment 56 Ian Gotenks 2007-09-29 07:38:24 EDT
Great ! Works perfect ! At least ...
I was waiting for this fix quite some time :(

2.6.22.7-85.fc7 works good, but has problems
when one has changed the boot disk order in 
bios. Fortunately 2.6.22.9-91.fc7 fixes it
and now everything works great ...

Many thanks.