Bug 158865 - Can't get dhcp address on DELL INSPIRON/9300
Can't get dhcp address on DELL INSPIRON/9300
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
All Linux
medium Severity high
: ---
: ---
Assigned To: John W. Linville
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-26 09:17 EDT by Daniel Walsh
Modified: 2007-11-30 17:11 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-06-23 16:07:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sysreport (359.11 KB, application/x-bzip)
2005-05-27 13:59 EDT, Daniel Walsh
no flags Details
/proc/acpi/dsdt contents from Dell Inspiron 9300 which mis-routes "b44" IRQ (9.47 KB, application/octet-stream)
2005-06-21 20:37 EDT, Bela Lubkin
no flags Details
DSDT.aml (8.90 KB, application/octet-stream)
2005-06-22 11:59 EDT, John W. Linville
no flags Details
mkinitrd (21.08 KB, text/plain)
2005-06-22 14:01 EDT, John W. Linville
no flags Details

  None (edit)
Description Daniel Walsh 2005-05-26 09:17:35 EDT
Description of problem:
dhcp does not work when installing or when installed.

Also does not work with static address.  Log says DHCP offers but no response.

Version-Release number of selected component (if applicable):
2.6.11-1.1353_FC4

How reproducible:
Every time
Comment 1 Dave Jones 2005-05-26 21:10:06 EDT
what NIC is in this machine ?
Comment 2 Daniel Walsh 2005-05-27 09:53:48 EDT
Broadcom 4400 10/10BaseT Ethernet

It is loading the b44 module.

Dan
Comment 3 John W. Linville 2005-05-27 09:59:44 EDT
Strange...do you actually see the DHCP request going out on the wire (using 
tcpdump, ethereal or similar on another box)? 
 
Also, please attach the output of running 'sysreport' on the failing box.  
Thanks! 
Comment 4 Daniel Walsh 2005-05-27 13:59:31 EDT
Created attachment 114922 [details]
sysreport

I don't see any.  I have it up on a pcmcia wireless card.  The internel
wireless card also does not work.    Contact me on the phone if you want to
access the machine.
Comment 5 John W. Linville 2005-05-27 15:16:55 EDT
I'm confused...in comment 2 you indicated use of b44?  Is the b44 the one that 
is failing?  Or the wireless?  If the later, what wireless card is in use? 
Comment 6 Daniel Walsh 2005-05-27 15:31:16 EDT
Sorry both the builtin and wireless are failing for different reasons.  I have
it working with a pcmcia wireless card now.

The builtin ethernet is a Broadcom 4400 19/100BaseT

The Internel Wireless is a ipw2200: Intel(R) Pro/Wireless 2915ABG Network Driver
Comment 7 Daniel Walsh 2005-06-08 10:20:25 EDT
2.6.11-1.1370_FC4 did not fix this problem.  Packets are definitely not leaving
the machine.
Comment 8 John W. Linville 2005-06-08 13:25:57 EDT
Do mii-tool and ethtool report the proper link state for the b44 device?   
Please issue "ip link set dev eth0 up" before issuing "mii-tool" and  
"ethtool", and then post the results.  
  
Also, please confirm whether or not "ifup eth0" works after the box has been  
booted (assuming that the b44 is connected to a DHCP-able network).  Or does 
it only fail at boot-up?  
Comment 9 Daniel Walsh 2005-06-08 15:49:07 EDT
ip link set dev eth0 up
[root@dhcppc2 nsalibsepol]# ethtool  eth0
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Current message level: 0x000000ff (255)
        Link detected: yes
[root@dhcppc2 nsalibsepol]# mii-tool eth0
eth0: negotiated 100baseTx-FD flow-control, link ok

[root@dhcppc2 nsalibsepol]# ifup eth0

Determining IP information for eth0... failed.

 tail -f /var/log/messages
Jun  8 15:45:16 dhcppc2 kernel: [<c02d2f1e>] (usb_hcd_irq+0x0/0x52)
Jun  8 15:45:16 dhcppc2 kernel: [<f8a88351>] (ohci_irq_handler+0x0/0xce2 [ohci1394])
Jun  8 15:45:16 dhcppc2 kernel: Disabling IRQ #9
Jun  8 15:46:16 dhcppc2 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port
67 interval 4
Jun  8 15:46:20 dhcppc2 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port
67 interval 11
Jun  8 15:46:31 dhcppc2 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port
67 interval 15
Jun  8 15:46:46 dhcppc2 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port
67 interval 15
Jun  8 15:47:01 dhcppc2 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port
67 interval 13
Jun  8 15:47:14 dhcppc2 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port
67 interval 3
Jun  8 15:47:17 dhcppc2 dhclient: No DHCPOFFERS received.
Comment 10 Dave Jones 2005-06-08 23:38:05 EDT
If -1370 fails, but FC3 works, that pretty much absolves the driver from blame
afaics, as that patch backed out all the FC3->FC4 changes to b44.[c|h].

I'm totally confused as to what could be at fault though.
Hmm, does booting with acpi=off fix it ?
Comment 12 Daniel Walsh 2005-06-13 11:38:15 EDT
Turning off acpi generated some more info.

 cat ~/error
Determining IP information for eth0...irq 9: nobody cared!
 [<c01516f4>] __report_bad_irq+0x24/0x7f
 [<c01517c6>] note_interrupt+0x59/0x83
 [<c0150a8a>] __do_IRQ+0x201/0x367
 [<c0105b1d>] do_IRQ+0x4a/0x82
 =======================
 [<f8d1ea89>] mld_ifc_timer_expire+0x2b/0x2d [ipv6]
 [<c0103c0e>] common_interrupt+0x1a/0x20
 [<c01282cc>] __do_softirq+0x2c/0x8a
 [<c0105c29>] do_softirq+0x3e/0x42
 =======================
 [<c0105b24>] do_IRQ+0x51/0x82
 [<c0103c0e>] common_interrupt+0x1a/0x20
handlers:
[<c022f8df>] (acpi_irq+0x0/0x14)
[<c02d2f1e>] (usb_hcd_irq+0x0/0x52)
[<f8a87351>] (ohci_irq_handler+0x0/0xce2 [ohci1394])
Disabling IRQ #9

Message from syslogd@localhost at Mon Jun 13 09:23:32 2005 ...
localhost kernel: Disabling IRQ #9

 failed.
Comment 13 John W. Linville 2005-06-15 11:35:02 EDT
Is your b44 sharing an interrupt line w/ your USB (or firewire) controller?  
Any chance that changing that situation would help?  Just a stab in the dark, 
really... 
Comment 14 Ketil Wendelbo Aanensen 2005-06-17 19:38:04 EDT
I'm using the same computer, with the same hardware, afaics. I think I have the
same problem: the network hangs when my optical USB-mouse is not moving. (Both
on boot time and when "surfing" the web, downloading e-mail, etc. It seems that
more people are having this problem (see f.x. fedoraforum.org, and try a search
for "inspiron 9300")

I had the same problem with FC3 and I'm really unhappy to see it still is
present in FC4. Ubuntu Hoary works fine on this computer. 

I really can't figure out how to solve this, but I'm guessing it has something
to do with this from dmesg:
irq 9: nobody cared!
 [<c01516f4>] __report_bad_irq+0x24/0x7f
 [<c01517c6>] note_interrupt+0x59/0x83
 [<c0150a8a>] __do_IRQ+0x201/0x367
 [<c0105b1d>] do_IRQ+0x4a/0x82
 =======================
 [<c0103c0e>] common_interrupt+0x1a/0x20
 [<c01282cc>] __do_softirq+0x2c/0x8a
 [<c0105c29>] do_softirq+0x3e/0x42
 =======================
 [<c0105b24>] do_IRQ+0x51/0x82
 [<c0103c0e>] common_interrupt+0x1a/0x20
handlers:
[<c022f8df>] (acpi_irq+0x0/0x14)
[<c02d2f1e>] (usb_hcd_irq+0x0/0x52)
[<f8a61351>] (ohci_irq_handler+0x0/0xce2 [ohci1394])
Disabling IRQ #9

If this is not the same problem as the one you're having, Im sorry to have
bothered you all.

- Ketil
Comment 15 Bela Lubkin 2005-06-17 21:26:43 EDT
I am also having the same problem(*) on an Inspiron 9300, using FC4.  The same 
messages appear in the kernel log.

Booting with "ahci=noirq" allows the NIC to operate successfully.  I've been 
trying to track down the changes this causes.  This is a quick summary: four 
devices' IRQs move from 10 to 7 (leaving two others still on 10).  The PCI-E 
Root Node device is assigned IRQ 11 in the "normal" case, and no IRQ in the 
"acpi=noirq" case.  The NIC moves from IRQ 11 to 9.

HOWEVER, and this seems like it might be a big clue, /proc/interrupts shows _no_ 
IRQ assigned to "eth0".  /proc/pci, /proc/bus/pci/devices, and `lspci -v` show 
IRQ 9.

It feels as if most of the system knows that the "b44" driver servicing 
interface "eth0" is talking to a device on IRQ 9; but "b44" itself doesn't know 
it.

IRQ 9 is shared with an OHCI 1394 controller and a UHCI USB controller (one of 
four).  I haven't checked, but I assume this means that 1394 and some subset of 
my USB ports aren't working (at least not for low-speed devices).

(*) By "same problem" I actually mean the same as Daniel Walsh: DHCP discovery 
fails and the interface is completely unusable.  Ketil Wendelbo Aanensen reports 
the interface works if the USB mouse is moved.  This may be another big hint: 
"b44" must have glommed onto the wrong IRQ chain.  IRQ 9 (the one the hardware 
is raising) gets disabled by the kernel.  Meanwhile, he moves the mouse, whose 
controller is presumably on (1) an IRQ other than 9 and (2) the IRQ that b44's 
ISR is listening to.  b44 services its hardware and packets flow.

So it seems that "b44" is attaching itself, or being attached, to the wrong IRQ 
chain.

>Bela<
Comment 16 Ketil Wendelbo Aanensen 2005-06-18 04:58:22 EDT
Thanks for the insight!

Being not very good at Linux, what does this mean on my part? Should *I* boot
with "ahci=noirq"? Will that help me?
And how do I do that?
Any change that this might be fixed in new FC versions on installation? Maybe
someone could make an RPM that could take care of this?

- Ketil
Comment 17 Ketil Wendelbo Aanensen 2005-06-19 05:57:26 EDT
(In reply to comment #16)
> Thanks for the insight!
> 
> Being not very good at Linux, what does this mean on my part? Should *I* boot
> with "ahci=noirq"? Will that help me?
> And how do I do that?

Nevermind, I found it out myself, just fiddling around. And it worked too! Now
my network doesn't just hang there anymore!

Having this in Grub solved it on my part:

title Fedora Core med nettverk (2.6.11-1.1369_FC4) (acpi=noirq)
        root (hd0,8)
        kernel /boot/vmlinuz-2.6.11-1.1369_FC4 ro root=LABEL=/ rhgb quiet acpi=noirq
        initrd /boot/initrd-2.6.11-1.1369_FC4.img


Comment 18 Bela Lubkin 2005-06-19 15:53:20 EDT
> kernel /boot/vmlinuz-2.6.11-1.1369_FC4 ro root=LABEL=/ rhgb quiet acpi=noirq

Right.  This papers over the symptoms.  There is still an underlying bug 
somewhere, either in "b44", the ACPI subsystem, or elsewhere in interrupt 
routing.  "acpi=noirq" might also have some negative effects elsewhere in the 
system (e.g. power management may not work as well); I'm not sure.

But it certainly does paper over the big symptom.

I'm new to Linux kernel config debugging.  If there's more useful information to 
be unearthed, please tell me what to run...

>Bela<
Comment 19 Bela Lubkin 2005-06-20 21:15:59 EDT
All of the kernel's data structures show that the "b44" NIC has been programmed 
to IRQ 11.  But the hardware is still generating IRQ 9.  How can this be?
Comment 21 John W. Linville 2005-06-21 14:46:43 EDT
The kernel and the BIOS are simply miscommunicating regarding IRQ routing.  
PCI devices don't really know what IRQ line they are using -- they always use 
the same one(s).  The PCI Northbridge routes the IRQ lines from the PCI 
devices to the interrupt controller according to how the BIOS has configured 
the bridge. 
 
If the kernel has incorrect information regarding IRQ routing, the BIOS is 
almost certainly at fault.  This is most likely caused by sloppy ACPI code, 
which is shockingly common.  It seems that Linux is more particular about 
strictly interpreting the ACPI spec than Windows is about that, so many boxes 
that work fine w/ Windows have trouble using ACPI under Linux.  I wish things 
were different, but it is hard to argue for sloppiness... :-) 
 
Sometimes it is possible to clean-up part of the ACPI code and then use the 
cleaned-up ACPI code when booting Linux.  There is no guarantee that will 
work, but I will take a look at doing that if you will do this: 
 
   cat /proc/acpi/dsdt > dsdt.dat  
  
Then, attach dsdt.dat to this bug...no promises... :-) 
Comment 22 Bela Lubkin 2005-06-21 20:37:21 EDT
Created attachment 115788 [details]
/proc/acpi/dsdt contents from Dell Inspiron 9300 which mis-routes "b44" IRQ

"b44" (Broadcom BCM4401-B0 on laptop motherboard) is PCI device 03:00.0.  When
machine is booted with "acpi=noirq", all parts of the kernel recognize the
device on IRQ 9, and the network works.  When "acpi=noirq" is omitted, all
parts of the kernel report the device on IRQ 11 -- but the device's interrupts
still come in on IRQ 9.  These go unserviced, so the kernel disables IRQ 9; the
network does not work.

>Bela<
Comment 23 Bela Lubkin 2005-06-21 20:39:56 EDT
> The kernel and the BIOS are simply miscommunicating regarding IRQ routing.  

No doubt.

I confirmed that the machine is running the latest release version of the BIOS 
from Dell.  I also verified that booting with or without "acpi=noirq" does not 
change the reported DSDT data.

>Bela<
Comment 24 John W. Linville 2005-06-22 11:59:30 EDT
Created attachment 115817 [details]
DSDT.aml

Corrected(?) DSDT file...there was only one warning, but it was the _WAK method
missing a return value.
Comment 25 John W. Linville 2005-06-22 14:01:15 EDT
Created attachment 115825 [details]
mkinitrd

FC4 mkinitrd w/ support for overriding DSDT
Comment 26 John W. Linville 2005-06-22 15:38:31 EDT
Ok, Bela, are you ready to try this? :-)    
    
Start by downloading the attachment from comment 24.  I'll presume that you    
save it as /etc/DSDT.aml.   
   
Next download the attachment from comment 25 and save it as /sbin/mkinitrd.    
You probably want to save a backup of your original /sbin/mkinitrd just in  
case...  
  
Now, edit /etc/sysconfig/kernel and add a line like this:  
  
   ACPI_DSDT=/etc/DSDT.aml  
  
Customize it to match where you saved the attachment from comment 24, of  
course.  
  
It is very important that you complete _all_ of the steps above before  
proceeding to the next step.  Please make sure you have done that, or the test  
will be pointless.  This means you... :-)  
  
Finally, download and install an appropriate kernel from here:  
  
   http://people.redhat.com/linville/kernels/fc4/  
  
The installation process will only create the initrd correctly if you have  
already completed the previous steps.  If you didn't believe me before, start  
again now... :-)  
  
Once that kernel is installed, please reboot and test your NIC.  Please post  
the results here.  Thanks!  
  
P.S.  If it doesn't work, please attach your initrd image so that I can make  
sure it got created properly...  
Comment 27 Bela Lubkin 2005-06-23 15:49:05 EDT
I've installed all that, to no benefit.  I believe the initrd is as you 
intended:

  # zcat /boot/initrd*jwltest* | cpio -itv | grep DSDT
  -rw-r--r--   1 root     root         9117 Jun 23 11:42 DSDT.aml

Do you want me to upload the whole thing for closer examination?

With the updated DSDT, the kernel is still convinced "b44" is on IRQ 11.  There 
is no behavioral difference between this kernel + DSDT and the old ones.  Adding 
"acpi=noirq" to the kernel command line again masks the problem (by making the 
kernel decide to "put" the NIC on IRQ 9, where it wants to be).

=============================================================================

I used `iasl -d` to disassemble your updated DSDT and compared it to the 
original.  There were a large number of seemingly trivial differences along the 
lines of:

  s/0x00/Zero/
  s/0x01/One/
  s/\\_/_/
  s/Or (0x01, Or (0x02, 0x10)/0x13/
  s/0xFFFFFFFF/Ones/

Then, a set of changes that _look_ trivial but I'm less certain about:

  s/_SB.PCI0.\(LNK[ABCD]|ISAB\)/\1/

The only clearly substantive change was in _WAK(), where you added a return 
value.  Did you expect that repairing _WAK() would fix the problem, or have I 
missed the point of one of the other changes that seem trivial to me?

>Bela<
Comment 28 John W. Linville 2005-06-23 16:07:00 EDT
Neither...I didn't really expect it to be fixed, since _WAK was the only  
change. :-(  I thought it was worth trying, since some have reported good  
results from correcting their DSDTs.  But, I'm not really surprised that it  
isn't fixed.  
  
At this point, I'm afraid that your only option is to report the issue to Dell  
and ask them to fix their ACPI code so that it works w/ Linux's ACPI 
interpreter.  I'm sorry, but it just can't be Red Hat's (or my) job to fix it.  
Until that happens, your best option is to continue using "acpi=noirq". 
Comment 29 Bela Lubkin 2005-06-23 18:50:01 EDT
Ok, thanks for trying.

I've learned a couple of extra things.

The differences I noted between the DSDT you returned and the one I sent are due 
to `iasl` performing some optimizations.  Decompiling and recompiling that DSDT 
results in the same differences.  Recompiling with `iasl -oa` disables those 
optimizations, resulting in far more similar code, much better for comparison & 
debugging.

Next, I have been able to patch the DSDT myself to make things "work" -- but I 
doubt whether this is a satisfactory patch.  The patch is:

               Device (LNKA)
               {
                   Name (_HID, EisaId ("PNP0C0F"))
                   Name (_UID, 0x01)
                   Name (_PRS, ResourceTemplate ()
                   {
  -                    IRQ (Level, ActiveLow, Shared) {9,10,11}
  +                    IRQ (Level, ActiveLow, Shared) {9}
                   })

This forces a whole bunch of devices (including the 'b44' NIC) onto IRQ 9.  The 
system works, but there must be a performance penalty...

I'm not at all convinced this is a BIOS/DSDT bug.  The Linux ACPI interrupt 
walking code may be at fault.  But my gut feel is that the Broadcom hardware 
itself needs some sort of extra "kick" to tell it that its PCI config space has 
been changed and it should be raising a different IRQ: thus, a hardware "quirk" 
which must be handled by b44.c; a bug by omission in b44.c.

I can't pursue this further on my own, so I guess I will fall back to 
"acpi=noirq" in the basis that this perturbs the IRQ setup the least.

(All I _really_ need is a way to tell the kernel: "Look, for whatever reason you 
_cannot_ reassign the IRQ for 'b44', just let it be on 9 even if you think IRQ 
service would be more balanced if it was somewhere else".  Then ACPI could have 
its IRQ 9 without inadvertently pushing 'b44' off to a different IRQ it can't 
raise.)

>Bela<
Comment 30 Ketil Wendelbo Aanensen 2005-11-23 15:47:49 EST
Since having this problem (see comment # 14) I have moved away from Fedora, but
I think I got the solution just before. Dell released an updated BIOS for I9300
(A05 I think).
I *believe* this has fixed the issue, at least the network worked, if I'm not
mistaken. Maybe someone with this computer and FC4 can check it out?

- Ketil
Comment 31 Rowan Shaeffer 2005-12-06 04:41:44 EST
Confirmed with a fresh install of Fedora Core 4. With the A05 BIOS, the 9300's
ethernet adaptor works without the need for the 'ahci=noirq' fix, which was
necessary with the A04 BIOS.

Note You need to log in before you can comment on or make changes to this bug.