Bug 217146

Summary: Hard lock-up with 2GB RAM and b44
Product: Red Hat Enterprise Linux 4 Reporter: Andrew Gormanly <a.gormanly>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: andriusb, jbaron, linville
Target Milestone: ---   
Target Release: ---   
Hardware: ia32e   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-05 18:33:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
jwltest-b44-dma_mapping_error.patch none

Description Andrew Gormanly 2006-11-24 11:31:58 UTC
Description of problem:
With 2GB RAM installed, b44 module locks the machine up hard.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-42.0.3.EL

How reproducible:
Every time

Steps to Reproduce:
1. Bring up NIC
2.
3.
  
Actual results:
Machine locks hard, must be power-switched.

Expected results:
Network connection established.

Additional info:
Dell Inspiron XPS M1210.  Older kernels affected too.  Latest driver version  
(1.00g, Jun 7 2006) available from Broadcom, who are the maintainers, exhibits
same behavior in kernel-smp-2.6.9-42.0.2.EL.  A workaround is to specify
mem=1000M on the boot line, but this sucks.

There's some traffic on lkml from a couple of years ago (Aug 2004, 2.6.8
timeframe) related to a b44 1GB DMA workaround, but this patch seems to be in
the RHEL 2.6.9-42 kernels already so is probably un- or semi-related.

[atg@pineapples ~]$ /sbin/lspci | grep Broadcom
03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)

Comment 1 John W. Linville 2006-12-05 20:03:25 UTC
That hardware is incapable of DMA above 1GB.  As you observed, some hacks have 
been added to the driver to get around that problem.  A hang on >1GB systems 
might indicate a problem with those hacks.

I see one upstream b44 patch that might be helpful.  I'll spin test kernel w/ 
that patch and post here for you to test when it is available.

Comment 2 John W. Linville 2006-12-06 15:10:13 UTC
Created attachment 142959 [details]
jwltest-b44-dma_mapping_error.patch

Comment 3 John W. Linville 2006-12-06 15:13:15 UTC
Test kernels w/ the above patch are available here:

   http://people.redhat.com/linville/kernels/rhel4/

Please give them a try and post the results here...thanks!

Comment 4 Andrew Gormanly 2006-12-07 18:22:07 UTC
Thanks for getting on to this so quickly.

The patch you have is the same as the changes in the (latest available to the
public, Jun 7 2006) Broadcom 1.00g driver, which I already tested in
2.6.9-42.0.2.EL, and your kernel has the same results as both that and the
shipping RHEL4 one - i.e. panic when loaded, if booting with higher than
mem=1040M (and sometimes below, only mem=1000M is reliable).

There's a driver version 1.01 in 2.6.18.3 - might it be worth checking that?

#define DRV_MODULE_VERSION      "1.01"
#define DRV_MODULE_RELDATE      "Jun 16, 2006"

I'll try and check it out myself over the weekend, and maybe the FC6 kernel
source too if I can.


Comment 5 Neil Horman 2006-12-11 20:16:09 UTC
Is the system at all responsive?  Specifically can you produce sysrq-t
operations when the system is hung?  It would be helpful to have that info so we
could confirm that it was actually b44 killing the system, and if so, what the
system state looked like when it went down.  Alternatively, if you can configure
nmi_watchdog, and capture a core dump via netdump, that would be great

Comment 6 Andrew Gormanly 2006-12-12 13:06:14 UTC
No, it's completely locked up; sysrq's don't work, and nothing appears via
netdump.  The only sign of life is flashing Caps- and Num-lock lights (and the
Bluetooth light is on).

I'm 99% sure it's b44, as the machine's fine without that module loaded but dies
on bringing up the interface.


Comment 7 Neil Horman 2006-12-12 13:13:38 UTC
possibility of nmi_watchdog?

Comment 8 Andrew Gormanly 2006-12-14 09:37:39 UTC
nmi_watchdog should be on by default - it's a Core 2 Duo (so SMP x86_64).  I
did, however, already try adding it to the boot line just in case, which made no
difference.

Comment 9 Neil Horman 2006-12-15 20:37:45 UTC
Ok, so I'm running a bcm4401 card on a system with 2GB of ram on board using
kernel 2.6.9-42.0.3, using card to transfer data bi-directionally with this
command running locally and on the remote host:
cat /dev/zero | ssh <peer> "cat > /dev/null"
Running for an hour now with no faults.  I'm going to let this run over the
weekend to make sure, but I'm inclined to think, that unless something goes
wrong during over the weekend that your test may have been flawed, and that this
was the b44 >1GB dma problem after all


Comment 10 Neil Horman 2006-12-18 15:42:46 UTC
well, it appears that a lockup occured over the weekend, although I didn't
consider it, but I johns patch isn't included in the -42.0.3.EL kernel.  I've
applied the patch and rebuilt the kernel, and am currently retesting.

Comment 11 Neil Horman 2006-12-18 15:55:10 UTC
well, good news (in a manner of speaking).  After applying Johns patch, I seem
to have locked up the box again, so I think I've reproduced your problem.  I'll
start debugging right away


Comment 12 Neil Horman 2006-12-21 21:12:58 UTC
Note to self, I've been testing for a few days and I've been able to recreate
the hang several times, but only using TCP.  If I send UDP in one direction (to
the b44 NIC or from the b44 NIC) then no hang.  I'm currently testing
bi-directional UDP to see if that causes the hang.  If it does, it suggests that
this is a problem resulting from some sort of tx/rx race.  If not, perhaps a
specific problem sending TCP frames (although I'm hesitant to believe that).


Comment 13 Neil Horman 2006-12-22 15:14:57 UTC
I've been doing some reading about alternate theories to this hang.  I'm setting
up a test here and was wondering if you could please do the same.  Could you
boot your kernel with teh following kernel parameter included:
pci=noacpi
And see if the hang recurrs?  I'd appreciate it.  Thanks!

Comment 14 Andrew Gormanly 2006-12-27 00:12:17 UTC
Booting with pci=noacpi still gives a hang on bringing up the network card, with
the same error message, "Kernel panic - not syncing: PCI-DMA: high address but
no IOMMU."


Comment 15 Neil Horman 2007-02-02 14:35:36 UTC
could you please add iommu=soft and try again, that should enable the soft iommu
support in the kernel for you.

Comment 16 Andrew Gormanly 2007-02-05 15:07:57 UTC
Well, by forcing the kernel to use the swiotlb it does actually stay alive after
initializing the b44, and it seems stable so far.

It's a tricky situation if this is the fix - should the kernel's default for all
Intel x86_64 machines be changed to iommu=soft (rather than the present
behaviour of iommu=off for machines with <3GB memory but iommu=soft for those
with >3GB memory) ?

In some ways it's cleaner to have the kernel's behaviour be independent of the
amount of RAM in the machine, especially when doing so removes the potential for
broken hardware killing the kernel by failing to DMA above a device-dependent
random number of bits that its designers decided to use as their DMA limitation
(31 in this case).

On the other hand, using up a chunk (64MB in 2.6.9) of low-end RAM on all Intel
x86_64 systems is a waste when most won't need it... but not much of one given
the normal memory for such machines is 512MB-4GB at present, and will grow.

Thanks for your time in solving this issue.  The really silly thing is that if
I'd had <1GB or >3GB of memory in this machine I'd never have seem this bug...


Comment 17 Neil Horman 2007-02-05 15:56:40 UTC
Unfortunately, if this is the case with your system, we're a little out of luck.
Are you sure that this system has an iommu at all?  Its possible that it
doesn't.  The presence of an iommu is detected in pci_iommu_init that gets
universally called on boot up.  It could also be that your system is
misreporting the size or availability of your iommu.  It might be worth your
time to instrument pci_iommu_init on your system and print out the values
returned from check_iommu_size.  If there is a bad value that gets reported
back, we could perhaps explore adding a check for it to back off to an swiotlb
if needed.

Comment 18 Andrew Gormanly 2007-02-05 17:53:02 UTC
I don't think there's any point - I thought it did not have an IOMMU, as no
Intel EM64T systems have one (neither does IA64), and that this is one of the
differences with AMD64, where the AGP aperture is used as an IOMMU.

The panic message in comment #14 appears to confirm this, as does the statement
in the RHEL3U2 release notes (
http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/release-notes/as-amd64/RELEASE-NOTES-U2-x86_64-en.html
)

  "IntelĀ® EM64T does not support an IOMMU in hardware while AMD64 processors do."

Regardless, if I understand things correctly, the point is that the Linux kernel
for the x86_64 arch (absent any boot switches) does not use any IOMMU if there's
less than 3GB RAM in the system, and prints out "PCI-DMA: Disabling IOMMU" on boot.

[On AMD64 and >3GB RAM, it uses (by default 64MB of) the GART and prints e.g.
"PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture".  On EM64T and >3GB
RAM, it uses SWIOTLB and prints "PCI-DMA: Using software bounce buffering for IO
(SWIOTLB)".]

This is normally fine, and saves us 64MB of RAM.  In the case of broken hardware
which can't DMA properly when the host system has "too much" physical memory,
however, the lack of any (hardware or software) IOMMU kills the kernel.  This is
why forcing it to use swiotlb works.  On an AMD system with this chip, setting
it to iommu=force should do the trick.

So for machines with the Broadcom 4401 installed, the x86_64 kernel is fine with
under 1GB of RAM, and fine with over 3GB of RAM.  Anything in between and it panics.

Could this be fixed in the b44 driver?  The following, from
Documentation/DMA-mapping.txt seems like it might be helpful, but I'm not a
kernel hacker...

"Does your device have any DMA addressing limitations?  For example, is
your device only capable of driving the low order 24-bits of address
on the PCI bus for SAC DMA transfers?  If so, you need to inform the
PCI layer of this fact.

By default, the kernel assumes that your device can address the full
32-bits in a SAC cycle.  For a 64-bit DAC capable device, this needs
to be increased.  And for a device with limitations, as discussed in
the previous paragraph, it needs to be decreased."

Ultimately, though, I feel that the kernel behaviour is not right - consistently
using the (hard- or software) IOMMU on x86_64 would avoid any problems with
hardware DMA addressing limitations.  I'm not sure how this affects i386 though.


Comment 19 Neil Horman 2007-02-05 18:33:11 UTC
yeah, I was just tossing this around, and you're right.  the iommu is getting
explicityly disabled because you have less than 4GB of RAM installed, so the
kernel decides that you don't need any iommu support. And there is really not
alot we could do about that.  Theres alot of possibilities that you could do to
add flags that only enable swiotlb in the event that no real iommu is present,
but that wouldnt change the fact that this is all because of the b44 hardwares
need to dma under 1GB of RAM, so it won't get much support.

We have fixed this (somewhat) in b44 already.  There is that patch that is
supposed to restrict memory allocations on the b44 driver to under 30 bits of
address, and use GFP_DMA if it can't be obtained.  It seems we are missing a
case though (perhaps one we can't control, possibly in the rx path from the
hardware).  Etiher way, I think about the only foolproof solution in the b44
driver is to allocate memory for the card only from ZONE_DMA, which is going to
have considerable performance impact, perhaps more so than just enabling swiotlb.

Perhaps this could be managed on install.  If you install these systems with a
kickstart file you could add a %post section to the install process to test for
the presence of 1GB < x < 4GB of ram and a b44 card.  If both are true, you
could append the swiotlb line to the boot command line arguments.  That way you
could at least be a little more selective on which systems used soft iommu.

But I think in the end this is going to have to be a CANTFIX.  Sorry about that.