Bug 145109

Summary: b44 network interfaces can't be started after system boot
Product: [Fedora] Fedora Reporter: Miloslav Trmač <mitr>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: hugh, jch, pp, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-25 14:00:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
b44hack modified to allocate the bounce buffer at device probe time
none
b44-bounce-bufs.patch none

Description Miloslav Trmač 2005-01-14 14:43:49 UTC
Description of problem:
Since 2.6.10 (2004/08/08 in BK), the b44 driver allocates bounce
buffers to make sure DMA areas have addresses < 1 GB.
It unfortunately does that by allocating roughly 770 kB at once;
after booting and logging in GNOME on my 256M box the DMA
zone does not have 700 kB of physically contiguous free memory,
so (ip link set eth0 up) fails.

If this would be better handled upstream, please tell me
and I'll bounce it to bugzilla.kernel.org

Version-Release number of selected component (if applicable):
kernel-2.6.10-1.737_FC3

Steps to Reproduce:
1.Set eth0 (a b44 interface) to ONBOOT=no
2.Boot computer, log in GNOME
3./sbin/ifup eth0

Comment 1 Pekka Pietikäinen 2005-01-17 16:14:07 UTC
Possible workaround posted to netdev, someone needs to test it and
figure out whether it helps or not. http://www.ee.oulu.fi/~pp/b44hack 
has it too


Comment 2 Need Real Name 2005-01-17 23:55:10 UTC
I'm running the "b44hack" patch right now against 2.6.10_741_FC3 as we
speak and I'll report findings.

Stupid question.  What is netdev?  I'm completely oblivious to kernel
development groups.  I'm just a user, man! :-)

Comment 3 Pekka Pietikäinen 2005-01-18 13:39:55 UTC
Thanks for testing. netdev.com is the mailing list where most
network code discussion happens. Would be nice to get the 1GB
workaround path tested as well, that should happen by either having >
1GB of memory, running a kernel with the 4:4 split or just changing 

        if(mapping+len > B44_DMA_MASK) {
                /* Chip can't handle DMA to/from >1GB, use bounce
buffer */
 
to something like 
        if(1 || mapping+len > B44_DMA_MASK) {

If that works as well and the driver loads and unloads happily I'll
prod upstream to merge the patch.


Comment 4 Need Real Name 2005-01-18 15:49:17 UTC
well, the patch in 'b44hack' doesn't seem to work either.

In my case, I left my wireless interface (eth1) up for the duration of
the night.  This morning I wake up and:

# ifdown eth1
# ifup eth0  <-- SIOCCFLAGS: Cannot allocate memory

There appears to be something related to duration of when the module
is unloaded and reloaded again.  Like those memory windows shrink the
longer the time before the module is reloaded.

BTW, the laptop i'm testing this on is a Dell Inspiron 8600 with 1GB RAM.

Comment 5 Need Real Name 2005-01-19 15:17:14 UTC
is that 1GB path workaround something I should try too? (i.e. the "1
||"  force?) 

Comment 6 Need Real Name 2005-01-19 16:57:51 UTC
Just to try something, I went ahead and made the tweak you described
above:

if(1 || mapping+len > B44_DMA_MASK) {

And I'll report the findings.

Comment 7 Pekka Pietikäinen 2005-01-20 14:26:57 UTC
Might also be that B44_BOUNCEBUF_SHIFT needs to be upped a bit. 
The problem is finding 700k of memory that is physically located under
16MB (< 1GB would be enough but the generic x86 pci code only can do <
16M and < 4GB). The first version tried to find one contiguous chunk,
the b44hack patch tries to allocate multiple smaller chucks (8 of them
with SHIFT==3), which should be easier to find (this does waste some
memory though).


Comment 8 Need Real Name 2005-01-20 16:25:54 UTC
well the above tweak "if (1 || mapping+len > B44_DMA_MASK)" didn't
work either after a full night of being unloaded.

Same error as before "SIOCSIFLAGS: Cannot allocate memory".

Regarding your latest post, I experimented with values of
B44_BOUNCEBUF_SHIFT <= 12 and nothing worked so far.  I even put some
printk's in there to see at what point in the bouncebuf allocation
loop things were failing and at the last test where the shift == 12 it
failed on the 230'th iteration trying to allocate 190 (bytes? i
presume) so i figure around ~43K which is significantly less than the
needed 770k :-)

let me back up a few steps, if I exit out of X or anything like that
will it free up some mem?  is there ever going to be hope of fixing
this?  are there some "stupid user tricks" i can do to keep the mem
the module allocated at bootup (which is succesful) around?

thanks!



Comment 9 Miloslav Trmač 2005-01-20 23:36:05 UTC
Created attachment 110033 [details]
b44hack modified to allocate the bounce buffer at device probe time

The attached patch moves the allocation to the time the device is probed.
This works Well Enough(tm) for me because the module is loaded at boot time,
but it is not a general solution.

My other attempts (to use single-page or two-page allocations, even
with 5 buffers / 2 pages) always worked fine right after the kernel
compile, but not after a fresh boot.

I'd still prefer a solution that didn't waste 761 kB of DMA memory
on my 256MB laptop; would it be possible to only allocate a single
bounce buffer for each packet that is >1GB and deallocate/reuse it
after transmit?

Comment 10 Pekka Pietikäinen 2005-01-21 00:40:54 UTC
Blah...

There is always the quick fix of a B44_DMA_MASK of 0xffffffff (see
#118165 for the long discussion that lead into this bug)
 
But of course if you remove the bcm4401 from your mobo with a
soldering iron and jury-rig it into a sparc64 it will totally break, 
so this obviously cannot go upstream! ;) ;) ;)

Option b is some kind of awful hack to only set the consistent dma
mask to the real value after the kernel has given something that goes
> 1GB. Totally misuses the kernel PCI DMA api, but...

Comment 11 Need Real Name 2005-01-21 19:29:30 UTC
oddly enough, yesterday when I finished trying various combos of the
SHIFT value, i reverted to the original b44.ko (from my 2.6.10_741_FC3
tree) and it loaded just AFTER I had killed off a few apps (eclipse,
thunderbird, IM client, etc) ... do those apps consume memory that
triggers this issue?

I'll give Miroslav's b44hack2 a try and since it fits my usage pattern
i'm guessing it'll work Well Enough(tm) for me also :-)

What does setting the B44_DMA_MASK to 0xffffffff accomplish?


Comment 12 Pekka Pietikäinen 2005-01-23 01:27:49 UTC
Makes the kernel think that any memory <=4GB is good enough for the
hardware/driver. 0x3fffffff means anything <=1GB is ok, which is
actually the truth, but the kernel interprets that as "only use memory
located in the  first 16MB", which is a legacy broken ISA-board thing,
 but things have changed "slightly" since. Ergo lots'o'problems: I
don't think there's even any attempt to reserve that area for broken
hardware (vs. drivers asking for any kind of memory getting it), so
the kernel quickly runs out of it no matter what you do.

the 0xffffffff will work because the kernel will never use anything >=
1GB in the default configuration, but 3rd party patches (like the 4:4
split used prevoiously in Fedora, and some other vendors have done
similar things) do make this assumption false.

Comment 13 Need Real Name 2005-01-28 20:09:01 UTC
So far so good with Miroslav's 'b44hack2' for the last two days with
my same usage pattern.  While it may be inelegant, pre-allocating the
buffers at probe time seems to do the trick.

is there a probable long-term solution for this?  or is it an issue of
the PCI DMA API improving?

Comment 14 John W. Linville 2005-02-23 20:40:42 UTC
Pekka and/or Miloslav,

Was the b44hack2 patch proposed upstream?  If so, how did it fare?

Comment 15 Miloslav Trmač 2005-02-28 16:29:37 UTC
I have not proposed it upstream... I personally don't consider it
upstream-worthy.

Comment 16 Need Real Name 2005-03-01 19:59:24 UTC
Not "upstream-worthy"?   Yikes! that's scary.  without that patch (or
something of the sort), my laptop is not "linux-worthy".  :-)

what's the most likely long-term solution here?  (i've asked this
before, but can anyone respond to this question)

Comment 17 Miloslav Trmač 2005-03-01 20:44:28 UTC
I'd like the "preferred solution" described in comment #9,
but I don't know the network API enough to say even
whether it's possible or not.

Comment 18 John W. Linville 2005-03-01 21:30:26 UTC
I discussed this briefly w/ Jeff Garzik, and he leans toward the
"preferred solution" from comment 9 as well, fwiw...

I'll probably take a peak at that, but if someone beat me to it that
would be OK too... :-)

Comment 20 John W. Linville 2005-03-10 01:34:52 UTC
Created attachment 111837 [details]
b44-bounce-bufs.patch

My interpretation of the preferred solution from comment 9...

Comment 21 John W. Linville 2005-03-10 01:41:15 UTC
I have pre-built test kernels available here:

   http://people.redhat.com/linville/kernels/fc3/

These, of course, include the patch from comment 20.  Unfortunately, I
don't have a box w/ >1GB of memory.  But, I did test by setting
B44_DMA_MASK to just 16MB...that seems to be working fine -- ttcp has
been pounding on it for hours.

Please give this a test and let me know the results!

Comment 22 Pekka Pietikäinen 2005-03-10 11:31:34 UTC
Looks sane to me (but -ENOHARDWARE so not tested). Patch should be
sent to netdev and akpm I think. It's certainly better than whatever code
currently is in any tree.



Comment 23 John W. Linville 2005-03-10 19:22:49 UTC
Definitely will push it upstream provided I get no negative feedback.
 I would prefer to here some positive reports first, so I'll wait a
day or two to hear... :-)

Comment 24 Miloslav Trmač 2005-03-11 22:54:10 UTC
The test kernel works fine here, but I don't have >1 GB of RAM either.

Comment 25 John W. Linville 2005-03-14 20:29:45 UTC
As of today, I have submitted the patch upstream (which is the best
path into Fedora)...

Comment 26 Need Real Name 2005-03-15 13:42:32 UTC
cool John L. I was just going to drop a note here saying that it's
been working fine for me for the last two days.  Thank you and nice job!

Comment 28 D. Hugh Redelmeier 2005-04-04 11:26:14 UTC
These seem to be the relevant netdev messages:
http://oss.sgi.com/projects/netdev/archive/2005-01/msg00053.html
http://oss.sgi.com/projects/netdev/archive/2005-01/msg01096.html
http://oss.sgi.com/projects/netdev/archive/2005-03/msg00855.html
http://oss.sgi.com/projects/netdev/archive/2005-03/msg00950.html

I'm visiting this bug entry because the Broadcom BCM94306 802.11g WiFi chip
seems to have a similar problem with 1G+ physical addresses.  Things are made
more interesting because the driver is a 64-bit MS Windows driver + ndiswrapper64.

From the netdev messages, I understand the a 1G DMA mask forces allocation below
16M on x86 and that this is way overconstrained.  Is this true on x86_64 (AMD
variant)?  In other words, is the adopted fix best on all architectures?

Comment 29 Pekka Pietikäinen 2005-04-04 12:00:00 UTC
For b44 x86_64 is is irrelevant, they can only be found embedded on x86's (and
embedded broadcom MIPS platforms, which usually have 32MB of memory max). And
PCI-based cards, but those are reference designs that you can only get directly
from broadcom (if I understood correctly). And with the latest "allocate only if
necessary" patch the amount of GFP_DMA used is quite small.

Probably best forum to discuss the wifi stuff would be the ndiswrapper mailing
list (with possible Cc:s to netdev and/or linux-kernel). Not that they
necessarily care on those lists ;)

Comment 30 John W. Linville 2005-04-04 15:43:36 UTC
For arches that take the pci_set_dma_mask value literally, then pci_alloc_* 
could be safely used.  However, I think this solution should still work.  And, 
it prevents us from having arch-specific versions of b44. 
 
Do you have another alternative?  If so, please suggest it (and include a 
patch if possible)... 

Comment 31 Miloslav Trmač 2005-04-11 22:47:27 UTC
b44 works again out-of-the box with kernel-2.6.11-1.14_FC3 (and in FC4t2).
Thanks again!


Comment 32 John W. Linville 2005-04-25 14:05:15 UTC
*** Bug 134790 has been marked as a duplicate of this bug. ***