Bug 629158

Summary:

Network adapter "disappears" after resuming from acpi suspend

Product:

[Fedora] Fedora

Reporter:

A. Folger <afolger>

Component:

kernel

Assignee:

Stanislaw Gruszka <sgruszka>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

medium

Docs Contact:

Priority:

low

Version:

CC:

anton, dougsland, gansalmon, gustavo, itamar, james, johnlumby, jonathan, kernel-maint, madhu.chinakonda, mirsev, ndbecker2, pmatiello, sgruszka

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-02-05 21:03:29 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
f13-r8169-alloc-fix.patch	none
excerpt of /var/log/messages, using the testing/debugging kernel	none
f12-r8169-alloc-fix.patch	none
f14-r8169-alloc-fix.patch	none

Description A. Folger 2010-09-01 05:49:55 UTC

Description of problem:
Network adapter "disappears" after resuming from acpi suspend. Network Managaer doesn't see the device, the device doesn't show up in kinfocenter, either.

Version-Release number of selected component (if applicable):
kernel-2.6.33.6-147.2.4.fc13.x86_64

How reproducible:
Every time.

Steps to Reproduce:
1. Switch on computer
2. Ascertain that networking works
3. Suspend
4. Resume (switch it on again)
5. See how there is no networking
  
Actual results:
No networking

Expected results:
Everything works normally

Additional info:

From /var/log/messages:

Sep  1 06:54:09 localhost NetworkManager[1234]: <info> wake requested (sleeping: yes  enabled: yes)
Sep  1 06:54:09 localhost NetworkManager[1234]: <info> waking up and re-enabling...
Sep  1 06:54:09 localhost NetworkManager[1234]: <info> (eth2): now managed
Sep  1 06:54:09 localhost NetworkManager[1234]: <info> (eth2): device state change: 1 -> 2 (reason 2)
Sep  1 06:54:09 localhost NetworkManager[1234]: <info> (eth2): bringing up device.
Sep  1 06:54:09 localhost kernel: NetworkManager: page allocation failure. order:3, mode:0x4020
Sep  1 06:54:09 localhost kernel: Pid: 1234, comm: NetworkManager Not tainted 2.6.33.6-147.2.4.fc13.x86_64 #1


From dmesg:

NetworkManager: page allocation failure. order:3, mode:0x4020
Pid: 1234, comm: NetworkManager Not tainted 2.6.33.6-147.2.4.fc13.x86_64 #1
Call Trace:
 [<ffffffff810c6d88>] __alloc_pages_nodemask+0x5ad/0x630
 [<ffffffff810f4058>] kmalloc_large_node+0x5a/0x97
 [<ffffffff810f5803>] __kmalloc_node_track_caller+0x2c/0x119
 [<ffffffff81381651>] ? __netdev_alloc_skb+0x2f/0x4c
 [<ffffffff81381224>] __alloc_skb+0x7b/0x16b
 [<ffffffff81381651>] __netdev_alloc_skb+0x2f/0x4c
 [<ffffffffa015d51b>] rtl8169_rx_fill+0xa3/0x14f [r8169]
 [<ffffffffa015f4f7>] rtl8169_init_ring+0x6c/0x99 [r8169]
 [<ffffffffa015fcf3>] rtl8169_open+0x7a/0x194 [r8169]
 [<ffffffff8138adfd>] dev_open+0x98/0xd3
 [<ffffffff8138a35c>] dev_change_flags+0xb9/0x179
 [<ffffffff81393925>] do_setlink+0x26c/0x33d
 [<ffffffff811c2f5b>] ? avc_has_perm+0x57/0x69
 [<ffffffff81393af3>] rtnl_setlink+0xfd/0x110
 [<ffffffff81393372>] rtnetlink_rcv_msg+0x1c1/0x1de
 [<ffffffff813931b1>] ? rtnetlink_rcv_msg+0x0/0x1de
 [<ffffffff813a433c>] netlink_rcv_skb+0x3e/0x8f
 [<ffffffff813931aa>] rtnetlink_rcv+0x21/0x28
 [<ffffffff813a411b>] netlink_unicast+0xe6/0x14f
 [<ffffffff813a4e22>] netlink_sendmsg+0x254/0x263
 [<ffffffff813793b1>] __sock_sendmsg+0x59/0x64
 [<ffffffff813796ae>] sock_sendmsg+0xa3/0xbc
 [<ffffffff813796ae>] ? sock_sendmsg+0xa3/0xbc
 [<ffffffff8137833b>] ? might_fault+0x1c/0x1e
 [<ffffffff81382ce0>] ? copy_from_user+0x2a/0x2c
 [<ffffffff813830b2>] ? verify_iovec+0x4f/0x8d
 [<ffffffff8137997e>] sys_sendmsg+0x217/0x29b
 [<ffffffff8137972f>] ? sockfd_lookup_light+0x1b/0x53
 [<ffffffff81379712>] ? fput_light+0xd/0xf
 [<ffffffff8137b274>] ? sys_sendto+0x120/0x14d
 [<ffffffff81109057>] ? path_put+0x1d/0x22
 [<ffffffff81095cff>] ? audit_syscall_entry+0x119/0x145
 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b

Comment 1 A. Folger 2010-09-01 05:51:53 UTC

I forgot. Here is some data about my hardware:
# dmidecode
<SNIP>
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
        Manufacturer: ASUSTeK Computer INC.
        Product Name: M4A88T-M
<SNIP>
Handle 0x0004, DMI type 4, 40 bytes
Processor Information
        Socket Designation: AM3
<SNIP>
Handle 0x002D, DMI type 10, 6 bytes
On Board Device Information
        Type: Ethernet
        Status: Enabled
        Description: To Be Filled By O.E.M.
<SNIP>
#

Comment 2 Stanislaw Gruszka 2010-09-09 14:24:20 UTC

After resume network driver rtl8169 is not able to allocate memory and fail to initialize, hence "disappears"  effect. I will prepare patch with change allocation strategy, what should fix the problem.

Comment 3 Stanislaw Gruszka 2010-09-09 16:15:13 UTC

Created attachment 446284 [details]
f13-r8169-alloc-fix.patch

Could you please test this patch. Please build debug kernel since it catch bugs when improper allocation method is used. I you can not build kernel by yourself, I will prepare packages tomorrow. Thanks.

Comment 4 Stanislaw Gruszka 2010-09-10 14:49:00 UTC

http://koji.fedoraproject.org/koji/taskinfo?taskID=2459629
Please test kernel-debug when it finish to build.

Comment 5 Stanislaw Gruszka 2010-09-13 11:55:07 UTC

These koji builds are automatically removed after about a week, so please test soon.

Comment 6 A. Folger 2010-09-13 15:08:08 UTC

Created attachment 446963 [details]
excerpt of /var/log/messages, using the testing/debugging kernel

Sorry for any lateness in reporting. I installed yesterday, and and while the computer suspends properly, it hangs upon resume. I wanted to test a few times, and can confirm that it hangs every time. Therefore, I cannot say whether or not the patch works for the network, I never get that far.

I am attaching an excerpt of /var/log/messages.

File starts with successful boot at 15:06:14. Resume at about the middle of the file (about line 750), at 16:17.

Comment 7 Stanislaw Gruszka 2010-09-13 15:29:42 UTC

Ok. I will build 2.6.33 kernel with the patch, which should not have resume problems on your system. On a while, could you provide info described
in https://wiki.ubuntu.com/DebuggingKernelSuspend on 2.6.34 kernel ?

Comment 8 Stanislaw Gruszka 2010-09-13 15:34:16 UTC

(In reply to comment #7)
> problems on your system. On a while, could you provide info described
> in https://wiki.ubuntu.com/DebuggingKernelSuspend on 2.6.34 kernel ?

Oops sorry, that info is not needed, just looked at logs. There is some problem
with graphics driver, maybe updating Xorg will help?

Comment 9 A. Folger 2010-09-13 18:31:09 UTC

All the updates that were released have been applied, so I am not sure how I am supposed to figure out if Xorg should be updated. May be you should CC in some Xorg person in this thread?

Do note that the graphics problem had been somewhat diagnosed, earlier, in the following bug report, which I now updated with the latest info from our bug here. Sadly, no one responded yet to that other bug report: https://bugzilla.redhat.com/show_bug.cgi?id=622737 .

Comment 10 Stanislaw Gruszka 2010-09-14 15:37:46 UTC

(In reply to comment #9)
> All the updates that were released have been applied, so I am not sure how I am
> supposed to figure out if Xorg should be updated.

If you did 
# yum --enablerepo=updates-testing update
that's the updates I was talking about.

> May be you should CC in some
> Xorg person in this thread?
I will reassign your other bug to Xorg, but first I have to think more about it, fedora graphics is mixed kernel - user space monster, quite frequently is not clear where the problem is.

Comment 11 Stanislaw Gruszka 2010-09-14 15:39:08 UTC

Here is 2.6.33 kernel build with proposed fix http://koji.fedoraproject.org/koji/taskinfo?taskID=2466611

Comment 12 Stanislaw Gruszka 2010-09-17 12:55:38 UTC

Resume problems on 2.6.34 seems to be caused by radeon driver. I just changed the topic, maybe someone will pick up this bug.

What about that problem?

Note: to test on 2.6.34 you can login as root to virtual terminal (using Ctrl+Alt+F2 on X window, to go back to X use Alt+F1 or Alt+F{Number} to login on different VT). Then run init 3 ("init 5" to turn on X window again), then "pm-suspend" . If suspend/resume still not work, you can boot kernel with radeon.modeset=0 parameter (add in /boot/grub/grub.conf).

Comment 13 A. Folger 2010-09-22 10:34:55 UTC

OK, I tried your tip, and when resuming, I could not get my screen back, at all. However, one little thing did run better: the keyboard wasn't locked, and I managed to switch terminals, log in as root and type reboot, all without a screen.

I also wonder: I generally boot with nomodeset (because there is another problem with edid, so I don't get my full screen resolution unless doing so - the subject of a separate, older bug report of mine). Does adding radeon.modeset=0 add anything? I tried with and without, and had the same result every time, so I don't know whether the radeon parameter added anything.

Anyway, given that even without X I have this problem, I believe we must conclude this is a kernel issue.

Comment 14 Stanislaw Gruszka 2010-09-22 10:59:05 UTC

(In reply to comment #13)
> I also wonder: I generally boot with nomodeset (because there is another
> problem with edid, so I don't get my full screen resolution unless doing so -
> the subject of a separate, older bug report of mine). Does adding
> radeon.modeset=0 add anything? 

In new kernels radeon.modeset=0 is replacement of nomodeset=1 (for ATI devices only), it should have the same effects. For example resolution of Virtual Terminal (switched by Ctrl+Alt+Fn) should be different. Also "radeon kernel modesetting enabled" is printed in dmesg when radeon.modeset=1 (default)

What about 2.6.33 kernel from comment 11 and problem with r8169 memory allocations? IIRC on 2.6.33, radeon drivers works fine on your system.

Comment 15 A. Folger 2010-09-22 12:08:30 UTC

What do you mean? It was under 2.6.33 that I first reported the problem. It wasn't good then, either.

Comment 16 Stanislaw Gruszka 2010-09-22 12:18:45 UTC

In comment 0 we have r8169 allocation failures on resume on 2.6.33.6-147.2.4.fc13.x86_64 kernel.

Does kernel-2.6.33.6-147.bz629158.fc13.x86_64 from
http://koji.fedoraproject.org/koji/taskinfo?taskID=2466611
helps with that issue and does not cause any other problems?

Comment 17 A. Folger 2010-09-22 12:45:02 UTC

Is that kernel different from the plain vanilla kernel-2.6.33.6-147 testing or debug kernel, which I both tried?

Comment 18 Stanislaw Gruszka 2010-09-22 13:00:00 UTC

Yes, it has patch from comment 3 applied.

Comment 19 Stanislaw Gruszka 2010-09-23 12:16:47 UTC

I just posted patches which should fix problem from comment 0
http://marc.info/?l=linux-netdev&m=128524323702376&w=2
http://marc.info/?l=linux-netdev&m=128524323702378&w=2

Since you have other, worse problems with suspend/resume with bug 622737 (and I don't have much time :-() I will not backport that patches to current fedora, however bug will be fixed in future releases.

Comment 20 Stanislaw Gruszka 2010-09-24 08:24:35 UTC

Upstream r8169 driver maintainer would like to know if patch really fix the problem. Did you test kernel-2.6.33.6-147.bz629158 ? If not, will you test it if I build the kernel with the patch (previous koji build was removed) ?

Comment 21 Stanislaw Gruszka 2010-09-24 22:14:38 UTC

*** Bug 566389 has been marked as a duplicate of this bug. ***

Comment 22 Stanislaw Gruszka 2010-09-24 22:17:06 UTC

Let's reopen since more people interested by fixing that problem.

Can someone test patch from comment 3 or upstream patches from  comment 19 ?

Comment 23 Stanislaw Gruszka 2010-09-24 22:32:17 UTC

*** Bug 567256 has been marked as a duplicate of this bug. ***

Comment 24 Serguei Miridonov 2010-09-25 03:32:17 UTC

(In reply to comment #21)
> *** Bug 566389 has been marked as a duplicate of this bug. ***

Please, read my reply here: https://bugzilla.redhat.com/show_bug.cgi?id=566389#c20

Comment 25 A. Folger 2010-09-26 15:47:18 UTC

I will gladly test the patch, but am right now going through a drive repair and have to travel for a few days. Will try next week.

Comment 26 Neal Becker 2010-09-26 15:59:37 UTC

Is there a kernel build with the proposed patch, or do I need to get kernel srpm and rebuild myself?

Comment 27 Stanislaw Gruszka 2010-09-29 09:36:14 UTC

F-13 koji builds are here (for 2.6.33 and 2.6.34 kernels respectively):
http://koji.fedoraproject.org/koji/taskinfo?taskID=2496152
http://koji.fedoraproject.org/koji/taskinfo?taskID=2496181

Comment 28 Stanislaw Gruszka 2010-09-29 15:07:10 UTC

Created attachment 450501 [details]
f12-r8169-alloc-fix.patch

The same fix for F-12.

http://koji.fedoraproject.org/koji/taskinfo?taskID=2496825

Comment 29 James 2010-10-01 18:17:28 UTC

(In reply to comment #3)
> Created attachment 446284 [details]
> f13-r8169-alloc-fix.patch

I'm currently using the F14 (2.6.35) series kernel on F13. Two of the hunks in this patch fail to apply cleanly to this version, but it builds OK. Not seen any r8169 PAFs yet, but "the night is young"...

Comment 30 Stanislaw Gruszka 2010-10-02 18:15:07 UTC

(In reply to comment #29)
> I'm currently using the F14 (2.6.35) series kernel on F13. Two of the hunks in
> this patch fail to apply cleanly to this version, but it builds OK. Not seen
> any r8169 PAFs yet, but "the night is young"...

You should look at drivers/net/r8169.c.rej and integrate remaining hunks by hand. Anyway these two hunks are not related with bug directly (I just checked), so I guess everything should be fine.

Comment 31 Stanislaw Gruszka 2010-10-02 18:20:51 UTC

Created attachment 451212 [details]
f14-r8169-alloc-fix.patch

The same fix for Fedora 14

Comment 32 A. Folger 2010-10-04 15:18:00 UTC

(In reply to comment #19)
> I just posted patches which should fix problem from comment 0
> http://marc.info/?l=linux-netdev&m=128524323702376&w=2
> http://marc.info/?l=linux-netdev&m=128524323702378&w=2
> 
> Since you have other, worse problems with suspend/resume with bug 622737 (and I
> don't have much time :-() I will not backport that patches to current fedora,
> however bug will be fixed in future releases.

I was off line for a few days, as I had a really dicey issue that required a reinstall (grub was totally corrupted and I couldn't figure out what was wrong, but I couldn't get even to the boot menu, even though that part was all right and grub theoretically in charge.), so I didn't do anything until yesterday. I now have the 2.6.34.7-56.fc13.x86_64 kernel install, and it works well. This problem seems solved, apparently thanks to you! If any part of the problem reappears, I will report.

Oh, I should also mention that so far, I am still using nomodeset, because otherwise the max screen resolution isn't available, so I do not know whether removing nomodeset will negatively influence your patch. Please let me know whether you want me to test this, too.

Cheers,

Comment 33 Stanislaw Gruszka 2010-10-04 15:49:54 UTC

(In reply to comment #32)
> now have the 2.6.34.7-56.fc13.x86_64 kernel install, and it works well.  
Hmm, this kernel does not include the fix, try 2.6.34.7-58.bz629158.fc13 from 
http://koji.fedoraproject.org/koji/taskinfo?taskID=2496181

Comment 34 Stanislaw Gruszka 2010-10-04 15:51:01 UTC

Anyone can confirm problem is fixed in test kernels/patches I prepared?

Comment 35 Stanislaw Gruszka 2010-10-06 18:04:54 UTC

Guys, please give me info if test kernels fix the problem or not, otherwise this bug will not be fixed.

Comment 36 Neal Becker 2010-10-06 19:20:31 UTC

The last page allocation failure message I see is from Sept 21.

Sep 29 08:44:31 Installed: kernel-devel-2.6.34.7-58.bz629158.fc13.x86_64

So, maybe it's fixed?

Before the fix (installed Sept 29), it was 8 days since the last occurrance.

Now it's been about another 8 days, and no occurance.

I use it every day.

Comment 37 Serguei Miridonov 2010-10-06 19:27:41 UTC

(In reply to comment #35)
> Guys, please give me info if test kernels fix the problem or not, otherwise
> this bug will not be fixed.

Have you sent announcement to this thread?

https://bugzilla.redhat.com/show_bug.cgi?id=566389

May be people there just don't receive these news...

I have just downloaded 

kernel-2.6.32.23-170.bz629158.fc12.i686.rpm
kernel-headers-2.6.32.23-170.bz629158.fc12.i686.rpm
kernel-devel-2.6.32.23-170.bz629158.fc12.i686.rpm

So, I will install this stuff and try for a week. Please wait...

Comment 38 A. Folger 2010-10-07 21:31:18 UTC

OK, I am back on after going through a move and a (not so smooth) ISP change. The stock kernels *seem* to work, but that's deceptive. Basically,  I can't figure it out. Sometimes/much of the time the network works upon resume, but at other times, it doesn't come back up. I now tried downloading your test kernels to testing them over the weekend, but they are no longer up. Can you put 'em back up? I will test.

Comment 39 Stanislaw Gruszka 2010-10-08 14:20:08 UTC

I should copy these test kernels from koji to other site, ah ... I will not build another scratch kernel, I will rather try to put patches upstream and to fedora. Since we have Neal confirmation, now I can proceed.

Comment 40 Serguei Miridonov 2010-10-12 11:34:17 UTC

Stanislaw, sorry but I could not test the kernel for F-12 because the kernel-firmware package is absent, so yum refuses to install the new kernel. Could you push your fixes to updates-testing repository?

When it is ready, could you please make an announcement in previous thread? :

https://bugzilla.redhat.com/show_bug.cgi?id=566389

This is to make sure that people who started to report this bug could also test your fix.

Comment 41 Fedora Update System 2010-10-19 01:16:22 UTC

kernel-2.6.35.6-45.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.6-45.fc14

Comment 42 Fedora Update System 2010-10-19 06:31:39 UTC

kernel-2.6.34.7-61.fc13 has been submitted as an update for Fedora 13.
https://admin.fedoraproject.org/updates/kernel-2.6.34.7-61.fc13

Comment 43 Fedora Update System 2010-10-19 06:36:01 UTC

kernel-2.6.32.23-170.fc12 has been submitted as an update for Fedora 12.
https://admin.fedoraproject.org/updates/kernel-2.6.32.23-170.fc12

Comment 44 Fedora Update System 2010-10-19 09:11:41 UTC

kernel-2.6.35.6-45.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 45 Fedora Update System 2010-10-22 18:04:55 UTC

kernel-2.6.34.7-61.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 46 Fedora Update System 2010-10-30 23:42:16 UTC

kernel-2.6.32.23-170.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 47 A. Folger 2010-10-31 11:34:07 UTC

(In reply to comment #42)
> kernel-2.6.34.7-61.fc13 has been submitted as an update for Fedora 13.
> https://admin.fedoraproject.org/updates/kernel-2.6.34.7-61.fc13

I am using this kernel, and while most of the time the problem is gone, occasionally, the network still fails to come back up after resume. This was diagnosed both on an i686 and an x86_64 system.

Comment 48 Serguei Miridonov 2010-11-07 20:12:31 UTC

Suggested fix is not working:

Linux xxxxxxxxxxx 2.6.32.23-170.fc12.i686 #1 SMP Mon Sep 27 17:58:16 UTC 2010 i686 i686 i386 GNU/Linux

NetworkManager: page allocation failure. order:3, mode:0x4020
Pid: 15744, comm: NetworkManager Tainted: P           2.6.32.23-170.fc12.i686 #1
Call Trace:
 [<c07946c6>] ? printk+0x14/0x16
 [<c04aac75>] __alloc_pages_nodemask+0x44c/0x4ac
 [<c04aace9>] __get_free_pages+0x14/0x26
 [<c04d01f2>] __kmalloc_track_caller+0x37/0x127
 [<c0706ea6>] ? __netdev_alloc_skb+0x1b/0x36
 [<c0706800>] __alloc_skb+0x4e/0x10d
 [<c0706ea6>] __netdev_alloc_skb+0x1b/0x36
 [<f7f45435>] rtl8169_rx_fill+0x93/0x12d [r8169]
 [<f7f459c0>] rtl8169_init_ring+0x58/0x84 [r8169]
 [<f7f47f68>] rtl8169_open+0x6e/0x15e [r8169]
 [<c070ec58>] dev_open+0x8b/0xc5
 [<c070e4b2>] dev_change_flags+0xa9/0x158
 [<c07166d8>] do_setlink+0x242/0x2e8
 [<c071677e>] ? rtnl_setlink+0x0/0xee
 [<c071685b>] rtnl_setlink+0xdd/0xee
 [<c0702f00>] ? sk_wait_data+0x6a/0x9a
 [<c071677e>] ? rtnl_setlink+0x0/0xee
 [<c07161e2>] rtnetlink_rcv_msg+0x190/0x1a6
 [<c05bff23>] ? might_fault+0x1e/0x20
 [<c0724470>] ? netlink_sendmsg+0x152/0x228
 [<c0716052>] ? rtnetlink_rcv_msg+0x0/0x1a6
 [<c0723b7f>] netlink_rcv_skb+0x35/0x7b
 [<c071604b>] rtnetlink_rcv+0x20/0x27
 [<c07239a3>] netlink_unicast+0xc3/0x11e
 [<c0724539>] netlink_sendmsg+0x21b/0x228
 [<c06fffff>] __sock_sendmsg+0x4a/0x53
 [<c0700678>] sock_sendmsg+0xbb/0xd1
 [<c04547a1>] ? autoremove_wake_function+0x0/0x34
 [<c04547a1>] ? autoremove_wake_function+0x0/0x34
 [<c05bff23>] ? might_fault+0x1e/0x20
 [<c05c0096>] ? copy_from_user+0x32/0x11a
 [<c070817a>] ? verify_iovec+0x43/0x71
 [<c070081a>] sys_sendmsg+0x18c/0x1f0
 [<c07014c5>] ? sys_recvmsg+0x1c2/0x1e1
 [<c04a5dab>] ? find_get_page+0x22/0x7c
 [<c04bc20b>] ? handle_mm_fault+0x47a/0x93e
 [<c079505b>] ? schedule+0x817/0x864
 [<c0701afa>] sys_socketcall+0x163/0x195
 [<c040ac82>] ? syscall_trace_leave+0xaa/0xbd
 [<c040367c>] syscall_call+0x7/0xb
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 155
CPU    1: hi:  186, btch:  31 usd:  59
HighMem per-cpu:
CPU    0: hi:  186, btch:  31 usd:  63
CPU    1: hi:  186, btch:  31 usd:  82
active_anon:316984 inactive_anon:119250 isolated_anon:0
 active_file:120833 inactive_file:111248 isolated_file:0
 unevictable:0 dirty:17 writeback:0 unstable:0
 free:48183 slab_reclaimable:30602 slab_unreclaimable:7930
 mapped:34850 shmem:1391 pagetables:3286 bounce:0
DMA free:3488kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:480kB inactive_file:96kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:3812kB slab_unreclaimable:556kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 861 3029 3029
Normal free:158048kB min:3720kB low:4648kB high:5580kB active_anon:76520kB inactive_anon:152480kB active_file:138912kB inactive_file:127416kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:881880kB mlocked:0kB dirty:4kB writeback:0kB mapped:14316kB shmem:8kB slab_reclaimable:118596kB slab_unreclaimable:31164kB kernel_stack:3424kB pagetables:924kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 17348 17348
HighMem free:31196kB min:512kB low:2852kB high:5196kB active_anon:1191416kB inactive_anon:324520kB active_file:343940kB inactive_file:317480kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2220560kB mlocked:0kB dirty:64kB writeback:0kB mapped:125084kB shmem:5556kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:12220kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 38*4kB 63*8kB 11*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3488kB
Normal: 20832*4kB 7564*8kB 877*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 158064kB
HighMem: 3721*4kB 1427*8kB 212*16kB 37*32kB 1*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 31196kB
238182 total pagecache pages
4711 pages in swap cache
Swap cache stats: add 70171, delete 65460, find 781095/782859
Free swap  = 8190568kB
Total swap = 8385888kB
785920 pages RAM
559618 pages HighMem
12180 pages reserved
239330 pages shared
612522 pages non-shared

After "sysctl vm.min_free_kbytes=65536" ehternet is working again without reboot or module reload.

Comment 49 Stanislaw Gruszka 2010-11-08 08:57:32 UTC

(In reply to comment #48)
> Suggested fix is not working:
> 
> Linux xxxxxxxxxxx 2.6.32.23-170.fc12.i686 #1 SMP Mon Sep 27 17:58:16 UTC 2010
> i686 i686 i386 GNU/Linux
> 
> NetworkManager: page allocation failure. order:3, mode:0x4020

mode: 0x4020 mean atomic allocation, so fix was not there.  

I checked 2.6.32.23-170 sources and indeed patch was not there. It was dropped because patch was merged upstream to -stable kernels, but the fix was removed too early. Anyway current 2.6.32.25 kernel have this fix.

Comment 50 Stanislaw Gruszka 2010-11-08 09:13:11 UTC

(In reply to comment #47)
> (In reply to comment #42)
> > kernel-2.6.34.7-61.fc13 has been submitted as an update for Fedora 13.
> > https://admin.fedoraproject.org/updates/kernel-2.6.34.7-61.fc13
> 
> I am using this kernel, and while most of the time the problem is gone,
> occasionally, the network still fails to come back up after resume. This was
> diagnosed both on an i686 and an x86_64 system.

kernel-2.6.34.7-61 have the fix, hmm. This can be different problem or indeed patch does not fix allocation issues, like was suggested by Serguei.

Please attach dmesg when the problem happen.

Comment 51 Serguei Miridonov 2010-11-08 21:19:27 UTC

(In reply to comment #49)
> (In reply to comment #48)
> > Suggested fix is not working:
> > 
> > Linux xxxxxxxxxxx 2.6.32.23-170.fc12.i686 #1 SMP Mon Sep 27 17:58:16 UTC 2010
> > i686 i686 i386 GNU/Linux
> > 
> > NetworkManager: page allocation failure. order:3, mode:0x4020
> 
> mode: 0x4020 mean atomic allocation, so fix was not there.  
> 
> I checked 2.6.32.23-170 sources and indeed patch was not there. It was dropped
> because patch was merged upstream to -stable kernels, but the fix was removed
> too early. Anyway current 2.6.32.25 kernel have this fix.

# yum update
Loaded plugins: aliases, auto-update-debuginfo, changelog, dellsysidplugin2, downloadonly, fastestmirror, filter-data,
              : keys, kmdl, list-data, merge-conf, post-transaction-actions, priorities, protectbase, refresh-packagekit,
              : remove-with-leaves, rpm-warm-cache, security, show-leaves, tsflags, upgrade-helper, verify, versionlock
Loading mirror speeds from cached hostfile
.....

Skipping filters plugin, no data
0 packages excluded due to repository protections
Skipping security plugin, no data
Setting up Update Process
No Packages marked for Update

So, where is the current kernel with this fix?

Comment 52 Stanislaw Gruszka 2010-12-01 12:59:45 UTC

(In reply to comment #51)
> # yum update
[snip]
> No Packages marked for Update
> 
> So, where is the current kernel with this fix?

I don't know why repositories (still!) are not updated. You can download latest kernels directly from koji http://koji.fedoraproject.org/koji/packageinfo?packageID=8

Comment 53 Serguei Miridonov 2010-12-03 17:15:31 UTC

Since today I'm testing kernel 2.6.32.26-175.fc12.i686 with all workarounds removed. Please, standby.

Comment 54 John Lumby 2010-12-14 14:30:44 UTC

I also suffered from this bug for a while and have a question.    Why does the driver throw an ENOMEM if it cannot allocate a full complement of 256 packet/data buffers?     I mean,  suppose that in init_ring / rx_fill,  the loop has allocated,  say,  255 buffers successfully,   and then fails with nomem on the 256'th.    Why does it not simply continue on and use the 255 it allocated?   Why fail the entire device open?

After all the various fixes,  this is still the case today (2.6.37-rc5)

The chip certainly does not insist that there must be 256 rx descriptors in the chain passed to it.     I have verified that in the grub netboot context.    And I've been running with 128 on my linux kernel for a while now.

Maybe this is moot if the bug really has been fixed  -   I don't know.   Has it (definitively been fixed?)   or still being assessed?

Comment 55 Stanislaw Gruszka 2010-12-15 08:03:07 UTC

(In reply to comment #54)
> I also suffered from this bug for a while and have a question.    Why does the
> driver throw an ENOMEM if it cannot allocate a full complement of 256
> packet/data buffers?  
Driver could use smaller ring buffer, but it need to be rewritten to use variable instead of hard codded NUM_RX_DESC .

> After all the various fixes,  this is still the case today (2.6.37-rc5)
Hmm, can you add comment and attach dmesg to https://bugzilla.kernel.org/show_bug.cgi?id=19752 . If still we fail to allocate in not atomic mode, this seems to be issue of allocator not the driver. Anyway dmesg should show some interesting information.

Comment 56 John Lumby 2010-12-15 15:02:31 UTC

> Driver could use smaller ring buffer, but it need to be rewritten to use 
> variable instead of hard codded NUM_RX_DESC .

Yes, that's what I did, and with a couple of other minor changes including
new module param to specify num_rx_buffs to (try to) alloc at open, this
has been working fine for some time.  It seems to me to be an improvement
even after all other fixes but don't know if actually needed.
I can send you my patch if you are interested.

> > After all the various fixes,  this is still the case today (2.6.37-rc5)
> Hmm, can you add comment and attach dmesg to
> https://bugzilla.kernel.org/show_bug.cgi?id=19752 . If still we fail

Sorry, I did not make clear - when I said "this is still the case today" the
"this" I am referring to is the driver logic (insist on 256), not occurrence
of problem.  I do not know whether the problem itself exists in latest level
of driver, and was too lazy to try my failure scenario on latest kernel build
because this thread says it is still under assessment and to stand by.  I am
still on older level (2.6.33) but can do that some time if no-one else has.

I did not see there is a kernel bugzilla on this until you mentioned it  -
(search didn't find it)  -  maybe move discussion there.

Comment 57 Serguei Miridonov 2010-12-15 16:43:19 UTC

From yum.log:

Dec 03 08:08:08 Installed: kernel-2.6.32.26-175.fc12.i686

No more issues since then. All quirks removed. Will continue to test.

Comment 58 Stanislaw Gruszka 2010-12-16 08:35:57 UTC

(In reply to comment #56)
> Yes, that's what I did, and with a couple of other minor changes including
> new module param to specify num_rx_buffs to (try to) alloc at open, this
> has been working fine for some time.  It seems to me to be an improvement

If you think patch is needed rebase it to current upstream code and post to netdev mailing list and maintainer.

Comment 59 Neal Becker 2010-12-16 14:36:29 UTC

I haven't seen this error for quite a long time now with any kernel, but maybe cause I now have 4G ram

Comment 60 John Lumby 2010-12-16 15:09:05 UTC

Thanks for the updates  -  sounds as though it is really fixed now.   In which case my patch is obsoleted I think.    If anyone finds some reason to want it,  feel free to request it from me.

Comment 61 Serguei Miridonov 2011-01-09 05:33:25 UTC

(In reply to comment #57)
> From yum.log:
> 
> Dec 03 08:08:08 Installed: kernel-2.6.32.26-175.fc12.i686
> 
> No more issues since then. All quirks removed. Will continue to test.

Fix confirmed: 

$ uptime
 21:32:26 up 29 days, 21:11,  6 users,  load average: 0.57, 0.68, 0.49
$ uname -a
Linux quantumpoint 2.6.32.26-175.fc12.i686 #1 SMP Wed Dec 1 21:52:04 UTC 2010 i686 i686 i386 GNU/Linux

No more problems.