566389 – r8169-related page allocation failure, had to restart NetworkManager

Bug 566389 - r8169-related page allocation failure, had to restart NetworkManager

Summary: r8169-related page allocation failure, had to restart NetworkManager

Keywords:
Status:	CLOSED DUPLICATE of bug 629158
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	12
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	568992 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-02-18 10:08 UTC by James
Modified:	2010-09-29 15:10 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-09-24 22:14:38 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Linux Kernel	16441	0	None	None	None	Never

Description James 2010-02-18 10:08:49 UTC

Description of problem:
r8169-related page allocation failure in NetworkManager, shortly after resume and change of network. Had to restart NetworkManager to get networking back.


NetworkManager: page allocation failure. order:3, mode:0x4020
Pid: 1427, comm: NetworkManager Not tainted 2.6.31.12-rhapsody.fc12-121 #1
Call Trace:
 [<ffffffff810c876f>] __alloc_pages_nodemask+0x57a/0x5bb
 [<ffffffff810f415d>] alloc_pages_node+0x48/0x4a
 [<ffffffff810f4189>] kmalloc_large_node+0x2a/0x67
 [<ffffffff810f5f1c>] __kmalloc_node_track_caller+0x31/0x11b
 [<ffffffff8136f4fe>] ? __netdev_alloc_skb+0x34/0x50
 [<ffffffff8136e8b8>] __alloc_skb+0x80/0x170
 [<ffffffff8136f4fe>] __netdev_alloc_skb+0x34/0x50
 [<ffffffffa011c5e0>] rtl8169_rx_fill+0xa8/0x154 [r8169]
 [<ffffffffa011e5c5>] rtl8169_init_ring+0x71/0x9f [r8169]
 [<ffffffffa011edbe>] rtl8169_open+0x7f/0x199 [r8169]
 [<ffffffff813779fe>] dev_open+0x9d/0xd8
 [<ffffffff81377160>] dev_change_flags+0xad/0x16e
 [<ffffffff813809b0>] do_setlink+0x2c2/0x393
 [<ffffffff81417d95>] ? _read_lock+0x1b/0x2e
 [<ffffffff81380b94>] rtnl_setlink+0x113/0x126
 [<ffffffff8138036e>] rtnetlink_rcv_msg+0x1c6/0x1e3
 [<ffffffff813801a8>] ? rtnetlink_rcv_msg+0x0/0x1e3
 [<ffffffff81391e81>] netlink_rcv_skb+0x43/0x96
 [<ffffffff813801a1>] rtnetlink_rcv+0x26/0x2d
 [<ffffffff813919ca>] netlink_unicast+0x125/0x18e
 [<ffffffff81391cb2>] netlink_sendmsg+0x27f/0x28e
 [<ffffffff81365f49>] __sock_sendmsg+0x61/0x6c
 [<ffffffff813666c1>] sock_sendmsg+0xcc/0xe5
 [<ffffffff8136658c>] ? sock_recvmsg+0xcf/0xe8
 [<ffffffff810661b7>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff810661b7>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff81367241>] ? move_addr_to_kernel+0x48/0x4d
 [<ffffffff8136fc87>] ? verify_iovec+0x51/0x8e
 [<ffffffff813668fb>] sys_sendmsg+0x221/0x2a5
 [<ffffffff81366004>] ? sockfd_lookup_light+0x20/0x58
 [<ffffffff81365fe2>] ? fput_light+0x12/0x14
 [<ffffffff8136736b>] ? sys_sendto+0x125/0x152
 [<ffffffff8110704e>] ? path_put+0x22/0x26
 [<ffffffff810959d5>] ? audit_syscall_entry+0x11e/0x14a
 [<ffffffff81011e32>] system_call_fastpath+0x16/0x1b
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  64
CPU    1: hi:  186, btch:  31 usd: 173
Active_anon:115757 active_file:131723 inactive_anon:38704
 inactive_file:132184 unevictable:4 dirty:58 writeback:1 unstable:0
 free:3970 slab:55352 mapped:31683 pagetables:11260 bounce:0
Node 0 DMA free:7996kB min:40kB low:48kB high:60kB active_anon:12kB inactive_anon:112kB active_file:2804kB inactive_file:4584kB unevictable:0kB present:15304kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 1995 1995 1995
Node 0 DMA32 free:7884kB min:5692kB low:7112kB high:8536kB active_anon:463016kB inactive_anon:154704kB active_file:524088kB inactive_file:524152kB unevictable:16kB present:2043040kB pages_scanned:292 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 5*4kB 3*8kB 3*16kB 3*32kB 2*64kB 2*128kB 3*256kB 3*512kB 1*1024kB 2*2048kB 0*4096kB = 7996kB
Node 0 DMA32: 1364*4kB 189*8kB 45*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 8008kB
293487 total pagecache pages
1395 pages in swap cache
Swap cache stats: add 6648, delete 5253, find 18309/18559
Free swap  = 3981624kB
Total swap = 3997688kB
521920 pages RAM
9514 pages reserved
326938 pages shared
300668 pages non-shared


Version-Release number of selected component (if applicable):
kernel-2.6.31.12-174.2.19.fc12.src.rpm

How reproducible:
Unknown, presumably sporadic.

Comment 1 Serguei Miridonov 2010-02-23 14:42:54 UTC

Hi, I can confirm this on HP Pavilion dv5 with

Linux 2.6.31.12-174.2.19.fc12.i686 #1 SMP Thu Feb 11 07:39:11 UTC 2010 i686 i686 i386 GNU/Linux

Often happens after wake up from suspend to RAM. Sometimes restarting NetworkManager and/or network services, reloading r8169 module does not help. The computer needs to reboot in order to get access to the nework again.

Any idea?

Comment 2 James 2010-02-23 20:17:24 UTC

This is a rare one for me. Normally I see PAFs left, right and centre if I pummel the I/O and VM subsystems (with default sysctl parameters), but I've seen this under light load... could be a legit bug in r8169 rather than general memory management horkage?

Comment 3 Serguei Miridonov 2010-02-26 05:55:16 UTC

After updating the kernel to 

l2.6.31.12-174.2.22.fc12.i686 #1 SMP Fri Feb 19 19:26:06 UTC 2010 i686 i686 i386

on February 23 it survived 2.5 days with 2-3 suspend-to-RAM/resume cycles every day. Today it happened again. Reloading module still works.

However, the crash log is slightly different, it starts from printk:

Feb 26 06:33:52 localhost NetworkManager: <info>  (eth0): now managed
Feb 26 06:33:52 localhost NetworkManager: <info>  (eth0): device state change: 1 -> 2 (reason 2)
Feb 26 06:33:52 localhost NetworkManager: <info>  (eth0): bringing up device.
Feb 26 06:33:52 localhost kernel: Pid: 1838, comm: NetworkManager Tainted: P           2.6.31.12-174.2.22.fc12.i686 #1
Feb 26 06:33:52 localhost kernel: Call Trace:
Feb 26 06:33:52 localhost kernel: [<c076688a>] ? printk+0x14/0x1a
Feb 26 06:33:52 localhost kernel: [<c04990bd>] __alloc_pages_nodemask+0x421/0x463
Feb 26 06:33:52 localhost kernel: [<c049913e>] __get_free_pages+0x14/0x26
Feb 26 06:33:52 localhost kernel: [<c04ba24d>] __kmalloc_track_caller+0x37/0x12d
Feb 26 06:33:52 localhost kernel: [<c06db7a3>] ? __netdev_alloc_skb+0x1b/0x36
Feb 26 06:33:52 localhost kernel: [<c06dadd5>] __alloc_skb+0x4e/0x10d
Feb 26 06:33:52 localhost kernel: [<c06db7a3>] __netdev_alloc_skb+0x1b/0x36
Feb 26 06:33:52 localhost kernel: [<f7da242d>] rtl8169_rx_fill+0x93/0x12d [r8169]
Feb 26 06:33:52 localhost kernel: [<f7da2999>] rtl8169_init_ring+0x58/0x84 [r8169]
Feb 26 06:33:52 localhost kernel: [<f7da47ba>] rtl8169_open+0x6e/0x15e [r8169]
Feb 26 06:33:52 localhost kernel: [<c06e2bd4>] dev_open+0x8b/0xc5
Feb 26 06:33:52 localhost kernel: [<c06e2435>] dev_change_flags+0x9b/0x14a
Feb 26 06:33:52 localhost kernel: [<c06ea4fe>] do_setlink+0x25d/0x303
Feb 26 06:33:52 localhost kernel: [<c06ea5a4>] ? rtnl_setlink+0x0/0xee
Feb 26 06:33:52 localhost kernel: [<c06ea681>] rtnl_setlink+0xdd/0xee
Feb 26 06:33:52 localhost kernel: [<c06ea5a4>] ? rtnl_setlink+0x0/0xee
Feb 26 06:33:52 localhost kernel: [<c06e9fed>] rtnetlink_rcv_msg+0x190/0x1a6
Feb 26 06:33:52 localhost kernel: [<c059ccbd>] ? might_fault+0x1c/0x1e
Feb 26 06:33:52 localhost kernel: [<c06f7ac5>] ? netlink_sendmsg+0x160/0x242
Feb 26 06:33:52 localhost kernel: [<c06e9e5d>] ? rtnetlink_rcv_msg+0x0/0x1a6
Feb 26 06:33:52 localhost kernel: [<c06f7d29>] netlink_rcv_skb+0x35/0x7c
Feb 26 06:33:52 localhost kernel: [<c06e9e56>] rtnetlink_rcv+0x20/0x27
Feb 26 06:33:52 localhost kernel: [<c06f7908>] netlink_unicast+0xec/0x149
Feb 26 06:33:52 localhost kernel: [<c06f7b9a>] netlink_sendmsg+0x235/0x242
Feb 26 06:33:52 localhost kernel: [<c06d40e3>] __sock_sendmsg+0x4a/0x53
Feb 26 06:33:52 localhost kernel: [<c06d4761>] sock_sendmsg+0xbb/0xd1
Feb 26 06:33:52 localhost kernel: [<c0449c21>] ? autoremove_wake_function+0x0/0x34
Feb 26 06:33:52 localhost kernel: [<c0449c21>] ? autoremove_wake_function+0x0/0x34
Feb 26 06:33:52 localhost kernel: [<c06d40e3>] ? __sock_sendmsg+0x4a/0x53
Feb 26 06:33:52 localhost kernel: [<c0449c21>] ? autoremove_wake_function+0x0/0x34
Feb 26 06:33:52 localhost kernel: [<c059ccbd>] ? might_fault+0x1c/0x1e
Feb 26 06:33:52 localhost kernel: [<c059ccf1>] ? copy_from_user+0x32/0x119
Feb 26 06:33:52 localhost kernel: [<c06dc2b0>] ? verify_iovec+0x43/0x6f
Feb 26 06:33:52 localhost kernel: [<c06d4903>] sys_sendmsg+0x18c/0x1f0
Feb 26 06:33:52 localhost kernel: [<c06d55f3>] ? sys_recvmsg+0x1c2/0x1e1
Feb 26 06:33:52 localhost kernel: [<c0497416>] ? list_add+0xf/0x11
Feb 26 06:33:52 localhost kernel: [<c0497d53>] ? __free_one_page+0x102/0x153
Feb 26 06:33:52 localhost kernel: [<c049b018>] ? put_compound_page+0x23/0x25
Feb 26 06:33:52 localhost kernel: [<c049b6ad>] ? put_page+0x1c/0x76
Feb 26 06:33:52 localhost kernel: [<c04b9a86>] ? kmem_cache_free+0x72/0xa9
Feb 26 06:33:52 localhost kernel: [<c06da0d8>] ? __kfree_skb+0x6f/0x72
Feb 26 06:33:52 localhost kernel: [<c06da0d8>] ? __kfree_skb+0x6f/0x72
Feb 26 06:33:52 localhost kernel: [<c06df5bb>] ? net_tx_action+0x5b/0xc6
Feb 26 06:33:52 localhost kernel: [<c043c0f9>] ? __do_softirq+0x148/0x157
Feb 26 06:33:52 localhost kernel: [<c06d5c39>] sys_socketcall+0x15f/0x18a
Feb 26 06:33:52 localhost kernel: [<c040365c>] syscall_call+0x7/0xb
Feb 26 06:33:52 localhost kernel: Mem-Info:
Feb 26 06:33:52 localhost kernel: DMA per-cpu:
Feb 26 06:33:52 localhost kernel: CPU    0: hi:    0, btch:   1 usd:   0
Feb 26 06:33:52 localhost kernel: CPU    1: hi:    0, btch:   1 usd:   0
Feb 26 06:33:52 localhost kernel: Normal per-cpu:
Feb 26 06:33:52 localhost kernel: CPU    0: hi:  186, btch:  31 usd: 156
Feb 26 06:33:52 localhost kernel: CPU    1: hi:  186, btch:  31 usd: 131
Feb 26 06:33:52 localhost kernel: HighMem per-cpu:
Feb 26 06:33:52 localhost kernel: CPU    0: hi:  186, btch:  31 usd: 102
Feb 26 06:33:52 localhost kernel: CPU    1: hi:  186, btch:  31 usd: 135
Feb 26 06:33:52 localhost kernel: Active_anon:161689 active_file:223208 inactive_anon:53083
Feb 26 06:33:52 localhost kernel: inactive_file:264241 unevictable:0 dirty:38 writeback:0 unstable:0
Feb 26 06:33:52 localhost kernel: free:19551 slab:32325 mapped:40330 pagetables:2320 bounce:0
Feb 26 06:33:52 localhost kernel: DMA free:3492kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB present:15864kB pages_scanned:0 all_unreclaimable? yes
Feb 26 06:33:52 localhost kernel: lowmem_reserve[]: 0 861 3029 3029
Feb 26 06:33:52 localhost kernel: Normal free:69728kB min:3720kB low:4648kB high:5580kB active_anon:16828kB inactive_anon:8008kB active_file:206216kB inactive_file:370512kB unevictable:0kB present:881880kB pages_scanned:0 all_unreclaimable? no
Feb 26 06:33:52 localhost kernel: lowmem_reserve[]: 0 0 17348 17348
Feb 26 06:33:52 localhost kernel: HighMem: 278*4kB 214*8kB 3*16kB 2*32kB 0*64kB 0*128kB 0*256kB 2*512kB 1*1024kB 0*2048kB 0*4096kB = 4984kB
Feb 26 06:33:52 localhost kernel: 488928 total pagecache pages
Feb 26 06:33:52 localhost kernel: 394 pages in swap cache
Feb 26 06:33:52 localhost kernel: Swap cache stats: add 8679, delete 8285, find 5158/5199
Feb 26 06:33:52 localhost kernel: Free swap  = 8352508kB
Feb 26 06:33:52 localhost kernel: Total swap = 8385888kB
Feb 26 06:33:52 localhost kernel: 785920 pages RAM
Feb 26 06:33:52 localhost kernel: 559618 pages HighMem
Feb 26 06:33:52 localhost kernel: 12098 pages reserved
Feb 26 06:33:52 localhost kernel: 293341 pages shared
Feb 26 06:33:52 localhost kernel: 636361 pages non-shared
Feb 26 06:33:52 localhost NetworkManager: <info>  (eth0): deactivating device (reason: 2).

Comment 4 Serguei Miridonov 2010-03-02 11:23:53 UTC

Today after 1 day, 21:55 uptime it happened again. Reloading module did not work, but after quitting two memory intensive applications (Java based) I could load the module without crash and network is working now.

Could it be that I need to use some kernel parameter (boot string,  /proc or    /sys file?) to keep some amount of memory free in order to make sure that it will be enough for the r8169 module?

Any idea?

Comment 5 r6144 2010-03-03 03:42:41 UTC

This happened to me as well on kernel-2.6.32.8-58.fc12.x86_64 after resuming from suspend-to-RAM.  rmmod and modprobe'ing the r8169 module causes the error to repeat.  I then used hugeadm to allocate a huge page (which caused a bit of thrashing but finally succeeded after one retry), deallocated it, and reloaded the r8169 module afterwards.  This time it succeeded.

Comment 6 Serguei Miridonov 2010-03-22 05:49:13 UTC

As a workaround I wrote a script: /etc/pm/sleep.d/85-network-r8169-module 

#!/bin/bash
case $1 in
    hibernate | suspend)
        echo "Removing r8169."
        modprobe -r r8169
        ;;
    thaw | resume)
        echo 3 > /proc/sys/vm/drop_caches
        modprobe r8169
        ;;
    *)
        ;;
esac
exit 0

It clears some memory on wakeup, so the r8169 module has enough space for page allocation when NetworkManager opens the eth0 device.

However, it would be better if someone checks the driver code for possible problems with memory allocation.

Comment 7 r6144 2010-04-20 13:01:42 UTC

This is happening to me after nearly every suspend on kernel-2.6.32.11-99.fc12.x86_64.

This seems to be caused by the patch linux-2.6-net-r8169-improved-rx-length-check-errors.patch that fixes CVE-2009-4537.  Apparently, the r8169 hardware does not handle the RxMaxSize setting correctly, allowing attackers to do remote memory corruption with a specially crafted packet.  The solution is to disable the RxMaxSize setting and always provide buffers of the maximum possible size, 16383.  However, it is difficult for the kernel to allocate these four pages of physically contiguous memory, especially since here the allocation seems to be GFP_ATOMIC.

Well, I don't think rtl8169_open() really needs GFP_ATOMIC, so maybe the code can be reorganized to work around this problem.

Comment 8 Serguei Miridonov 2010-04-20 13:27:42 UTC

Thank you for good news! At least we can hope now that the reason of this bug is known now. Any idea when it's going to be fixed?

With workaround two posts above the only minor problem I have is just some disk activity on a wakeup and slightly slower resume from RAM because something must be loaded from swap and from files after memory clean-up, as I understand. If this bug is fixed, the system will resume much faster.

Comment 9 James 2010-05-18 10:25:14 UTC

Caught another one, after suspend/resume. Restarting NM recovered service.

Comment 10 Dan Williams 2010-06-26 00:41:49 UTC

*** Bug 568992 has been marked as a duplicate of this bug. ***

Comment 11 James 2010-07-30 11:17:24 UTC

Still present in kernel-2.6.34.1-29.fc13.x86_64.

NetworkManager: page allocation failure. order:3, mode:0x20
Pid: 1169, comm: NetworkManager Not tainted 2.6.34.1-rhapsody.fc13.i915pp.noecd-209 #1
Call Trace:
 [<ffffffff810c8199>] __alloc_pages_nodemask+0x5a8/0x64f
 [<ffffffff810f4fc0>] kmem_getpages+0x5d/0x128
 [<ffffffff810f5be8>] fallback_alloc+0x13b/0x1bb
 [<ffffffff810f5a9e>] ____cache_alloc_node+0x10d/0x11c
 [<ffffffff810f5cee>] kmem_cache_alloc_node_notrace+0x86/0xb8
 [<ffffffff813854fa>] ? __alloc_skb+0x70/0x160
 [<ffffffff810f5e1f>] __kmalloc_node+0x68/0x98
 [<ffffffff813854fa>] __alloc_skb+0x70/0x160
 [<ffffffff81386114>] __netdev_alloc_skb+0x2f/0x4b
 [<ffffffffa013c546>] rtl8169_rx_fill+0xa3/0x14f [r8169]
 [<ffffffffa013e79f>] rtl8169_init_ring+0x6c/0x9a [r8169]
 [<ffffffffa013efae>] rtl8169_open+0x7a/0x194 [r8169]
 [<ffffffff8138e1db>] __dev_open+0x89/0xb7
 [<ffffffff8138bf2b>] __dev_change_flags+0xb9/0x13d
 [<ffffffff8138e11c>] dev_change_flags+0x1c/0x52
 [<ffffffff8139843b>] do_setlink+0x27e/0x4b9
 [<ffffffff813854fa>] ? __alloc_skb+0x70/0x160
 [<ffffffff81398773>] rtnl_setlink+0xfd/0x110
 [<ffffffff81397e48>] rtnetlink_rcv_msg+0x1c1/0x1de
 [<ffffffff81397c87>] ? rtnetlink_rcv_msg+0x0/0x1de
 [<ffffffff813a955d>] netlink_rcv_skb+0x3e/0x8f
 [<ffffffff81397c80>] rtnetlink_rcv+0x21/0x28
 [<ffffffff813a933b>] netlink_unicast+0xe6/0x14f
 [<ffffffff813a9af8>] netlink_sendmsg+0x254/0x263
 [<ffffffff8137cad6>] __sock_sendmsg+0x59/0x64
 [<ffffffff8137cdd3>] sock_sendmsg+0xa3/0xbc
 [<ffffffff8137cdd3>] ? sock_sendmsg+0xa3/0xbc
 [<ffffffff8137bab1>] ? might_fault+0x17/0x19
 [<ffffffff81386527>] ? copy_from_user+0x37/0x3f
 [<ffffffff81386893>] ? verify_iovec+0x4f/0x8c
 [<ffffffff8137d0a3>] sys_sendmsg+0x217/0x29b
 [<ffffffff8137ce54>] ? sockfd_lookup_light+0x1b/0x53
 [<ffffffff8137ce37>] ? fput_light+0xd/0xf
 [<ffffffff8137e9d4>] ? sys_sendto+0x120/0x14d
 [<ffffffff8110ad29>] ? path_put+0x1d/0x22
 [<ffffffff810967ec>] ? audit_syscall_entry+0x119/0x145
 [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  54
CPU    1: hi:  186, btch:  31 usd:  51
active_anon:118480 inactive_anon:51135 isolated_anon:0
 active_file:108920 inactive_file:116327 isolated_file:0
 unevictable:16 dirty:40 writeback:0 unstable:0
 free:31765 slab_reclaimable:35327 slab_unreclaimable:27823
 mapped:24736 shmem:24520 pagetables:9519 bounce:0
Node 0 DMA free:8104kB min:248kB low:308kB high:372kB active_anon:0kB inactive_anon:468kB active_file:1372kB inactive_file:4488kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15708kB mlocked:0kB dirty:0kB writeback:0kB mapped:192kB shmem:192kB slab_reclaimable:748kB slab_unreclaimable:716kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 1995 1995 1995
Node 0 DMA32 free:118956kB min:32516kB low:40644kB high:48772kB active_anon:473920kB inactive_anon:204072kB active_file:434308kB inactive_file:460820kB unevictable:64kB isolated(anon):0kB isolated(file):0kB present:2043040kB mlocked:64kB dirty:160kB writeback:0kB mapped:98752kB shmem:97888kB slab_reclaimable:140560kB slab_unreclaimable:110576kB kernel_stack:2472kB pagetables:38076kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 38*4kB 16*8kB 11*16kB 5*32kB 3*64kB 3*128kB 1*256kB 3*512kB 3*1024kB 1*2048kB 0*4096kB = 8104kB
Node 0 DMA32: 5041*4kB 9603*8kB 1277*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 118956kB
249869 total pagecache pages
102 pages in swap cache
Swap cache stats: add 1572, delete 1470, find 20/24
Free swap  = 4122604kB
Total swap = 4128764kB
521920 pages RAM
9803 pages reserved
261454 pages shared
296523 pages non-shared

Comment 12 James 2010-07-30 11:23:29 UTC

https://bugzilla.kernel.org/show_bug.cgi?id=16441 looks like a very similar thing.

Comment 13 Serguei Miridonov 2010-07-31 02:42:07 UTC

Please, vote to this bug. It seems to me that without votes nobody takes care on any bug report.

Comment 14 Serguei Miridonov 2010-09-02 19:26:29 UTC

I'm just interested, is there anybody who can submit a patch to the kernel fixing this issue? Because of this bug I have to flash all caches to the disk on resume to free memory and avoid this crash. This makes sleep/resume longer which is annoying. Is there a driver maintainer?

Comment 15 Chuck Ebbert 2010-09-03 11:26:29 UTC

The real solution is to stop using r8169. :/

But you can work around the problem to some extent by setting the sysctl vm.min_free_kbytes to a large value, like at least 65536

Comment 16 James 2010-09-03 12:41:50 UTC

(In reply to comment #15)
> The real solution is to stop using r8169. :/

Is there another driver for this hardware we can use?

Comment 17 Serguei Miridonov 2010-09-03 15:11:45 UTC

(In reply to comment #15)
> The real solution is to stop using r8169. :/
> 
> But you can work around the problem to some extent by setting the sysctl
> vm.min_free_kbytes to a large value, like at least 65536

I'll try that, but... As I understand, this will keep 64M of memory free. But this does not guarantee that it will be physically contiguous. So, we can only hope that such an amount may have 16383 bytes of contiguous memory for this driver. And, therefore, if these 64M is highly fragmented, it might happen that driver will crash sometimes. Is it correct?

Why does rtl8169_open() uses GFP_ATOMIC? It is not an interrupt handler, it is just a call from the user space which can be delayed while something is swapped out...

Comment 18 James 2010-09-18 10:03:07 UTC

Still present in 2.6.35.4 series kernels.

Comment 19 Stanislaw Gruszka 2010-09-24 22:14:38 UTC


*** This bug has been marked as a duplicate of bug 629158 ***

Comment 20 Serguei Miridonov 2010-09-25 03:30:51 UTC

(In reply to comment #19)
> 
> *** This bug has been marked as a duplicate of bug 629158 ***

You have just switched to another thread with only one bug reporter... This thread was started earlier, so it would be more logical to make this thread as main and declare bug 629158 as duplicate of this bug.

Please note that this bug does not cause kernel oops every time after resume because it depends on the current memory state. Some people may try workarounds mentioned here and thus this bug does not manifest itself. For example, the workaround with line "vm.min_free_kbytes = 65536" in /etc/sysctl.conf still works for me after more than 17 days of uptime with 2-3 sleep/resume cycles every day. Another workaround with flashing disk caches also worked but it takes more time during sleep/resume cycle. Both workarounds work even with quite heavy memory usage (about 1-2GB of used swap space).

So, don't expect that you can declare the bug fixed if you don't receive new reports within a day or two.

Instead, it would be useful if you 

- build a test kernels for all current Fedora distributions, 
- publish here the download links for both 32- and 64-bit test kernels, 
- and provide some instructions how to force this bug
  (change VM settings?).

Comment 21 Stanislaw Gruszka 2010-09-25 09:38:30 UTC

(In reply to comment #20)
> You have just switched to another thread with only one bug reporter... This
> thread was started earlier, so it would be more logical to make this thread as
> main and declare bug 629158 as duplicate of this bug.

Yes, you have right. But bug 629158 was assigned to me, whereas this bug report stays without a triage, hence such duplicate direction.

> So, don't expect that you can declare the bug fixed if you don't receive new
> reports within a day or two.

Again true.

> Instead, it would be useful if you 
> 
> - build a test kernels for all current Fedora distributions, 
> - publish here the download links for both 32- and 64-bit test kernels, 

Will do on Wednesday, when come back to the office and get access to koji, if someone else did not make build earlier.

Comment 22 Stanislaw Gruszka 2010-09-29 09:39:08 UTC

(In reply to comment #20)
> - build a test kernels for all current Fedora distributions, 
> - publish here the download links for both 32- and 64-bit test kernels, 
> - and provide some instructions how to force this bug
>   (change VM settings?).

F-13 koji builds are here (for 2.6.33 and 2.6.34 kernels respectively):
http://koji.fedoraproject.org/koji/taskinfo?taskID=2496152
http://koji.fedoraproject.org/koji/taskinfo?taskID=2496181

Regarding step to reproduce, I don't know. Do the same what you did before :-) remove all quirks, use memory, suspend/resume frequently.

Comment 23 Serguei Miridonov 2010-09-29 12:25:05 UTC

Fedora 12 is also still current supported distribution. Please, backport fixes to F12 kernel-2.6.32.

Thank you.

Comment 24 Stanislaw Gruszka 2010-09-29 15:10:20 UTC

Here it is (kernel compilation pending)
https://bugzilla.redhat.com/show_bug.cgi?id=629158#c28

Note You need to log in before you can comment on or make changes to this bug.