Bug 569412 - iwl - page allocation failure
Summary: iwl - page allocation failure
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Stanislaw Gruszka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-03-01 14:17 UTC by gene c
Modified: 2010-07-22 10:21 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-07-22 10:21:54 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/log/messages (93.83 KB, text/plain)
2010-03-01 14:17 UTC, gene c
no flags Details
Messages with "page allocation failure. order:2" (382.27 KB, text/plain)
2010-03-23 21:24 UTC, Manny
no flags Details
traceback from /var/log/messages (5.99 KB, text/plain)
2010-05-15 21:22 UTC, gene c
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 551937 0 low CLOSED Frequent "page allocation failure. order:2, mode:0x4020" with 2.6.31.5-122.fc12.x86_64 on X61 BTRFS Intel 4965 AGN GM... 2021-02-22 00:41:40 UTC

Description gene c 2010-03-01 14:17:16 UTC
Created attachment 397076 [details]
/var/log/messages

Description of problem:
Under heavy load (doing backups) there was a page allocation failure from iwl_rx_allocate().

Once this occurred the network became unuseable and several other problems ensured  (nfs timeouts etc...)


Version-Release number of selected component (if applicable):
2.6.32.9-64.fc12.x86_64 (taken from koji build)


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:
I will attach /var/log/messages 




Expected results:


Additional info:

Comment 1 gene c 2010-03-01 17:18:00 UTC
Looks similar to this bug:


https://bugzilla.redhat.com/show_bug.cgi?id=551937

Comment 2 Stanislaw Gruszka 2010-03-02 12:08:16 UTC
Hi gene c

First part is the same as in Bug 551937, but there system after printing message become stable. From your logs I see some processes hangs - not good.

The question is: are you able to reproduce that bug ?

Comment 3 gene c 2010-03-02 14:11:48 UTC
The hangs are because the network was not usable - nfs mounts hung etc ...  I believe I can reproduce this by putting the iwlagn under heavy load (think needs N not just G to get enough load but I'm not sure)

I'd also be happy to try 2.6.33 where I understand some allocation bugs for iwlagn are fixed.

I cant find a koji build of 2.6.33 for F12 however .. Im willing to try f13/alpha/beta/rawhide kernel if there are no core dependencies dragged in which may be problematic for stock f12 (such as dracut) - I need this f12 system to keep running.

Do you know if there will be f12 builds of 2.6.33 ?

Comment 4 Stanislaw Gruszka 2010-03-02 15:52:54 UTC
To update to 2.6.33 just do, this will update gubby and should not break usage of older kernel.

yum --enablerepo="rawhide" update kernel

Patch solving issue in 2.6.33 is problematic, I would like to have better solution for 2.6.32. I will propose it upstream and then create 2.6.32 kernel for test. I'm not sure if my proposition would be accepted however.

Please make sure you are able to reproduce, since now nobody really was able (including me, I get this ones, but not any more), so we could not tell if patch fix the problem or not. 

Thanks.

Comment 5 gene c 2010-03-02 16:50:02 UTC
Ok - I will do my best to confirm it is both reproducible in 2.6.32 and also fixed in 2.6.33 .. - will take some time as this machine is being used for business and it took a long while of sustained beating on the network (via backups) to trigger it previously.

Comment 6 gene c 2010-03-03 13:45:49 UTC
Did as above - and got 2.6.34 which sadly craps out on boot.

Mar  3 08:35:13 lap1 kernel: =============================================
Mar  3 08:35:13 lap1 kernel: [ INFO: possible recursive locking detected ]
Mar  3 08:35:13 lap1 kernel: 2.6.34-0.4.rc0.git2.fc14.x86_64 #1
Mar  3 08:35:13 lap1 kernel: ---------------------------------------------
Mar  3 08:35:13 lap1 kernel: K99cpuspeed/2809 is trying to acquire lock:
Mar  3 08:35:13 lap1 kernel: (s_active){++++.+}, at: [<ffffffff81176f02>] sysfs_addrm_finish+0x36/0x55
Mar  3 08:35:13 lap1 kernel:
Mar  3 08:35:13 lap1 kernel: but task is already holding lock:
Mar  3 08:35:13 lap1 kernel: (s_active){++++.+}, at: [<ffffffff811770bd>] sysfs_get_active_two+0x24/0x48
Mar  3 08:35:13 lap1 kernel:
Mar  3 08:35:13 lap1 kernel: other info that might help us debug this:
Mar  3 08:35:13 lap1 kernel: 4 locks held by K99cpuspeed/2809:
Mar  3 08:35:13 lap1 kernel: #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff81175b9f>] sysfs_write_file+0x3c/0x144
Mar  3 08:35:13 lap1 kernel: #1:  (s_active){++++.+}, at: [<ffffffff811770bd>] sysfs_get_active_two+0x24/0x48
Mar  3 08:35:13 lap1 kernel: #2:  (s_active){++++.+}, at: [<ffffffff811770ca>] sysfs_get_active_two+0x31/0x48
Mar  3 08:35:13 lap1 kernel: #3:  (dbs_mutex){+.+.+.}, at: [<ffffffffa0408da2>] cpufreq_governor_dbs+0x2a0/0x352 [cpufreq_ondemand]
Mar  3 08:35:13 lap1 kernel:
Mar  3 08:35:13 lap1 kernel: stack backtrace:
Mar  3 08:35:13 lap1 kernel: Pid: 2809, comm: K99cpuspeed Not tainted 2.6.34-0.4.rc0.git2.fc14.x86_64 #1
Mar  3 08:35:13 lap1 kernel: Call Trace:
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107e94f>] __lock_acquire+0xcb5/0xd2c
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107cf48>] ? mark_held_locks+0x52/0x70
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107d329>] ? debug_check_no_locks_freed+0x12e/0x145
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107d1c8>] ? trace_hardirqs_on_caller+0x111/0x135
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107eaa2>] lock_acquire+0xdc/0x102
Mar  3 08:35:13 lap1 kernel: [<ffffffff81176f02>] ? sysfs_addrm_finish+0x36/0x55
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107c300>] ? lockdep_init_map+0x9e/0x113
Mar  3 08:35:13 lap1 kernel: [<ffffffff8117690f>] sysfs_deactivate+0x9a/0x103
Mar  3 08:35:13 lap1 kernel: [<ffffffff81176f02>] ? sysfs_addrm_finish+0x36/0x55
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107120a>] ? sched_clock_cpu+0xc3/0xce
Mar  3 08:35:13 lap1 kernel: [<ffffffff8147889c>] ? __mutex_unlock_slowpath+0x120/0x132
Mar  3 08:35:13 lap1 kernel: [<ffffffff81176f02>] sysfs_addrm_finish+0x36/0x55
Mar  3 08:35:13 lap1 kernel: [<ffffffff81175254>] sysfs_hash_and_remove+0x53/0x6a
Mar  3 08:35:13 lap1 kernel: [<ffffffff811782f9>] sysfs_remove_group+0x91/0xca
Mar  3 08:35:13 lap1 kernel: [<ffffffffa0408db6>] cpufreq_governor_dbs+0x2b4/0x352 [cpufreq_ondemand]
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107d1f9>] ? trace_hardirqs_on+0xd/0xf
Mar  3 08:35:13 lap1 kernel: [<ffffffff813a2623>] __cpufreq_governor+0x9b/0xde
Mar  3 08:35:13 lap1 kernel: [<ffffffff813a346f>] __cpufreq_set_policy+0x1ce/0x275
Mar  3 08:35:13 lap1 kernel: [<ffffffff813a3993>] store_scaling_governor+0x1a7/0x1fb
Mar  3 08:35:13 lap1 kernel: [<ffffffff813a3c9c>] ? handle_update+0x0/0x39
Mar  3 08:35:13 lap1 kernel: [<ffffffff814791b7>] ? down_write+0x7a/0x81
Mar  3 08:35:13 lap1 kernel: [<ffffffff813a327c>] store+0x61/0x86
Mar  3 08:35:13 lap1 kernel: [<ffffffff81175c6b>] sysfs_write_file+0x108/0x144
Mar  3 08:35:13 lap1 kernel: [<ffffffff8111ed9d>] vfs_write+0xae/0x10b
Mar  3 08:35:13 lap1 kernel: [<ffffffff8107d1c8>] ? trace_hardirqs_on_caller+0x111/0x135
Mar  3 08:35:13 lap1 kernel: [<ffffffff8111eeba>] sys_write+0x4a/0x6e
Mar  3 08:35:13 lap1 kernel: [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Mar  3 08:35:13 lap1 kernel: Kernel logging (proc) stopped.

Comment 7 gene c 2010-03-03 14:15:29 UTC
Ill see if i can boot 2.6.33 from f13 .. or fc13 or whatever the label is .. if I can then Ill at least have 2 kernels to compare.

I prefer install via rpm as yum from rawhide did not keep all the kernels I asked it to (via /etc/yum.conf) it stuck to the default of 3 which was kinda annoying.

Assuming 2.6.33 runs ok Ill work on doing some tests in the coming days ...

Comment 8 Stanislaw Gruszka 2010-03-03 15:13:45 UTC
Sorry, yesterday rawhide yum update kernel install 2.6.33 :-/ I'm not recommending using 2.6.34, since it is at very begging of development cycle.

I'm working on 2.6.32 memory allocation fix, but now I have some other work at higher priority, so this will take some time.

Comment 9 gene c 2010-03-04 14:26:37 UTC
I tried 2.6.33 from fc13 - sadly it craps the same way - as soon as the modeset occurs for graphics  it dies - I looked at updateing nouveau but it needs a new X server as well .. is there a way I can run 2.6.33 on f12 with nouveau driver ? 

Trace is as above:
Mar  4 08:31:54 lap1 kernel: =============================================
Mar  4 08:31:54 lap1 kernel: [ INFO: possible recursive locking detected ]
Mar  4 08:31:54 lap1 kernel: 2.6.33-4.fc13.x86_64 #1
Mar  4 08:31:54 lap1 kernel: ---------------------------------------------

... etc.

I need help knowing exactly what to install or what kernel args (nomodeset or whatever) to boot 2.6.33 on my hardware ?

thanks

gene

Comment 10 gene c 2010-03-10 14:54:29 UTC
To be even more clear - a way to run it that allows me to revert back to earlier kernels (and presumably nouveau driver) ... I saw a long thread on LKML on nouveau .. not sure where we stand on this.

I believe I can trigger this with my regular backup.

Oddly I had plugged in wire to do backup, but iwl still blew up - so I had to rmmod iwlagn to do a wired backup ... which I thought was odd.

Please let me know if you know a clean way to boot up a later kernel with nouveau that allows me to go back to 2.6.32 etc ..

Comment 11 Stanislaw Gruszka 2010-03-10 15:09:30 UTC
(In reply to comment #10)
> Please let me know if you know a clean way to boot up a later kernel with
> nouveau that allows me to go back to 2.6.32 etc ..    

Could you elaborate more, I don't understand problem here.

Comment 12 gene c 2010-03-10 15:18:05 UTC
Using nouveau - 2.6.33 kernel wont boot on fedora 12 - in order to do at present, we need to update nouveau driver which in turn requires libdrm as well as new Xorg server. The binary API for nouveau changed in a non-backward  compatible way.

If I do this - then I can no longer go back and run 2.6.32 any longer.

I believe linus had a similar complaint on LKML.

Comment 13 Stanislaw Gruszka 2010-03-10 15:34:38 UTC
No. You can run vanilla 2.6.33 on F12 with nouveau (I'm doing that :), but you can't run vanilla 2.6.32. Fedora have extra patches for 2.6.32 kernel to run with F12 Xorg, and that was about Linus complains.

You have different problem with 2.6.33, it is this "INFO: possible recursive locking detected". In this week, I'll look at this bug more closely and provide you working 2.6.33 or 2.6.32 kernel to test. Stay tuned :)

Comment 14 Stanislaw Gruszka 2010-03-11 15:20:34 UTC
Ok, here is the kernel:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2045788

Would be nice if you can assure that bug is reproducible fedora released kernel and not reproducible on kernel from above link.

Comment 15 gene c 2010-03-16 21:12:40 UTC
I have not had time yet to test this ... I will get to it soon as I can. I have also found an NFS crash (triggered by load over nfs) in same kernel

Mar 13 22:13:09 lap1 kernel: [<ffffffff8107bb23>] ? timekeeping_get_ns+0x1b/0x3d
Mar 13 22:13:09 lap1 kernel: [<ffffffffa04b2517>] ? nfs_wait_bit_uninterruptible+0x0/0x12 [nfs]


I need to file separate bug unless this is known issue

By the way now that 2.6.33.1 is out in stable, will this kernel be made avail to f12 do you know ?

gene/

Comment 16 Stanislaw Gruszka 2010-03-17 11:13:55 UTC
(In reply to comment #15)
> I have not had time yet to test this ... I will get to it soon as I can. I have
> also found an NFS crash (triggered by load over nfs) in same kernel
[snip]
> I need to file separate bug unless this is known issue

Please do, if this is know issue it will be closed by duplicate.
 
> By the way now that 2.6.33.1 is out in stable, will this kernel be made avail
> to f12 do you know ?

No, as far as I know. And F13 will use 2.6.34.

Comment 17 Manny 2010-03-18 06:23:56 UTC
My system seems to be having the same problem that is discussed in this bug. Below is a representative stack trace:

Mar 14 14:54:55 localhost kernel: java: page allocation failure. order:2, mode:0x4020
Mar 14 14:54:57 localhost kernel: Pid: 4143, comm: java Tainted: G        W  2.6.32.9-70.fc12.x86_64 #1
Mar 14 14:54:57 localhost kernel: Call Trace:
Mar 14 14:54:57 localhost kernel: <IRQ>  [<ffffffff810dc7fc>] __alloc_pages_nodemask+0x5ad/0x631
Mar 14 14:54:57 localhost kernel: [<ffffffff8110c288>] alloc_pages_node+0x48/0x4a
Mar 14 14:54:57 localhost kernel: [<ffffffff8110c2b4>] kmalloc_large_node+0x2a/0x67
Mar 14 14:54:57 localhost kernel: [<ffffffff8110d5c7>] __kmalloc_node_track_caller+0x31/0x125
Mar 14 14:54:57 localhost kernel: [<ffffffffa01f607c>] ? iwl_rx_allocate+0x90/0x2f4 [iwlcore]
Mar 14 14:54:57 localhost kernel: [<ffffffff813abf16>] __alloc_skb+0x80/0x170
Mar 14 14:54:57 localhost kernel: [<ffffffffa01f607c>] iwl_rx_allocate+0x90/0x2f4 [iwlcore]
Mar 14 14:54:57 localhost kernel: [<ffffffff81234652>] ? is_swiotlb_buffer+0x2e/0x3b
Mar 14 14:54:57 localhost kernel: [<ffffffffa01f76c8>] iwl_rx_replenish_now+0x1b/0x27 [iwlcore]
Mar 14 14:54:57 localhost kernel: [<ffffffffa0221565>] iwl_rx_handle+0x3aa/0x3c3 [iwlagn]
Mar 14 14:54:57 localhost kernel: [<ffffffff81017d45>] ? sched_clock+0x9/0xd
Mar 14 14:54:57 localhost kernel: [<ffffffff81079325>] ? sched_clock_local+0x1c/0x82
Mar 14 14:54:57 localhost kernel: [<ffffffffa0224433>] iwl_irq_tasklet_legacy+0x50c/0x76e [iwlagn]
Mar 14 14:54:57 localhost kernel: [<ffffffffa01f0e17>] ? __iwl_read32.clone.2+0xaa/0xb9 [iwlcore]
Mar 14 14:54:57 localhost kernel: [<ffffffffa01eeef8>] ? tasklet_schedule+0x1a/0x1c [iwlcore]
Mar 14 14:54:57 localhost kernel: [<ffffffff8105bf7b>] tasklet_action+0x85/0xe4
Mar 14 14:54:57 localhost kernel: [<ffffffff8105d964>] __do_softirq+0xe5/0x1a9
Mar 14 14:54:57 localhost kernel: [<ffffffff810acd61>] ? handle_IRQ_event+0x60/0x121
Mar 14 14:54:57 localhost kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30
Mar 14 14:54:57 localhost kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86
Mar 14 14:54:57 localhost kernel: [<ffffffff8105d7a2>] irq_exit+0x3b/0x7d
Mar 14 14:54:57 localhost kernel: [<ffffffff81459c9d>] do_IRQ+0xa5/0xbc
Mar 14 14:54:57 localhost kernel: [<ffffffff81012693>] ret_from_intr+0x0/0x11
Mar 14 14:54:57 localhost kernel: <EOI>  [<ffffffff8145442f>] ? thread_return+0x70/0xdb
Mar 14 14:54:57 localhost kernel: [<ffffffff8145418b>] ? schedule+0x601/0x835
Mar 14 14:54:57 localhost kernel: [<ffffffff8101268e>] ? common_interrupt+0xe/0x13
Mar 14 14:54:57 localhost kernel: [<ffffffff81017e73>] ? native_sched_clock+0x2d/0x5f
Mar 14 14:54:57 localhost kernel: [<ffffffff810a8fe5>] ? audit_syscall_entry+0x11e/0x14a
Mar 14 14:54:57 localhost kernel: [<ffffffff81051b71>] ? sys_sched_yield+0x51/0x57
Mar 14 14:54:57 localhost kernel: [<ffffffff81011d32>] ? system_call_fastpath+0x16/0x1b

I am able to easily reproduce the hang when the system is under memory pressure and starts to swap to disk. I create this condition by running the command "memhog 4g". Once swapping starts, the system will be left unresponsive within seconds.

I have been able to reproduce it on the following kernels:
kernel-2.6.32.9-70.fc12.x86_64
kernel-2.6.33-10.fc13.x86_64

If start the system in single user mode and run memhog, the system will not hang. If I "modprobe iwlagn", and then run memhog, the system will not hang. If I then start NetworkManager in single user mode and run memhog, the system hangs.

I have installed the suggested kernel-2.6.32.8-48.rc2.sg_slab.fc12.x86_64.rpm and I am no longer able to reproduce the problem. Once the system starts to swap it does become unresponsive as the swapping is occurring, but the swapping does finish and the system becomes responsive again.

Can others reproduce the hang with the memhog command?

Comment 18 Stanislaw Gruszka 2010-03-18 07:45:57 UTC
I think, system hangs are due to iwlwifi driver or firmware, which is not robust enough to survive allocation failures in some cases. Since SLAB make allocation failures gone system does not hung.

Comment 19 Manny 2010-03-23 21:24:20 UTC
Created attachment 402144 [details]
Messages with "page allocation failure. order:2"

I think that I spoke to soon with regard to my last post. I was able to get page allocation failures with 2.6.32.8-48.rc2.sg_slab.fc12.x86_64 as well. The system will hang, and take about 15 minutes to become responsive again.

Comment 20 Robert de Rooy 2010-03-26 11:43:12 UTC
FYI, there is a separate bug report for the recursive locking error bz571969

Comment 21 Stanislaw Gruszka 2010-03-29 13:02:35 UTC
(In reply to comment #18)
> If start the system in single user mode and run memhog, the system will not
> hang. If I "modprobe iwlagn", and then run memhog, the system will not hang. If
> I then start NetworkManager in single user mode and run memhog, the system
> hangs.
[snip]
> Can others reproduce the hang with the memhog command?    

I'm trying to reproduce. Steps to reproduce are like below?

Add S to kernel parameters in grub.conf, after boot do:
# memhog 4g
# modprobe iwlagn
# memhog 4g
# /etc/init.d/NetworkManager start

For me: this give NetworkManager calltrace. 

And additional question, what shows your "free -m"?

Comment 22 Stanislaw Gruszka 2010-03-31 08:11:30 UTC
Hi Manny

Could you please answer the questions from Comment 21 ? Thanks.

Comment 23 Manny 2010-04-02 04:23:35 UTC
I apologize for the delay. This has been a busy week...

I have tried to reproduce what I did before and now too I get a networkmanager call trace as well. In any case, I can still reproduce the issue by starting a standard graphical session as a regular user. My network will automatically connect to a wireless network. I then run memhog 4g as a normal user. This results in the system getting stuck.

It does not matter if I am connect to a wireless N or wireless G network at either 2.5 or 5 ghz. I have found that the system is not permanently frozen, just stuck for an extended period. I believe that the higher the amount of memory pressure (memhog 4g vs memhog 5g), the longer the system remains unusable. 

Below is my free -m output:
             total       used       free     shared    buffers     cached
Mem:          3891       3788        102          0         35       1158
-/+ buffers/cache:       2594       1296
Swap:         5887        228       5659

I should be able to answer quicker next time. Sorry for the delay...

Comment 24 Stanislaw Gruszka 2010-04-02 11:41:25 UTC
(In reply to comment #23)
> I have tried to reproduce what I did before and now too I get a networkmanager
> call trace as well. 

Hmm, what changed - NetworkManager was updated? Or maybe you did something different than before, and now just suggested by my steps to reproduce get this NetworkManager calltrace :) ?
 
> It does not matter if I am connect to a wireless N or wireless G network at
> either 2.5 or 5 ghz. I have found that the system is not permanently frozen,
> just stuck for an extended period. I believe that the higher the amount of
> memory pressure (memhog 4g vs memhog 5g), the longer the system remains
> unusable. 

I have this too, but there are no allocations errors. This behaviour (stuck for a while) must be some other bug in virtual memory management :(

> I should be able to answer quicker next time. Sorry for the delay...    

No problem, we (fedora developers) do not respond in timely manner as well.

Comment 25 Stanislaw Gruszka 2010-04-02 11:51:46 UTC
(In reply to comment #16)
> > By the way now that 2.6.33.1 is out in stable, will this kernel be made avail
> > to f12 do you know ?
> 
> No, as far as I know. And F13 will use 2.6.34.    

Just to clarify. F-12 will stay with 2.6.32 and F13 is using 2.6.33 as far (I thought it will be 2.6.34 but seems plans changed).

Comment 26 Manny 2010-04-06 18:25:56 UTC
I took the plunge and wiped my system for F13. Under memory pressure the system remains unresponsive for an extended period of time. I wouldn't say that the system is frozen, because it does come back to life after an extended period of time. This behavior is identical to F12. Once the system recovers, the following appears in the logs:

INFO: task kdmflush:460 blocked for more than 120 seconds.
INFO: task jbd2/dm-1-8:470 blocked for more than 120 seconds.
INFO: task rpcbind:1531 blocked for more than 120 seconds.
INFO: task NetworkManager:1556 blocked for more than 120 seconds.
INFO: task hald:1674 blocked for more than 120 seconds.
INFO: task hald-addon-acpi:1723 blocked for more than 120 seconds.
INFO: task ntpd:1800 blocked for more than 120 seconds.
INFO: task abrtd:1837 blocked for more than 120 seconds.
INFO: task crond:1845 blocked for more than 120 seconds.
INFO: task Xorg:1908 blocked for more than 120 seconds.

along with associated stack traces.

I did not get any iwlagn page allocation failures this time, but it seems to be related somehow.

I was using kernel-2.6.33.1-19.fc13.x86_64 for this. The problem seems to be that the virtual memory management is getting bogged down somehow.

Could the iwlagn page allocation failures be due to the pages taking excessive time to be allocated? Is there a means in the page request mechanism for iwlagn that results in allocation failures if the pages take to long? Perhaps a side effect of some sort of realtime process scheduling? I am not too familiar with the inner workings of the kernel.

Comment 27 gene c 2010-05-15 21:21:43 UTC
Update - I have this with 2.6.32.12-115 (and earlier of course).

rdiff-backup never on /home never fails to induce this:

trace attached.

(As always once this occurs machine is essentially unusable)

Comment 28 gene c 2010-05-15 21:22:49 UTC
Created attachment 414289 [details]
traceback from /var/log/messages

Comment 29 gene c 2010-05-16 13:14:08 UTC
Sorry took me a while - is there a 2.6.34.x I could test somewhere (on f12) ?

I will look into installing f13 as a dual boot when I can if there isnt a convenient later kernel to test .. this machine is important so I need to make sure I keep f12 running as well ... even if all backups are done on wire .. bright side is they are a lot faster.

Comment 30 Stanislaw Gruszka 2010-05-26 15:30:39 UTC
(In reply to comment #29)
> Sorry took me a while - is there a 2.6.34.x I could test somewhere (on f12) ?

No, F-12 will stay with 2.6.32 . On F-13 we have currently 2.6.33 but switch to 2.6.34 is possible.

> I will look into installing f13 as a dual boot when I can if there isnt a
> convenient later kernel to test .. this machine is important so I need to make
> sure I keep f12 running as well ... even if all backups are done on wire ..
> bright side is they are a lot faster.    

Yeh, if wireless not work and there is possibility to to have wired connection,that's good idea. 

Does your last allocation failure from comment 28 make something wrong with system or network connection, or driver recover from that situation by itself?

Comment 31 gene c 2010-05-26 17:10:21 UTC
The driver does not recover - a reboot is required and the machine is not in a good state .. fiddling with system makes it eventually hang and a power reset is required ... 

I will try f13 this weekend if possible.

Comment 32 gene c 2010-06-21 03:07:09 UTC
Ok I have not managed to trigger this on F13 with this kernel:

2.6.33.5-124.fc13.x86_64

So far so good ...

Comment 33 Stanislaw Gruszka 2010-07-22 10:21:54 UTC
Based on above comment I'm closing bug.


Note You need to log in before you can comment on or make changes to this bug.