Bug 746538

Summary:	block devices not flushed predictably
Product:	[Fedora] Fedora	Reporter:	Pádraig Brady <p>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	15	CC:	alevy, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-07-11 17:49:08 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pádraig Brady 2011-10-17 00:39:20 UTC

I used dd as follows to write a 700MB image to a USB stick

dd bs=1M if=file.img of=/dev/sdc

Sometimes this would complete at 4.5MB/s

Sometimes dd would return much earlier (saying it wrote at 15MB/s),
while the led on the USB stick remained flashing.
A subsequent `eject -s /dev/sdc` did block until the data was written.

I also noticed that in the "4.5MB/s" case above, there
were stalls of around 20s where no data was being written
to the stick at all.

Kernel version is 2.6.40.4-5.fc15.x86_64

Comment 1 Pádraig Brady 2011-10-17 02:07:30 UTC

Gah, I just tried another usb stick there, and no my whole desktop is locked up

I noticed this in dmesg while the dd process was stuck in D state:

"iwlagn 0000:09:00.0: Failed to alloc_pages with GFP_ATOMIC. Only 0 free buffers remaining"

So looks like some mem entity is exhausted in the kernel?
I could switch to a different virtual terminal.
I could not kill -9 the dd process.
Upon removing the USB stick, everything "unstuck" again.

Comment 2 Chuck Ebbert 2011-10-18 02:31:51 UTC

You should really use oflag=direct when doing dd to a block device like that.

Comment 3 Pádraig Brady 2011-11-02 12:30:46 UTC

I added oflag=direct which "fixed" things.
  I could ^C the dd process with immediate effect
  No stalls (mem exhaustion)
I'm guessing it's just bypassing some broken kernel logic,
though I'm struggling to see what logic could be invoked by the kernel
for a process that doesn't use O_DIRECT but is writing directly to a block device.
Perhaps the kernel is trying to schedule the writes to the device?
Though that seems a bit daft given the device isn't used by anything else.
Perhaps the kernel is trying to cache the writes, but that seems daft too.

Anyway here is a stack trace that happens during the stalls.

[711520.363875] swapper: page allocation failure: order:1, mode:0x4020
[711520.363894] Pid: 0, comm: swapper Not tainted 2.6.40.4-5.fc15.x86_64 #1
[711520.363897] Call Trace:
[711520.363898]  <IRQ>  [<ffffffff810dfe56>] warn_alloc_failed+0x111/0x13d
[711520.363911]  [<ffffffff810e1c20>] ? __free_pages_ok+0xc9/0xd8
[711520.363916]  [<ffffffff810e2e09>] __alloc_pages_nodemask+0x6c5/0x736
[711520.363921]  [<ffffffff8110dc72>] alloc_pages_current+0xbe/0xd8
[711520.363926]  [<ffffffff810834ff>] ? arch_local_irq_save+0x15/0x1b
[711520.363947]  [<ffffffffa02a764f>] iwlagn_rx_allocate+0x91/0x268 [iwlagn]
[711520.363956]  [<ffffffffa02a79de>] iwlagn_rx_replenish_now+0x18/0x24 [iwlagn]
[711520.363963]  [<ffffffffa029bf85>] iwl_irq_tasklet+0x911/0xad8 [iwlagn]
[711520.363968]  [<ffffffff8105a0b3>] tasklet_action+0x7f/0xd2
[711520.363972]  [<ffffffff8105a954>] __do_softirq+0xc9/0x1b5
[711520.363977]  [<ffffffff8148804e>] ? _raw_spin_lock+0xe/0x10
[711520.363981]  [<ffffffff8148fa5c>] call_softirq+0x1c/0x30
[711520.363984]  [<ffffffff8100abb9>] do_softirq+0x46/0x81
[711520.363987]  [<ffffffff8105ac36>] irq_exit+0x57/0xb1
[711520.363990]  [<ffffffff814902de>] do_IRQ+0x8e/0xa5
[711520.363994]  [<ffffffff81488493>] common_interrupt+0x13/0x13
[711520.363996]  <EOI>  [<ffffffff81072e8c>] ? enqueue_hrtimer+0x3f/0x7a
[711520.364004]  [<ffffffff812841b0>] ? intel_idle+0xd8/0x100
[711520.364007]  [<ffffffff81284192>] ? intel_idle+0xba/0x100
[711520.364012]  [<ffffffff813af949>] cpuidle_idle_call+0xd7/0x168
[711520.364016]  [<ffffffff81008307>] cpu_idle+0xa5/0xdf
[711520.364020]  [<ffffffff8146741e>] rest_init+0x72/0x74
[711520.364023]  [<ffffffff81b66b8b>] start_kernel+0x3ca/0x3d5
[711520.364027]  [<ffffffff81b662c4>] x86_64_start_reservations+0xaf/0xb3
[711520.364030]  [<ffffffff81b66140>] ? early_idt_handlers+0x140/0x140
[711520.364033]  [<ffffffff81b663ca>] x86_64_start_kernel+0x102/0x111
[711520.364037] iwlagn 0000:09:00.0: Failed to alloc_pages with GFP_ATOMIC. Only 0 free buffers remaining.

Comment 4 Josh Boyer 2012-06-07 14:59:13 UTC

Are you still seeing this stack trace with 2.6.43/3.3?

Comment 5 Alon Levy 2012-07-11 14:07:56 UTC

I don't have the bug reported in the summary, but I do have almost the same backtrace, reporting it here instead of creating a new bug since I don't actually have a problem other then this trace. If a new bug would still be helpful please let me know and I'll do it.

I'm getting the same error in my logs, no locking involved, everything seems to work just fine. F17, 3.3.4-5.fc17.x86_64, Thinkpad T510:

[507785.954572] Pid: 0, comm: swapper/0 Tainted: G        W    3.3.4-5.fc17.x86_64 #1
[507785.954575] Call Trace:
[507785.954577]  <IRQ>  [<ffffffff811236cb>] warn_alloc_failed+0xeb/0x150
[507785.954595]  [<ffffffff81130729>] ? wakeup_kswapd+0x69/0x140
[507785.954599]  [<ffffffff81127173>] __alloc_pages_nodemask+0x663/0x920
[507785.954649]  [<ffffffffa049aa60>] ? ieee80211_rx+0x310/0xa20 [mac80211]
[507785.954655]  [<ffffffff81086df6>] ? ttwu_do_activate.constprop.80+0x66/0x70
[507785.954660]  [<ffffffff8115eed4>] alloc_pages_current+0xa4/0x110
[507785.954677]  [<ffffffffa053eb70>] iwlagn_rx_allocate+0xf0/0x270 [iwlwifi]
[507785.954689]  [<ffffffffa05401ad>] iwl_irq_tasklet+0xc2d/0xd30 [iwlwifi]
[507785.954694]  [<ffffffff8105d6e3>] tasklet_action+0x73/0x120
[507785.954697]  [<ffffffff8105dc90>] __do_softirq+0xc0/0x1e0
[507785.954702]  [<ffffffff815f4d9c>] call_softirq+0x1c/0x30
[507785.954706]  [<ffffffff81015465>] do_softirq+0x75/0xb0
[507785.954709]  [<ffffffff8105e055>] irq_exit+0xb5/0xc0
[507785.954712]  [<ffffffff815f5603>] do_IRQ+0x63/0xe0
[507785.954716]  [<ffffffff815ebc2e>] common_interrupt+0x6e/0x6e
[507785.954718]  <EOI>  [<ffffffff813181d6>] ? intel_idle+0xe6/0x150
[507785.954725]  [<ffffffff813181b7>] ? intel_idle+0xc7/0x150
[507785.954732]  [<ffffffff8149ddd7>] cpuidle_idle_call+0xb7/0x230
[507785.954735]  [<ffffffff8101220f>] cpu_idle+0xaf/0x120
[507785.954740]  [<ffffffff815c9eae>] rest_init+0x72/0x74
[507785.954746]  [<ffffffff81aedc00>] start_kernel+0x3ba/0x3c5
[507785.954750]  [<ffffffff81aed346>] x86_64_start_reservations+0x131/0x135
[507785.954753]  [<ffffffff81aed44a>] x86_64_start_kernel+0x100/0x10f
[507785.954758] iwlwifi 0000:03:00.0: Failed to alloc_pages with GFP_ATOMIC.Only 0 free buffers remaining.

Comment 6 Josh Boyer 2012-07-11 14:12:41 UTC

(In reply to comment #5)
> I don't have the bug reported in the summary, but I do have almost the same
> backtrace, reporting it here instead of creating a new bug since I don't
> actually have a problem other then this trace. If a new bug would still be
> helpful please let me know and I'll do it.
> 
> I'm getting the same error in my logs, no locking involved, everything seems
> to work just fine. F17, 3.3.4-5.fc17.x86_64, Thinkpad T510:

Basically the kernel ran out of atomic memory and the wireless driver needed to do an allocation.  Your kernel is already tainted with some other warning though, so something else happened before this.

If you can recreate this using the 3.4.4-5.fc17 update that went out a couple of days ago please open a new bug with the full dmesg and what was happening on the machine at the time.

Comment 7 Alon Levy 2012-07-11 15:03:01 UTC

OK, I'll give it a go, thanks for the fast response. Unrelated: Just randomly stumbled on bug 822877, hope it isn't still the case of high idle load.

Comment 8 Josh Boyer 2012-07-11 17:49:08 UTC

Fedora 15 has reached it's end of life as of June 26, 2012.  As a result, we will not be fixing any remaining bugs found in Fedora 15.

In the event that you have upgraded to a newer release and the bug you reported is still present, please reopen the bug and set the version field to the newest release you have encountered the issue with.  Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered.

Thank you for taking the time to file a report.  We hope newer versions of Fedora suit your needs.