143650 – order 0 page allocation failure during lvcreate

Bug 143650 - order 0 page allocation failure during lvcreate

Summary: order 0 page allocation failure during lvcreate

Keywords:
Status:	CLOSED DUPLICATE of bug 132057
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	2
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Alasdair Kergon
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-12-23 10:13 UTC by Paul P Komkoff Jr
Modified:	2007-11-30 22:10 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-02-21 19:07:47 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
output of dmsetup info -c (3.92 KB, application/octet-stream) 2005-04-21 10:01 UTC, Graham King	no flags	Details
View All

Description Paul P Komkoff Jr 2004-12-23 10:13:50 UTC

Description of problem:
To make consistent backups of user maildirs I am using the following 
scenario: lvcreate -L 1G -s blabla && mount -o ro && backup && umount 
&& lvremove -f
I am running this procedure 4 times a day (every 6 hours). After last 
kernel update (kernel 2.6.9-1.6_FC2.i686) I started to get following 
traces:
Dec 23 00:00:02 ns kernel: lvcreate: page allocation failure. 
order:0, mode:0xd0
Dec 23 00:00:02 ns kernel: [<02147890>] __alloc_pages+0x2bd/0x2db
Dec 23 00:00:02 ns kernel: [<42106533>] alloc_pl+0x27/0x3d [dm_mod]
Dec 23 00:00:02 ns kernel: [<421067c6>] client_alloc_pages+0x15/0x47 
[dm_mod]
Dec 23 00:00:02 ns kernel: [<421077db>] 
kcopyd_client_create+0x74/0xb4 [dm_mod]
Dec 23 00:00:02 ns kernel: [<4214bd7c>] 
dm_create_persistent+0x94/0xf4 [dm_snapshot]
Dec 23 00:00:02 ns kernel: [<4214a6f0>] snapshot_ctr+0x246/0x2cf 
[dm_snapshot]
Dec 23 00:00:02 ns kernel: [<421030cd>] 
dm_table_add_target+0x11c/0x189 [dm_mod]
Dec 23 00:00:02 ns kernel: [<42105143>] populate_table+0x8a/0xaf 
[dm_mod]
Dec 23 00:00:02 ns kernel: [<4210519f>] table_load+0x37/0xf6 [dm_mod]
Dec 23 00:00:02 ns kernel: [<4210583d>] ctl_ioctl+0xcd/0x10f [dm_mod]
Dec 23 00:00:02 ns kernel: [<42105168>] table_load+0x0/0xf6 [dm_mod]
Dec 23 00:00:02 ns kernel: [<0217b0e2>] sys_ioctl+0x29a/0x33c
Dec 23 00:00:02 ns kernel: [<02108a98>] do_IRQ+0x286/0x290
Dec 23 00:00:02 ns kernel: device-mapper: : Could not create kcopyd 
client

After which system starts misbehaving badly - i.e. all processes 
working with snapshot source stuck forever in D state or even all 
harddisk access is halted.

This usually happens after 4-5 days of normal work.
Any suggestions?

Version-Release number of selected component (if applicable):
kernel 2.6.9-1.6_FC2.i686

How reproducible:
Well, reproducible

Steps to Reproduce:
1. Create and remove snapshot enough times

Comment 1 Dave Jones 2005-01-11 04:18:16 UTC

is this still a problem with todays 2.6.10 updates ?

Comment 2 Paul P Komkoff Jr 2005-01-11 09:58:21 UTC

I don't know yet. It's a production server, I will try to update at today
evening and check.

Comment 3 Alasdair Kergon 2005-01-18 16:04:47 UTC

This is a known shortcoming of device-mapper snapshots.
Work is underway to address it.

Comment 4 Alasdair Kergon 2005-01-18 16:15:51 UTC

Each snapshot requires a certain amount of physical kernel memory to
be available.  If there isn't enough free kernel memory at the instant
you create or activate a snapshot, you get an error like that.

Recent upstream kernel changes have actually made the problem worse.

To recover without rebooting, you need to use 'dmsetup' to reset the
states of the devices in the right sequence, removing the snapshot in
the process.

I hope to have the code fixed in about a month's time. [It involves
rewriting some complicated code, unfortunately.]

Comment 5 Alasdair Kergon 2005-01-18 16:17:58 UTC


*** This bug has been marked as a duplicate of 132057 ***

Comment 6 Graham King 2005-04-13 15:21:04 UTC

(In reply to comment #4)
> To recover without rebooting, you need to use 'dmsetup' to reset the
> states of the devices in the right sequence, removing the snapshot in
> the process.

Please could you post a simple example (for an LVM wedged with one snapshot in
use)?  This would really help me while we're waiting for the real fix!  Thanks.
(The man page for dmsetup assumes far more low-level knowledge than I have).

Comment 7 Alasdair Kergon 2005-04-13 15:30:30 UTC

It depends on how it failed.

Post the output of 'dmsetup info -c' here.

Comment 8 Graham King 2005-04-13 16:11:09 UTC

Too late to do that, I'm afraid.  I blundered around for several hours and
eventually resuscitated the machine, without understanding how or why.
But when the problem first arose, I noted that vgob01-lv_home (the source of the
snapshot) was INACTIVE and suspended, whereas the corresponding snapshot,
vgob01-hdup_snapshot, was INACTIVE and available.

I managed to regain use of the machine with dmsetup resume vgob01-lv_home and
lvremove /dev/vgob01/hdup_snapshot.  But a reboot then failed with:
"device-mapper: : unknown target type
...
Couldn't load device 'vgob01-hdup_snapshot' "

If that's not enough info, I'll post the requested output next time the problem
happens (seems to be once every few weeks at present).  Thanks for your help.

Comment 9 Graham King 2005-04-21 10:01:27 UTC

Created attachment 113461 [details]
output of dmsetup info -c

as requested in comment 7.  This was run from a virtual console in multi-user
mode before attempting any recovery.
If this leads to a dmsetup "recipe" I'd be grateful.  I can post a log of my
present (embarrassingly ignorant, long-winded, and probably dangerous) recovery
procedure, if that would help.

Comment 10 Red Hat Bugzilla 2006-02-21 19:07:47 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.