Description of problem:
To make consistent backups of user maildirs I am using the following
scenario: lvcreate -L 1G -s blabla && mount -o ro && backup && umount
&& lvremove -f
I am running this procedure 4 times a day (every 6 hours). After last
kernel update (kernel 2.6.9-1.6_FC2.i686) I started to get following
Dec 23 00:00:02 ns kernel: lvcreate: page allocation failure.
Dec 23 00:00:02 ns kernel: [<02147890>] __alloc_pages+0x2bd/0x2db
Dec 23 00:00:02 ns kernel: [<42106533>] alloc_pl+0x27/0x3d [dm_mod]
Dec 23 00:00:02 ns kernel: [<421067c6>] client_alloc_pages+0x15/0x47
Dec 23 00:00:02 ns kernel: [<421077db>]
Dec 23 00:00:02 ns kernel: [<4214bd7c>]
Dec 23 00:00:02 ns kernel: [<4214a6f0>] snapshot_ctr+0x246/0x2cf
Dec 23 00:00:02 ns kernel: [<421030cd>]
Dec 23 00:00:02 ns kernel: [<42105143>] populate_table+0x8a/0xaf
Dec 23 00:00:02 ns kernel: [<4210519f>] table_load+0x37/0xf6 [dm_mod]
Dec 23 00:00:02 ns kernel: [<4210583d>] ctl_ioctl+0xcd/0x10f [dm_mod]
Dec 23 00:00:02 ns kernel: [<42105168>] table_load+0x0/0xf6 [dm_mod]
Dec 23 00:00:02 ns kernel: [<0217b0e2>] sys_ioctl+0x29a/0x33c
Dec 23 00:00:02 ns kernel: [<02108a98>] do_IRQ+0x286/0x290
Dec 23 00:00:02 ns kernel: device-mapper: : Could not create kcopyd
After which system starts misbehaving badly - i.e. all processes
working with snapshot source stuck forever in D state or even all
harddisk access is halted.
This usually happens after 4-5 days of normal work.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create and remove snapshot enough times
is this still a problem with todays 2.6.10 updates ?
I don't know yet. It's a production server, I will try to update at today
evening and check.
This is a known shortcoming of device-mapper snapshots.
Work is underway to address it.
Each snapshot requires a certain amount of physical kernel memory to
be available. If there isn't enough free kernel memory at the instant
you create or activate a snapshot, you get an error like that.
Recent upstream kernel changes have actually made the problem worse.
To recover without rebooting, you need to use 'dmsetup' to reset the
states of the devices in the right sequence, removing the snapshot in
I hope to have the code fixed in about a month's time. [It involves
rewriting some complicated code, unfortunately.]
*** This bug has been marked as a duplicate of 132057 ***
(In reply to comment #4)
> To recover without rebooting, you need to use 'dmsetup' to reset the
> states of the devices in the right sequence, removing the snapshot in
> the process.
Please could you post a simple example (for an LVM wedged with one snapshot in
use)? This would really help me while we're waiting for the real fix! Thanks.
(The man page for dmsetup assumes far more low-level knowledge than I have).
It depends on how it failed.
Post the output of 'dmsetup info -c' here.
Too late to do that, I'm afraid. I blundered around for several hours and
eventually resuscitated the machine, without understanding how or why.
But when the problem first arose, I noted that vgob01-lv_home (the source of the
snapshot) was INACTIVE and suspended, whereas the corresponding snapshot,
vgob01-hdup_snapshot, was INACTIVE and available.
I managed to regain use of the machine with dmsetup resume vgob01-lv_home and
lvremove /dev/vgob01/hdup_snapshot. But a reboot then failed with:
"device-mapper: : unknown target type
Couldn't load device 'vgob01-hdup_snapshot' "
If that's not enough info, I'll post the requested output next time the problem
happens (seems to be once every few weeks at present). Thanks for your help.
Created attachment 113461 [details]
output of dmsetup info -c
as requested in comment 7. This was run from a virtual console in multi-user
mode before attempting any recovery.
If this leads to a dmsetup "recipe" I'd be grateful. I can post a log of my
present (embarrassingly ignorant, long-winded, and probably dangerous) recovery
procedure, if that would help.
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.