Description of problem: To make consistent backups of user maildirs I am using the following scenario: lvcreate -L 1G -s blabla && mount -o ro && backup && umount && lvremove -f I am running this procedure 4 times a day (every 6 hours). After last kernel update (kernel 2.6.9-1.6_FC2.i686) I started to get following traces: Dec 23 00:00:02 ns kernel: lvcreate: page allocation failure. order:0, mode:0xd0 Dec 23 00:00:02 ns kernel: [<02147890>] __alloc_pages+0x2bd/0x2db Dec 23 00:00:02 ns kernel: [<42106533>] alloc_pl+0x27/0x3d [dm_mod] Dec 23 00:00:02 ns kernel: [<421067c6>] client_alloc_pages+0x15/0x47 [dm_mod] Dec 23 00:00:02 ns kernel: [<421077db>] kcopyd_client_create+0x74/0xb4 [dm_mod] Dec 23 00:00:02 ns kernel: [<4214bd7c>] dm_create_persistent+0x94/0xf4 [dm_snapshot] Dec 23 00:00:02 ns kernel: [<4214a6f0>] snapshot_ctr+0x246/0x2cf [dm_snapshot] Dec 23 00:00:02 ns kernel: [<421030cd>] dm_table_add_target+0x11c/0x189 [dm_mod] Dec 23 00:00:02 ns kernel: [<42105143>] populate_table+0x8a/0xaf [dm_mod] Dec 23 00:00:02 ns kernel: [<4210519f>] table_load+0x37/0xf6 [dm_mod] Dec 23 00:00:02 ns kernel: [<4210583d>] ctl_ioctl+0xcd/0x10f [dm_mod] Dec 23 00:00:02 ns kernel: [<42105168>] table_load+0x0/0xf6 [dm_mod] Dec 23 00:00:02 ns kernel: [<0217b0e2>] sys_ioctl+0x29a/0x33c Dec 23 00:00:02 ns kernel: [<02108a98>] do_IRQ+0x286/0x290 Dec 23 00:00:02 ns kernel: device-mapper: : Could not create kcopyd client After which system starts misbehaving badly - i.e. all processes working with snapshot source stuck forever in D state or even all harddisk access is halted. This usually happens after 4-5 days of normal work. Any suggestions? Version-Release number of selected component (if applicable): kernel 2.6.9-1.6_FC2.i686 How reproducible: Well, reproducible Steps to Reproduce: 1. Create and remove snapshot enough times
is this still a problem with todays 2.6.10 updates ?
I don't know yet. It's a production server, I will try to update at today evening and check.
This is a known shortcoming of device-mapper snapshots. Work is underway to address it.
Each snapshot requires a certain amount of physical kernel memory to be available. If there isn't enough free kernel memory at the instant you create or activate a snapshot, you get an error like that. Recent upstream kernel changes have actually made the problem worse. To recover without rebooting, you need to use 'dmsetup' to reset the states of the devices in the right sequence, removing the snapshot in the process. I hope to have the code fixed in about a month's time. [It involves rewriting some complicated code, unfortunately.]
*** This bug has been marked as a duplicate of 132057 ***
(In reply to comment #4) > To recover without rebooting, you need to use 'dmsetup' to reset the > states of the devices in the right sequence, removing the snapshot in > the process. Please could you post a simple example (for an LVM wedged with one snapshot in use)? This would really help me while we're waiting for the real fix! Thanks. (The man page for dmsetup assumes far more low-level knowledge than I have).
It depends on how it failed. Post the output of 'dmsetup info -c' here.
Too late to do that, I'm afraid. I blundered around for several hours and eventually resuscitated the machine, without understanding how or why. But when the problem first arose, I noted that vgob01-lv_home (the source of the snapshot) was INACTIVE and suspended, whereas the corresponding snapshot, vgob01-hdup_snapshot, was INACTIVE and available. I managed to regain use of the machine with dmsetup resume vgob01-lv_home and lvremove /dev/vgob01/hdup_snapshot. But a reboot then failed with: "device-mapper: : unknown target type ... Couldn't load device 'vgob01-hdup_snapshot' " If that's not enough info, I'll post the requested output next time the problem happens (seems to be once every few weeks at present). Thanks for your help.
Created attachment 113461 [details] output of dmsetup info -c as requested in comment 7. This was run from a virtual console in multi-user mode before attempting any recovery. If this leads to a dmsetup "recipe" I'd be grateful. I can post a log of my present (embarrassingly ignorant, long-winded, and probably dangerous) recovery procedure, if that would help.
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.