Bug 132057
Summary: | page allocation failure on snapshot lvcreate | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alexandre Oliva <oliva> |
Component: | device-mapper-obsolete | Assignee: | Alasdair Kergon <agk> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rawhide | CC: | bugzilla.redhat, del, djuran, doogie, dwa, dwysocha, i, jch, joshkel, kbsingh, ken_sheldon, k.georgiou, mbrandsma, mbroz, menscher, modus-bugzilla, rhbugzilla, rocky366, ron.arts, xim, zing |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 1.02.02 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-11-09 12:45:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexandre Oliva
2004-09-08 13:02:22 UTC
A significantly different stack trace for an identical scenario. lvcreate: page allocation failure. order:0, mode:0xd2 [<02142fb8>] __alloc_pages+0x28b/0x298 [<02159df9>] __vmalloc+0xb7/0xf0 [<22a4469d>] dm_vcalloc+0x1d/0x42 [dm_mod] [<25c9c278>] init_exception_table+0x18/0x4f [dm_snapshot] [<25c9c467>] init_hash_tables+0xb4/0xd8 [dm_snapshot] [<25c9c683>] snapshot_ctr+0x1f8/0x2d8 [dm_snapshot] [<22a45073>] dm_table_add_target+0xfc/0x169 [dm_mod] [<22a47048>] populate_table+0x8a/0xaf [dm_mod] [<22a470a4>] table_load+0x37/0xf6 [dm_mod] [<22a4774b>] ctl_ioctl+0xcb/0x10d [dm_mod] [<22a4706d>] table_load+0x0/0xf6 [dm_mod] [<021760f3>] sys_ioctl+0x296/0x337 [<0210808b>] do_IRQ+0x2f7/0x303 device-mapper: : Unable to allocate hash table space device-mapper: error adding target to table FWIW, in this case /sbin/devmap_name didn't fail. After the error, I got: udev[13335]: creating device node '/dev/dm-4' udev[13337]: creating device node '/dev/dm-5' udev[13340]: creating device node '/dev/dm-6' Same problem here. Sometimes, lvcreate will make the snapshot, sometimes it will freeze but, as I have all my filesystems on one volume group, I can't see debug info on the logs. I still can see it on the console, though. So insufficient memory for the snapshot? Try different parameters (fewer concurrent snapshots, smaller device, larger chunksize etc.). [And snapshots of volumes used by the lvm2 tools themselves are not supported yet; in other cases, lvremove-ing the failed snapshot ought to tidy things up after a failure.] The system had lots of free swap space, so it shouldn't be it. FWIW, I once got the same error while lvremoving the snapshot LV. No other snapshots were present, and this was a single-LV VG in external disks only, totally unrelated with the root filesystem. Nice, tre system had frozen, but the oops made it to /var/log/messages. I figured a snapshot was hurting performance too much (rsyncing big files around), so I tried to remove it while there was a lot of stuff yet to be flushed to disk. And then it died: slab error in kmem_cache_destroy(): cache `dm-io-bio': Can't free all objects [<021471a8>] kmem_cache_destroy+0x97/0x11c [<229efa00>] bio_set_exit+0x1c/0x45 [dm_mod] [<229efd2a>] resize_pool+0x2c/0x9b [dm_mod] [<229f1729>] kcopyd_client_destroy+0xb/0x26 [dm_mod] [<229fe7b2>] snapshot_dtr+0x4f/0x58 [dm_snapshot] [<229ec886>] table_destroy+0x39/0x83 [dm_mod] [<229ee26a>] __hash_remove+0x3e/0x52 [dm_mod] [<229ee941>] dev_remove+0x7a/0x96 [dm_mod] [<229ef74b>] ctl_ioctl+0xcb/0x10d [dm_mod] [<229ee8c7>] dev_remove+0x0/0x96 [dm_mod] [<021760f3>] sys_ioctl+0x296/0x337 slab error in kmem_cache_destroy(): cache `dm-io-1': Can't free all objects [<021471a8>] kmem_cache_destroy+0x97/0x11c [<229efa20>] bio_set_exit+0x3c/0x45 [dm_mod] [<229efd2a>] resize_pool+0x2c/0x9b [dm_mod] [<229f1729>] kcopyd_client_destroy+0xb/0x26 [dm_mod] [<229fe7b2>] snapshot_dtr+0x4f/0x58 [dm_snapshot] [<229ec886>] table_destroy+0x39/0x83 [dm_mod] [<229ee26a>] __hash_remove+0x3e/0x52 [dm_mod] [<229ee941>] dev_remove+0x7a/0x96 [dm_mod] [<229ef74b>] ctl_ioctl+0xcb/0x10d [dm_mod] [<229ee8c7>] dev_remove+0x0/0x96 [dm_mod] [<021760f3>] sys_ioctl+0x296/0x337 ------------[ cut here ]------------ kernel BUG at drivers/md/kcopyd.c:145! invalid operand: 0000 [#1] Modules linked in: tun nfsd exportfs lockd md5 ipv6 parport_pc lp parport autofs4 sunrpc ds ipt_REJECT ipt_LOG ipt_state iptable_filter iptable_nat ip_conntrack iptable_mangle ip_tables button battery asus_acpi ac yenta_socket pcmcia_core uhci_hcd i8xx_tco snd_maestro3 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd soundcore e100 mii floppy dm_snapshot dm_mirror ext3 jbd raid1 dm_mod usb_storage sbp2 ohci1394 ieee1394 sd_mod scsi_mod CPU: 0 EIP: 0060:[<229f0714>] Not tainted VLI EFLAGS: 00010287 (2.6.8-1.541) EIP is at client_free_pages+0xb/0x32 [dm_mod] eax: 00000100 ebx: 219efd20 ecx: 03397018 edx: 00000015 esi: 22a23080 edi: 00000000 ebp: 00000000 esp: 1512cf38 ds: 007b es: 007b ss: 0068 Process lvremove (pid: 27924, threadinfo=1512c000 task=1a88d950) Stack: 219efd20 229f1730 2010a080 229fe7b2 22a23080 21e03480 229ec886 1e97e5e0 22e87000 00000004 08c66300 229ee26a 229f5c20 229ee941 00000000 1512c000 229ef74b 229ee8c7 22e87000 229f5ce0 c134fd04 063ccec0 ffffffe7 021760f3 Call Trace: [<229f1730>] kcopyd_client_destroy+0x12/0x26 [dm_mod] [<229fe7b2>] snapshot_dtr+0x4f/0x58 [dm_snapshot] [<229ec886>] table_destroy+0x39/0x83 [dm_mod] [<229ee26a>] __hash_remove+0x3e/0x52 [dm_mod] [<229ee941>] dev_remove+0x7a/0x96 [dm_mod] [<229ef74b>] ctl_ioctl+0xcb/0x10d [dm_mod] [<229ee8c7>] dev_remove+0x0/0x96 [dm_mod] [<021760f3>] sys_ioctl+0x296/0x337 Code: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0[expected: 0], irqs_disabled():1 [<0211c50c>] __might_sleep+0x7d/0x89 [<0215e0fe>] rw_vm+0x216/0x482 [<229f06e9>] client_alloc_pages+0x27/0x47 [dm_mod] [<229f06e9>] client_alloc_pages+0x27/0x47 [dm_mod] [<0215e850>] get_user_size+0x30/0x57 [<229f06e9>] client_alloc_pages+0x27/0x47 [dm_mod] [<0210664d>] show_registers+0x109/0x15e [<02106822>] die+0x11b/0x219 [<02106b0b>] do_invalid_op+0x0/0x95 [<02106b9a>] do_invalid_op+0x8f/0x95 [<229f0714>] client_free_pages+0xb/0x32 [dm_mod] [<021425e3>] __free_pages_ok+0xb2/0xba [<021467c0>] kmem_freepages+0x6d/0x87 [<229f0714>] client_free_pages+0xb/0x32 [dm_mod] [<229f1730>] kcopyd_client_destroy+0x12/0x26 [dm_mod] [<229fe7b2>] snapshot_dtr+0x4f/0x58 [dm_snapshot] [<229ec886>] table_destroy+0x39/0x83 [dm_mod] [<229ee26a>] __hash_remove+0x3e/0x52 [dm_mod] [<229ee941>] dev_remove+0x7a/0x96 [dm_mod] [<229ef74b>] ctl_ioctl+0xcb/0x10d [dm_mod] [<229ee8c7>] dev_remove+0x0/0x96 [dm_mod] [<021760f3>] sys_ioctl+0x296/0x337 Bad EIP value. <1>Unable to handle kernel paging request at virtual address 22d19000 printing eip: 229ff94f *pde = 1e0a7067 Oops: 0002 [#2] Modules linked in: tun nfsd exportfs lockd md5 ipv6 parport_pc lp parport autofs4 sunrpc ds ipt_REJECT ipt_LOG ipt_state iptable_filter iptable_nat ip_conntrack iptable_mangle ip_tables button battery asus_acpi ac yenta_socket pcmcia_core uhci_hcd i8xx_tco snd_maestro3 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd soundcore e100 mii floppy dm_snapshot dm_mirror ext3 jbd raid1 dm_mod usb_storage sbp2 ohci1394 ieee1394 sd_mod scsi_mod CPU: 0 EIP: 0060:[<229ff94f>] Not tainted VLI EFLAGS: 00010206 (2.6.8-1.541) EIP is at write_exception+0x18/0x2d [dm_snapshot] eax: 00f33909 ebx: 1e89bf0c ecx: 22d19000 edx: 00000000 esi: 229fea4b edi: 219efda0 ebp: 229fea50 esp: 1e89bf04 ds: 007b es: 007b ss: 0068 Process kcopyd (pid: 3958, threadinfo=1e89b000 task=201da030) Stack: 00000000 229ffc4b 00f33909 00000000 0008ba26 00000000 17786b94 2010a080 17786b94 229fea50 229fea80 17786b94 0e2f2b74 00000000 229f083d 00000000 0e2f2b74 229f5dd8 00000246 229f07fe 229f0c31 00000006 229f65c0 00000206 Call Trace: [<229ffc4b>] persistent_commit+0x39/0xd4 [dm_snapshot] [<229fea50>] copy_callback+0x0/0x34 [dm_snapshot] [<229fea80>] copy_callback+0x30/0x34 [dm_snapshot] [<229f083d>] run_complete_job+0x3f/0x47 [dm_mod] [<229f07fe>] run_complete_job+0x0/0x47 [dm_mod] [<229f0c31>] process_jobs+0x19/0x3bc [dm_mod] [<229f0fe3>] do_work+0xf/0x2d [dm_mod] [<02130705>] worker_thread+0x21f/0x33d [<229f0fd4>] do_work+0x0/0x2d [dm_mod] [<0211b5bd>] default_wake_function+0x0/0xc [<0211b5bd>] default_wake_function+0x0/0xc [<021304e6>] worker_thread+0x0/0x33d [<02135401>] kthread+0x69/0x91 [<02135398>] kthread+0x0/0x91 [<021041d9>] kernel_thread_helper+0x5/0xb Code: 8b 41 08 89 53 04 8b 51 0c 89 43 08 31 c0 89 53 0c 5b c3 53 89 cb e8 b9 ff ff ff 89 c1 85 c9 b8 ea ff ff ff 74 18 8b 03 8b 53 04 <89> 01 8b 43 08 89 51 04 8b 53 0c 89 41 08 31 c0 89 51 0c 5b c3 Only physical memory counts: snapshots can't use swap. Need to locate the source for the dm components (dm* and kcopyd* in drivers/md/) you were running and check whether it matches current upstream [e.g. to confirm whether or not an upstream fix affecting kcopyd_client_destroy got applied yet to the version you were running]. If the code you're running *does* match upstream, then I'll spend time investigating. Also similar report on linux-kernel last week. I think the problems will go away once the lvm2 activation code has been changed to make use of the device-mapper create/load ioctl splitting. But this is a significant amount of work and it is not yet scheduled. *** Bug 144874 has been marked as a duplicate of this bug. *** Update: this problem is now fully understood and work on the solution is underway. *** Bug 143650 has been marked as a duplicate of this bug. *** How is the solution comming along? I think I stumbled upon this problem on a machine running the kernel-smp-2.6.9-5.0.3.EL.x86_64 on RHEL4. I got the following messages in my syslog: Apr 22 16:32:06 livingston kernel: kjournald starting. Commit interval 5 seconds Apr 22 16:32:06 livingston kernel: EXT3-fs: dm-21: orphan cleanup on readonly fs Apr 22 16:32:06 livingston kernel: EXT3-fs warning (device dm-21): ext3_orphan_get: bad orphan inode 18874! e2fsck was run? Apr 22 16:32:06 livingston kernel: Apr 22 16:32:06 livingston kernel: ext3_test_bit(bit=2489, block=33794) = 0 Apr 22 16:32:06 livingston kernel: inode=0000000000000000 Apr 22 16:32:06 livingston kernel: EXT3-fs: recovery complete. Apr 22 16:32:06 livingston kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 22 16:32:06 livingston kernel: SELinux: initialized (dev dm-21, type ext3), uses xattr Apr 22 16:32:07 livingston kernel: kjournald starting. Commit interval 5 seconds Apr 22 16:32:07 livingston kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 22 16:32:07 livingston kernel: SELinux: initialized (dev dm-24, type ext3), uses xattr Apr 22 16:32:08 livingston kernel: kjournald starting. Commit interval 5 seconds Apr 22 16:32:08 livingston kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 22 16:32:08 livingston kernel: SELinux: initialized (dev dm-27, type ext3), uses xattr Apr 22 16:32:11 livingston kernel: lvcreate: page allocation failure. order:4, mode:0xd0 Apr 22 16:32:11 livingston kernel: Apr 22 16:32:11 livingston kernel: Call Trace:<ffffffff801ea63a>{vgacon_cursor+0} <ffffffff80157ba6>{__alloc_pages+768} Apr 22 16:32:11 livingston kernel: <ffffffff80157c3a>{__get_free_pages+11} <ffffffff8015aab4>{kmem_getpages+36} Apr 22 16:32:11 livingston kernel: <ffffffff8015b24f>{cache_alloc_refill+615} <ffffffff8015af17>{__kmalloc+123} Apr 22 16:32:11 livingston kernel: <ffffffff80156522>{mempool_resize+136} <ffffffffa003d6d3>{:dm_mod:resize_pool+79} Apr 22 16:32:11 livingston kernel: <ffffffffa003eade>{:dm_mod:kcopyd_client_create+598} Apr 22 16:32:11 livingston kernel: <ffffffffa0088488>{:dm_snapshot:snapshot_ctr+740} <ffffffffa003ac5e>{:dm_mod:dm_table_add_target+554} Apr 22 16:32:11 livingston kernel: <ffffffffa003cde6>{:dm_mod:table_load+225} <ffffffffa003cd05>{:dm_mod:table_load+0} Apr 22 16:32:11 livingston kernel: <ffffffffa003d49b>{:dm_mod:ctl_ioctl+569} <ffffffff801833ca>{sys_ioctl+858} Apr 22 16:32:11 livingston kernel: <ffffffff8011000e>{system_call+126} Apr 22 16:32:11 livingston kernel: device-mapper: : Could not create kcopyd client Apr 22 16:32:11 livingston kernel: Apr 22 16:32:11 livingston kernel: device-mapper: error adding target to table Apr 22 16:32:11 livingston SnapVault[18629]: OSSV: /snapshot/master sudbury:/vol/vol3/livingston_lfs_master /snapshot/master is not an existing directory Apr 22 16:32:11 livingston kernel: FAT: Unrecognized mount option "acl" or missing value Apr 22 16:32:12 livingston SnapVault[18629]: OSSV: /snapshot/patch sudbury:/vol/vol3/livingston_lfs_patch /snapshot/patch is not an existing directory Apr 22 16:32:13 livingston kernel: lvcreate: page allocation failure. order:4, mode:0xd0 Apr 22 16:32:13 livingston kernel: Apr 22 16:32:13 livingston kernel: Call Trace:<ffffffff801ea63a>{vgacon_cursor+0} <ffffffff80157ba6>{__alloc_pages+768} Apr 22 16:32:13 livingston kernel: <ffffffff80157c3a>{__get_free_pages+11} <ffffffff8015aab4>{kmem_getpages+36} Apr 22 16:32:13 livingston kernel: <ffffffff8015b24f>{cache_alloc_refill+615} <ffffffff8015af17>{__kmalloc+123} Apr 22 16:32:13 livingston kernel: <ffffffff80156522>{mempool_resize+136} <ffffffffa003d6d3>{:dm_mod:resize_pool+79} Apr 22 16:32:13 livingston kernel: <ffffffffa003eade>{:dm_mod:kcopyd_client_create+598} Apr 22 16:32:13 livingston kernel: <ffffffffa0088488>{:dm_snapshot:snapshot_ctr+740} <ffffffffa003ac5e>{:dm_mod:dm_table_add_target+554} Apr 22 16:32:13 livingston kernel: <ffffffffa003cde6>{:dm_mod:table_load+225} <ffffffffa003cd05>{:dm_mod:table_load+0} Apr 22 16:32:13 livingston kernel: <ffffffffa003d49b>{:dm_mod:ctl_ioctl+569} <ffffffff801833ca>{sys_ioctl+858} Apr 22 16:32:13 livingston kernel: <ffffffff8011000e>{system_call+126} Apr 22 16:32:13 livingston kernel: device-mapper: : Could not create kcopyd client Apr 22 16:32:13 livingston kernel: Apr 22 16:32:13 livingston kernel: device-mapper: error adding target to table Apr 22 16:32:47 livingston kernel: lvremove: page allocation failure. order:4, mode:0xd0 Apr 22 16:32:47 livingston kernel: Apr 22 16:32:48 livingston kernel: Call Trace:<ffffffff801ea63a>{vgacon_cursor+0} <ffffffff80157ba6>{__alloc_pages+768} Apr 22 16:32:48 livingston kernel: <ffffffff80157c3a>{__get_free_pages+11} <ffffffff8015aab4>{kmem_getpages+36} Apr 22 16:32:48 livingston kernel: <ffffffff8015b24f>{cache_alloc_refill+615} <ffffffff8015af17>{__kmalloc+123} Apr 22 16:32:48 livingston kernel: <ffffffff80156522>{mempool_resize+136} <ffffffffa003d6d3>{:dm_mod:resize_pool+79} Apr 22 16:32:48 livingston kernel: <ffffffffa003eade>{:dm_mod:kcopyd_client_create+598} Apr 22 16:32:48 livingston kernel: <ffffffffa0088488>{:dm_snapshot:snapshot_ctr+740} <ffffffffa003ac5e>{:dm_mod:dm_table_add_target+554} Apr 22 16:32:48 livingston kernel: <ffffffffa003cde6>{:dm_mod:table_load+225} <ffffffffa003cd05>{:dm_mod:table_load+0} Apr 22 16:32:48 livingston kernel: <ffffffffa003d49b>{:dm_mod:ctl_ioctl+569} <ffffffff801833ca>{sys_ioctl+858} Apr 22 16:32:48 livingston kernel: <ffffffff8011000e>{system_call+126} Apr 22 16:32:48 livingston kernel: device-mapper: : Could not create kcopyd client Apr 22 16:32:48 livingston kernel: Apr 22 16:32:48 livingston kernel: device-mapper: error adding target to table Apr 22 16:32:49 livingston rpc.mountd: authenticated unmount request from murray.carmen.se:627 for /lfs/local_share (/lfs/local_share) Apr 22 16:32:49 livingston rpc.mountd: authenticated unmount request from murray.carmen.se:627 for /lfs/local_share (/lfs/local_share) Apr 22 16:32:49 livingston kernel: lvremove: page allocation failure. order:4, mode:0xd0 Apr 22 16:32:49 livingston kernel: Apr 22 16:32:50 livingston kernel: Call Trace:<ffffffff801ea63a>{vgacon_cursor+0} <ffffffff80157ba6>{__alloc_pages+768} Apr 22 16:32:50 livingston kernel: <ffffffff80157c3a>{__get_free_pages+11} <ffffffff8015aab4>{kmem_getpages+36} Apr 22 16:32:50 livingston kernel: <ffffffff8015b24f>{cache_alloc_refill+615} <ffffffff8015af17>{__kmalloc+123} Apr 22 16:32:50 livingston kernel: <ffffffff80156522>{mempool_resize+136} <ffffffffa003d6d3>{:dm_mod:resize_pool+79} Apr 22 16:32:50 livingston kernel: <ffffffffa003eade>{:dm_mod:kcopyd_client_create+598} Apr 22 16:32:50 livingston kernel: <ffffffffa0088488>{:dm_snapshot:snapshot_ctr+740} <ffffffffa003ac5e>{:dm_mod:dm_table_add_target+554} Apr 22 16:32:50 livingston kernel: <ffffffffa003cde6>{:dm_mod:table_load+225} <ffffffffa003cd05>{:dm_mod:table_load+0} Apr 22 16:32:50 livingston kernel: <ffffffffa003d49b>{:dm_mod:ctl_ioctl+569} <ffffffff801833ca>{sys_ioctl+858} Apr 22 16:32:50 livingston kernel: <ffffffff8011000e>{system_call+126} Apr 22 16:32:50 livingston kernel: device-mapper: : Could not create kcopyd client Apr 22 16:32:50 livingston kernel: Apr 22 16:32:50 livingston kernel: device-mapper: error adding target to table The RHEL4 version of this bug is tracked on bug 164959 Seeing this on RHEL 4.1; will add my logs to that report. This problem has been "fully understood" since January... why hasn't a fix been released? *** Bug 169597 has been marked as a duplicate of this bug. *** The number of problem reports was small and other work was assigned higher priority. The fix involves a substantial amount of work, but this is now underway and has a high priority. The userspace packages have been updated and kernel patches are starting to trickle upstream. Have all the patches been applied for this bug yet? If so, what kernel versions? 2.6.17-rc1 has all the patches that are ready so far. What's the first kernel version that had these patches? We are running 2.6.12.6 in xen. Are they easy to backport? Upgrading is difficult for us. I'm willing to do them myself. I've looked at the device-mapper and lvm2 cvs repositories, but can't really find the exact series of fixes that deal with this. Does it require both kernel and userspace changes? 2.6.17-rc1, yes, yes. Diff the dm files in the two kernels to work out what to change. For userspace just keep up with the most up-to-date releases. Sorry for pestering again; I see the redhat advisory that mentions these being fixed. Is that based on an unreleased version from cvs, or the last versions released on Feb 7, 2006? If the latter, then upgrading is a simple matter of apt-get(we run debian). 19 files changed, 937 insertions(+), 603 deletions(-) I haven't tried just copying the files verbatim; I doubt that would work, anyways. I'll plug away at this. Btw, I saw mails about this, with one suggestion that the user should get more physical ram. Is that really the case? Also, this fix talks about preallocating the needed memory. What if the preallocation fails? How big of a contiguous block is needed? Details about my hardware: 256m machine(xen dom0). 23 logical volumes, on a 545G VG with 99G free. 6 snapshots, which get created/removed for syncing. This should be fixed in both FC6 and FC devel and to my best knowledge also FC5. |