Red Hat Bugzilla – Bug 984236
out of memory failure on lvm2 thinp volume when creating filesystems
Last modified: 2015-02-18 08:59:54 EST
Created attachment 773171 [details]
Description of problem: mkfs.btfs fails dramatically on a 16TB LVM thinp virtual volume.
Version-Release number of selected component (if applicable):
Up to date Fedora 19, plus kernel-3.10.0-1.fc20.x86_64.debug
Steps to Reproduce:
1. Create a 16TB virtual logical volume backed by a 400GB VG-thin-pool. i.e. 500GB HDD is partitioned, one partition is sda8, made into a PV, added to a VG, created a thin-pool from all VG extents, created a 16TB virtual LV named /dev/vg1/brick1.
2. mkfs.btrfs /dev/vg1/brick1
Looks like a lot of out of memory errors:
[ 335.751711] Out of memory: Kill process 164 (systemd-journal) score 0 or sacrifice child
[ 335.754041] Killed process 164 (systemd-journal) total-vm:348984kB, anon-rss:0kB, file-rss:964kB
[ 335.999673] Out of memory: Kill process 167 (lvmetad) score 0 or sacrifice child
[ 336.001970] Killed process 167 (lvmetad) total-vm:100096kB, anon-rss:0kB, file-rss:1788kB
[ 336.509314] Out of memory: Kill process 295 (bash) score 0 or sacrifice child
[ 336.511635] Killed process 387 (mkfs.btrfs) total-vm:14044kB, anon-rss:0kB, file-rss:508kB
Subsequently possible circular locking dependency detected.
Not this. Works with mkfs.xfs.
Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz
Created attachment 773172 [details]
[root@f19s ~]# free -m
total used free shared buffers cached
Mem: 3936 1416 2519 0 33 334
-/+ buffers/cache: 1048 2888
Swap: 7999 0 7999
Within a minute after mkfs.btrfs:
[root@f19s ~]# free -m
total used free shared buffers cached
Mem: 3936 3848 87 0 0 8
-/+ buffers/cache: 3840 95
Swap: 7999 53 7946
Shell hangs, and then ssh connection is closed by remote host shortly thereafter.
Created attachment 773194 [details]
journalctl -xb multi-user.target
Attachments 171, 172 are single user command-line boot. This is multi-user.target, which takes quite a bit longer to recover from.
Created attachment 773207 [details]
journalctl -b (mkfs.xfs)
multi-user.target, I'm able to get it to happen with mkfs.xfs also. So I don't think this is btrfs specific. I think it could be an LVM thinp issue.
This isn't reproducing in qemu-kvm pointed to a 16TB (thin provisioned) qcow2 file on the same baremetal machine as prior tests. Works with both xfs and btrfs, and the host free memory doesn't go below 1500MB, including what's consumed for the VM.
So I think this is an LVM thinp bug. Either it should work as well as a qcow2 file for a VM, or I should get some kind of "not possible" message when creating the 16TB virtual LV.
Hmm looks like dmeventd memory leak (some leaks were fixed in upstream)
Please could you try to reproduce without monitoring (lvm.conf monitoring = 0)
(and make sure dmeventd is not running).
(Also what have been the parameters for thin pool - 'lvs -a -o all')
[root@f19s ~]# lvcreate -l 102900 -T vg1/thinp
device-mapper: remove ioctl on failed: Device or resource busy
Logical volume "thinp" created
[ 267.473860] bio: create slab <bio-1> at 1
[ 267.660102] device-mapper: ioctl: unable to remove open device vg1-thinp
[ 268.025393] bio: create slab <bio-1> at 1
[ 268.071759] device-mapper: thin: Data device (dm-1) discard unsupported: Disabling discard passdown.
[root@f19s ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
brick1 vg1 Vwi-a-tz- 16.00t thinp 0.00
thinp vg1 twi-a-tz- 401.95g 0.00
/etc/lvm/lvm.conf monitoring set to 0 and confirmed dmeventd isn't running after a reboot.
mkfs.btrfs still takes a very long time, I can't ssh into the computer. However, systemd doesn't kill mkfs.btrfs this time, and the formatting completes successfully. Attaching new dmesg showing errors.
Created attachment 773396 [details]
This is with the regular kernel, not the debug kernel. Is there any difference with the debug kernel for any of this?
So few issues:
The ioctl errors is a known issue - there will be some workaround for non-clustered used deployed soon upstream - they are 100% harmless and could be ignored for now.
Dmeventd seems to be consuming quite a lot of memory - hopefully this will be addressed with new lvm2 version release.
As for kernel question - are you using distro debug kernel - or your own build ?
There are dm thinp driver debug kernel options which are ONLY meant to be used by developers and they consume huuuuge amount of memory for various validation data structures - so unless you are dm thinp developer avoid using those options - so make sure they are not enabled in your kernel.
Otherwise you would need either to use much more physical memory, or significantly reduce device sizes.
(So yes there could be a huge difference between debug-nondebug kernel)
(In reply to Zdenek Kabelac from comment #9)
> As for kernel question - are you using distro debug kernel - or your own
> build ?
Distro debug kernel.
> There are dm thinp driver debug kernel options which are ONLY meant to be
> used by developers and they consume huuuuge amount of memory for various
> validation data structures - so unless you are dm thinp developer avoid
> using those options - so make sure they are not enabled in your kernel.
> Otherwise you would need either to use much more physical memory, or
> significantly reduce device sizes.
> (So yes there could be a huge difference between debug-nondebug kernel)
Yeah I mean in terms of the quality of dmesg for the purpose of this bug. If it's no different, then I'll use the non-debug kernel.
If there is no big need for a snapshots - i.e. the primary purpose is to provision the space - increasing --chunksize for thin pool creation might greatly reduce memory requirements - default size is 64K - so maybe using 512K (or even bigger) would mean big speedup in your case.
(For a lot of snapshots - smaller chunk is advantage since a small pieces could be share - but for mostly space provisioning - larger chunks are reducing memory footprint significantly)
There's no present need for snapshots. I'm just looking for an alternative to qcow2 files to present large virtual storage to two VM's for glusterfs familiarization. It seems like the host lvm presenting a virtual device to the VM is a bit more efficient than the VM writing btrfs to a qcow2 file on an XFS file system managed by the host.
Then you should consider maybe even bigger chunk sizes - it just depends how big provisioning you need - since write of single byte obviously provisions whole chunk. Max supported chunk size is 1GiB - but this may have negative impact when zeroing is enabled at the same time - but again - if you do not need this feature on - then the large chunk sizes with disabled zeroing (default is enabled) will give you maximum speed.
Upstream will support 'profiles' with same reasonable defaults for some typical use-cases.
Is there still anything we can do better here ?
Otherwise I think we should close this BZ.
Uncertain, I haven't recently done significant testing on very large virtual sized LV's backed by such small amounts of physical storage.
This message is a notice that Fedora 19 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 19. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 19 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
Thank you for reporting this bug and we are sorry it could not be fixed.