Bug 984236

Summary: out of memory failure on lvm2 thinp volume when creating filesystems
Product: [Fedora] Fedora Reporter: Chris Murphy <bugzilla>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: agk, bmarzins, bmr, bugzilla, dwysocha, gansalmon, heinzm, itamar, jonathan, kernel-maint, lvm-team, madhu.chinakonda, msnitzer, prajnoha, prockai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-18 13:59:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg
none
journalctl -xb
none
journalctl -xb multi-user.target
none
journalctl -b (mkfs.xfs)
none
dmesg monitoring=0 none

Description Chris Murphy 2013-07-13 20:47:42 UTC
Created attachment 773171 [details]
dmesg

Description of problem: mkfs.btfs fails dramatically on a 16TB LVM thinp virtual volume. 


Version-Release number of selected component (if applicable):
Up to date Fedora 19, plus kernel-3.10.0-1.fc20.x86_64.debug
lvm2-2.02.98-9.fc19.x86_64
lvm2-libs-2.02.98-9.fc19.x86_64
btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64


How reproducible:
Always


Steps to Reproduce:
1. Create a 16TB virtual logical volume backed by a 400GB VG-thin-pool. i.e. 500GB HDD is partitioned, one partition is sda8, made into a PV, added to a VG, created a thin-pool from all VG extents, created a 16TB virtual LV named /dev/vg1/brick1.

2. mkfs.btrfs /dev/vg1/brick1


Actual results:
Looks like a lot of out of memory errors:

[  335.751711] Out of memory: Kill process 164 (systemd-journal) score 0 or sacrifice child
[  335.754041] Killed process 164 (systemd-journal) total-vm:348984kB, anon-rss:0kB, file-rss:964kB

[  335.999673] Out of memory: Kill process 167 (lvmetad) score 0 or sacrifice child
[  336.001970] Killed process 167 (lvmetad) total-vm:100096kB, anon-rss:0kB, file-rss:1788kB

[  336.509314] Out of memory: Kill process 295 (bash) score 0 or sacrifice child
[  336.511635] Killed process 387 (mkfs.btrfs) total-vm:14044kB, anon-rss:0kB, file-rss:508kB

Subsequently possible circular locking dependency detected.

Expected results:
Not this. Works with mkfs.xfs.

Additional info:
Intel(R) Core(TM)2 Duo CPU     T9300  @ 2.50GHz
4GB memory
8GB swap

Comment 1 Chris Murphy 2013-07-13 20:48:57 UTC
Created attachment 773172 [details]
journalctl -xb

Comment 2 Chris Murphy 2013-07-13 21:08:34 UTC
Before mkfs.btrfs:
[root@f19s ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3936       1416       2519          0         33        334
-/+ buffers/cache:       1048       2888
Swap:         7999          0       7999


Within a minute after mkfs.btrfs:
[root@f19s ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3936       3848         87          0          0          8
-/+ buffers/cache:       3840         95
Swap:         7999         53       7946

Shell hangs, and then ssh connection is closed by remote host shortly thereafter.

Comment 3 Chris Murphy 2013-07-13 23:24:43 UTC
Created attachment 773194 [details]
journalctl -xb multi-user.target

Attachments 171, 172 are single user command-line boot. This is multi-user.target, which takes quite a bit longer to recover from.

Comment 4 Chris Murphy 2013-07-14 01:50:06 UTC
Created attachment 773207 [details]
journalctl -b (mkfs.xfs)

multi-user.target, I'm able to get it to happen with mkfs.xfs also. So I don't think this is btrfs specific. I think it could be an LVM thinp issue.

Comment 5 Chris Murphy 2013-07-14 03:21:52 UTC
This isn't reproducing in qemu-kvm pointed to a 16TB (thin provisioned) qcow2 file on the same baremetal machine as prior tests. Works with both xfs and btrfs, and the host free memory doesn't go below 1500MB, including what's consumed for the VM.

So I think this is an LVM thinp bug. Either it should work as well as a qcow2 file for a VM, or I should get some kind of "not possible" message when creating the 16TB virtual LV.

Comment 6 Zdenek Kabelac 2013-07-14 08:46:00 UTC
Hmm looks like  dmeventd memory leak  (some leaks were fixed in upstream)

Please could you try to reproduce without monitoring  (lvm.conf   monitoring = 0)
(and make sure dmeventd is not running).

(Also what have been the parameters for thin pool -  'lvs -a -o all')

Comment 7 Chris Murphy 2013-07-14 18:05:05 UTC
[root@f19s ~]# lvcreate -l 102900 -T vg1/thinp
  device-mapper: remove ioctl on  failed: Device or resource busy
  Logical volume "thinp" created

[  267.473860] bio: create slab <bio-1> at 1
[  267.660102] device-mapper: ioctl: unable to remove open device vg1-thinp
[  268.025393] bio: create slab <bio-1> at 1
[  268.071759] device-mapper: thin: Data device (dm-1) discard unsupported: Disabling discard passdown.


[root@f19s ~]# lvs
  LV     VG   Attr      LSize   Pool  Origin Data%  Move Log Copy%  Convert
  brick1 vg1  Vwi-a-tz-  16.00t thinp          0.00                        
  thinp  vg1  twi-a-tz- 401.95g                0.00 
  
  /etc/lvm/lvm.conf monitoring set to 0 and confirmed dmeventd isn't running after a reboot.

mkfs.btrfs still takes a very long time, I can't ssh into the computer. However, systemd doesn't kill mkfs.btrfs this time, and the formatting completes successfully. Attaching new dmesg showing errors.

Comment 8 Chris Murphy 2013-07-14 18:06:06 UTC
Created attachment 773396 [details]
dmesg monitoring=0

This is with the regular kernel, not the debug kernel. Is there any difference with the debug kernel for any of this?

Comment 9 Zdenek Kabelac 2013-07-14 18:17:18 UTC
So few issues:

The  ioctl errors is a known issue - there will be some workaround for non-clustered used deployed soon upstream - they are 100% harmless and could be ignored for now.

Dmeventd seems to be consuming quite a lot of memory - hopefully this will be addressed with new lvm2 version release.

As for kernel question -  are you using  distro debug kernel - or your own build ?

There are dm thinp driver debug kernel options which are ONLY meant to be used by developers and they consume huuuuge amount of memory for various validation data structures - so unless you are dm thinp developer  avoid using those options - so make sure they are not enabled in your kernel.
Otherwise you would need either to use much more physical memory, or significantly reduce device sizes. 
(So yes there could be a huge difference between debug-nondebug kernel)

Comment 10 Chris Murphy 2013-07-14 18:23:33 UTC
(In reply to Zdenek Kabelac from comment #9)
> As for kernel question -  are you using  distro debug kernel - or your own
> build ?

Distro debug kernel.

> 
> There are dm thinp driver debug kernel options which are ONLY meant to be
> used by developers and they consume huuuuge amount of memory for various
> validation data structures - so unless you are dm thinp developer  avoid
> using those options - so make sure they are not enabled in your kernel.
> Otherwise you would need either to use much more physical memory, or
> significantly reduce device sizes. 
> (So yes there could be a huge difference between debug-nondebug kernel)

Yeah I mean in terms of the quality of dmesg for the purpose of this bug. If it's no different, then I'll use the non-debug kernel.

Comment 11 Zdenek Kabelac 2013-07-14 18:31:22 UTC
If there is no big need for a snapshots - i.e. the primary purpose is to provision the space  - increasing  --chunksize  for thin pool creation might greatly reduce memory requirements - default size is 64K - so maybe using 512K (or even bigger) would mean big speedup in your case.

(For a lot of snapshots - smaller chunk is advantage since a small pieces could be share -  but for mostly space provisioning - larger chunks are reducing memory footprint significantly)

Comment 12 Chris Murphy 2013-07-14 18:49:35 UTC
There's no present need for snapshots. I'm just looking for an alternative to qcow2 files to present large virtual storage to two VM's for glusterfs familiarization. It seems like the host lvm presenting a virtual device to the VM is a bit more efficient than the VM writing btrfs to a qcow2 file on an XFS file system managed by the host.

Comment 13 Zdenek Kabelac 2013-07-14 19:05:09 UTC
Then you should consider maybe even bigger chunk sizes - it just depends how big provisioning you need - since write of single byte obviously provisions whole chunk. Max supported chunk size is 1GiB - but this may have negative impact when zeroing is enabled at the same time - but again - if you do not need this feature on - then the large chunk sizes with disabled zeroing (default is enabled) will give you maximum speed.

Upstream will support  'profiles' with same reasonable defaults for some typical use-cases.

Comment 14 Zdenek Kabelac 2014-11-27 09:00:22 UTC
Is there still anything we can do better here ?

Otherwise I think we should close this BZ.

Comment 15 Chris Murphy 2014-11-28 05:05:23 UTC
Uncertain, I haven't recently done significant testing on very large virtual sized LV's backed by such small amounts of physical storage.

Comment 16 Fedora End Of Life 2015-01-09 22:12:06 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Fedora End Of Life 2015-02-18 13:59:54 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.