Bug 866994 - tgz-out causes memory leak in guestfsd
tgz-out causes memory leak in guestfsd
Status: NEW
Product: Virtualization Tools
Classification: Community
Component: libguestfs (Show other bugs)
unspecified
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Richard W.M. Jones
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-16 09:54 EDT by Richard W.M. Jones
Modified: 2014-09-07 19:01 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Richard W.M. Jones 2012-10-16 09:54:09 EDT
Description of problem:

This log attached below was posted by infernix on IRC.  It apparently
comes when doing a tar-out command on a large ext3 filesystem.
The reporter notes that the tarball reaches 7.5 ~ 15 GB, and the
filesystem is 63 GB used.

Version-Release number of selected component (if applicable):

libguestfs 1.18.8

How reproducible:

Fairly reproducible.

Steps to Reproduce:
1. None yet.

Additional info:

libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 59 94 20 2c | 27 57 93 f2 | 29 d2 32 c7 | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 2a 3e 7c c9 | 8f d3 cc 8a | 9a 92 89 79 | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 2d 35 12 50 | e6 2f b7 e0 | 96 dd de 6c | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 43 89 f9 43 | a6 13 6b 50 | 49 14 54 92 | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 40 4a 3c 1e | 30 1b cf 76 | 55 a2 e3 b7 | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | b8 92 26 f4 | b5 7b 98 4a | 22 69 8c fb | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 36 fa 20 6b | 6e 94 06 97 | 99 1b 83 4c | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 30 04 d5 72 | d8 c4 72 85 | 67 f8 50 c5 | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | eb b4 9f bd | 3e 3b e9 74 | b1 99 8d fd | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 21 3d 91 e5 | 70 24 9d c5 | 2a 32 4d 3e | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 31 43 45 b4 | 93 68 25 1d | 75 8c 7d 97 | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 3f cf df af | e5 5b e4 61 | fa 4f 29 0e | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | a0 c9 06 7a | 7c e0 4d 12 | 67 85 cd 94 | ...
libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 62 5b 04 b3 | ed ef b9 db | 7e 61 5f 1a | ...
[ 2406.090055] kthreadd invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0
[ 2406.090889] kthreadd cpuset=/ mems_allowed=0
[ 2406.091302] Pid: 2, comm: kthreadd Not tainted 3.5.3 #1
[ 2406.091797] Call Trace:
[ 2406.092045]  [<ffffffff810b41a0>] ? dump_header.isra.11+0x6b/0x187
[ 2406.092631]  [<ffffffff81072491>] ? arch_local_irq_save+0x11/0x17
[ 2406.093204]  [<ffffffff815864ac>] ? _raw_spin_lock_irqsave+0x9/0x25
[ 2406.093796]  [<ffffffff81306d3a>] ? ___ratelimit+0xde/0xec
[ 2406.094312]  [<ffffffff810b43e9>] ? oom_kill_process.part.13.constprop.14+0x4b/0x25e
[ 2406.095054]  [<ffffffff81040898>] ? has_ns_capability_noaudit+0xd/0x14
[ 2406.095665]  [<ffffffff810b4b63>] ? out_of_memory+0x442/0x4a4
[ 2406.096211]  [<ffffffff810b86d0>] ? __alloc_pages_nodemask+0x640/0x7cf
[ 2406.096830]  [<ffffffff81035ab6>] ? copy_process+0x10b/0x1181
[ 2406.097381]  [<ffffffff8105b264>] ? check_preempt_curr+0x57/0x64
[ 2406.097948]  [<ffffffff81036c2f>] ? do_fork+0xe2/0x24a
[ 2406.098437]  [<ffffffff8102b9e4>] ? pvclock_clocksource_read+0x42/0xb2
[ 2406.099054]  [<ffffffff8105f36b>] ? set_next_entity+0x32/0x52
[ 2406.099603]  [<ffffffff81014618>] ? kernel_thread+0x80/0x88
[ 2406.100145]  [<ffffffff8105093f>] ? kthread_freezable_should_stop+0x37/0x37
[ 2406.100808]  [<ffffffff8158d7a0>] ? gs_change+0x13/0x13
[ 2406.101299]  [<ffffffff81050ce9>] ? kthreadd+0xed/0x12c
[ 2406.101790]  [<ffffffff81059646>] ? finish_task_switch+0x89/0xc7
[ 2406.102353]  [<ffffffff8158d7a4>] ? kernel_thread_helper+0x4/0x10
[ 2406.102934]  [<ffffffff81050bfc>] ? tsk_fork_get_node+0x1a/0x1a
[ 2406.103491]  [<ffffffff8158d7a0>] ? gs_change+0x13/0x13
[ 2406.103983] Mem-Info:
[ 2406.104207] Node 0 DMA per-cpu:
[ 2406.104539] CPU    0: hi:    0, btch:   1 usd:   0
[ 2406.104993] Node 0 DMA32 per-cpu:
[ 2406.105326] CPU    0: hi:  186, btch:  31 usd:   0
[ 2406.105781] active_anon:518 inactive_anon:7 isolated_anon:0
[ 2406.105781]  active_file:8 inactive_file:17 isolated_file:0
[ 2406.105781]  unevictable:472 dirty:0 writeback:0 unstable:0
[ 2406.105781]  free:1509 slab_reclaimable:118236 slab_unreclaimable:1089
[ 2406.105781]  mapped:3 shmem:12 pagetables:101 bounce:0
[ 2406.108408] Node 0 DMA free:1988kB min:88kB low:108kB high:132kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15636kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:13892kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 2406.111833] lowmem_reserve[]: 0 476 476 476
[ 2406.112303] Node 0 DMA32 free:4048kB min:2744kB low:3428kB high:4116kB active_anon:2072kB inactive_anon:28kB active_file:32kB inactive_file:56kB unevictable:1888kB isolated(anon):0kB isolated(file):0kB present:487864kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:48kB slab_reclaimable:459052kB slab_unreclaimable:4356kB kernel_stack:328kB pagetables:404kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1982 all_unreclaimable? yes
[ 2406.115942] lowmem_reserve[]: 0 0 0 0
[ 2406.116374] Node 0 DMA: 1*4kB 168*8kB 36*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1988kB
[ 2406.117562] Node 0 DMA32: 674*4kB 7*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 4048kB
[ 2406.118767] 508 total pagecache pages
[ 2406.119115] 0 pages in swap cache
[ 2406.119432] Swap cache stats: add 0, delete 0, find 0/0
[ 2406.119922] Free swap  = 0kB
[ 2406.120215] Total swap = 0kB
[ 2406.121595] 127982 pages RAM
[ 2406.121881] 4579 pages reserved
[ 2406.122180] 117 pages shared
[ 2406.122464] 121777 pages non-shared
[ 2406.122796] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 2406.123498] [   71]     0    71     5308       90   0     -17         -1000 udevd
[ 2406.124199] [   99]     0    99     5307       90   0     -17         -1000 udevd
[ 2406.124907] [  100]     0   100     5307       90   0     -17         -1000 udevd
[ 2406.125606] [  154]     0   154    24876       73   0       0             0 guestfsd
[ 2406.126348] [  160]     0   160     4326      130   0       0             0 tar
[ 2406.127032] [  161]     0   161     2146      108   0       0             0 gzip
[ 2406.127719] Out of memory: Kill process 154 (guestfsd) score 0 or sacrifice child
[ 2406.128415] Killed process 154 (guestfsd) total-vm:99504kB, anon-rss:292kB, file-rss:0kB
[ 2406.130070] guestfsd: page allocation failure: order:2, mode:0x40d0
[ 2406.130662] Pid: 154, comm: guestfsd Not tainted 3.5.3 #1
[ 2406.131167] Call Trace:
[ 2406.131407]  [<ffffffff810b5a9f>] ? warn_alloc_failed+0x11a/0x12d
[ 2406.131979]  [<ffffffff81585cd2>] ? _cond_resched+0x7/0x1c
[ 2406.132494]  [<ffffffff810b87b9>] ? __alloc_pages_nodemask+0x729/0x7cf
[ 2406.133110]  [<ffffffff810e23b0>] ? alloc_pages_current+0xc7/0xe4
[ 2406.133685]  [<ffffffff810b53d9>] ? __get_free_pages+0x6/0x34

[ 2406.134222]  [<ffffffff810e76f1>] ? kmalloc_order_trace+0x22/0x56
[ 2406.134798]  [<ffffffff8139d766>] ? port_fops_write+0x163/0x1b7
[ 2406.135357]  [<ffffffff812c2046>] ? security_file_permission+0x15/0x2b
[ 2406.135973]  [<ffffffff810f30e3>] ? vfs_write+0xa2/0xe9
[ 2406.136466]  [<ffffffff810f32c0>] ? sys_write+0x45/0x6b
[ 2406.136967]  [<ffffffff8158c4b9>] ? system_call_fastpath+0x16/0x1b
[ 2406.137546] Mem-Info:
[ 2406.137773] Node 0 DMA per-cpu:
[ 2406.138086] CPU    0: hi:    0, btch:   1 usd:   0
[ 2406.138538] Node 0 DMA32 per-cpu:
[ 2406.138868] CPU    0: hi:  186, btch:  31 usd:   0
[ 2406.139319] active_anon:518 inactive_anon:7 isolated_anon:0
[ 2406.139319]  active_file:8 inactive_file:17 isolated_file:0
[ 2406.139319]  unevictable:472 dirty:0 writeback:0 unstable:0
[ 2406.139319]  free:1509 slab_reclaimable:118236 slab_unreclaimable:1089
[ 2406.139319]  mapped:3 shmem:12 pagetables:101 bounce:0
[ 2406.141957] Node 0 DMA free:1988kB min:88kB low:108kB high:132kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15636kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:13892kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 2406.145379] lowmem_reserve[]: 0 476 476 476
[ 2406.145846] Node 0 DMA32 free:4048kB min:2744kB low:3428kB high:4116kB active_anon:2072kB inactive_anon:28kB active_file:32kB inactive_file:56kB unevictable:1888kB isolated(anon):0kB isolated(file):0kB present:487864kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:48kB slab_reclaimable:459052kB slab_unreclaimable:4356kB kernel_stack:328kB pagetables:404kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1982 all_unreclaimable? yes
[ 2406.149443] lowmem_reserve[]: 0 0 0 0
[ 2406.149865] Node 0 DMA: 1*4kB 168*8kB 36*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1988kB
[ 2406.151060] Node 0 DMA32: 674*4kB 7*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 4048kB
[ 2406.152245] 508 total pagecache pages
[ 2406.152597] 0 pages in swap cache
[ 2406.152911] Swap cache stats: add 0, delete 0, find 0/0
[ 2406.153396] Free swap  = 0kB
[ 2406.153675] Total swap = 0kB
[ 2406.155039] 127982 pages RAM
[ 2406.155327] 4579 pages reserved
[ 2406.155627] 117 pages shared
[ 2406.155901] 121777 pages non-shared
Killed
tar: -: Wrote only 4096 of 10240 bytes
tar: Error is not recoverable: exiting now
/init: 161: /init: reboot: not found
[ 2406.329592] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
[ 2406.329592] 
[ 2406.330467] Pid: 1, comm: init Not tainted 3.5.3 #1
[ 2406.330933] Call Trace:
[ 2406.331178]  [<ffffffff8157c01f>] ? panic+0xc1/0x1c7
[ 2406.331645]  [<ffffffff8103bb46>] ? do_exit+0x397/0x7aa
[ 2406.332137]  [<ffffffff8103c1da>] ? do_group_exit+0x75/0x9f
[ 2406.332665]  [<ffffffff8103c213>] ? sys_exit_group+0xf/0xf
[ 2406.333182]  [<ffffffff8158c4b9>] ? system_call_fastpath+0x16/0x1b
[ 2406.333851] Rebooting in 1 seconds..libguestfs: child_cleanup: 0x209d590: child process died
*stdin*:3: libguestfs: error: receive_file_data: parse error in reply callback
*stdin*:3: libguestfs: error: /tmp/guestfishdebug.tgz: error in chunked encoding
libguestfs: closing guestfs handle 0x209d590 (state 0)
Comment 1 Richard W.M. Jones 2012-10-16 10:42:16 EDT
I attempted to reproduce this using a Fedora 19
guest with an ext4 filesystem (note the reporter
said this only affects ext3 filesystems).

guestfish --ro \
  -a /dev/vg_data/F19Rawhidex64 -m /dev/vg_f18rawhidex64/lv_root \
  sh "ps aux" : \
  tar-out / /dev/null : \
  sh "ps aux"

The guestfsd process grew only modestly (1292 -> 1332 KB).

This is using 1.19.51, so there are various features of
my test which are not comparable with what the reporter
found.
Comment 2 Richard W.M. Jones 2012-10-16 10:55:59 EDT
The command used was actually tgz-out, not tar-out.
I tried the above reproducer with s/tar-out/tgz-out/
(still with 1.19) but could not reproduce it.  Will
try on 1.18.
Comment 3 Richard W.M. Jones 2012-10-16 11:29:17 EDT
I couldn't reproduce this with ext3 filesystems
using libguestfs 1.16.32.  guestfsd size before
and after:

/tmp/ps1:root       144  0.0  0.2 97332 1264 ?        S    16:23   0:00 guestfsd
/tmp/ps2:root       144  0.0  0.2 97340 1300 ?        S    16:23   0:00 guestfsd

Next I will try 1.18.

Here is my test program.

------------------------
#!/bin/sh -                                                                     
                                                                                
set -e                                                                          
                                                                                
img=rhel3x64.img                                                                
fs=/dev/sda2                                                                    
extra=-v                                                                        
                                                                                
guestfish --ro -a $img -m $fs $extra <<EOF                                      
sh "ps aux" | cat > /tmp/ps1                                                    
tgz-out / /dev/null                                                             
sh "ps aux" | cat > /tmp/ps2                                                    
EOF                                                                             
                                                                                
grep guestfsd /tmp/ps1 /tmp/ps2                                                 
------------------------
Comment 4 Richard W.M. Jones 2012-10-16 11:36:51 EDT
Could not reproduce on 1.18.9 on Fedora either.

Possibilities include:

 - Debian package that the reporter is using it broken in
   a Debian-specific way.

 - The leak only occurs while the tar command is running.
   (The test script only checks memory usage before and after).
Comment 5 Richard W.M. Jones 2012-10-16 11:47:29 EDT
Reporter is using libguestfs_1.18.8-1 from Debian sid:
http://packages.debian.org/source/sid/libguestfs

I notice that tar creates an error file as it goes
along.  If the tar command was spewing error messages
then that might cause some kind of failure, particularly
if tmp-on-tmpfs.
Comment 6 Richard W.M. Jones 2012-10-17 05:32:55 EDT
I couldn't reproduce this on Debian Wheezy + libguestfs 1:1.18.1-1
using an ext4 guest.  Now trying with ext3 guest.
Comment 7 Richard W.M. Jones 2012-10-17 06:20:16 EDT
Nor on an ext3 guest.  The only possibilities I can
see are:

 - problem in the methodology of the test: the leak only occurs
   while tar is running

 - something about the reporter's guest's ext3 filesystem (eg.
   tar prints lot of warnings?)
Comment 8 infernix 2012-10-17 21:46:39 EDT
I have two entirely different ext3 filesystems on which I can reproduce this when using 512MB ram for the libguestfs VM. The only common factor is that they generate relatively large tgz files (27.5GB and 29.4GB).

Have you tested with really large filesystems like that?

Also, I run a 'debug sh "fsck.ext3 -fy $PARTITION 1>/dev/null 2>/devnull; echo $?"' in a separate libguestfs process and check its return code before I do the tgz backup, so it is a filesystem for which the fsck return code is less than 4.
Comment 9 Richard W.M. Jones 2012-12-11 17:46:59 EST
I'm afraid I still can't reproduce this on Fedora so I'm
scrubbing it for 1.20.  (I did not yet try a very large filesystem)
However I will leave the bug open so we can look at it again
in a future release.

Note You need to log in before you can comment on or make changes to this bug.