Description of problem: This log attached below was posted by infernix on IRC. It apparently comes when doing a tar-out command on a large ext3 filesystem. The reporter notes that the tarball reaches 7.5 ~ 15 GB, and the filesystem is 63 GB used. Version-Release number of selected component (if applicable): libguestfs 1.18.8 How reproducible: Fairly reproducible. Steps to Reproduce: 1. None yet. Additional info: libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 59 94 20 2c | 27 57 93 f2 | 29 d2 32 c7 | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 2a 3e 7c c9 | 8f d3 cc 8a | 9a 92 89 79 | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 2d 35 12 50 | e6 2f b7 e0 | 96 dd de 6c | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 43 89 f9 43 | a6 13 6b 50 | 49 14 54 92 | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 40 4a 3c 1e | 30 1b cf 76 | 55 a2 e3 b7 | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | b8 92 26 f4 | b5 7b 98 4a | 22 69 8c fb | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 36 fa 20 6b | 6e 94 06 97 | 99 1b 83 4c | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 30 04 d5 72 | d8 c4 72 85 | 67 f8 50 c5 | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | eb b4 9f bd | 3e 3b e9 74 | b1 99 8d fd | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 21 3d 91 e5 | 70 24 9d c5 | 2a 32 4d 3e | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 31 43 45 b4 | 93 68 25 1d | 75 8c 7d 97 | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 3f cf df af | e5 5b e4 61 | fa 4f 29 0e | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | a0 c9 06 7a | 7c e0 4d 12 | 67 85 cd 94 | ... libguestfs: recv_from_daemon: 8200 bytes: 00 00 00 00 | 00 00 20 00 | 62 5b 04 b3 | ed ef b9 db | 7e 61 5f 1a | ... [ 2406.090055] kthreadd invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0 [ 2406.090889] kthreadd cpuset=/ mems_allowed=0 [ 2406.091302] Pid: 2, comm: kthreadd Not tainted 3.5.3 #1 [ 2406.091797] Call Trace: [ 2406.092045] [<ffffffff810b41a0>] ? dump_header.isra.11+0x6b/0x187 [ 2406.092631] [<ffffffff81072491>] ? arch_local_irq_save+0x11/0x17 [ 2406.093204] [<ffffffff815864ac>] ? _raw_spin_lock_irqsave+0x9/0x25 [ 2406.093796] [<ffffffff81306d3a>] ? ___ratelimit+0xde/0xec [ 2406.094312] [<ffffffff810b43e9>] ? oom_kill_process.part.13.constprop.14+0x4b/0x25e [ 2406.095054] [<ffffffff81040898>] ? has_ns_capability_noaudit+0xd/0x14 [ 2406.095665] [<ffffffff810b4b63>] ? out_of_memory+0x442/0x4a4 [ 2406.096211] [<ffffffff810b86d0>] ? __alloc_pages_nodemask+0x640/0x7cf [ 2406.096830] [<ffffffff81035ab6>] ? copy_process+0x10b/0x1181 [ 2406.097381] [<ffffffff8105b264>] ? check_preempt_curr+0x57/0x64 [ 2406.097948] [<ffffffff81036c2f>] ? do_fork+0xe2/0x24a [ 2406.098437] [<ffffffff8102b9e4>] ? pvclock_clocksource_read+0x42/0xb2 [ 2406.099054] [<ffffffff8105f36b>] ? set_next_entity+0x32/0x52 [ 2406.099603] [<ffffffff81014618>] ? kernel_thread+0x80/0x88 [ 2406.100145] [<ffffffff8105093f>] ? kthread_freezable_should_stop+0x37/0x37 [ 2406.100808] [<ffffffff8158d7a0>] ? gs_change+0x13/0x13 [ 2406.101299] [<ffffffff81050ce9>] ? kthreadd+0xed/0x12c [ 2406.101790] [<ffffffff81059646>] ? finish_task_switch+0x89/0xc7 [ 2406.102353] [<ffffffff8158d7a4>] ? kernel_thread_helper+0x4/0x10 [ 2406.102934] [<ffffffff81050bfc>] ? tsk_fork_get_node+0x1a/0x1a [ 2406.103491] [<ffffffff8158d7a0>] ? gs_change+0x13/0x13 [ 2406.103983] Mem-Info: [ 2406.104207] Node 0 DMA per-cpu: [ 2406.104539] CPU 0: hi: 0, btch: 1 usd: 0 [ 2406.104993] Node 0 DMA32 per-cpu: [ 2406.105326] CPU 0: hi: 186, btch: 31 usd: 0 [ 2406.105781] active_anon:518 inactive_anon:7 isolated_anon:0 [ 2406.105781] active_file:8 inactive_file:17 isolated_file:0 [ 2406.105781] unevictable:472 dirty:0 writeback:0 unstable:0 [ 2406.105781] free:1509 slab_reclaimable:118236 slab_unreclaimable:1089 [ 2406.105781] mapped:3 shmem:12 pagetables:101 bounce:0 [ 2406.108408] Node 0 DMA free:1988kB min:88kB low:108kB high:132kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15636kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:13892kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 2406.111833] lowmem_reserve[]: 0 476 476 476 [ 2406.112303] Node 0 DMA32 free:4048kB min:2744kB low:3428kB high:4116kB active_anon:2072kB inactive_anon:28kB active_file:32kB inactive_file:56kB unevictable:1888kB isolated(anon):0kB isolated(file):0kB present:487864kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:48kB slab_reclaimable:459052kB slab_unreclaimable:4356kB kernel_stack:328kB pagetables:404kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1982 all_unreclaimable? yes [ 2406.115942] lowmem_reserve[]: 0 0 0 0 [ 2406.116374] Node 0 DMA: 1*4kB 168*8kB 36*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1988kB [ 2406.117562] Node 0 DMA32: 674*4kB 7*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 4048kB [ 2406.118767] 508 total pagecache pages [ 2406.119115] 0 pages in swap cache [ 2406.119432] Swap cache stats: add 0, delete 0, find 0/0 [ 2406.119922] Free swap = 0kB [ 2406.120215] Total swap = 0kB [ 2406.121595] 127982 pages RAM [ 2406.121881] 4579 pages reserved [ 2406.122180] 117 pages shared [ 2406.122464] 121777 pages non-shared [ 2406.122796] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 2406.123498] [ 71] 0 71 5308 90 0 -17 -1000 udevd [ 2406.124199] [ 99] 0 99 5307 90 0 -17 -1000 udevd [ 2406.124907] [ 100] 0 100 5307 90 0 -17 -1000 udevd [ 2406.125606] [ 154] 0 154 24876 73 0 0 0 guestfsd [ 2406.126348] [ 160] 0 160 4326 130 0 0 0 tar [ 2406.127032] [ 161] 0 161 2146 108 0 0 0 gzip [ 2406.127719] Out of memory: Kill process 154 (guestfsd) score 0 or sacrifice child [ 2406.128415] Killed process 154 (guestfsd) total-vm:99504kB, anon-rss:292kB, file-rss:0kB [ 2406.130070] guestfsd: page allocation failure: order:2, mode:0x40d0 [ 2406.130662] Pid: 154, comm: guestfsd Not tainted 3.5.3 #1 [ 2406.131167] Call Trace: [ 2406.131407] [<ffffffff810b5a9f>] ? warn_alloc_failed+0x11a/0x12d [ 2406.131979] [<ffffffff81585cd2>] ? _cond_resched+0x7/0x1c [ 2406.132494] [<ffffffff810b87b9>] ? __alloc_pages_nodemask+0x729/0x7cf [ 2406.133110] [<ffffffff810e23b0>] ? alloc_pages_current+0xc7/0xe4 [ 2406.133685] [<ffffffff810b53d9>] ? __get_free_pages+0x6/0x34 [ 2406.134222] [<ffffffff810e76f1>] ? kmalloc_order_trace+0x22/0x56 [ 2406.134798] [<ffffffff8139d766>] ? port_fops_write+0x163/0x1b7 [ 2406.135357] [<ffffffff812c2046>] ? security_file_permission+0x15/0x2b [ 2406.135973] [<ffffffff810f30e3>] ? vfs_write+0xa2/0xe9 [ 2406.136466] [<ffffffff810f32c0>] ? sys_write+0x45/0x6b [ 2406.136967] [<ffffffff8158c4b9>] ? system_call_fastpath+0x16/0x1b [ 2406.137546] Mem-Info: [ 2406.137773] Node 0 DMA per-cpu: [ 2406.138086] CPU 0: hi: 0, btch: 1 usd: 0 [ 2406.138538] Node 0 DMA32 per-cpu: [ 2406.138868] CPU 0: hi: 186, btch: 31 usd: 0 [ 2406.139319] active_anon:518 inactive_anon:7 isolated_anon:0 [ 2406.139319] active_file:8 inactive_file:17 isolated_file:0 [ 2406.139319] unevictable:472 dirty:0 writeback:0 unstable:0 [ 2406.139319] free:1509 slab_reclaimable:118236 slab_unreclaimable:1089 [ 2406.139319] mapped:3 shmem:12 pagetables:101 bounce:0 [ 2406.141957] Node 0 DMA free:1988kB min:88kB low:108kB high:132kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15636kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:13892kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 2406.145379] lowmem_reserve[]: 0 476 476 476 [ 2406.145846] Node 0 DMA32 free:4048kB min:2744kB low:3428kB high:4116kB active_anon:2072kB inactive_anon:28kB active_file:32kB inactive_file:56kB unevictable:1888kB isolated(anon):0kB isolated(file):0kB present:487864kB mlocked:0kB dirty:0kB writeback:0kB mapped:12kB shmem:48kB slab_reclaimable:459052kB slab_unreclaimable:4356kB kernel_stack:328kB pagetables:404kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1982 all_unreclaimable? yes [ 2406.149443] lowmem_reserve[]: 0 0 0 0 [ 2406.149865] Node 0 DMA: 1*4kB 168*8kB 36*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1988kB [ 2406.151060] Node 0 DMA32: 674*4kB 7*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 4048kB [ 2406.152245] 508 total pagecache pages [ 2406.152597] 0 pages in swap cache [ 2406.152911] Swap cache stats: add 0, delete 0, find 0/0 [ 2406.153396] Free swap = 0kB [ 2406.153675] Total swap = 0kB [ 2406.155039] 127982 pages RAM [ 2406.155327] 4579 pages reserved [ 2406.155627] 117 pages shared [ 2406.155901] 121777 pages non-shared Killed tar: -: Wrote only 4096 of 10240 bytes tar: Error is not recoverable: exiting now /init: 161: /init: reboot: not found [ 2406.329592] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00 [ 2406.329592] [ 2406.330467] Pid: 1, comm: init Not tainted 3.5.3 #1 [ 2406.330933] Call Trace: [ 2406.331178] [<ffffffff8157c01f>] ? panic+0xc1/0x1c7 [ 2406.331645] [<ffffffff8103bb46>] ? do_exit+0x397/0x7aa [ 2406.332137] [<ffffffff8103c1da>] ? do_group_exit+0x75/0x9f [ 2406.332665] [<ffffffff8103c213>] ? sys_exit_group+0xf/0xf [ 2406.333182] [<ffffffff8158c4b9>] ? system_call_fastpath+0x16/0x1b [ 2406.333851] Rebooting in 1 seconds..libguestfs: child_cleanup: 0x209d590: child process died *stdin*:3: libguestfs: error: receive_file_data: parse error in reply callback *stdin*:3: libguestfs: error: /tmp/guestfishdebug.tgz: error in chunked encoding libguestfs: closing guestfs handle 0x209d590 (state 0)
I attempted to reproduce this using a Fedora 19 guest with an ext4 filesystem (note the reporter said this only affects ext3 filesystems). guestfish --ro \ -a /dev/vg_data/F19Rawhidex64 -m /dev/vg_f18rawhidex64/lv_root \ sh "ps aux" : \ tar-out / /dev/null : \ sh "ps aux" The guestfsd process grew only modestly (1292 -> 1332 KB). This is using 1.19.51, so there are various features of my test which are not comparable with what the reporter found.
The command used was actually tgz-out, not tar-out. I tried the above reproducer with s/tar-out/tgz-out/ (still with 1.19) but could not reproduce it. Will try on 1.18.
I couldn't reproduce this with ext3 filesystems using libguestfs 1.16.32. guestfsd size before and after: /tmp/ps1:root 144 0.0 0.2 97332 1264 ? S 16:23 0:00 guestfsd /tmp/ps2:root 144 0.0 0.2 97340 1300 ? S 16:23 0:00 guestfsd Next I will try 1.18. Here is my test program. ------------------------ #!/bin/sh - set -e img=rhel3x64.img fs=/dev/sda2 extra=-v guestfish --ro -a $img -m $fs $extra <<EOF sh "ps aux" | cat > /tmp/ps1 tgz-out / /dev/null sh "ps aux" | cat > /tmp/ps2 EOF grep guestfsd /tmp/ps1 /tmp/ps2 ------------------------
Could not reproduce on 1.18.9 on Fedora either. Possibilities include: - Debian package that the reporter is using it broken in a Debian-specific way. - The leak only occurs while the tar command is running. (The test script only checks memory usage before and after).
Reporter is using libguestfs_1.18.8-1 from Debian sid: http://packages.debian.org/source/sid/libguestfs I notice that tar creates an error file as it goes along. If the tar command was spewing error messages then that might cause some kind of failure, particularly if tmp-on-tmpfs.
I couldn't reproduce this on Debian Wheezy + libguestfs 1:1.18.1-1 using an ext4 guest. Now trying with ext3 guest.
Nor on an ext3 guest. The only possibilities I can see are: - problem in the methodology of the test: the leak only occurs while tar is running - something about the reporter's guest's ext3 filesystem (eg. tar prints lot of warnings?)
I have two entirely different ext3 filesystems on which I can reproduce this when using 512MB ram for the libguestfs VM. The only common factor is that they generate relatively large tgz files (27.5GB and 29.4GB). Have you tested with really large filesystems like that? Also, I run a 'debug sh "fsck.ext3 -fy $PARTITION 1>/dev/null 2>/devnull; echo $?"' in a separate libguestfs process and check its return code before I do the tgz backup, so it is a filesystem for which the fsck return code is less than 4.
I'm afraid I still can't reproduce this on Fedora so I'm scrubbing it for 1.20. (I did not yet try a very large filesystem) However I will leave the bug open so we can look at it again in a future release.