Description of problem: In order to find out why GlusterFS FUSE client leaks I would like to use Valgrind's Massif tool (because Memcheck does not show any reasonable leaks). So, I install GlusterFS packages + debug packages and run the following: === valgrind --tool=massif --smc-check=all --trace-children=yes --sim-hints=fuse-compatible /usr/sbin/glusterfs -N --volfile-server=glusterfs.example.com --volfile-id=some_volume /mnt/net/glusterfs/test === This command produces instant output: === ==25482== Massif, a heap profiler ==25482== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote ==25482== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info ==25482== Command: /usr/sbin/glusterfs -N --volfile-server=glusterfs.la.net.ua --volfile-id=mail_boxes /mnt/net/glusterfs/test ==25482== ==25483== ==25484== === Immediately after this I also get 2 files generated by Valgrind: === -rw------- 1 root root 20K вер 6 22:17 massif.out.25483 -rw------- 1 root root 9.1K вер 6 22:17 massif.out.25484 === (both files are attached). Then I start to manipulate files within mounted volume provoking memory to leak. After dancing around and assuming I see memory leaking in top/htop output, I finally decide to unmount volume to get my memory profile: === umount /mnt/net/glusterfs/test === Right after this command is executed, Valgrind shows me the following: === valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed. valgrind: Heap block lo/hi size mismatch: lo = 1, hi = 0. This is probably caused by your program erroneously writing past the end of a heap block and corrupting heap metadata. If you fix any invalid writes reported by Memcheck, this assertion failure will probably go away. Please try that before reporting this as a bug. host stacktrace: ==25482== at 0x3802FC56: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x3802FD84: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x3802FF06: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x3803D5E1: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x3807F6C5: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x380349EF: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x38034D53: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x3808E2D4: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x3808E55A: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0x380B5B0D: ??? (in /usr/lib64/valgrind/massif-amd64-linux) ==25482== by 0xDEADBEEFDEADBEEE: ??? ==25482== by 0xDEADBEEFDEADBEEE: ??? ==25482== by 0xDEADBEEFDEADBEEE: ??? sched status: running_tid=3 Thread 3: status = VgTs_Runnable ==25482== at 0x4C29037: free (in /usr/lib64/valgrind/vgpreload_massif-amd64-linux.so) ==25482== by 0x67CE63B: __libc_freeres (in /usr/lib64/libc-2.17.so) ==25482== by 0x4A246B4: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-amd64-linux.so) ==25482== by 0x66A2E2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so) ==25482== by 0x66A2EB4: exit (in /usr/lib64/libc-2.17.so) ==25482== by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308) ==25482== by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029) ==25482== by 0x606DDC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==25482== by 0x6760CEC: clone (in /usr/lib64/libc-2.17.so) Note: see also the FAQ in the source distribution. It contains workarounds to several common problems. In particular, if Valgrind aborted or crashed after identifying problems in your program, there's a good chance that fixing those problems will prevent Valgrind aborting or crashing, especially if it happened in m_mallocfree.c. If that doesn't help, please report this bug to: www.valgrind.org In the bug report, send all the above text, the valgrind version, and what OS and version you are using. Thanks. === I clearly see some crap happening within 25482 PID. Okay, lets check Valgrind's output: === -rw------- 1 root root 20K вер 6 22:17 massif.out.25483 -rw------- 1 root root 9.1K вер 6 22:17 massif.out.25484 === No changes! Files didn't get updated, and output for misbehaving PID 25482 did not appear! I see 0xDEADBEEFDEADBEEE pattern in Valgrind's output, and that means some memory gets corrupted. Okay, let's re-run Valgrind with Memcheck tool, because this is what output above suggests: === valgrind --leak-check=full --show-leak-kinds=all --log-file="valgrind_fuse.log" /usr/sbin/glusterfs -N --volfile-server=glusterfs.example.com --volfile-id=some_volume /mnt/net/glusterfs/test === valgrind_fuse.log is attached as well. I've noticed there the following warnings/errors for main PID: === ==26441== Thread 7: ==26441== Syscall param writev(vector[...]) points to uninitialised byte(s) ==26441== at 0x675FEA0: writev (in /usr/lib64/libc-2.17.so) ==26441== by 0xE664795: send_fuse_iov (fuse-bridge.c:158) ==26441== by 0xE6649B9: send_fuse_data (fuse-bridge.c:197) ==26441== by 0xE666F7A: fuse_attr_cbk (fuse-bridge.c:753) ==26441== by 0xE6671A6: fuse_root_lookup_cbk (fuse-bridge.c:783) ==26441== by 0x1451A937: io_stats_lookup_cbk (io-stats.c:1512) ==26441== by 0x14301B3E: mdc_lookup_cbk (md-cache.c:867) ==26441== by 0x13EEA226: qr_lookup_cbk (quick-read.c:446) ==26441== by 0x13CD9B66: ioc_lookup_cbk (io-cache.c:260) ==26441== by 0x1346515D: dht_revalidate_cbk (dht-common.c:985) ==26441== by 0x1320F0F0: afr_discover_done (afr-common.c:2429) ==26441== by 0x1320F0F0: afr_discover_cbk (afr-common.c:2474) ==26441== by 0x12F9B6F8: client3_3_lookup_cbk (client-rpc-fops.c:2988) ==26441== Address 0x168b538c is on thread 7's stack ==26441== in frame #3, created by fuse_attr_cbk (fuse-bridge.c:723) ==26441== ==26441== Warning: invalid file descriptor -1 in syscall close() ==26441== Thread 3: ==26441== Invalid free() / delete / delete[] / realloc() ==26441== at 0x4C2AD17: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==26441== by 0x67D663B: __libc_freeres (in /usr/lib64/libc-2.17.so) ==26441== by 0x4A246B4: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-amd64-linux.so) ==26441== by 0x66AAE2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so) ==26441== by 0x66AAEB4: exit (in /usr/lib64/libc-2.17.so) ==26441== by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308) ==26441== by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029) ==26441== by 0x6075DC4: start_thread (in /usr/lib64/libpthread-2.17.so) ==26441== by 0x6768CEC: clone (in /usr/lib64/libc-2.17.so) ==26441== Address 0x6a2d3d0 is 0 bytes inside data symbol "noai6ai_cached" === Could this be the reason for Massif to fail? Version-Release number of selected component (if applicable): GlusterFS 3.7.15, CentOS 7.2. How reproducible: Always. Steps to Reproduce: See above. Actual results: Massif tool does not provide reasonable output. Expected results: I want my memory to be profiled. Additional info: Feel free to ask me for any additional info.
Created attachment 1198408 [details] massif.out.25483
Created attachment 1198409 [details] massif.out.25484
Created attachment 1198410 [details] Memcheck output
Oleksandr, Seems like it could be problem with libc as per the following: http://valgrind.org/docs/manual/manual-core.html#manual-core.rareopts May be you should run with: '--run-libc-freeres=no' I checked the other parameter not initialized error. I checked kernel code and the 'dummy' variable inside the strucutre is not used at all. May be it is used for either giving extra functionality without changing structures/alignment, not sure. Pranith
Yay, nice catch, Pranith! Seems to work now, thanks.