Bug 1373630 - Unable to profile GlusterFS FUSE client with Valgrind's Massif tool
Summary: Unable to profile GlusterFS FUSE client with Valgrind's Massif tool
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: fuse
Version: 3.7.15
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1372729
TreeView+ depends on / blocked
 
Reported: 2016-09-06 19:28 UTC by Oleksandr Natalenko
Modified: 2016-09-06 20:41 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-06 20:41:57 UTC
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
massif.out.25483 (19.89 KB, text/plain)
2016-09-06 19:30 UTC, Oleksandr Natalenko
no flags Details
massif.out.25484 (9.01 KB, text/plain)
2016-09-06 19:31 UTC, Oleksandr Natalenko
no flags Details
Memcheck output (575.04 KB, text/plain)
2016-09-06 19:31 UTC, Oleksandr Natalenko
no flags Details

Description Oleksandr Natalenko 2016-09-06 19:28:32 UTC
Description of problem:

In order to find out why GlusterFS FUSE client leaks I would like to use Valgrind's Massif tool (because Memcheck does not show any reasonable leaks). So, I install GlusterFS packages + debug packages and run the following:

===
valgrind --tool=massif --smc-check=all --trace-children=yes --sim-hints=fuse-compatible /usr/sbin/glusterfs -N --volfile-server=glusterfs.example.com --volfile-id=some_volume /mnt/net/glusterfs/test
===

This command produces instant output:

===
==25482== Massif, a heap profiler
==25482== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote
==25482== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==25482== Command: /usr/sbin/glusterfs -N --volfile-server=glusterfs.la.net.ua --volfile-id=mail_boxes /mnt/net/glusterfs/test
==25482== 
==25483== 
==25484==
===

Immediately after this I also get 2 files generated by Valgrind:

===
-rw------- 1 root root  20K вер  6 22:17 massif.out.25483
-rw------- 1 root root 9.1K вер  6 22:17 massif.out.25484
===

(both files are attached).

Then I start to manipulate files within mounted volume provoking memory to leak. After dancing around and assuming I see memory leaking in top/htop output, I finally decide to unmount volume to get my memory profile:

===
umount /mnt/net/glusterfs/test
===

Right after this command is executed, Valgrind shows me the following:

===
valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.                                                 valgrind: Heap block lo/hi size mismatch: lo = 1, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.                                                                    

host stacktrace:
==25482==    at 0x3802FC56: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3802FD84: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3802FF06: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3803D5E1: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3807F6C5: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x380349EF: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x38034D53: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3808E2D4: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3808E55A: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x380B5B0D: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0xDEADBEEFDEADBEEE: ???
==25482==    by 0xDEADBEEFDEADBEEE: ???
==25482==    by 0xDEADBEEFDEADBEEE: ???

sched status:                                                                                                                           running_tid=3

Thread 3: status = VgTs_Runnable
==25482==    at 0x4C29037: free (in /usr/lib64/valgrind/vgpreload_massif-amd64-linux.so)
==25482==    by 0x67CE63B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==25482==    by 0x4A246B4: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==25482==    by 0x66A2E2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==25482==    by 0x66A2EB4: exit (in /usr/lib64/libc-2.17.so)
==25482==    by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==25482==    by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029)
==25482==    by 0x606DDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==25482==    by 0x6760CEC: clone (in /usr/lib64/libc-2.17.so)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.
===

I clearly see some crap happening within 25482 PID. Okay, lets check Valgrind's output:

===
-rw------- 1 root root  20K вер  6 22:17 massif.out.25483
-rw------- 1 root root 9.1K вер  6 22:17 massif.out.25484
===

No changes! Files didn't get updated, and output for misbehaving PID 25482 did not appear!

I see 0xDEADBEEFDEADBEEE pattern in Valgrind's output, and that means some memory gets corrupted.

Okay, let's re-run Valgrind with Memcheck tool, because this is what output above suggests:

===
valgrind --leak-check=full --show-leak-kinds=all --log-file="valgrind_fuse.log" /usr/sbin/glusterfs -N --volfile-server=glusterfs.example.com --volfile-id=some_volume /mnt/net/glusterfs/test
===

valgrind_fuse.log is attached as well. I've noticed there the following warnings/errors for main PID:

===
==26441== Thread 7:
==26441== Syscall param writev(vector[...]) points to uninitialised byte(s)
==26441==    at 0x675FEA0: writev (in /usr/lib64/libc-2.17.so)
==26441==    by 0xE664795: send_fuse_iov (fuse-bridge.c:158)
==26441==    by 0xE6649B9: send_fuse_data (fuse-bridge.c:197)
==26441==    by 0xE666F7A: fuse_attr_cbk (fuse-bridge.c:753)
==26441==    by 0xE6671A6: fuse_root_lookup_cbk (fuse-bridge.c:783)
==26441==    by 0x1451A937: io_stats_lookup_cbk (io-stats.c:1512)
==26441==    by 0x14301B3E: mdc_lookup_cbk (md-cache.c:867)
==26441==    by 0x13EEA226: qr_lookup_cbk (quick-read.c:446)
==26441==    by 0x13CD9B66: ioc_lookup_cbk (io-cache.c:260)
==26441==    by 0x1346515D: dht_revalidate_cbk (dht-common.c:985)
==26441==    by 0x1320F0F0: afr_discover_done (afr-common.c:2429)
==26441==    by 0x1320F0F0: afr_discover_cbk (afr-common.c:2474)
==26441==    by 0x12F9B6F8: client3_3_lookup_cbk (client-rpc-fops.c:2988)
==26441==  Address 0x168b538c is on thread 7's stack
==26441==  in frame #3, created by fuse_attr_cbk (fuse-bridge.c:723)
==26441== 
==26441== Warning: invalid file descriptor -1 in syscall close()
==26441== Thread 3:
==26441== Invalid free() / delete / delete[] / realloc()
==26441==    at 0x4C2AD17: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==26441==    by 0x67D663B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==26441==    by 0x4A246B4: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==26441==    by 0x66AAE2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==26441==    by 0x66AAEB4: exit (in /usr/lib64/libc-2.17.so)
==26441==    by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==26441==    by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029)
==26441==    by 0x6075DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==26441==    by 0x6768CEC: clone (in /usr/lib64/libc-2.17.so)
==26441==  Address 0x6a2d3d0 is 0 bytes inside data symbol "noai6ai_cached"
===

Could this be the reason for Massif to fail?

Version-Release number of selected component (if applicable):

GlusterFS 3.7.15, CentOS 7.2.

How reproducible:

Always.

Steps to Reproduce:

See above.

Actual results:

Massif tool does not provide reasonable output.

Expected results:

I want my memory to be profiled.

Additional info:

Feel free to ask me for any additional info.

Comment 1 Oleksandr Natalenko 2016-09-06 19:30:47 UTC
Created attachment 1198408 [details]
massif.out.25483

Comment 2 Oleksandr Natalenko 2016-09-06 19:31:08 UTC
Created attachment 1198409 [details]
massif.out.25484

Comment 3 Oleksandr Natalenko 2016-09-06 19:31:28 UTC
Created attachment 1198410 [details]
Memcheck output

Comment 4 Pranith Kumar K 2016-09-06 20:19:55 UTC
Oleksandr,
   Seems like it could be problem with libc as per the following:
http://valgrind.org/docs/manual/manual-core.html#manual-core.rareopts

May be you should run with: '--run-libc-freeres=no'

I checked the other parameter not initialized error. I checked kernel code and the 'dummy' variable inside the strucutre is not used at all. May be it is used for either giving extra functionality without changing structures/alignment, not sure.

Pranith

Comment 5 Oleksandr Natalenko 2016-09-06 20:41:57 UTC
Yay, nice catch, Pranith! Seems to work now, thanks.


Note You need to log in before you can comment on or make changes to this bug.