1373630 – Unable to profile GlusterFS FUSE client with Valgrind's Massif tool

Bug 1373630 - Unable to profile GlusterFS FUSE client with Valgrind's Massif tool

Summary: Unable to profile GlusterFS FUSE client with Valgrind's Massif tool

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	fuse
Sub Component:
Version:	3.7.15
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1372729
TreeView+	depends on / blocked

Reported:	2016-09-06 19:28 UTC by Oleksandr Natalenko
Modified:	2016-09-06 20:41 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-09-06 20:41:57 UTC
Regression:	---
Mount Type:	fuse
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
massif.out.25483 (19.89 KB, text/plain) 2016-09-06 19:30 UTC, Oleksandr Natalenko	no flags	Details
massif.out.25484 (9.01 KB, text/plain) 2016-09-06 19:31 UTC, Oleksandr Natalenko	no flags	Details
Memcheck output (575.04 KB, text/plain) 2016-09-06 19:31 UTC, Oleksandr Natalenko	no flags	Details
View All

Description Oleksandr Natalenko 2016-09-06 19:28:32 UTC

Description of problem:

In order to find out why GlusterFS FUSE client leaks I would like to use Valgrind's Massif tool (because Memcheck does not show any reasonable leaks). So, I install GlusterFS packages + debug packages and run the following:

===
valgrind --tool=massif --smc-check=all --trace-children=yes --sim-hints=fuse-compatible /usr/sbin/glusterfs -N --volfile-server=glusterfs.example.com --volfile-id=some_volume /mnt/net/glusterfs/test
===

This command produces instant output:

===
==25482== Massif, a heap profiler
==25482== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote
==25482== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==25482== Command: /usr/sbin/glusterfs -N --volfile-server=glusterfs.la.net.ua --volfile-id=mail_boxes /mnt/net/glusterfs/test
==25482== 
==25483== 
==25484==
===

Immediately after this I also get 2 files generated by Valgrind:

===
-rw------- 1 root root  20K вер  6 22:17 massif.out.25483
-rw------- 1 root root 9.1K вер  6 22:17 massif.out.25484
===

(both files are attached).

Then I start to manipulate files within mounted volume provoking memory to leak. After dancing around and assuming I see memory leaking in top/htop output, I finally decide to unmount volume to get my memory profile:

===
umount /mnt/net/glusterfs/test
===

Right after this command is executed, Valgrind shows me the following:

===
valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.                                                 valgrind: Heap block lo/hi size mismatch: lo = 1, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.                                                                    

host stacktrace:
==25482==    at 0x3802FC56: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3802FD84: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3802FF06: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3803D5E1: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3807F6C5: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x380349EF: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x38034D53: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3808E2D4: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x3808E55A: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0x380B5B0D: ??? (in /usr/lib64/valgrind/massif-amd64-linux)
==25482==    by 0xDEADBEEFDEADBEEE: ???
==25482==    by 0xDEADBEEFDEADBEEE: ???
==25482==    by 0xDEADBEEFDEADBEEE: ???

sched status:                                                                                                                           running_tid=3

Thread 3: status = VgTs_Runnable
==25482==    at 0x4C29037: free (in /usr/lib64/valgrind/vgpreload_massif-amd64-linux.so)
==25482==    by 0x67CE63B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==25482==    by 0x4A246B4: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==25482==    by 0x66A2E2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==25482==    by 0x66A2EB4: exit (in /usr/lib64/libc-2.17.so)
==25482==    by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==25482==    by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029)
==25482==    by 0x606DDC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==25482==    by 0x6760CEC: clone (in /usr/lib64/libc-2.17.so)


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.
===

I clearly see some crap happening within 25482 PID. Okay, lets check Valgrind's output:

===
-rw------- 1 root root  20K вер  6 22:17 massif.out.25483
-rw------- 1 root root 9.1K вер  6 22:17 massif.out.25484
===

No changes! Files didn't get updated, and output for misbehaving PID 25482 did not appear!

I see 0xDEADBEEFDEADBEEE pattern in Valgrind's output, and that means some memory gets corrupted.

Okay, let's re-run Valgrind with Memcheck tool, because this is what output above suggests:

===
valgrind --leak-check=full --show-leak-kinds=all --log-file="valgrind_fuse.log" /usr/sbin/glusterfs -N --volfile-server=glusterfs.example.com --volfile-id=some_volume /mnt/net/glusterfs/test
===

valgrind_fuse.log is attached as well. I've noticed there the following warnings/errors for main PID:

===
==26441== Thread 7:
==26441== Syscall param writev(vector[...]) points to uninitialised byte(s)
==26441==    at 0x675FEA0: writev (in /usr/lib64/libc-2.17.so)
==26441==    by 0xE664795: send_fuse_iov (fuse-bridge.c:158)
==26441==    by 0xE6649B9: send_fuse_data (fuse-bridge.c:197)
==26441==    by 0xE666F7A: fuse_attr_cbk (fuse-bridge.c:753)
==26441==    by 0xE6671A6: fuse_root_lookup_cbk (fuse-bridge.c:783)
==26441==    by 0x1451A937: io_stats_lookup_cbk (io-stats.c:1512)
==26441==    by 0x14301B3E: mdc_lookup_cbk (md-cache.c:867)
==26441==    by 0x13EEA226: qr_lookup_cbk (quick-read.c:446)
==26441==    by 0x13CD9B66: ioc_lookup_cbk (io-cache.c:260)
==26441==    by 0x1346515D: dht_revalidate_cbk (dht-common.c:985)
==26441==    by 0x1320F0F0: afr_discover_done (afr-common.c:2429)
==26441==    by 0x1320F0F0: afr_discover_cbk (afr-common.c:2474)
==26441==    by 0x12F9B6F8: client3_3_lookup_cbk (client-rpc-fops.c:2988)
==26441==  Address 0x168b538c is on thread 7's stack
==26441==  in frame #3, created by fuse_attr_cbk (fuse-bridge.c:723)
==26441== 
==26441== Warning: invalid file descriptor -1 in syscall close()
==26441== Thread 3:
==26441== Invalid free() / delete / delete[] / realloc()
==26441==    at 0x4C2AD17: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==26441==    by 0x67D663B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==26441==    by 0x4A246B4: _vgnU_freeres (in /usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==26441==    by 0x66AAE2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==26441==    by 0x66AAEB4: exit (in /usr/lib64/libc-2.17.so)
==26441==    by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==26441==    by 0x111914: glusterfs_sigwaiter (glusterfsd.c:2029)
==26441==    by 0x6075DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
==26441==    by 0x6768CEC: clone (in /usr/lib64/libc-2.17.so)
==26441==  Address 0x6a2d3d0 is 0 bytes inside data symbol "noai6ai_cached"
===

Could this be the reason for Massif to fail?

Version-Release number of selected component (if applicable):

GlusterFS 3.7.15, CentOS 7.2.

How reproducible:

Always.

Steps to Reproduce:

See above.

Actual results:

Massif tool does not provide reasonable output.

Expected results:

I want my memory to be profiled.

Additional info:

Feel free to ask me for any additional info.

Comment 1 Oleksandr Natalenko 2016-09-06 19:30:47 UTC

Created attachment 1198408 [details]
massif.out.25483

Comment 2 Oleksandr Natalenko 2016-09-06 19:31:08 UTC

Created attachment 1198409 [details]
massif.out.25484

Comment 3 Oleksandr Natalenko 2016-09-06 19:31:28 UTC

Created attachment 1198410 [details]
Memcheck output

Comment 4 Pranith Kumar K 2016-09-06 20:19:55 UTC

Oleksandr,
   Seems like it could be problem with libc as per the following:
http://valgrind.org/docs/manual/manual-core.html#manual-core.rareopts

May be you should run with: '--run-libc-freeres=no'

I checked the other parameter not initialized error. I checked kernel code and the 'dummy' variable inside the strucutre is not used at all. May be it is used for either giving extra functionality without changing structures/alignment, not sure.

Pranith

Comment 5 Oleksandr Natalenko 2016-09-06 20:41:57 UTC

Yay, nice catch, Pranith! Seems to work now, thanks.

Note You need to log in before you can comment on or make changes to this bug.