| Summary: | GNFS from mainline Glusterfs-3.1-qa13 crashes while initiating SFS2008 | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Prithu Tiwari <prithu> |
| Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 3.1-alpha | CC: | gluster-bugs, prithu |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | nfs |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Prithu Tiwari
2010-09-01 09:20:04 UTC
SFS crashed while initializing the test.
tail of sfsc003.r4k-2 file
-------------------------------------------------------------------------------
Wed Sep 1 03:56:53 2010 Completed.
Wed Sep 1 03:56:53 2010 Sending DONE-MOUNT message to Prime Client(client3.gluster.priv).
Wed Sep 1 03:56:53 2010 Completed.
Wed Sep 1 03:56:53 2010 Waiting on DO-INIT message from Prime Client(client3.gluster.priv).
Wed Sep 1 03:56:58 2010 Received.
Wed Sep 1 03:56:58 2010 Initializing test directories.
Wed Sep 1 03:56:58 2010 Child 2 will create 1926 directories.
Wed Sep 1 03:56:58 2010 Child 3 will create 1926 directories.
Wed Sep 1 03:56:58 2010 Child 1 will create 1926 directories.
Wed Sep 1 03:56:58 2010 Child 0 will create 1926 directories.
Wed Sep 1 03:57:08 2010 Child 0 finished creating 1926 directories.
Wed Sep 1 03:57:08 2010 Child 0 will create 58925 files.
Wed Sep 1 03:57:08 2010 Child 1 finished creating 1926 directories.
Wed Sep 1 03:57:08 2010 Child 1 will create 58925 files.
lad_write() RPC call failed : RPC: Unable to receive
sfsnfs30: error in lad_write() at 168 for f0yeeeyd.d
sfsnfs30: sending Pid 0 Signal 2
sfsnfs3: caught unexpected SIGCHLD.
A child (PID: 19549) has abnormally exited with status of 65
Tell Prime to stop the benchmark.
Terminating all child processes.
sfsnfs3: sending Pid 0 Signal 2
------------------------------------------------------------------------------
Checked at the GNFS server and found GNFS had crashed with core dump.
the back-trace of core was :
------------------------------------------------------------------------------
(gdb) bt
#0 0x00000032df630265 in raise () from /lib64/libc.so.6
#1 0x00000032df631d10 in abort () from /lib64/libc.so.6
#2 0x00000032df6296e6 in __assert_fail () from /lib64/libc.so.6
#3 0x00002b51f378ac7d in __gf_free (free_ptr=<value optimized out>) at mem-pool.c:252
#4 0x00002aaaab362052 in xdr_free_write3args_nocopy (wa=0x2aaab42ef558) at ../../../../xlators/nfs/lib/src/xdr-nfs3.c:1894
#5 0x00002aaaab351270 in nfs3svc_write_vec (req=0x16661938, iob=<value optimized out>) at nfs3.c:1931
#6 0x00002aaaab36120d in nfs_rpcsvc_record_vectored_call_actor (conn=0x161da278) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2206
#7 0x00002aaaab361a66 in nfs_rpcsvc_update_vectored_state (conn=0x161da278) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2263
#8 0x00002aaaab361e38 in nfs_rpcsvc_record_update_state (conn=0x161da278, dataread=1328) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2359
#9 0x00002aaaab362028 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=19119, data=0x161da278, poll_in=1, poll_out=128,
poll_err=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2557
#10 0x00002b51f3789d77 in event_dispatch_epoll (event_pool=0x161cf7b8) at event.c:812
#11 0x00002aaaab360492 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64
#12 0x00000032dfe06617 in start_thread () from /lib64/libpthread.so.0
#13 0x00000032df6d3c2d in clone () from /lib64/libc.so.6
-------------------------------------------------------------------------------
The crash did not occur when I ran GNFS with following command ./dsh tc4 "export GLUSTERFS_DISABLE_MEM_ACCT=1;/opt/gnfs/sbin/glusterfs -f /share/shehjart/volfiles/gnfs-1v-4d.vol -l /tmp/gnnnn3" The SFS test ran to completion with all 5 iterations finishing. Performance is slightly slower(less) but need to investigate more before it could be confirmed. Prithu, Please run another test with the following patch: http://dev.gluster.com/~shehjart/0001-nfs3-Free-vectored-write-args-using-FREE-not-GF_FREE.patch Do not export the environment variable otherwise this patch will not have any affect. I can confirm this happens on a simple compilebench load also: configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.0git /lib64/libc.so.6[0x32d5c302d0] /lib64/libc.so.6(gsignal+0x35)[0x32d5c30265] /lib64/libc.so.6(abort+0x110)[0x32d5c31d10] /lib64/libc.so.6(__assert_fail+0xf6)[0x32d5c296e6] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0(__gf_free+0x11d)[0x2b9d51b5dc7d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(xdr_free_write3args_nocopy+0x12)[0x2aaaabbda072] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_write_vec+0x90)[0x2aaaabbc9290] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_vectored_call_actor+0x6d)[0x2aaaabbd922d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_update_vectored_state+0xf6)[0x2aaaabbd9a86] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_update_state+0x1d8)[0x2aaaabbd9e58] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_conn_data_handler+0x78)[0x2aaaabbda048] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0[0x2b9d51b5cd77] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_stage_proc+0x12)[0x2aaaabbd84b2] /lib64/libpthread.so.0[0x32d680673d] /lib64/libc.so.6(clone+0x6d)[0x32d5cd3d1d] Compilebench command: configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.0git /lib64/libc.so.6[0x32d5c302d0] /lib64/libc.so.6(gsignal+0x35)[0x32d5c30265] /lib64/libc.so.6(abort+0x110)[0x32d5c31d10] /lib64/libc.so.6(__assert_fail+0xf6)[0x32d5c296e6] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0(__gf_free+0x11d)[0x2b9d51b5dc7d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(xdr_free_write3args_nocopy+0x12)[0x2aaaabbda072] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_write_vec+0x90)[0x2aaaabbc9290] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_vectored_call_actor+0x6d)[0x2aaaabbd922d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_update_vectored_state+0xf6)[0x2aaaabbd9a86] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_update_state+0x1d8)[0x2aaaabbd9e58] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_conn_data_handler+0x78)[0x2aaaabbda048] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0[0x2b9d51b5cd77] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_stage_proc+0x12)[0x2aaaabbd84b2] /lib64/libpthread.so.0[0x32d680673d] /lib64/libc.so.6(clone+0x6d)[0x32d5cd3d1d] Config is a 4x3 distributed-replicated config. PATCH: http://patches.gluster.com/patch/4482 in master (nfs3: Free vectored write args using FREE not GF_FREE) Prithu, do a git pull on mainline. The patch in the previous message fixes this problem. I did a patch of the code(qa13) using the patch sent but the GNFS crashed again with following back-trace
-------------------------------------------------------------------------------
#0 0x00000032df630265 in raise () from /lib64/libc.so.6
#1 0x00000032df631d10 in abort () from /lib64/libc.so.6
#2 0x00000032df66a84b in __libc_message () from /lib64/libc.so.6
#3 0x00000032df6722ef in _int_free () from /lib64/libc.so.6
#4 0x00000032df67273b in free () from /lib64/libc.so.6
#5 0x00002aaaab3512b0 in nfs3svc_write_vec (req=0x2aaaaf6b5038, iob=<value optimized out>) at nfs3.c:1931
#6 0x00002aaaab36124d in nfs_rpcsvc_record_vectored_call_actor (conn=0x1e171048) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2206
#7 0x00002aaaab361aa6 in nfs_rpcsvc_update_vectored_state (conn=0x1e171048) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2263
#8 0x00002aaaab361e78 in nfs_rpcsvc_record_update_state (conn=0x1e171048, dataread=1328) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2359
#9 0x00002aaaab362068 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=16109, data=0x1e171048, poll_in=1, poll_out=128,
poll_err=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2557
#10 0x00002b1e88667d77 in event_dispatch_epoll (event_pool=0x1e15a7b8) at event.c:812
#11 0x00002aaaab3604d2 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64
#12 0x00000032dfe06617 in start_thread () from /lib64/libpthread.so.0
#13 0x00000032df6d3c2d in clone () from /lib64/libc.so.6
---------------------------------------------------------------------------------
But I would now try with git-pull from the mainline.
(In reply to comment #7) > I did a patch of the code(qa13) using the patch sent but the GNFS crashed again > with following back-trace Yes. Thats expected. Mainline has it fixed. I used the Glusterfs-mainline from the git. The SFS still crashed the GNFS with
the following backtrace
----------------------------------------------------------------------
#0 0x00000032df630265 in raise () from /lib64/libc.so.6
#1 0x00000032df631d10 in abort () from /lib64/libc.so.6
#2 0x00000032df6296e6 in __assert_fail () from /lib64/libc.so.6
#3 0x00002b5171057130 in gf_mem_set_acct_info (xl=0x2b5171281560, alloc_ptr=0x40b63fa0, size=48, type=73) at mem-pool.c:88
#4 0x00002b5171057472 in __gf_calloc (nmemb=1, size=<value optimized out>, type=73) at mem-pool.c:140
#5 0x00002aaaab33deb4 in nfs3svc_write_vecsizer (req=0x2aaaaf82c428, readsize=0x40b64010, newbuf=0x40b6401c) at nfs3.c:1902
#6 0x00002aaaab3542db in nfs_rpcsvc_handle_vectored_rpc_call (conn=0x1019ccf8) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2144
#7 0x00002aaaab3548c4 in nfs_rpcsvc_update_vectored_state (conn=0x20fd) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2254
#8 0x00002aaaab354bfc in nfs_rpcsvc_record_update_state (conn=0x1019ccf8, dataread=20) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2357
#9 0x00002aaaab354d68 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=8447, data=0x1019ccf8, poll_in=1, poll_out=128,
poll_err=0) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2555
#10 0x00002b51710562d7 in event_dispatch_epoll (event_pool=0x10171548) at event.c:812
#11 0x00002aaaab3535b2 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src//rpcsvc.c:64
#12 0x00000032dfe06617 in start_thread () from /lib64/libpthread.so.0
#13 0x00000032df6d3c2d in clone () from /lib64/libc.so.6
---------------------------------------------------------------------------
Re-opened. Fix is on the way. Prithu, please set this to resolved after you've verified that the crash is fixed. PATCH: http://patches.gluster.com/patch/4582 in master (nfsrpc: Set THIS before vector sizing upcall) |