Bug 763231 (GLUSTER-1499)
Summary: | GNFS from mainline Glusterfs-3.1-qa13 crashes while initiating SFS2008 | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Prithu Tiwari <prithu> |
Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 3.1-alpha | CC: | gluster-bugs, prithu |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | RTP | Mount Type: | nfs |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Prithu Tiwari
2010-09-01 09:20:04 UTC
SFS crashed while initializing the test. tail of sfsc003.r4k-2 file ------------------------------------------------------------------------------- Wed Sep 1 03:56:53 2010 Completed. Wed Sep 1 03:56:53 2010 Sending DONE-MOUNT message to Prime Client(client3.gluster.priv). Wed Sep 1 03:56:53 2010 Completed. Wed Sep 1 03:56:53 2010 Waiting on DO-INIT message from Prime Client(client3.gluster.priv). Wed Sep 1 03:56:58 2010 Received. Wed Sep 1 03:56:58 2010 Initializing test directories. Wed Sep 1 03:56:58 2010 Child 2 will create 1926 directories. Wed Sep 1 03:56:58 2010 Child 3 will create 1926 directories. Wed Sep 1 03:56:58 2010 Child 1 will create 1926 directories. Wed Sep 1 03:56:58 2010 Child 0 will create 1926 directories. Wed Sep 1 03:57:08 2010 Child 0 finished creating 1926 directories. Wed Sep 1 03:57:08 2010 Child 0 will create 58925 files. Wed Sep 1 03:57:08 2010 Child 1 finished creating 1926 directories. Wed Sep 1 03:57:08 2010 Child 1 will create 58925 files. lad_write() RPC call failed : RPC: Unable to receive sfsnfs30: error in lad_write() at 168 for f0yeeeyd.d sfsnfs30: sending Pid 0 Signal 2 sfsnfs3: caught unexpected SIGCHLD. A child (PID: 19549) has abnormally exited with status of 65 Tell Prime to stop the benchmark. Terminating all child processes. sfsnfs3: sending Pid 0 Signal 2 ------------------------------------------------------------------------------ Checked at the GNFS server and found GNFS had crashed with core dump. the back-trace of core was : ------------------------------------------------------------------------------ (gdb) bt #0 0x00000032df630265 in raise () from /lib64/libc.so.6 #1 0x00000032df631d10 in abort () from /lib64/libc.so.6 #2 0x00000032df6296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b51f378ac7d in __gf_free (free_ptr=<value optimized out>) at mem-pool.c:252 #4 0x00002aaaab362052 in xdr_free_write3args_nocopy (wa=0x2aaab42ef558) at ../../../../xlators/nfs/lib/src/xdr-nfs3.c:1894 #5 0x00002aaaab351270 in nfs3svc_write_vec (req=0x16661938, iob=<value optimized out>) at nfs3.c:1931 #6 0x00002aaaab36120d in nfs_rpcsvc_record_vectored_call_actor (conn=0x161da278) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2206 #7 0x00002aaaab361a66 in nfs_rpcsvc_update_vectored_state (conn=0x161da278) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2263 #8 0x00002aaaab361e38 in nfs_rpcsvc_record_update_state (conn=0x161da278, dataread=1328) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2359 #9 0x00002aaaab362028 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=19119, data=0x161da278, poll_in=1, poll_out=128, poll_err=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2557 #10 0x00002b51f3789d77 in event_dispatch_epoll (event_pool=0x161cf7b8) at event.c:812 #11 0x00002aaaab360492 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64 #12 0x00000032dfe06617 in start_thread () from /lib64/libpthread.so.0 #13 0x00000032df6d3c2d in clone () from /lib64/libc.so.6 ------------------------------------------------------------------------------- The crash did not occur when I ran GNFS with following command ./dsh tc4 "export GLUSTERFS_DISABLE_MEM_ACCT=1;/opt/gnfs/sbin/glusterfs -f /share/shehjart/volfiles/gnfs-1v-4d.vol -l /tmp/gnnnn3" The SFS test ran to completion with all 5 iterations finishing. Performance is slightly slower(less) but need to investigate more before it could be confirmed. Prithu, Please run another test with the following patch: http://dev.gluster.com/~shehjart/0001-nfs3-Free-vectored-write-args-using-FREE-not-GF_FREE.patch Do not export the environment variable otherwise this patch will not have any affect. I can confirm this happens on a simple compilebench load also: configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.0git /lib64/libc.so.6[0x32d5c302d0] /lib64/libc.so.6(gsignal+0x35)[0x32d5c30265] /lib64/libc.so.6(abort+0x110)[0x32d5c31d10] /lib64/libc.so.6(__assert_fail+0xf6)[0x32d5c296e6] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0(__gf_free+0x11d)[0x2b9d51b5dc7d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(xdr_free_write3args_nocopy+0x12)[0x2aaaabbda072] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_write_vec+0x90)[0x2aaaabbc9290] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_vectored_call_actor+0x6d)[0x2aaaabbd922d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_update_vectored_state+0xf6)[0x2aaaabbd9a86] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_update_state+0x1d8)[0x2aaaabbd9e58] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_conn_data_handler+0x78)[0x2aaaabbda048] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0[0x2b9d51b5cd77] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_stage_proc+0x12)[0x2aaaabbd84b2] /lib64/libpthread.so.0[0x32d680673d] /lib64/libc.so.6(clone+0x6d)[0x32d5cd3d1d] Compilebench command: configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.0git /lib64/libc.so.6[0x32d5c302d0] /lib64/libc.so.6(gsignal+0x35)[0x32d5c30265] /lib64/libc.so.6(abort+0x110)[0x32d5c31d10] /lib64/libc.so.6(__assert_fail+0xf6)[0x32d5c296e6] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0(__gf_free+0x11d)[0x2b9d51b5dc7d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(xdr_free_write3args_nocopy+0x12)[0x2aaaabbda072] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs3svc_write_vec+0x90)[0x2aaaabbc9290] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_vectored_call_actor+0x6d)[0x2aaaabbd922d] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_update_vectored_state+0xf6)[0x2aaaabbd9a86] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_record_update_state+0x1d8)[0x2aaaabbd9e58] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_conn_data_handler+0x78)[0x2aaaabbda048] /home/shehjart/glusterfsd-master/lib/libglusterfs.so.0[0x2b9d51b5cd77] /home/shehjart/glusterfsd-master/lib/glusterfs/3.1.0git/xlator/nfs/server.so(nfs_rpcsvc_stage_proc+0x12)[0x2aaaabbd84b2] /lib64/libpthread.so.0[0x32d680673d] /lib64/libc.so.6(clone+0x6d)[0x32d5cd3d1d] Config is a 4x3 distributed-replicated config. PATCH: http://patches.gluster.com/patch/4482 in master (nfs3: Free vectored write args using FREE not GF_FREE) Prithu, do a git pull on mainline. The patch in the previous message fixes this problem. I did a patch of the code(qa13) using the patch sent but the GNFS crashed again with following back-trace ------------------------------------------------------------------------------- #0 0x00000032df630265 in raise () from /lib64/libc.so.6 #1 0x00000032df631d10 in abort () from /lib64/libc.so.6 #2 0x00000032df66a84b in __libc_message () from /lib64/libc.so.6 #3 0x00000032df6722ef in _int_free () from /lib64/libc.so.6 #4 0x00000032df67273b in free () from /lib64/libc.so.6 #5 0x00002aaaab3512b0 in nfs3svc_write_vec (req=0x2aaaaf6b5038, iob=<value optimized out>) at nfs3.c:1931 #6 0x00002aaaab36124d in nfs_rpcsvc_record_vectored_call_actor (conn=0x1e171048) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2206 #7 0x00002aaaab361aa6 in nfs_rpcsvc_update_vectored_state (conn=0x1e171048) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2263 #8 0x00002aaaab361e78 in nfs_rpcsvc_record_update_state (conn=0x1e171048, dataread=1328) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2359 #9 0x00002aaaab362068 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=16109, data=0x1e171048, poll_in=1, poll_out=128, poll_err=0) at ../../../../xlators/nfs/lib/src/rpcsvc.c:2557 #10 0x00002b1e88667d77 in event_dispatch_epoll (event_pool=0x1e15a7b8) at event.c:812 #11 0x00002aaaab3604d2 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src/rpcsvc.c:64 #12 0x00000032dfe06617 in start_thread () from /lib64/libpthread.so.0 #13 0x00000032df6d3c2d in clone () from /lib64/libc.so.6 --------------------------------------------------------------------------------- But I would now try with git-pull from the mainline. (In reply to comment #7) > I did a patch of the code(qa13) using the patch sent but the GNFS crashed again > with following back-trace Yes. Thats expected. Mainline has it fixed. I used the Glusterfs-mainline from the git. The SFS still crashed the GNFS with the following backtrace ---------------------------------------------------------------------- #0 0x00000032df630265 in raise () from /lib64/libc.so.6 #1 0x00000032df631d10 in abort () from /lib64/libc.so.6 #2 0x00000032df6296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b5171057130 in gf_mem_set_acct_info (xl=0x2b5171281560, alloc_ptr=0x40b63fa0, size=48, type=73) at mem-pool.c:88 #4 0x00002b5171057472 in __gf_calloc (nmemb=1, size=<value optimized out>, type=73) at mem-pool.c:140 #5 0x00002aaaab33deb4 in nfs3svc_write_vecsizer (req=0x2aaaaf82c428, readsize=0x40b64010, newbuf=0x40b6401c) at nfs3.c:1902 #6 0x00002aaaab3542db in nfs_rpcsvc_handle_vectored_rpc_call (conn=0x1019ccf8) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2144 #7 0x00002aaaab3548c4 in nfs_rpcsvc_update_vectored_state (conn=0x20fd) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2254 #8 0x00002aaaab354bfc in nfs_rpcsvc_record_update_state (conn=0x1019ccf8, dataread=20) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2357 #9 0x00002aaaab354d68 in nfs_rpcsvc_conn_data_handler (fd=<value optimized out>, idx=8447, data=0x1019ccf8, poll_in=1, poll_out=128, poll_err=0) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2555 #10 0x00002b51710562d7 in event_dispatch_epoll (event_pool=0x10171548) at event.c:812 #11 0x00002aaaab3535b2 in nfs_rpcsvc_stage_proc (arg=<value optimized out>) at ../../../../xlators/nfs/lib/src//rpcsvc.c:64 #12 0x00000032dfe06617 in start_thread () from /lib64/libpthread.so.0 #13 0x00000032df6d3c2d in clone () from /lib64/libc.so.6 --------------------------------------------------------------------------- Re-opened. Fix is on the way. Prithu, please set this to resolved after you've verified that the crash is fixed. PATCH: http://patches.gluster.com/patch/4582 in master (nfsrpc: Set THIS before vector sizing upcall) |