| Summary: | [ glusterfs 2.0.6rc1 ] - "server SEG fault" | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Vinayak Hegde <vinayak> |
| Component: | protocol | Assignee: | Basavanagowda Kanur <gowda> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 2.0.5 | CC: | aavati, gluster-bugs, gowda, guru, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
few observations after analyzing the core: 1. readdir() was sent on a file-descriptor which refers to a file, not a directory. 2. process id (pid) of the process which sent readdir() does not match the process id (pid) of the process which owns the file-descriptor. 3. connection disconnect happened on client side, for the exact server which segfaulted. suspecting client-side bug might have triggered server segfault. in ideal situation, we should have encountered problem (segafault or other errors) on client side for client's fault. i'm making changes to protocol/server to ensure that the file-descriptor requestor owns the file-descriptor. > i'm making changes to
> protocol/server to ensure that the file-descriptor requestor owns the
> file-descriptor.
I'm not sure how you plan to do this. As a filesystem we can only know the opener of the fd. It is very much legal for the non-openers of the fd to access it, since fds can be passed on to other processes via fork()s and unix fd passing.
Avati
PATCH: http://patches.gluster.com/patch/943 in master (protocol/client: fixed registration of saved_fds) PATCH: http://patches.gluster.com/patch/943 in release-2.0 (protocol/client: fixed registration of saved_fds) |
3 server stripe write behind and io-cache on client side Error may not be reproducible. Running tools were exnihilate.sh,annihilate.sh,dbench,iozone [root@brick6 /]# du -hs core.32202-brick6 9.0M core.32202-brick6 [root@brick6 /]# ls -lh core.32202-brick6 -rw------- 1 root root 31M Jul 28 23:30 core.32202-brick6 The BT: # /share/guru/analyse_core.sh core.32202-brick6 Core was generated by `/opt/glusterfs/2.0.6rc1/sbin/glusterfsd -f server_stripe.vol -l server_log'. Program terminated with signal 11, Segmentation fault. #0 0x00002ad610e44f3c in gf_print_trace (signum=11) at ../../../libglusterfs/src/common-utils.c:385 385 if ((tmp->root->type == GF_OP_TYPE_FOP_REQUEST) || (gdb) bt #0 0x00002ad610e44f3c in gf_print_trace (signum=11) at ../../../libglusterfs/src/common-utils.c:385 #1 <signal handler called> #2 0x00002ad611905906 in posix_readdir (frame=0x2aaab004acb0, this=0xb6c3de0, fd=0x2aaab8053860, size=4096, off=4096) at ../../../../../xlators/storage/posix/src/posix.c:3524 #3 0x00002ad610e43a2d in default_readdir (frame=0x2aaab001f590, this=0xb6c47a0, fd=0x2aaab8053860, size=4096, off=4096) at ../../../libglusterfs/src/defaults.c:1400 #4 0x00002ad611d2147b in iot_readdir_wrapper (frame=0x2aaab8029180, this=0xb6c4fe0, fd=0x2aaab8053860, size=4096, offset=4096) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1689 #5 0x00002ad610e52e3b in call_resume_wind (stub=0x2aaab8016660) at ../../../libglusterfs/src/call-stub.c:2661 #6 0x00002ad610e56fc2 in call_resume (stub=0x2aaab8016660) at ../../../libglusterfs/src/call-stub.c:4169 #7 0x00002ad611d21f93 in iot_worker_ordered (arg=0xb6c6850) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1943 #8 0x0000003129c06307 in start_thread () from /lib64/libpthread.so.0 #9 0x00000031294d1ded in clone () from /lib64/libc.so.6 (gdb) thr [Current thread is 1 (process 32206)] (gdb) info thr 10 process 32202 0x00000031294c441b in write () from /lib64/libc.so.6 9 process 32203 0x00000031294983b1 in nanosleep () from /lib64/libc.so.6 8 process 32204 0x0000003129c0a6a7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 process 32205 0x0000003129c0a6a7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 process 32207 0x0000003129c0a6a7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 process 32208 0x0000003129c0a6a7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 process 32209 0x0000003129c0a6a7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 process 32210 0x0000003129c0a6a7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 process 32211 0x0000003129c0a6a7 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 1 process 32206 0x00002ad610e44f3c in gf_print_trace (signum=11) at ../../../libglusterfs/src/common-utils.c:385 (gdb) thr 10 [Switching to thread 10 (process 32202)]#0 0x00000031294c441b in write () from /lib64/libc.so.6 (gdb) bt #0 0x00000031294c441b in write () from /lib64/libc.so.6 #1 0x000000312946b643 in _IO_new_file_write () from /lib64/libc.so.6 #2 0x000000312946b556 in _IO_new_do_write () from /lib64/libc.so.6 #3 0x000000312946bb87 in _IO_new_file_sync () from /lib64/libc.so.6 #4 0x0000003129460d30 in fflush () from /lib64/libc.so.6 #5 0x00002ad610e3e2f5 in _gf_log (domain=0xb6c4fc0 "brick", file=0x2ad611f48aa0 "../../../../../xlators/protocol/server/src/server-protocol.c", function=0x2ad611f49e4b "server_flush", line=4110, level=GF_LOG_ERROR, fmt=0x2ad611f48aeb "invalid argument: state->fd") at ../../../libglusterfs/src/logging.c:201 #6 0x00002ad611f388d8 in server_flush (frame=0x2aaab401e8b0, bound_xl=0xb6c4fe0, hdr=0x2aaab4018f70, hdrlen=48, iobuf=0x0) at ../../../../../xlators/protocol/server/src/server-protocol.c:4110 #7 0x00002ad611f43489 in protocol_server_interpret (this=0xb6c58d0, trans=0x2aaab400d800, hdr_p=0x2aaab4018f70 "", hdrlen=48, iobuf=0x0) at ../../../../../xlators/protocol/server/src/server-protocol.c:7474 #8 0x00002ad611f43fd5 in protocol_server_pollin (this=0xb6c58d0, trans=0x2aaab400d800) at ../../../../../xlators/protocol/server/src/server-protocol.c:7755 #9 0x00002ad611f441e4 in notify (this=0xb6c58d0, event=2, data=0x2aaab400d800) at ../../../../../xlators/protocol/server/src/server-protocol.c:7811 #10 0x00002aaaaaaaeba4 in socket_event_poll_in (this=0x2aaab400d800) at ../../../../transport/socket/src/socket.c:714 #11 0x00002aaaaaaaeea2 in socket_event_handler (fd=7, idx=2, data=0x2aaab400d800, poll_in=1, poll_out=0, poll_err=0) at ../../../../transport/socket/src/socket.c:814 #12 0x00002ad610e5b0dd in event_dispatch_epoll_handler (event_pool=0xb6be320, events=0xb6c8570, i=0) at ../../../libglusterfs/src/event.c:804 #13 0x00002ad610e5b2b2 in event_dispatch_epoll (event_pool=0xb6be320) at ../../../libglusterfs/src/event.c:867 #14 0x00002ad610e5b5c8 in event_dispatch (event_pool=0xb6be320) at ../../../libglusterfs/src/event.c:975 #15 0x00000000004053a9 in main (argc=5, argv=0x7fff99c80ee8) at ../../../glusterfsd/src/glusterfsd.c:1226