Bug 1121230
Summary: | gluster nfs server process was crashed thrice while mounting volume using nfs protocol | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> |
Component: | gluster-nfs | Assignee: | santosh pradhan <spradhan> |
Status: | CLOSED WORKSFORME | QA Contact: | amainkar |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.0 | CC: | racpatel, rhs-bugs, ssamanta, vagarwal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-08-12 09:23:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rachana Patel
2014-07-18 17:13:06 UTC
-> It was distributed volume and tried to mount using command mount -t nfs <IP>:<vol_name> <mnt_point> -> no I/Os werr being donr from any other mount no volume options were set for the same. I have tried several times (~ 20 times) on 2x2 volume (build glusterfs-3.6.0.24-1.el6rhs.x86_64) to reproduce the issue but no success. Even Saurabh tried to repro the issue with his setup but no luck yet. This is the analysis I have at the moment: ========================================= (1) The NFS server (glusterfsd) crashes while getting the mount request i.e. while processing in the socket layer [ glusterfsd/NFS server crashes before the control comes to NFS protocol layer]. The backtrace: Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/'. Program terminated with signal 11, Segmentation fault. #0 list_del_init (iobuf_pool=<value optimized out>, page_size=128) at list.h:88 88 old->next->prev = old->prev; (gdb) where #0 list_del_init (iobuf_pool=<value optimized out>, page_size=128) at list.h:88 #1 __iobuf_arena_unprune (iobuf_pool=<value optimized out>, page_size=128) at iobuf.c:227 #2 0x00000035ff24fd85 in __iobuf_pool_add_arena (iobuf_pool=0x1c4c970, page_size=128, num_pages=1024) at iobuf.c:251 #3 0x00000035ff2500dd in iobuf_get2 (iobuf_pool=0x1c4c970, page_size=44) at iobuf.c:603 #4 0x00007f7113df6050 in __socket_proto_state_machine (this=0x1f3ee50) at socket.c:1989 #5 socket_proto_state_machine (this=0x1f3ee50) at socket.c:2107 #6 socket_event_poll_in (this=0x1f3ee50) at socket.c:2123 #7 0x00007f7113df7c5d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1f3ee50, poll_in=1, poll_out=0, poll_err=0) at socket.c:2240 #8 0x00000035ff275907 in event_dispatch_epoll_handler (event_pool=0x1c684d0) at event-epoll.c:384 #9 event_dispatch_epoll (event_pool=0x1c684d0) at event-epoll.c:445 #10 0x0000000000407e93 in main (argc=11, argv=0x7fffb4964cc8) at glusterfsd.c:2023 (2) Something unusual I could see in the glusterfs ctx iobuf_pool: (gdb) frame 4 #4 0x00007f7113df6050 in __socket_proto_state_machine (this=0x1f3ee50) at socket.c:1989 1989 iobuf = iobuf_get2 (this->ctx->iobuf_pool, (gdb) p ((struct iobuf_pool *)this->ctx->iobuf_pool).purge[0] $1 = {next = 0xffffffff01c4cdb0, prev = 0x1c4c770} The next pointer seems to be corrupted somehow. (3) In __iobuf_select_arena(): __iobuf_pool_add_arena() gets executed if and only if "all arenas were full, find the right count to add" [as per the comment in code]. What can cause all the arenas to be FULL, before it goes to fetch one from purge[] list? It needs to be checked with people who work in socket layer. Will be sending an email to Avati/gluster-devel. (4) The issue is not reproducible with several attempts, so the severity should be brought down IMO. What do you say Rachana? Thanks, Santosh The issue is not reproducible with several attempts, not even by QE. As per my discussion with Rachana, I am lowering the severity to medium so that it can be investigated later if we hit it again. Otherwise it would be CLOSED. |