Bug 797119 - [glusterfs-3.3.0qa24]: glusterfs crashed in xattrop when replace-brick is given
Summary: [glusterfs-3.3.0qa24]: glusterfs crashed in xattrop when replace-brick is given
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-02-24 10:35 UTC by Raghavendra Bhat
Modified: 2013-07-24 17:32 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:32:14 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3.0qa40
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-02-24 10:35:45 UTC
Description of problem:
2x2 distributed replicate setup. 1 fuse and 1 nfs client. While some tests were running on the fuse client gave replace-brick and the source brick crashed.

This is the backtrace of the core.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.10.1.11.130.export-'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f30a6bae956 in _xattrop_index_action (this=0x22b2800, inode=0x7f30a4366130, xattr=0x0)
    at ../../../../../xlators/features/index/src/index.c:452
452             trav = xattr->members_list;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x00007f30a6bae956 in _xattrop_index_action (this=0x22b2800, inode=0x7f30a4366130, xattr=0x0)
    at ../../../../../xlators/features/index/src/index.c:452
#1  0x00007f30a6baeb4e in fop_fxattrop_index_action (this=0x22b2800, inode=0x7f30a4366130, xattr=0x0)
    at ../../../../../xlators/features/index/src/index.c:493
#2  0x00007f30a6baf46e in index_fxattrop_cbk (frame=0x7f30aab05350, cookie=0x7f30aab052a4, this=0x22b2800, op_ret=0, op_errno=2, xattr=0x0)
    at ../../../../../xlators/features/index/src/index.c:648
#3  0x00007f30a6e21526 in afr_fxattrop_cbk (frame=0x7f30aab052a4, cookie=0x7f30aaaf06bc, this=0x22b14f0, op_ret=-1, op_errno=2, xattr=0x0)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:2704
#4  0x00007f30a7079147 in client3_1_fxattrop_cbk (req=0x7f309c0436b8, iov=0x7f309c0436f8, count=1, myframe=0x7f30aaaf06bc)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1462
#5  0x00007f30aba25919 in rpc_clnt_handle_reply (clnt=0x7f309c001470, pollin=0x2368400) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:796
#6  0x00007f30aba25cb6 in rpc_clnt_notify (trans=0x7f309c128620, mydata=0x7f309c0014a0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2368400)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:915
#7  0x00007f30aba21da8 in rpc_transport_notify (this=0x7f309c128620, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2368400)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#8  0x00007f30a8732270 in socket_event_poll_in (this=0x7f309c128620) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1686
#9  0x00007f30a87327f4 in socket_event_handler (fd=14, idx=4, data=0x7f309c128620, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1801
#10 0x00007f30abc7c05c in event_dispatch_epoll_handler (event_pool=0x228bc20, events=0x22a55f0, i=0) at ../../../libglusterfs/src/event.c:794
#11 0x00007f30abc7c27f in event_dispatch_epoll (event_pool=0x228bc20) at ../../../libglusterfs/src/event.c:856
#12 0x00007f30abc7c60a in event_dispatch (event_pool=0x228bc20) at ../../../libglusterfs/src/event.c:956
#13 0x0000000000407dcc in main (argc=19, argv=0x7fffffb51738) at ../../../glusterfsd/src/glusterfsd.c:1612
(gdb)  f 0
#0  0x00007f30a6bae956 in _xattrop_index_action (this=0x22b2800, inode=0x7f30a4366130, xattr=0x0)
    at ../../../../../xlators/features/index/src/index.c:452
452             trav = xattr->members_list;
(gdb) p xattr
$1 = (dict_t *) 0x0
(gdb) up
#1  0x00007f30a6baeb4e in fop_fxattrop_index_action (this=0x22b2800, inode=0x7f30a4366130, xattr=0x0)
    at ../../../../../xlators/features/index/src/index.c:493
493             _xattrop_index_action (this, inode, xattr);
(gdb) p xattr
$2 = (dict_t *) 0x0
(gdb) up
#2  0x00007f30a6baf46e in index_fxattrop_cbk (frame=0x7f30aab05350, cookie=0x7f30aab052a4, this=0x22b2800, op_ret=0, op_errno=2, xattr=0x0)
    at ../../../../../xlators/features/index/src/index.c:648
648             fop_fxattrop_index_action (this, frame->local, xattr);
(gdb) p xattr
$3 = (dict_t *) 0x0
(gdb) up
#3  0x00007f30a6e21526 in afr_fxattrop_cbk (frame=0x7f30aab052a4, cookie=0x7f30aaaf06bc, this=0x22b14f0, op_ret=-1, op_errno=2, xattr=0x0)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:2704
2704                    AFR_STACK_UNWIND (fxattrop, frame, local->op_ret, local->op_errno,
(gdb) p xattr
$4 = (dict_t *) 0x0
(gdb) p op_ret
$5 = -1
(gdb) p local->op_ret
$6 = 0
(gdb) l afr_fxattrop_cbk
2680
2681    int32_t
2682    afr_fxattrop_cbk (call_frame_t *frame, void *cookie,
2683                      xlator_t *this, int32_t op_ret, int32_t op_errno,
2684                      dict_t *xattr)
2685    {
2686            afr_local_t *local = NULL;
2687
2688            int call_count = -1;
2689
(gdb) 
2690            local = frame->local;
2691
2692            LOCK (&frame->lock);
2693            {
2694                    if (op_ret == 0)
2695                            local->op_ret = 0;
2696
2697                    local->op_errno = op_errno;
2698            }
2699            UNLOCK (&frame->lock);
(gdb) 
2700
2701            call_count = afr_frame_return (frame);
2702
2703            if (call_count == 0)
2704                    AFR_STACK_UNWIND (fxattrop, frame, local->op_ret, local->op_errno,
2705                                      xattr);
2706
2707            return 0;
2708    }
2709
(gdb) 

In afr_{f}xattrop_cbk we are not saving the xattr we have received from the subvolumes. Suppose the 1st subvolume returened success with op_ret 0 and non null xattr. Now since we are not storing xattr in the local only local->op_ret is set to 0. IF the op on the next subvolume fails, then op_ret is -1, but we are not storing it in local (since one of the subvolumes returned success), but will be sending the NULL xattr.

xlators above afr might segfault when they see op_ret to be 0 and assume that xattr to be present.




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create a volume and start it
2. mount it and put some data into the volume
3. give replace-brick
  
Actual results:

The source brick crashes

Expected results:

The source brick should not crash.

Additional info:

gluster volume info mirror
 
Volume Name: mirror
Type: Distributed-Replicate
Volume ID: f7ab6a61-4629-43e0-92c0-890f425b6afe
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.11.130:/export-xfs/mirror
Brick2: 10.1.11.131:/export-xfs/mirror
Brick3: 10.1.11.144:/export-xfs/mirror
Brick4: 10.1.11.145:/export-xfs/mirror
Options Reconfigured:
cluster.self-heal-daemon: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
geo-replication.indexing: on
features.limit-usage: /playground:22GB
features.quota: on
performance.stat-prefetch: on

Comment 1 Anand Avati 2012-03-12 12:23:04 UTC
CHANGE: http://review.gluster.com/2813 (cluster/afr: save the xattr obtained in the {f}xattrop_cbk in local) merged in master by Vijay Bellur (vijay)

Comment 2 Raghavendra Bhat 2012-05-09 09:57:01 UTC
Checked with glusterfs-3.3.0qa40. Now replace-brick command does not give this crash since we are properly storing the xattrs that have been returned.


Note You need to log in before you can comment on or make changes to this bug.