Bug 808067 - Brick process crashed upon add-brick and rebalance
Summary: Brick process crashed upon add-brick and rebalance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: pre-release
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-29 13:25 UTC by shylesh
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:39:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description shylesh 2012-03-29 13:25:52 UTC
Description of problem:
while I/O is happening did add-brick and initiate rebalance, this caused one of the brick process to crash

Version-Release number of selected component (if applicable):
3.3.0qa32

How reproducible:


Steps to Reproduce:
1. create a distribute volume with 2 bricks
2. untard the kernel and do rm -rf of the kernel directory
3. while remove is happening add-brick and initiate rebalance .
  
Actual results:
brick process crashed.

Expected results:


Additional info:
Program terminated with signal 11, Segmentation fault.
#0  0x00007fdadebab6fb in gf_server_check_setxattr_cmd (frame=0x7fdae2977fc0, dict=0x0) at server-helpers.c:1411
1411            for (pair = dict->members_list; pair; pair = pair->next) {
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64


=============================================================================================================

(gdb) bt
#0  0x00007fdadebab6fb in gf_server_check_setxattr_cmd (frame=0x7fdae2977fc0, dict=0x0) at server-helpers.c:1411
#1  0x00007fdadebbee21 in server_setxattr (req=0x7fdade476910) at server3_1-fops.c:3865
#2  0x00007fdae3b2012e in rpcsvc_handle_rpc_call (svc=0xd0f470, trans=0xd24c30, msg=0xd40640) at rpcsvc.c:520
#3  0x00007fdae3b204d1 in rpcsvc_notify (trans=0xd24c30, mydata=0xd0f470, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xd40640) at rpcsvc.c:616
#4  0x00007fdae3b25ee4 in rpc_transport_notify (this=0xd24c30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xd40640) at rpc-transport.c:498
#5  0x00007fdae06ab27c in socket_event_poll_in (this=0xd24c30) at socket.c:1686
#6  0x00007fdae06ab800 in socket_event_handler (fd=19, idx=8, data=0xd24c30, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#7  0x00007fdae3d82640 in event_dispatch_epoll_handler (event_pool=0xcd3db0, events=0xd01660, i=0) at event.c:794
#8  0x00007fdae3d82863 in event_dispatch_epoll (event_pool=0xcd3db0) at event.c:856
#9  0x00007fdae3d82bee in event_dispatch (event_pool=0xcd3db0) at event.c:956
#10 0x000000000040801c in main (argc=19, argv=0x7fff84662fe8) at glusterfsd.c:1650

+++++++++++===============================================(gdb) f 0
#0  0x00007fdadebab6fb in gf_server_check_setxattr_cmd (frame=0x7fdae2977fc0, dict=0x0) at server-helpers.c:1411
1411            for (pair = dict->members_list; pair; pair = pair->next) {
(gdb) l
1406
1407            conf = frame->this->private;
1408            if (!conf)
1409                    return 0;
1410
1411            for (pair = dict->members_list; pair; pair = pair->next) {
1412                    /* this exact key is used in 'io-stats' too.
1413                     * But this is better place for this information dump.
1414                     */
1415                    if (fnmatch ("*io*stat*dump", pair->key, 0) == 0) {
(gdb) p dict
$1 = (dict_t *) 0x0

=======================================================================================================

[2012-03-29 08:05:23.979830] I [server-handshake.c:571:server_setvolume] 0-dist-server: accepted client from 10.16.157.66:1015 (version: 3.3.0qa32)
[2012-03-29 08:05:23.982574] I [server-handshake.c:571:server_setvolume] 0-dist-server: accepted client from 10.16.157.63:1018 (version: 3.3.0qa32)
[2012-03-29 08:05:24.477978] W [dict.c:458:dict_ref] (-->/usr/local/lib/libglusterfs.so.0(+0x41ba1) [0x7fdae3d75ba1] (-->/usr/local/lib/glusterfs/3.3.0qa32/x
lator/performance/io-threads.so(iot_truncate_wrapper+0x23b) [0x7fdadf22267a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/features/locks.so(pl_truncate+0xe4
) [0x7fdadf438f66]))) 0-dict: dict is NULL
[2012-03-29 08:05:24.478026] E [posix.c:216:posix_stat] 0-dist-posix: lstat on /home/bricks/d4/.glusterfs/c6/71/c671964c-d9aa-4efe-82b9-96a4d9741d0b failed: 
No such file or directory
[2012-03-29 08:05:24.478037] E [posix.c:176:truncate_stat_cbk] 0-dist-locks: got error (errno=2, stderror=No such file or directory) from child
[2012-03-29 08:05:24.478055] E [posix.c:219:truncate_stat_cbk] 0-dist-locks: truncate failed with ret: -1, error: No such file or directory
[2012-03-29 08:05:24.478069] I [server3_1-fops.c:1206:server_truncate_cbk] 0-dist-server: 51: TRUNCATE <gfid:c671964c-d9aa-4efe-82b9-96a4d9741d0b> (c671964c-
d9aa-4efe-82b9-96a4d9741d0b) ==> -1 (No such file or directory)
[2012-03-29 08:05:24.617882] W [dict.c:458:dict_ref] (-->/usr/local/lib/libglusterfs.so.0(+0x41ba1) [0x7fdae3d75ba1] (-->/usr/local/lib/glusterfs/3.3.0qa32/x
lator/performance/io-threads.so(iot_truncate_wrapper+0x23b) [0x7fdadf22267a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/features/locks.so(pl_truncate+0xe4
) [0x7fdadf438f66]))) 0-dict: dict is NULL
[2012-03-29 08:05:24.617959] E [posix.c:216:posix_stat] 0-dist-posix: lstat on /home/bricks/d4/.glusterfs/d1/c0/d1c04d4a-8612-4d8b-8124-1ee4fb830d09 failed: 
No such file or directory
[2012-03-29 08:05:24.617977] E [posix.c:176:truncate_stat_cbk] 0-dist-locks: got error (errno=2, stderror=No such file or directory) from child
[2012-03-29 08:05:24.617992] E [posix.c:219:truncate_stat_cbk] 0-dist-locks: truncate failed with ret: -1, error: No such file or directory
[2012-03-29 08:05:24.618013] I [server3_1-fops.c:1206:server_truncate_cbk] 0-dist-server: 81: TRUNCATE <gfid:d1c04d4a-8612-4d8b-8124-1ee4fb830d09> (d1c04d4a-
8612-4d8b-8124-1ee4fb830d09) ==> -1 (No such file or directory)
[2012-03-29 08:05:24.683553] E [posix-helpers.c:659:posix_handle_pair] 0-dist-posix: /home/bricks/d4/.glusterfs/2c/53/2c5375f1-d472-48af-8dc3-7cc0a06fc762/kv
m/mmu_audit.c: key:trusted.glusterfs.dht.linkto error:File exists
[2012-03-29 08:05:24.683572] E [posix.c:1735:posix_create] 0-dist-posix: setting xattrs on /home/bricks/d4/.glusterfs/2c/53/2c5375f1-d472-48af-8dc3-7cc0a06fc
762/kvm/mmu_audit.c failed (File exists)
[2012-03-29 08:05:24.714598] E [posix.c:351:posix_setattr] 0-dist-posix: setattr (lstat) on /home/bricks/d4/.glusterfs/5d/74/5d744fbf-e73f-430c-ae26-4bd7b1de
110f failed: No such file or directory
[2012-03-29 08:05:24.714635] I [server3_1-fops.c:1731:server_setattr_cbk] 0-dist-server: 106: SETATTR <gfid:5d744fbf-e73f-430c-ae26-4bd7b1de110f> (5d744fbf-e
73f-430c-ae26-4bd7b1de110f) ==> -1 (No such file or directory)
[2012-03-29 08:05:24.730650] E [posix.c:816:posix_mknod] 0-dist-posix: mknod on /home/bricks/d4/.glusterfs/2c/53/2c5375f1-d472-48af-8dc3-7cc0a06fc762/kvm/x86
.h failed: File exists
[2012-03-29 08:05:24.730668] I [server3_1-fops.c:571:server_mknod_cbk] 0-dist-server: 113: MKNOD /linux-3.2.13/arch/x86/kvm/x86.h ==> -1 (File exists)
[2012-03-29 08:05:24.731558] I [server3_1-fops.c:817:server_getxattr_cbk] 0-dist-server: 114: GETXATTR (null) (trusted.glusterfs.pathinfo) ==> -1 (No such fi
le or directory)
[2012-03-29 08:05:24.751038] I [server3_1-fops.c:817:server_getxattr_cbk] 0-dist-server: 117: GETXATTR (null) (trusted.glusterfs.node-uuid) ==> -1 (No such f
ile or directory)
[2012-03-29 08:05:24.776183] I [server3_1-fops.c:52:server_statfs_cbk] 0-dist-server: 123: STATFS -1 (No such file or directory)
[2012-03-29 08:05:24.797732] I [server3_1-fops.c:817:server_getxattr_cbk] 0-dist-server: 125: GETXATTR (null) (trusted.glusterfs.node-uuid) ==> -1 (No such f
ile or directory)
[2012-03-29 08:05:24.822872] I [server3_1-fops.c:52:server_statfs_cbk] 0-dist-server: 131: STATFS -1 (No such file or directory)
[2012-03-29 08:05:24.843082] I [server3_1-fops.c:817:server_getxattr_cbk] 0-dist-server: 133: GETXATTR (null) (trusted.glusterfs.node-uuid) ==> -1 (No such f
ile or directory)
[2012-03-29 08:05:25.072301] E [posix-helpers.c:659:posix_handle_pair] 0-dist-posix: /home/bricks/d4/.glusterfs/2c/53/2c5375f1-d472-48af-8dc3-7cc0a06fc76
=====================================================================================================================

[2012-03-29 08:22:01.858086] W [client.c:112:client_grace_timeout] 7-dist-client-2: client grace timer expired, updating the lk-version to 84
[2012-03-29 08:22:03.860835] I [client.c:136:client_register_grace_timer] 7-dist-client-2: Registering a grace timer
[2012-03-29 08:22:13.868979] W [client.c:112:client_grace_timeout] 7-dist-client-2: client grace timer expired, updating the lk-version to 85
[2012-03-29 08:22:15.871717] I [client.c:136:client_register_grace_timer] 7-dist-client-2: Registering a grace timer
[2012-03-29 08:22:18.437998] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /
[2012-03-29 08:22:18.438049] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09e6ecd2d
8] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup+0x4d4) [0x7f09e6ee924a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/pr
otocol/client.so(client_submit_request+0x678) [0x7f09e6ecce10]))) 7-iobuf: invalid argument: iobuf
[2012-03-29 08:22:18.439558] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /.gdbinit
[2012-03-29 08:22:18.439619] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09e6ecd2d
8] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup+0x4d4) [0x7f09e6ee924a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/pr
otocol/client.so(client_submit_request+0x678) [0x7f09e6ecce10]))) 7-iobuf: invalid argument: iobuf
[2012-03-29 08:22:18.441276] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /glusterfs
[2012-03-29 08:22:18.441338] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09e6ecd2d
8] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup+0x4d4) [0x7f09e6ee924a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/pr
otocol/client.so(client_submit_request+0x678) [0x7f09e6ecce10]))) 7-iobuf: invalid argument: iobuf
[2012-03-29 08:22:18.442909] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /glusterfs
[2012-03-29 08:22:18.442966] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09e6ecd2d
8] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup+0x4d4) [0x7f09e6ee924a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/pr
otocol/client.so(client_submit_request+0x678) [0x7f09e6ecce10]))) 7-iobuf: invalid argument: iobuf
[2012-03-29 08:22:18.561264] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /system-supplied DSO at 0x7fff8476d000
[2012-03-29 08:22:18.561316] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09e6ecd2d
8] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup+0x4d4) [0x7f09e6ee924a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/pr
otocol/client.so(client_submit_request+0x678) [0x7f09e6ecce10]))) 7-iobuf: invalid argument: iobuf
[2012-03-29 08:22:18.562321] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /system-supplied DSO at 0x7fff8476d000-gdb.py
[2012-03-29 08:22:18.562372] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09e6ecd2d
8] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup+0x4d4) [0x7f09e6ee924a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/pr
otocol/client.so(client_submit_request+0x678) [0x7f09e6ecce10]))) 7-iobuf: invalid argument: iobuf
[2012-03-29 08:22:18.563119] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /.gdb_history
[2012-03-29 08:22:18.563203] W [client3_1-fops.c:2607:client3_1_lookup_cbk] 7-dist-client-2: remote operation failed: Transport endpoint is not connected. Pa
th: /.gdb_history
[2012-03-29 08:22:18.563247] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09e6ecd2d
8] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup+0x4d4) [0x7f09e6ee924a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/pr
otocol/client.so(client_submit_request+0x678) [0x7f09e6ecce10]))) 7-iobuf: invalid argument: iobuf
[2012-03-29 08:22:18.563297] E [iobuf.c:660:iobuf_unref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client_lookup+0x15a) [0x7f09

Comment 1 Anand Avati 2012-04-17 08:15:06 UTC
CHANGE: http://review.gluster.com/3164 (protocol/server: Check if dict arg is NULL in setxattr) merged in master by Vijay Bellur (vijay)

Comment 2 shylesh 2012-04-20 10:02:32 UTC
works on 3.3.0qa36


Note You need to log in before you can comment on or make changes to this bug.