Created attachment 641261 [details] mnt,bricks,vdsm,engine,rebalance logs Description of problem: Brick process was crashed after performing add-brick and rebalance on a distribute volume serving as VMstore Version-Release number of selected component (if applicable): glusterfs-fuse-3.3.0rhsvirt1-8.el6rhs.x86_64 vdsm-gluster-4.9.6-16.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-8.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch glusterfs-rdma-3.3.0rhsvirt1-8.el6rhs.x86_64 gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-debuginfo-3.3.0rhsvirt1-8.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch glusterfs-geo-replication-3.3.0rhsvirt1-8.el6rhs.x86_64 How reproducible: Steps to Reproduce: 1. created a single brick distribute volume 2. Storage domain was created on this volume 3. VMs were healthy, did add-brick and started rebalance 4. rebalance completed successfully but after sometime brick process core dumped Additional info: bt ---- Core was generated by `/usr/sbin/glusterfsd -s localhost --volfile-id distribute.rhs-client37.lab.eng.'. Program terminated with signal 11, Segmentation fault. #0 0x00007f4a47dd12ac in ltable_dump (trav=0x2698ac0) at server.c:308 308 gf_proc_dump_build_key(key, Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libaio-0.3.107-10.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 zlib-1.2.3-27.el6.x86_64 (gdb) bt #0 0x00007f4a47dd12ac in ltable_dump (trav=0x2698ac0) at server.c:308 #1 0x00007f4a47dd1847 in server_inode (this=<value optimized out>) at server.c:560 #2 0x000000335ea45012 in gf_proc_dump_xlator_info (top=<value optimized out>) at statedump.c:451 #3 0x000000335ea4578c in gf_proc_dump_info (signum=<value optimized out>) at statedump.c:774 #4 0x0000000000405d22 in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1502 #5 0x000000335de077f1 in start_thread () from /lib64/libpthread.so.0 #6 0x000000335d6e5ccd in clone () from /lib64/libc.so.6 (gdb) l 303 char key[GF_DUMP_MAX_BUF_LEN] = {0,}; 304 struct _locker *locker = NULL; 305 char locker_data[GF_MAX_LOCK_OWNER_LEN] = {0,}; 306 int count = 0; 307 308 gf_proc_dump_build_key(key, 309 "conn","bound_xl.ltable.inodelk.%s", 310 trav->bound_xl->name); 311 gf_proc_dump_add_section(key); 312 (gdb) p trav->bound_xl $1 = (xlator_t *) 0x0 volume info =========== Volume Name: distribute Type: Distribute Volume ID: 11695105-f2d4-488c-b695-c29eb3dfa9be Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: rhs-client37.lab.eng.blr.redhat.com:/brick1 Brick2: rhs-client43.lab.eng.blr.redhat.com:/brick2 Options Reconfigured: cluster.subvols-per-directory: 1 cluster.eager-lock: enable storage.linux-aio: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off Attached the mnt, bricks, rebalance , vdsm and engine logs
*** Bug 874928 has been marked as a duplicate of this bug. ***
Verified on 3.3.0.5rhs-40.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0691.html