874913 – [RHEV-RHS] Brick process crashed after rebalance

Bug 874913 - [RHEV-RHS] Brick process crashed after rebalance

Summary: [RHEV-RHS] Brick process crashed after rebalance

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Bhat
QA Contact:	shylesh
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	874928 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-11-09 04:26 UTC by shylesh
Modified:	2018-12-05 15:42 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.3.0.5rhs-40
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-04-23 23:24:21 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
mnt,bricks,vdsm,engine,rebalance logs (2.65 MB, application/x-gzip) 2012-11-09 04:26 UTC, shylesh	no flags	Details
View All

Description shylesh 2012-11-09 04:26:54 UTC

Created attachment 641261 [details]
mnt,bricks,vdsm,engine,rebalance logs

Description of problem:
Brick process was crashed after performing add-brick and rebalance on a distribute volume serving as VMstore

Version-Release number of selected component (if applicable):

glusterfs-fuse-3.3.0rhsvirt1-8.el6rhs.x86_64
vdsm-gluster-4.9.6-16.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-8.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
glusterfs-rdma-3.3.0rhsvirt1-8.el6rhs.x86_64
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-debuginfo-3.3.0rhsvirt1-8.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch
glusterfs-geo-replication-3.3.0rhsvirt1-8.el6rhs.x86_64


How reproducible:


Steps to Reproduce:
1. created a single brick distribute volume
2. Storage domain was created on this volume
3. VMs were healthy, did add-brick and started rebalance
4. rebalance completed successfully but after sometime brick process core dumped
  
 
Additional info:

bt
----
Core was generated by `/usr/sbin/glusterfsd -s localhost --volfile-id distribute.rhs-client37.lab.eng.'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f4a47dd12ac in ltable_dump (trav=0x2698ac0) at server.c:308
308             gf_proc_dump_build_key(key,
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.12.x86_64 libaio-0.3.107-10.el6.x86_64 libgcc-4.4.6-3.el6.x86_64 openssl-1.0.0-20.el6_2.5.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x00007f4a47dd12ac in ltable_dump (trav=0x2698ac0) at server.c:308
#1  0x00007f4a47dd1847 in server_inode (this=<value optimized out>) at server.c:560
#2  0x000000335ea45012 in gf_proc_dump_xlator_info (top=<value optimized out>) at statedump.c:451
#3  0x000000335ea4578c in gf_proc_dump_info (signum=<value optimized out>) at statedump.c:774
#4  0x0000000000405d22 in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1502
#5  0x000000335de077f1 in start_thread () from /lib64/libpthread.so.0
#6  0x000000335d6e5ccd in clone () from /lib64/libc.so.6




(gdb) l
303             char key[GF_DUMP_MAX_BUF_LEN] = {0,};
304             struct _locker *locker = NULL;
305             char    locker_data[GF_MAX_LOCK_OWNER_LEN] = {0,};
306             int     count = 0;
307
308             gf_proc_dump_build_key(key,
309                                    "conn","bound_xl.ltable.inodelk.%s",
310                                    trav->bound_xl->name);
311             gf_proc_dump_add_section(key);
312
(gdb) p trav->bound_xl
$1 = (xlator_t *) 0x0




volume info
===========
Volume Name: distribute
Type: Distribute
Volume ID: 11695105-f2d4-488c-b695-c29eb3dfa9be
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: rhs-client37.lab.eng.blr.redhat.com:/brick1
Brick2: rhs-client43.lab.eng.blr.redhat.com:/brick2
Options Reconfigured:
cluster.subvols-per-directory: 1
cluster.eager-lock: enable
storage.linux-aio: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off


Attached the mnt, bricks, rebalance , vdsm and engine logs

Comment 3 Pranith Kumar K 2012-11-12 04:26:24 UTC

*** Bug 874928 has been marked as a duplicate of this bug. ***

Comment 6 shylesh 2013-01-23 14:15:02 UTC

Verified on 3.3.0.5rhs-40.el6rhs.x86_64

Comment 9 Scott Haines 2014-04-23 23:24:21 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0691.html

Note You need to log in before you can comment on or make changes to this bug.