1022510 – GlusterFS client crashes during add-brick and rebalance

Bug 1022510 - GlusterFS client crashes during add-brick and rebalance

Summary: GlusterFS client crashes during add-brick and rebalance

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1019874 1104940 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-10-23 12:46 UTC by Samuli Heinonen
Modified:	2015-10-07 13:15 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-10-07 13:15:42 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
GlusterFS client log during rebalance (20.50 KB, text/plain) 2013-10-23 12:46 UTC, Samuli Heinonen	no flags	Details
Backtrace of coredump (6.21 KB, text/plain) 2013-10-23 12:47 UTC, Samuli Heinonen	no flags	Details
View All

Description Samuli Heinonen 2013-10-23 12:46:05 UTC

Created attachment 815391 [details]
GlusterFS client log during rebalance

Description of problem:
GlusterFS client crashes during rebalance after add-brick.


GlusterFS setup before add-brick

Volume Name: dev-el6-sata1
Type: Replicate
Volume ID: 840eccd5-b3fb-4dc8-b67d-966bd22e8557
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: boar1:/gluster/sata/brick1/dev-el6-sata1
Brick2: boar2:/gluster/sata/brick1/dev-el6-sata1
Options Reconfigured:
server.allow-insecure: on
performance.client-io-threads: enable
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 10
performance.quick-read: off
performance.io-cache: off
performance.stat-prefetch: off
network.remote-dio: enable


Version-Release number of selected component (if applicable):
Servers:
CentOS 6.4: 
glusterfs-fuse-3.4.1-1.el6.x86_64
glusterfs-server-3.4.1-1.el6.x86_64
glusterfs-libs-3.4.1-1.el6.x86_64
glusterfs-3.4.1-1.el6.x86_64
glusterfs-cli-3.4.1-1.el6.x86_64

Client:
RHEL 6.5 (beta)
glusterfs-3.4.1-2.el6.x86_64
glusterfs-libs-3.4.1-2.el6.x86_64
glusterfs-fuse-3.4.1-2.el6.x86_64
glusterfs-api-3.4.1-2.el6.x86_64
glusterfs-rdma-3.4.1-2.el6.x86_64
glusterfs-cli-3.4.1-2.el6.x86_64


Steps to Reproduce:
Backend filesystem is on logical volume mounted as:
/dev/mapper/sata--brick1-export on /gluster/sata/brick1 type xfs (rw,noatime,inode64,nobarrier,nobarrier)

For testing purposes new bricks are on same logical volume as older ones

1. gluster vol add-brick dev-el6-sata1 replica 2 boar1:/gluster/sata/brick1/dev-el6-sata2  boar2:/gluster/sata/brick1/dev-el6-sata2
2. gluster vol rebalance dev-el6-sata1 fix-layout start
3. gluster vol rebalance dev-el6-sata1 start

Actual results:
GlusterFS client crashes during rebalance and mount point goes unaccessible (Transpoint endpoint is not connected). After rebalance is finished it's required to use umount -fl to unmount the volume.


Expected results:
Gluster client doesn't crash and mount point is usable during rebalance.

Comment 1 Samuli Heinonen 2013-10-23 12:47:11 UTC

Created attachment 815392 [details]
Backtrace of coredump

Comment 2 Joe Julian 2014-06-05 08:47:11 UTC

*** Bug 1104940 has been marked as a duplicate of this bug. ***

Comment 3 Joe Julian 2014-06-05 16:16:03 UTC

afaict, this bug occurs as the file is migrated to a different server and a fuse cache invalidation is triggered.

Comment 4 Joe Julian 2014-06-06 03:48:23 UTC

I'm not sure if this is relevant.

On the source server, some of the files that were migrated to the destination are still showing as open in lsof, despite their having been deleted.

Comment 5 Susant Kumar Palai 2014-06-11 10:38:27 UTC

Hey Joe, a patch(http://review.gluster.org/#/c/8029/) is sent  addressing the same crash as part of the bug: https://bugzilla.redhat.com/show_bug.cgi?id=961615

Comment 6 Pranith Kumar K 2014-06-16 10:18:07 UTC

*** Bug 1019874 has been marked as a duplicate of this bug. ***

Comment 7 Joe Julian 2014-06-16 13:31:36 UTC

In bug 961615 (above) I tested the backport against 3.4.4. Prior to applying the patch I could crash the clients every time. After the patch I could not. (Yes, I reviewed it verified)

Comment 8 Niels de Vos 2015-05-17 21:57:24 UTC

GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5.

This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs".

If there is no response by the end of the month, this bug will get automatically closed.

Comment 9 Kaleb KEITHLEY 2015-10-07 13:15:42 UTC

GlusterFS 3.4.x has reached end-of-life.

If this bug still exists in a later release please reopen this and change the version or open a new bug.

Note You need to log in before you can comment on or make changes to this bug.