983061 – [RHS-RHOS] Cinder fuse client crashed in afr_fd_has_witnessed_unstable_write after remove-brick operation

Bug 983061 - [RHS-RHOS] Cinder fuse client crashed in afr_fd_has_witnessed_unstable_write after remove-brick operation

Summary: [RHS-RHOS] Cinder fuse client crashed in afr_fd_has_witnessed_unstable_write ...

Keywords:
Status:	CLOSED DUPLICATE of bug 978802
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Amar Tumballi
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-10 12:03 UTC by Anush Shetty
Modified:	2013-12-19 00:09 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-07-10 12:33:01 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Anush Shetty 2013-07-10 12:03:25 UTC

Description of problem: One a 6x2 Distributed-Replicate volume, we added 2 more bricks using add-brick to make it 7x2 Distributed Replicate cinder volume. We created 10 cinder volumes of 15G each. 10 Nova instances were created. 2 pairs of replica bricks were removed using remove-brick start and then commited the same. And trying to attach the cinder volume to the instances, the cinder fuse process crashed.


Version-Release number of selected component (if applicable):
RHS: glusterfs-3.3.0.11rhs-1.el6rhs.x86_64
Cinder: openstack-cinder-2013.1.2-3.el6ost.noarch
Puddle repo:  http://download.lab.bos.redhat.com/rel-eng/OpenStack/Grizzly/2013-07-08.1/puddle.repo

How reproducible: Filing it first time we saw it.

Steps to Reproduce:
1. Create 6x2 Distributed-Replicate volume
2. Configure cinder to use RHS 
3. Create cinder volumes
4. Remove brick operations on RHS volume
5. Attach cinder volume to instance


Actual results:

Cinder fuse client crashed.

Expected results:

Should be able to seamlessly attach the cinder volumes to the instances after remove brick operations.

Additional info:

1. RHOS hostname: rhs-client28.lab.eng.blr.redhat.com

2. RHS nodes (hostname and IP address): 10.70.37.66, 10.70.37.173, 10.70.37.71, 10.70.37.158

3. RHS node from where the gluster commands were executed: 10.70.37.173

4. Volume info
# gluster volume info
 
Volume Name: cinder-vol
Type: Distributed-Replicate
Volume ID: 19f5abf1-5739-417a-bcff-e56d0a5baa74
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.70.37.66:/brick4/s1
Brick2: 10.70.37.173:/brick4/s2
Brick3: 10.70.37.66:/brick5/s3
Brick4: 10.70.37.173:/brick5/s4
Brick5: 10.70.37.71:/brick4/s7
Brick6: 10.70.37.158:/brick4/s8
Brick7: 10.70.37.71:/brick6/s11
Brick8: 10.70.37.158:/brick6/s12
Options Reconfigured:
storage.owner-gid: 165
storage.owner-uid: 165
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on

5. Volume status
# gluster volume status cinder-vol
Status of volume: cinder-vol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.66:/brick4/s1				24009	Y	2106
Brick 10.70.37.173:/brick4/s2				24009	Y	3243
Brick 10.70.37.66:/brick5/s3				24010	Y	2111
Brick 10.70.37.173:/brick5/s4				24010	Y	3249
Brick 10.70.37.71:/brick4/s7				24009	Y	2683
Brick 10.70.37.158:/brick4/s8				24009	Y	14982
Brick 10.70.37.71:/brick6/s11				24011	Y	2695
Brick 10.70.37.158:/brick6/s12				24011	Y	14992
NFS Server on localhost					38467	Y	15718
Self-heal Daemon on localhost				N/A	Y	15724
NFS Server on 10.70.37.66				38467	Y	4693
Self-heal Daemon on 10.70.37.66				N/A	Y	4699
NFS Server on 10.70.37.71				38467	Y	4660
Self-heal Daemon on 10.70.37.71				N/A	Y	4666
NFS Server on 10.70.37.158				38467	Y	25999
Self-heal Daemon on 10.70.37.158			N/A	Y	26005


6. Mount point on the client: 
/var/lib/cinder/volumes/cf55327cba40506e44b37f45f55af5e7
/var/lib/nova/mnt/cf55327cba40506e44b37f45f55af5e7

7. # tail /var/log/glusterfs/var-lib-nova-mnt-cf55327cba40506e44b37f45f55af5e7.log

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-07-10 15:53:01
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0.10rhs
/lib64/libc.so.6[0x362a232920]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/cluster/replicate.so(afr_fd_has_witnessed_unstable_write+0x32)[0x7f77902c4442]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/cluster/replicate.so(afr_fsync+0xa6)[0x7f77902eeaa6]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/cluster/distribute.so(dht_fsync+0x154)[0x7f77900883d4]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/performance/write-behind.so(wb_fsync+0x292)[0x7f778bdfbeb2]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/debug/io-stats.so(io_stats_fsync+0x14d)[0x7f778bbe1fcd]
/usr/lib64/libglusterfs.so.0(syncop_fsync+0x174)[0x3b97a50c94]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/mount/fuse.so(fuse_migrate_fd+0x36b)[0x7f779337d9ab]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/mount/fuse.so(fuse_handle_opened_fds+0xa4)[0x7f779337e674]
/usr/lib64/glusterfs/3.3.0.10rhs/xlator/mount/fuse.so(+0xe749)[0x7f779337e749]
/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x3b97a4c332]
/lib64/libc.so.6[0x362a243b70]

Comment 3 shilpa 2013-07-10 12:19:41 UTC

Reproduced the same bug.

Comment 4 Pranith Kumar K 2013-07-10 12:33:01 UTC


*** This bug has been marked as a duplicate of bug 978802 ***

Note You need to log in before you can comment on or make changes to this bug.