Bug 1006124 - DISRTIBUTED-REPLICATE: Lock migration failures are seen after remove-brick commit
Summary: DISRTIBUTED-REPLICATE: Lock migration failures are seen after remove-brick co...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1286180
TreeView+ depends on / blocked
 
Reported: 2013-09-10 04:52 UTC by shylesh
Modified: 2015-11-27 12:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1286180 (view as bug list)
Environment:
Last Closed: 2015-11-27 12:14:21 UTC
Embargoed:


Attachments (Terms of Use)

Description shylesh 2013-09-10 04:52:23 UTC
Description of problem:
Two instances of ping pong was running on the same file, same mount point but from different terminals, decommissioning the brick which contains this file (remove-brick start) causes ping pong test to fail .

Version-Release number of selected component (if applicable):
[root@rhs1-gold myscripts]# rpm -qa| grep gluster
glusterfs-libs-3.4.0.32rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.32rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.32rhs-1.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
gluster-swift-container-1.8.0-6.11.el6rhs.noarch
glusterfs-3.4.0.32rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.32rhs-1.el6rhs.x86_64


How reproducible:
Not sure

Steps to Reproduce:
1.Created a 3x2 distributed-replicate volume 
2.created a single file on the mount point
3.from two different terminals ran pingpong on the above created file (note: it's on the same mountpoint)
4. while pingpong is in progress started remove-brick
gluster volume remove-brick <vol> <brick1> <brick2> start

5. once status is "COMPLETED" , committed the operation

Actual results:

Saw ping pong tests failing 


lock at 3 failed! - Bad file descriptor
unlock at 2 failed! - Bad file descriptor
lock at 0 failed! - Bad file descriptor
unlock at 3 failed! - Bad file descriptor
lock at 1 failed! - Bad file descriptor
unlock at 0 failed! - Bad file descriptor
lock at 2 failed! - Bad file descriptor
unlock at 1 failed! - Bad file descriptor
lock at 3 failed! - Bad file descriptor
unlock at 2 failed! - Bad file descriptor
lock at 0 failed! - Bad file descriptor
unlock at 3 failed! - Bad file descriptor
lock at 1 failed! - Bad file descriptor
unlock at 0 failed! - Bad file descriptor
lock at 2 failed! - Bad file descriptor
unlock at 1 failed! - Bad file descriptor
lock at 3 failed! - Bad file descriptor
unlock at 2 failed! - Bad file descriptor
lock at 0 failed! - Bad file descriptor
unlock at 3 failed! - Bad file descriptor
lock at 1 failed! - Bad file descriptor
unlock at 0 failed! - Bad file descriptor
lock at 2 failed! - Bad file descriptor


 

Additional info:
===============

mnt log says
===========

[2013-09-10 04:26:08.396583] W [fuse-bridge.c:4634:fuse_setlk_resume] 0-glusterfs-fuse: 260298: LK() inode migration of (null) failed (Bad file descriptor)
[2013-09-10 04:26:08.396779] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x164ce5c inode-gfid:a027e925-835c-4a14-8ecf-8d6cff700313) did not complete, failing fop with EBADF (old-subvolume:distr-rep-1 new-subvolume:distr-rep-1)
[2013-09-10 04:26:08.396836] W [fuse-bridge.c:4634:fuse_setlk_resume] 0-glusterfs-fuse: 260299: LK() inode migration of (null) failed (Bad file descriptor)
[2013-09-10 04:26:08.396984] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x164ce5c inode-gfid:a027e925-835c-4a14-8ecf-8d6cff700313) did not complete, failing fop with EBADF (old-subvolume:distr-rep-1 new-subvolume:distr-rep-1)
[2013-09-10 04:26:08.397004] W [fuse-bridge.c:4634:fuse_setlk_resume] 0-glusterfs-fuse: 260300: LK() inode migration of (null) failed (Bad file descriptor)
[2013-09-10 04:26:08.397144] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x164ce5c inode-gfid:a027e925-835c-4a14-8ecf-8d6cff700313) did not complete, failing fop with EBADF (old-subvolume:distr-rep-1 new-subvolume:distr-rep-1)
[2013-09-10 04:26:08.397174] W [fuse-bridge.c:4634:fuse_setlk_resume] 0-glusterfs-fuse: 260301: LK() inode migration of (null) failed (Bad file descriptor)
[2013-09-10 04:26:08.397264] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x164ce5c inode-gfid:a027e925-835c-4a14-8ecf-8d6cff700313) did not complete, failing fop with EBADF (old-subvolume:distr-rep-1 new-subvolume:distr-rep-1)
[2013-09-10 04:26:08.397293] W [fuse-bridge.c:4634:fuse_setlk_resume] 0-glusterfs-fuse: 260302: LK() inode migration of (null) failed (Bad file descriptor)
[2013-09-10 04:26:08.397469] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x164ce5c inode-gfid:a027e925-835c-4a14-8ecf-8d6cff700313) did not complete, failing fop with EBADF (old-subvolume:distr-rep-1 new-subvolume:distr-rep-1)
[2013-09-10 04:26:08.397501] W [fuse-bridge.c:4634:fuse_setlk_resume] 0-glusterfs-fuse: 260303: LK() inode migration of (null) failed (Bad file descriptor)
[2013-09-10 04:26:08.397651] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x164ce5c inode-gfid:a027e925-835c-4a14-8ecf-8d6cff700313) did not complete, failing fop with EBADF (old-subvolume:distr-rep-1 new-subvolume:distr-rep-1)
[2013-09-10 04:26:08.397680] W [fuse-bridge.c:2787:fuse_flush_resume] 0-glusterfs-fuse: 260304: FLUSH() inode migration of (null) failed (Bad file descriptor)



Cluster info
===========
RHS nodes
-------
10.70.37.113
10.70.37.133
10.70.37.134
10.70.37.59

Mounted on 
-----------
/shylesh/distr-rep





Volume info
-----------
Volume Name: distr-rep
Type: Distributed-Replicate 
Volume ID: f39e29ab-8758-440a-ac1b-1d0d9fb4d28f
Status: Started
Number of Bricks: 3 x 2 = 6 
Transport-type: tcp
Bricks:
Brick1: 10.70.37.113:/brick3/distr-rep8
Brick2: 10.70.37.133:/brick3/distr-rep9
Brick3: 10.70.37.134:/brick3/distr-rep10
Brick4: 10.70.37.59:/brick3/distr-rep11
Brick5: 10.70.37.113:/brick3/distr-rep15
Brick6: 10.70.37.133:/brick3/distr-rep15


[root@rhs1-gold myscripts]# gluster v remove-brick distr-rep 10.70.37.134:/brick3/distr-rep10 10.70.37.59:/brick3/distr-rep11 start
volume remove-brick start: success
ID: 40240c87-cdd2-47c9-a554-15eb7c876a60


pingpong command
-----------------
/pingpng FILE 4


attached the sosreports


Note You need to log in before you can comment on or make changes to this bug.