Bug 861846 - Link to xattr is pointing to wrong subvolume after rebalance
Summary: Link to xattr is pointing to wrong subvolume after rebalance
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: shishir gowda
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-10-01 06:40 UTC by shylesh
Modified: 2013-12-09 01:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-05 06:59:16 UTC
Embargoed:


Attachments (Terms of Use)
rebalance and client logs (66.01 KB, application/x-gzip)
2012-10-01 06:40 UTC, shylesh
no flags Details

Description shylesh 2012-10-01 06:40:48 UTC
Created attachment 619623 [details]
rebalance and client logs

Description of problem:
Created a distribute volume and after performing rebalance , linkto attribute of a link file is pointing to wrong subvolume

Version-Release number of selected component (if applicable):
RHS 2.0.z update2

How reproducible:


Steps to Reproduce:
1. Created a distribute subvolume of 2 bricks which was serving as RHEVM storage
2. added one more brick performed rebalance
3. added another brick again did rebalance , there were some failures seen in rebalance and storage domain was down
  

RHEM- client log says 
======================
ages/7a17d651-e252-4e56-999e-5ed6e431c928
[2012-09-28 16:14:53.864944] I [dht-layout.c:698:dht_layout_dir_mismatch] 6-rebal-dist-dht: subvol: rebal-dist-client-0; inode layout - 0 - 1073741822; disk layout - 0 - 1431655764
[2012-09-28 16:14:53.864958] I [dht-common.c:596:dht_revalidate_cbk] 6-rebal-dist-dht: mismatching layouts for /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
[2012-09-28 16:14:53.865353] I [dht-layout.c:593:dht_layout_normalize] 6-rebal-dist-dht: found anomalies in /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928. holes=0 overlaps=1
[2012-09-28 16:18:42.658611] W [dht-common.c:940:dht_lookup_everywhere_cbk] 6-rebal-dist-dht: multiple subvolumes (rebal-dist-client-2 and rebal-dist-client-1) have file /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (preferably rename the file in the backend, and do a fresh lookup)
[2012-09-28 16:18:42.659864] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 6-rebal-dist-client-2: remote operation failed: Permission denied. Path: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c). Key: trusted.glusterfs.dht.linkto
[2012-09-28 16:18:42.659912] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: failed to get the 'linkto' xattr Permission denied
[2012-09-28 16:18:42.659950] W [fuse-bridge.c:513:fuse_attr_cbk] 0-glusterfs-fuse: 9447643: STAT() /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 => -1 (Structure needs cleaning)
[2012-09-28 16:18:42.663010] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 6-rebal-dist-client-2: remote operation failed: Permission denied. Path: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c). Key: trusted.glusterfs.dht.linkto
[2012-09-28 16:18:42.663054] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: failed to get the 'linkto' xattr Permission denied
[2012-09-28 16:18:49.214758] W [client3_1-fops.c:1183:client3_1_fgetxattr_cbk] 6-rebal-dist-client-3: remote operation failed: Permission denied
[2012-09-28 16:18:49.214809] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: (null): failed to get the 'linkto' xattr Permission denied
[2012-09-28 16:18:49.215216] I [client-helpers.c:100:this_fd_set_ctx] 6-rebal-dist-client-1: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c): trying duplicate remote fd set. 
[2012-09-28 16:19:26.934252] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-09-28 16:19:26.943721] I [glusterfsd-mgmt.c:1568:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing




checking the multiple subvolumes having same file
===================================================
volume info
===========
Volume Name: rebal-dist
Type: Distribute
Volume ID: 3f5375fb-f39e-4ab9-8aba-f2efd967939d
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/dist-rebal
Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/dist-rebal
Brick3: rhs-gp-srv12.lab.eng.blr.redhat.com:/dist-rebal
Brick4: rhs-gp-srv15.lab.eng.blr.redhat.com:/dist-rebal
Options Reconfigured:
performance.quick-read: disable
performance.io-cache: disable
performance.stat-prefetch: disable
performance.read-ahead: disable
storage.linux-aio: disable
cluster.eager-lock: enable

===========================

client-1
=========
[root@rhs-gp-srv11 dist-rebal]#  getfattr -d -e hex -m .  c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.gfid=0xbff4dc08cede45ebb0cc76b87f9d9f4c

[root@rhs-gp-srv11 dist-rebal]# stat  c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
  File: `c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315'
  Size: 31543267328     Blocks: 33877064   IO Block: 4096   regular file
Device: fd1bh/64795d    Inode: 119         Links: 2
Access: (0660/-rw-rw----)  Uid: (   36/    vdsm)   Gid: (   36/     kvm)
Access: 2012-09-30 08:37:04.088846036 +0530
Modify: 2012-10-01 12:07:27.213939665 +0530
Change: 2012-10-01 12:07:27.213939665 +0530



client-2
==========
[root@rhs-gp-srv12 dist-rebal]# getfattr -d -e hex -m .  c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
trusted.gfid=0xbff4dc08cede45ebb0cc76b87f9d9f4c
trusted.glusterfs.dht.linkto=0x726562616c2d646973742d636c69656e742d3300

[root@rhs-gp-srv12 dist-rebal]# getfattr -d -e text -m .  c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
trusted.gfid="����E��v���L"
trusted.glusterfs.dht.linkto="rebal-dist-client-3"

[root@rhs-gp-srv12 dist-rebal]# stat c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
  File: `c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315'
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd11h/64785d    Inode: 268435582   Links: 2
Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-09-28 16:14:46.681867578 +0530
Modify: 2012-09-28 16:14:46.681867578 +0530
Change: 2012-09-28 16:14:46.681867578 +0530


client-3
==========

[root@rhs-gp-srv15 dist-rebal]# getfattr -d -e text -m .  c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
getfattr: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory



looking at the above output file is actually present on client-1, where as linkto says client-3.



Attached the complere rebalance and client logs

Comment 1 shylesh 2012-10-01 06:52:14 UTC
[root@rhs-gp-srv11 dist-rebal]# rpm -qa | grep gluster
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch

Comment 3 shishir gowda 2012-10-04 04:04:00 UTC
There seems to be brick down, which led rebalance to fail. This might have caused the in-correct link-to attrs being set. Did a subsequent rebalance (once all bricks are up) fix the issue at hand?

[2012-09-28 06:44:41.590748] C [client-handshake.c:126:rpc_client_ping_timer_expired] 0-rebal-dist-client-0: server 10.70.36.8:24012 has not responded in the last 42 seconds, disconnecting.
[2012-09-28 06:44:41.591206] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2012-09-28 06:43:22.661185 (xid=0x95x)
[2012-09-28 06:44:41.591249] W [client3_1-fops.c:2720:client3_1_readv_cbk] 0-rebal-dist-client-0: remote operation failed: Transport endpoint is not connected
[2012-09-28 06:44:41.591294] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2012-09-28 06:43:59.585065 (xid=0x96x)
[2012-09-28 06:44:41.591312] W [client-handshake.c:275:client_ping_cbk] 0-rebal-dist-client-0: timer must have expired
[2012-09-28 06:44:41.591326] I [client.c:2090:client_rpc_notify] 0-rebal-dist-client-0: disconnected
[2012-09-28 06:44:41.591349] W [dht-common.c:4696:dht_notify] 0-rebal-dist-dht: Received CHILD_DOWN. Exiting
[2012-09-28 06:44:41.591351] E [dht-rebalance.c:721:dht_migrate_file] 0-rebal-dist-dht: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/be5a3b2c-d055-4fd7-b7c3-b2c7c1bda987/4684b77c-5817-4069-bbc2-77b9586c3fd6.meta: failed to migrate data

Comment 4 shylesh 2012-10-04 04:40:09 UTC
(In reply to comment #3)
> There seems to be brick down, which led rebalance to fail. This might have
> caused the in-correct link-to attrs being set. Did a subsequent rebalance
> (once all bricks are up) fix the issue at hand?
> 
> [2012-09-28 06:44:41.590748] C
> [client-handshake.c:126:rpc_client_ping_timer_expired]
> 0-rebal-dist-client-0: server 10.70.36.8:24012 has not responded in the last
> 42 seconds, disconnecting.
> [2012-09-28 06:44:41.591206] E [rpc-clnt.c:373:saved_frames_unwind]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0)
> [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)
> [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame
> type(GlusterFS 3.1) op(READ(12)) called at 2012-09-28 06:43:22.661185
> (xid=0x95x)
> [2012-09-28 06:44:41.591249] W [client3_1-fops.c:2720:client3_1_readv_cbk]
> 0-rebal-dist-client-0: remote operation failed: Transport endpoint is not
> connected
> [2012-09-28 06:44:41.591294] E [rpc-clnt.c:373:saved_frames_unwind]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0)
> [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)
> [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame
> type(GlusterFS Handshake) op(PING(3)) called at 2012-09-28 06:43:59.585065
> (xid=0x96x)
> [2012-09-28 06:44:41.591312] W [client-handshake.c:275:client_ping_cbk]
> 0-rebal-dist-client-0: timer must have expired
> [2012-09-28 06:44:41.591326] I [client.c:2090:client_rpc_notify]
> 0-rebal-dist-client-0: disconnected
> [2012-09-28 06:44:41.591349] W [dht-common.c:4696:dht_notify]
> 0-rebal-dist-dht: Received CHILD_DOWN. Exiting
> [2012-09-28 06:44:41.591351] E [dht-rebalance.c:721:dht_migrate_file]
> 0-rebal-dist-dht:
> /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/be5a3b2c-d055-4fd7-b7c3-
> b2c7c1bda987/4684b77c-5817-4069-bbc2-77b9586c3fd6.meta: failed to migrate
> data


shishir,

Subsequent rebalance operations did not corrected the issue, strange is it stopped logging this "multiple subvolume" message in rebalnce log. Further rebalance operations just says "Completed" with out any warnings in the log, but still this linkto is pointing somewhere else.

Another strange thing is the Child down messages you saw in the log, i have not seen any child down while rebalance is happening but still log says child down, not sure about this issue.


If you want you can access the setup
==================================
Volume Name: rebal-dist
Type: Distribute
Volume ID: 3f5375fb-f39e-4ab9-8aba-f2efd967939d
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/dist-rebal
Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/dist-rebal
Brick3: rhs-gp-srv12.lab.eng.blr.redhat.com:/dist-rebal
Brick4: rhs-gp-srv15.lab.eng.blr.redhat.com:/dist-rebal
Options Reconfigured:
performance.quick-read: disable
performance.io-cache: disable
performance.stat-prefetch: disable
performance.read-ahead: disable
storage.linux-aio: off
cluster.eager-lock: enable

Comment 5 shishir gowda 2012-10-04 06:36:42 UTC
Was the newly added-bricks directory permissions set to 36:36?

Comment 6 shylesh 2012-10-04 06:41:57 UTC
Yes it was set

[root@rhs-gp-srv11 rebal]# getfacl /rebal
getfacl: Removing leading '/' from absolute path names
# file: rebal
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x

Comment 7 shishir gowda 2012-10-04 06:52:34 UTC
Please provide the getfacl output for these files/dirs
/c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
/c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/
/c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
/c6cd740b-01d4-4c29-afde-5707984a6dcf

from all the bricks

Comment 8 shylesh 2012-10-04 07:11:48 UTC
(In reply to comment #7)
> Please provide the getfacl output for these files/dirs
> /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-
> 5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
> /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-
> 5ed6e431c928/
> /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
> /c6cd740b-01d4-4c29-afde-5707984a6dcf
> 
> from all the bricks

===============

 
Brick1
=======
[root@rhs-gp-srv4 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x

[root@rhs-gp-srv4 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory



[root@rhs-gp-srv4 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x


[root@rhs-gp-srv4 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x


******************************
Brick2
=========
[root@rhs-gp-srv11 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x


[root@rhs-gp-srv11 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory


[root@rhs-gp-srv11 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x

[root@rhs-gp-srv11 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x

******************************

brick3
======
[root@rhs-gp-srv12 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x


[root@rhs-gp-srv12 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory



[root@rhs-gp-srv12 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x


[root@rhs-gp-srv12 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x


*********************************************************************
brick 4
========
[root@rhs-gp-srv15 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x


[root@rhs-gp-srv15 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315
getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory

[root@rhs-gp-srv15 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x

[root@rhs-gp-srv15 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf
# file: c6cd740b-01d4-4c29-afde-5707984a6dcf
# owner: vdsm
# group: kvm
user::rwx
group::r-x
other::r-x

Comment 9 shishir gowda 2012-10-04 08:01:36 UTC
the client is complaining of ENOSPC (before 2nd brick was added.)

[2012-09-28 14:13:26.897456] W [client3_1-fops.c:876:client3_1_writev_cbk] 6-rebal-dist-client-0: remote operation failed: No space left on device
[2012-09-28 14:13:26.897664] W [fuse-bridge.c:2025:fuse_writev_cbk] 0-glusterfs-fuse: 9241163: WRITE => -1 (No space left on device)
[2012-09-28 14:13:26.906450] W [client3_1-fops.c:876:client3_1_writev_cbk] 6-rebal-dist-client-0: remote operation failed: No space left on device

Comment 10 shishir gowda 2012-10-04 08:36:53 UTC
volume rebal-dist-client-1
  9:     type protocol/client
 10:     option remote-host rhs-gp-srv11.lab.eng.blr.redhat.com
 11:     option remote-subvolume /dist-rebal
 12:     option transport-type tcp
 13: end-volume

2012-09-28 16:13:34.213452] W [client3_1-fops.c:473:client3_1_open_cbk] 6-rebal-dist-client-1: remote operation failed: Permission denied. Path: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c)
[2012-09-28 16:13:34.213511] E [dht-helper.c:884:dht_rebalance_inprogress_task] 6-rebal-dist-dht: (null): failed to send open() on target file at rebal-dist-client-1
[2012-09-28 16:13:40.749468] W [client3_1-fops.c:1183:client3_1_fgetxattr_cbk] 6-rebal-dist-client-1: remote operation failed: Permission denied
[2012-09-28 16:13:40.749538] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: (null): failed to get the 'linkto' xattr Permission denied
[2012-09-28 16:13:40.749578] W [fuse-bridge.c:1948:fuse_readv_cbk] 0-glusterfs-fuse: 9440501: READ => -1 (Structure needs cleaning)

selinux seems to be enabled
[root@rhs-gp-srv11 ~]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   permissive
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted

Please disable it and re-run your tests

Comment 11 Gowrishankar Rajaiyan 2012-10-04 09:11:05 UTC
I understand that we need to disable selinux, however, the current mode is permissive so there is no selinux enforcement here. We will _disable_ selinux but I don't see the point in re-running the tests, probably I am overlooking something here ... could you please educate us.

Comment 12 shishir gowda 2012-10-04 10:19:28 UTC
The reason to suspect selinux and request for a rerun of the tests are these

1. clients (mnt) open calls are failing with permission denied errors when they have detected a rebalance in progress
2. All these failed open calls are related to the RHS server in question (selinux enabled)
3. There are getxattr failures, too with permission denied issues. 
4. On the backend, all the permissions of the files/dirs look fine
5. There is nothing else to suggest a failure from rebalance/client.
6. All other similar calls have passed through on the subvolumes where selinux has been disabled.

Hence, the previous comment.

Comment 14 shylesh 2012-10-05 06:59:16 UTC
Not reproducible after disabling selinux


Note You need to log in before you can comment on or make changes to this bug.