Created attachment 619623 [details] rebalance and client logs Description of problem: Created a distribute volume and after performing rebalance , linkto attribute of a link file is pointing to wrong subvolume Version-Release number of selected component (if applicable): RHS 2.0.z update2 How reproducible: Steps to Reproduce: 1. Created a distribute subvolume of 2 bricks which was serving as RHEVM storage 2. added one more brick performed rebalance 3. added another brick again did rebalance , there were some failures seen in rebalance and storage domain was down RHEM- client log says ====================== ages/7a17d651-e252-4e56-999e-5ed6e431c928 [2012-09-28 16:14:53.864944] I [dht-layout.c:698:dht_layout_dir_mismatch] 6-rebal-dist-dht: subvol: rebal-dist-client-0; inode layout - 0 - 1073741822; disk layout - 0 - 1431655764 [2012-09-28 16:14:53.864958] I [dht-common.c:596:dht_revalidate_cbk] 6-rebal-dist-dht: mismatching layouts for /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 [2012-09-28 16:14:53.865353] I [dht-layout.c:593:dht_layout_normalize] 6-rebal-dist-dht: found anomalies in /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928. holes=0 overlaps=1 [2012-09-28 16:18:42.658611] W [dht-common.c:940:dht_lookup_everywhere_cbk] 6-rebal-dist-dht: multiple subvolumes (rebal-dist-client-2 and rebal-dist-client-1) have file /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (preferably rename the file in the backend, and do a fresh lookup) [2012-09-28 16:18:42.659864] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 6-rebal-dist-client-2: remote operation failed: Permission denied. Path: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c). Key: trusted.glusterfs.dht.linkto [2012-09-28 16:18:42.659912] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: failed to get the 'linkto' xattr Permission denied [2012-09-28 16:18:42.659950] W [fuse-bridge.c:513:fuse_attr_cbk] 0-glusterfs-fuse: 9447643: STAT() /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 => -1 (Structure needs cleaning) [2012-09-28 16:18:42.663010] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 6-rebal-dist-client-2: remote operation failed: Permission denied. Path: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c). Key: trusted.glusterfs.dht.linkto [2012-09-28 16:18:42.663054] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: failed to get the 'linkto' xattr Permission denied [2012-09-28 16:18:49.214758] W [client3_1-fops.c:1183:client3_1_fgetxattr_cbk] 6-rebal-dist-client-3: remote operation failed: Permission denied [2012-09-28 16:18:49.214809] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: (null): failed to get the 'linkto' xattr Permission denied [2012-09-28 16:18:49.215216] I [client-helpers.c:100:this_fd_set_ctx] 6-rebal-dist-client-1: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c): trying duplicate remote fd set. [2012-09-28 16:19:26.934252] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed [2012-09-28 16:19:26.943721] I [glusterfsd-mgmt.c:1568:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing checking the multiple subvolumes having same file =================================================== volume info =========== Volume Name: rebal-dist Type: Distribute Volume ID: 3f5375fb-f39e-4ab9-8aba-f2efd967939d Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/dist-rebal Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/dist-rebal Brick3: rhs-gp-srv12.lab.eng.blr.redhat.com:/dist-rebal Brick4: rhs-gp-srv15.lab.eng.blr.redhat.com:/dist-rebal Options Reconfigured: performance.quick-read: disable performance.io-cache: disable performance.stat-prefetch: disable performance.read-ahead: disable storage.linux-aio: disable cluster.eager-lock: enable =========================== client-1 ========= [root@rhs-gp-srv11 dist-rebal]# getfattr -d -e hex -m . c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.gfid=0xbff4dc08cede45ebb0cc76b87f9d9f4c [root@rhs-gp-srv11 dist-rebal]# stat c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 File: `c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315' Size: 31543267328 Blocks: 33877064 IO Block: 4096 regular file Device: fd1bh/64795d Inode: 119 Links: 2 Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Access: 2012-09-30 08:37:04.088846036 +0530 Modify: 2012-10-01 12:07:27.213939665 +0530 Change: 2012-10-01 12:07:27.213939665 +0530 client-2 ========== [root@rhs-gp-srv12 dist-rebal]# getfattr -d -e hex -m . c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 trusted.gfid=0xbff4dc08cede45ebb0cc76b87f9d9f4c trusted.glusterfs.dht.linkto=0x726562616c2d646973742d636c69656e742d3300 [root@rhs-gp-srv12 dist-rebal]# getfattr -d -e text -m . c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 trusted.gfid="����E��v���L" trusted.glusterfs.dht.linkto="rebal-dist-client-3" [root@rhs-gp-srv12 dist-rebal]# stat c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 File: `c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: fd11h/64785d Inode: 268435582 Links: 2 Access: (1000/---------T) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-09-28 16:14:46.681867578 +0530 Modify: 2012-09-28 16:14:46.681867578 +0530 Change: 2012-09-28 16:14:46.681867578 +0530 client-3 ========== [root@rhs-gp-srv15 dist-rebal]# getfattr -d -e text -m . c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 getfattr: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory looking at the above output file is actually present on client-1, where as linkto says client-3. Attached the complere rebalance and client logs
[root@rhs-gp-srv11 dist-rebal]# rpm -qa | grep gluster glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch
There seems to be brick down, which led rebalance to fail. This might have caused the in-correct link-to attrs being set. Did a subsequent rebalance (once all bricks are up) fix the issue at hand? [2012-09-28 06:44:41.590748] C [client-handshake.c:126:rpc_client_ping_timer_expired] 0-rebal-dist-client-0: server 10.70.36.8:24012 has not responded in the last 42 seconds, disconnecting. [2012-09-28 06:44:41.591206] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame type(GlusterFS 3.1) op(READ(12)) called at 2012-09-28 06:43:22.661185 (xid=0x95x) [2012-09-28 06:44:41.591249] W [client3_1-fops.c:2720:client3_1_readv_cbk] 0-rebal-dist-client-0: remote operation failed: Transport endpoint is not connected [2012-09-28 06:44:41.591294] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2012-09-28 06:43:59.585065 (xid=0x96x) [2012-09-28 06:44:41.591312] W [client-handshake.c:275:client_ping_cbk] 0-rebal-dist-client-0: timer must have expired [2012-09-28 06:44:41.591326] I [client.c:2090:client_rpc_notify] 0-rebal-dist-client-0: disconnected [2012-09-28 06:44:41.591349] W [dht-common.c:4696:dht_notify] 0-rebal-dist-dht: Received CHILD_DOWN. Exiting [2012-09-28 06:44:41.591351] E [dht-rebalance.c:721:dht_migrate_file] 0-rebal-dist-dht: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/be5a3b2c-d055-4fd7-b7c3-b2c7c1bda987/4684b77c-5817-4069-bbc2-77b9586c3fd6.meta: failed to migrate data
(In reply to comment #3) > There seems to be brick down, which led rebalance to fail. This might have > caused the in-correct link-to attrs being set. Did a subsequent rebalance > (once all bricks are up) fix the issue at hand? > > [2012-09-28 06:44:41.590748] C > [client-handshake.c:126:rpc_client_ping_timer_expired] > 0-rebal-dist-client-0: server 10.70.36.8:24012 has not responded in the last > 42 seconds, disconnecting. > [2012-09-28 06:44:41.591206] E [rpc-clnt.c:373:saved_frames_unwind] > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818] > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) > [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) > [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame > type(GlusterFS 3.1) op(READ(12)) called at 2012-09-28 06:43:22.661185 > (xid=0x95x) > [2012-09-28 06:44:41.591249] W [client3_1-fops.c:2720:client3_1_readv_cbk] > 0-rebal-dist-client-0: remote operation failed: Transport endpoint is not > connected > [2012-09-28 06:44:41.591294] E [rpc-clnt.c:373:saved_frames_unwind] > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x397f20f818] > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) > [0x397f20f4d0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) > [0x397f20ef3e]))) 0-rebal-dist-client-0: forced unwinding frame > type(GlusterFS Handshake) op(PING(3)) called at 2012-09-28 06:43:59.585065 > (xid=0x96x) > [2012-09-28 06:44:41.591312] W [client-handshake.c:275:client_ping_cbk] > 0-rebal-dist-client-0: timer must have expired > [2012-09-28 06:44:41.591326] I [client.c:2090:client_rpc_notify] > 0-rebal-dist-client-0: disconnected > [2012-09-28 06:44:41.591349] W [dht-common.c:4696:dht_notify] > 0-rebal-dist-dht: Received CHILD_DOWN. Exiting > [2012-09-28 06:44:41.591351] E [dht-rebalance.c:721:dht_migrate_file] > 0-rebal-dist-dht: > /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/be5a3b2c-d055-4fd7-b7c3- > b2c7c1bda987/4684b77c-5817-4069-bbc2-77b9586c3fd6.meta: failed to migrate > data shishir, Subsequent rebalance operations did not corrected the issue, strange is it stopped logging this "multiple subvolume" message in rebalnce log. Further rebalance operations just says "Completed" with out any warnings in the log, but still this linkto is pointing somewhere else. Another strange thing is the Child down messages you saw in the log, i have not seen any child down while rebalance is happening but still log says child down, not sure about this issue. If you want you can access the setup ================================== Volume Name: rebal-dist Type: Distribute Volume ID: 3f5375fb-f39e-4ab9-8aba-f2efd967939d Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: rhs-gp-srv4.lab.eng.blr.redhat.com:/dist-rebal Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/dist-rebal Brick3: rhs-gp-srv12.lab.eng.blr.redhat.com:/dist-rebal Brick4: rhs-gp-srv15.lab.eng.blr.redhat.com:/dist-rebal Options Reconfigured: performance.quick-read: disable performance.io-cache: disable performance.stat-prefetch: disable performance.read-ahead: disable storage.linux-aio: off cluster.eager-lock: enable
Was the newly added-bricks directory permissions set to 36:36?
Yes it was set [root@rhs-gp-srv11 rebal]# getfacl /rebal getfacl: Removing leading '/' from absolute path names # file: rebal # owner: vdsm # group: kvm user::rwx group::r-x other::r-x
Please provide the getfacl output for these files/dirs /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/ /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ /c6cd740b-01d4-4c29-afde-5707984a6dcf from all the bricks
(In reply to comment #7) > Please provide the getfacl output for these files/dirs > /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e- > 5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 > /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e- > 5ed6e431c928/ > /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ > /c6cd740b-01d4-4c29-afde-5707984a6dcf > > from all the bricks =============== Brick1 ======= [root@rhs-gp-srv4 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv4 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory [root@rhs-gp-srv4 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv4 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf # file: c6cd740b-01d4-4c29-afde-5707984a6dcf # owner: vdsm # group: kvm user::rwx group::r-x other::r-x ****************************** Brick2 ========= [root@rhs-gp-srv11 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv11 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory [root@rhs-gp-srv11 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv11 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf # file: c6cd740b-01d4-4c29-afde-5707984a6dcf # owner: vdsm # group: kvm user::rwx group::r-x other::r-x ****************************** brick3 ====== [root@rhs-gp-srv12 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv12 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory [root@rhs-gp-srv12 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv12 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf # file: c6cd740b-01d4-4c29-afde-5707984a6dcf # owner: vdsm # group: kvm user::rwx group::r-x other::r-x ********************************************************************* brick 4 ======== [root@rhs-gp-srv15 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928 # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv15 dist-rebal]# getfacl /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 getfacl: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315: No such file or directory [root@rhs-gp-srv15 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # file: c6cd740b-01d4-4c29-afde-5707984a6dcf/images/ # owner: vdsm # group: kvm user::rwx group::r-x other::r-x [root@rhs-gp-srv15 dist-rebal]# getfacl c6cd740b-01d4-4c29-afde-5707984a6dcf # file: c6cd740b-01d4-4c29-afde-5707984a6dcf # owner: vdsm # group: kvm user::rwx group::r-x other::r-x
the client is complaining of ENOSPC (before 2nd brick was added.) [2012-09-28 14:13:26.897456] W [client3_1-fops.c:876:client3_1_writev_cbk] 6-rebal-dist-client-0: remote operation failed: No space left on device [2012-09-28 14:13:26.897664] W [fuse-bridge.c:2025:fuse_writev_cbk] 0-glusterfs-fuse: 9241163: WRITE => -1 (No space left on device) [2012-09-28 14:13:26.906450] W [client3_1-fops.c:876:client3_1_writev_cbk] 6-rebal-dist-client-0: remote operation failed: No space left on device
volume rebal-dist-client-1 9: type protocol/client 10: option remote-host rhs-gp-srv11.lab.eng.blr.redhat.com 11: option remote-subvolume /dist-rebal 12: option transport-type tcp 13: end-volume 2012-09-28 16:13:34.213452] W [client3_1-fops.c:473:client3_1_open_cbk] 6-rebal-dist-client-1: remote operation failed: Permission denied. Path: /c6cd740b-01d4-4c29-afde-5707984a6dcf/images/7a17d651-e252-4e56-999e-5ed6e431c928/bcfe435a-08e6-4365-82b3-021a98d02315 (bff4dc08-cede-45eb-b0cc-76b87f9d9f4c) [2012-09-28 16:13:34.213511] E [dht-helper.c:884:dht_rebalance_inprogress_task] 6-rebal-dist-dht: (null): failed to send open() on target file at rebal-dist-client-1 [2012-09-28 16:13:40.749468] W [client3_1-fops.c:1183:client3_1_fgetxattr_cbk] 6-rebal-dist-client-1: remote operation failed: Permission denied [2012-09-28 16:13:40.749538] E [dht-helper.c:652:dht_migration_complete_check_task] 6-rebal-dist-dht: (null): failed to get the 'linkto' xattr Permission denied [2012-09-28 16:13:40.749578] W [fuse-bridge.c:1948:fuse_readv_cbk] 0-glusterfs-fuse: 9440501: READ => -1 (Structure needs cleaning) selinux seems to be enabled [root@rhs-gp-srv11 ~]# sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: permissive Mode from config file: enforcing Policy version: 24 Policy from config file: targeted Please disable it and re-run your tests
I understand that we need to disable selinux, however, the current mode is permissive so there is no selinux enforcement here. We will _disable_ selinux but I don't see the point in re-running the tests, probably I am overlooking something here ... could you please educate us.
The reason to suspect selinux and request for a rerun of the tests are these 1. clients (mnt) open calls are failing with permission denied errors when they have detected a rebalance in progress 2. All these failed open calls are related to the RHS server in question (selinux enabled) 3. There are getxattr failures, too with permission denied issues. 4. On the backend, all the permissions of the files/dirs look fine 5. There is nothing else to suggest a failure from rebalance/client. 6. All other similar calls have passed through on the subvolumes where selinux has been disabled. Hence, the previous comment.
Not reproducible after disabling selinux