+++ This bug was initially created as a clone of Bug #1342374 +++ +++ This bug was initially created as a clone of Bug #1342372 +++ +++ This bug was initially created as a clone of Bug #1341796 +++ +++ This bug was initially created as a clone of Bug #1341034 +++ Description of problem: From the snapshot taken during directory creation, the directories which were being created aren't accessible. snapshots taken later without any IO ops seems to have consistent data. Volume Name: superman Type: Tier Volume ID: ba49611f-1cbc-4a25-a1a8-8a0eecfe6f76 Status: Started Number of Bricks: 20 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 4 x 2 = 8 Brick1: 10.70.35.133:/bricks/brick7/reg-tier-3 Brick2: 10.70.35.10:/bricks/brick7/reg-tier-3 Brick3: 10.70.35.11:/bricks/brick7/reg-tier-3 Brick4: 10.70.35.225:/bricks/brick7/reg-tier-3 Brick5: 10.70.35.239:/bricks/brick7/reg-tier-3 Brick6: 10.70.37.60:/bricks/brick7/reg-tier-3 Brick7: 10.70.37.120:/bricks/brick7/reg-tier-3 Brick8: 10.70.37.101:/bricks/brick7/reg-tier-3 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick9: 10.70.37.101:/bricks/brick0/l1 Brick10: 10.70.37.120:/bricks/brick0/l1 Brick11: 10.70.37.60:/bricks/brick0/l1 Brick12: 10.70.35.239:/bricks/brick0/l1 Brick13: 10.70.35.225:/bricks/brick0/l1 Brick14: 10.70.35.11:/bricks/brick0/l1 Brick15: 10.70.35.10:/bricks/brick0/l1 Brick16: 10.70.35.133:/bricks/brick0/l1 Brick17: 10.70.37.101:/bricks/brick1/l1 Brick18: 10.70.37.120:/bricks/brick1/l1 Brick19: 10.70.37.60:/bricks/brick1/l1 Brick20: 10.70.35.239:/bricks/brick1/l1 Options Reconfigured: features.barrier: disable features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on cluster.enable-shared-storage: enable nfs-ganesha: disable 'ls -l' from mountpoint where snapshot is activated ??????????? ? ? ? ? ? dir-1 ??????????? ? ? ? ? ? dir-10 ??????????? ? ? ? ? ? dir-11 ??????????? ? ? ? ? ? dir-12 ??????????? ? ? ? ? ? dir-13 ??????????? ? ? ? ? ? dir-14 ??????????? ? ? ? ? ? dir-15 ??????????? ? ? ? ? ? dir-16 ??????????? ? ? ? ? ? dir-17 gluster snapshot list snapshot-superman-1_GMT-2016.05.31-04.54.11 snapshot-superman-2_GMT-2016.05.31-05.02.13 snapshot-superman-3_GMT-2016.05.31-05.08.25 snapshot-superman-4_GMT-2016.05.31-05.24.10 snapshot 'snapshot-superman-1_GMT-2016.05.31-04.54.11' was taken during directory creation. Rest of the snapshots were taken later without IOs. Version-Release number of selected component (if applicable): glusterfs-3.7.9-6.el7rhgs.x86_64 How reproducible: 1/1, yet to determine Steps to Reproduce: 1. create a disperse volume 2 x (4+2) 2. start linux untar operation, mkdir -p dir-{1..1000}/sd-{1..100} from two different clients 3. attach a 4x2 hot tier 4. create a snapshot 5. activate the snapshot and list directories Actual results: directories are inaccessible Expected results: directories should be accessible Additional info: sosreports shall be attached shortly. --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-31 02:55:40 EDT --- This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from krishnaram Karthick on 2016-05-31 04:44:36 EDT --- - Tried reproducing this issue, couldn't reproduce - This is possible when fixlayout in hot tier is not complete and we try to take a snapshot, will have to confirm this theory --- Additional comment from krishnaram Karthick on 2016-05-31 04:45:28 EDT --- --- Additional comment from krishnaram Karthick on 2016-06-01 01:58:14 EDT --- snapshot-1 was activated and mounted on 10.70.47.161 on '/mnt/superman' [root@dhcp47-161 ~]# mount ... 10.70.37.120:/snaps/snapshot-superman-1_GMT-2016.05.31-04.54.11/superman on /mnt/superman type fuse.glusterfs (ro,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) ... --- Additional comment from krishnaram Karthick on 2016-06-01 03:15:19 EDT --- Although this issue is not consistently seen, This issue was not seen in 3.1.2 release. Proposing this bug as a blocker, to discuss and decide whether to take it in 3.1.3. --- Additional comment from Nithya Balachandran on 2016-06-01 04:34:04 EDT --- The logs indicate that the dht selfheal and hence the lookup fails for the directories that are not accessible. From mnt-superman.log on 10.70.47.161: [2016-06-01 05:21:39.776823] I [MSGID: 109063] [dht-layout.c:718:dht_layout_normalize] 0-87fda0f4b2404018904c3d49718497c5-tier-dht: Found anomalies in /dir-9 (gfid = 94224520-61a4-4d26-a2fa-152f2631a295). Holes=1 overlaps=0 [2016-06-01 05:21:39.783140] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-17: remote operation failed. Path: /dir-9 [Invalid argument] [2016-06-01 05:21:39.783278] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-16: remote operation failed. Path: /dir-9 [Invalid argument] [2016-06-01 05:21:39.784359] W [MSGID: 109005] [dht-selfheal.c:1172:dht_selfheal_dir_mkdir_cbk] 0-87fda0f4b2404018904c3d49718497c5-tier-dht: Directory selfheal failed: path = /dir-9, gfid = 94224520-61a4-4d26-a2fa-152f2631a295 [Invalid argument] [2016-06-01 05:21:39.790632] W [fuse-resolve.c:66:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/dir-9: failed to resolve (Invalid argument) From the snapshot brick (/run/gluster/snaps/87fda0f4b2404018904c3d49718497c5/brick3/reg-tier-3) logs for 0-superman-client-17: [2016-06-01 05:19:24.493623] W [MSGID: 120022] [quota-enforcer-client.c:236:quota_enforcer_lookup_cbk] 0-87fda0f4b2404018904c3d49718497c5-quota: Getting cluster-wide size of directory failed (path: / gfid:00000000-0000-0000-0000-000000000001) [Invalid argument] [2016-06-01 05:19:24.493696] E [MSGID: 115056] [server-rpc-fops.c:515:server_mkdir_cbk] 0-87fda0f4b2404018904c3d49718497c5-server: 5516: MKDIR /dir-9 (00000000-0000-0000-0000-000000000001/dir-9) client: dhcp47-161.lab.eng.blr.redhat.com-1099-2016/05/31-05:21:53:395888-superman-client-17-0-0 [Invalid argument] On examining the quotad process using gdb, the operation fails in quotad_aggregator_lookup () -> qd_nameless_lookup (). qd_nameless_lookup () { ... subvol = qd_find_subvol (this, volume_uuid); if (subvol == NULL) { op_errno = EINVAL; <------ fails here goto out; } This is because snapshot volumes are not part of the quotad graph. This is unrelated to tiering. Modifying the description accordingly. --- Additional comment from Rejy M Cyriac on 2016-06-01 07:41:59 EDT --- Accepted as Blocker for RHGS 3.1.3 release at the Blocker Bug Triage meeting on 01 June 2016 --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-06-01 09:58:07 EDT --- Since this bug has been approved for the z-stream release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.1.z+', and has been marked for RHGS 3.1 Update 3 release through the Internal Whiteboard entry of '3.1.3', the Target Release is being automatically set to 'RHGS 3.1.3' --- Additional comment from Vijay Bellur on 2016-06-01 15:00:02 EDT --- REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#1) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-06-02 03:16:08 EDT --- REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#2) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-06-02 07:14:57 EDT --- REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#3) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-06-02 07:54:07 EDT --- REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#4) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Vijay Bellur on 2016-06-03 02:09:03 EDT --- COMMIT: http://review.gluster.org/14608 committed in master by Rajesh Joseph (rjoseph) ------ commit 03d523504230c336cf585159266e147945f31153 Author: Mohammed Rafi KC <rkavunga> Date: Wed Jun 1 23:01:37 2016 +0530 glusterd/snapshot: remove quota related options from snap volfile enabling inode-quota on a snapshot volume is unnecessary, because snapshot is a read-only volume. So we don't need to enforce quota on a snapshot volume. This patch will remove the quota related options from snapshot volfile. Change-Id: Iddabcb83820dac2384924a01d45abe1ef1e95600 BUG: 1341796 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/14608 Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: N Balachandran <nbalacha> NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.com> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Rajesh Joseph <rjoseph>
REVIEW: http://review.gluster.org/14630 (glusterd/snapshot: remove quota related options from snap volfile) posted (#1) for review on release-3.6 by mohammed rafi kc (rkavunga)
This bug is being closed as GlusterFS-3.6 will no longer be recieving bug fixes. This bug has been fixed in more recent GlusterFS releases. If you still face this bug with the newer GlusterFS versions, please open a new bug.