Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1342375

Summary:	[quota+snapshot]: Directories are inaccessible from activated snapshot, when the snapshot was created during directory creation
Product:	[Community] GlusterFS	Reporter:	Mohammed Rafi KC <rkavunga>
Component:	snapshot	Assignee:	Mohammed Rafi KC <rkavunga>
Status:	CLOSED EOL	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.6.10	CC:	bugs, kramdoss, nbalacha, rcyriac, rhinduja, rjoseph, rkavunga
Target Milestone:	---	Keywords:	Regression, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1342374	Environment:
Last Closed:	2016-08-16 12:55:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1341034, 1341796, 1342372, 1342374
Bug Blocks:

Description Mohammed Rafi KC 2016-06-03 06:12:27 UTC

+++ This bug was initially created as a clone of Bug #1342374 +++

+++ This bug was initially created as a clone of Bug #1342372 +++

+++ This bug was initially created as a clone of Bug #1341796 +++

+++ This bug was initially created as a clone of Bug #1341034 +++

Description of problem:
From the snapshot taken during directory creation, the directories which were being created aren't accessible.
snapshots taken later without any IO ops seems to  have consistent data.

Volume Name: superman
Type: Tier
Volume ID: ba49611f-1cbc-4a25-a1a8-8a0eecfe6f76
Status: Started
Number of Bricks: 20
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick1: 10.70.35.133:/bricks/brick7/reg-tier-3
Brick2: 10.70.35.10:/bricks/brick7/reg-tier-3
Brick3: 10.70.35.11:/bricks/brick7/reg-tier-3
Brick4: 10.70.35.225:/bricks/brick7/reg-tier-3
Brick5: 10.70.35.239:/bricks/brick7/reg-tier-3
Brick6: 10.70.37.60:/bricks/brick7/reg-tier-3
Brick7: 10.70.37.120:/bricks/brick7/reg-tier-3
Brick8: 10.70.37.101:/bricks/brick7/reg-tier-3
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick9: 10.70.37.101:/bricks/brick0/l1
Brick10: 10.70.37.120:/bricks/brick0/l1
Brick11: 10.70.37.60:/bricks/brick0/l1
Brick12: 10.70.35.239:/bricks/brick0/l1
Brick13: 10.70.35.225:/bricks/brick0/l1
Brick14: 10.70.35.11:/bricks/brick0/l1
Brick15: 10.70.35.10:/bricks/brick0/l1
Brick16: 10.70.35.133:/bricks/brick0/l1
Brick17: 10.70.37.101:/bricks/brick1/l1
Brick18: 10.70.37.120:/bricks/brick1/l1
Brick19: 10.70.37.60:/bricks/brick1/l1
Brick20: 10.70.35.239:/bricks/brick1/l1
Options Reconfigured:
features.barrier: disable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: disable

'ls -l' from mountpoint where snapshot is activated

???????????   ? ?    ?           ?            ? dir-1
???????????   ? ?    ?           ?            ? dir-10
???????????   ? ?    ?           ?            ? dir-11
???????????   ? ?    ?           ?            ? dir-12
???????????   ? ?    ?           ?            ? dir-13
???????????   ? ?    ?           ?            ? dir-14
???????????   ? ?    ?           ?            ? dir-15
???????????   ? ?    ?           ?            ? dir-16
???????????   ? ?    ?           ?            ? dir-17

gluster  snapshot list
snapshot-superman-1_GMT-2016.05.31-04.54.11
snapshot-superman-2_GMT-2016.05.31-05.02.13
snapshot-superman-3_GMT-2016.05.31-05.08.25
snapshot-superman-4_GMT-2016.05.31-05.24.10

snapshot 'snapshot-superman-1_GMT-2016.05.31-04.54.11' was taken during directory creation. Rest of the snapshots were taken later without IOs.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-6.el7rhgs.x86_64

How reproducible:
1/1, yet to determine

Steps to Reproduce:
1. create a disperse volume 2 x (4+2)
2. start linux untar operation, mkdir -p dir-{1..1000}/sd-{1..100} from two different clients
3. attach a 4x2 hot tier
4. create a snapshot 
5. activate the snapshot and list directories

Actual results:
directories are inaccessible

Expected results:
directories should be accessible

Additional info:
sosreports shall be attached shortly.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-31 02:55:40 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from krishnaram Karthick on 2016-05-31 04:44:36 EDT ---

 - Tried reproducing this issue, couldn't reproduce
 - This is possible when fixlayout in hot tier is not complete and we try to take a snapshot, will have to confirm this theory

--- Additional comment from krishnaram Karthick on 2016-05-31 04:45:28 EDT ---



--- Additional comment from krishnaram Karthick on 2016-06-01 01:58:14 EDT ---

snapshot-1 was activated and mounted on 10.70.47.161 on '/mnt/superman'

[root@dhcp47-161 ~]# mount
...
10.70.37.120:/snaps/snapshot-superman-1_GMT-2016.05.31-04.54.11/superman on /mnt/superman type fuse.glusterfs (ro,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
...

--- Additional comment from krishnaram Karthick on 2016-06-01 03:15:19 EDT ---

Although this issue is not consistently seen, This issue was not seen in 3.1.2 release.

Proposing this bug as a blocker, to discuss and decide whether to take it in 3.1.3.

--- Additional comment from Nithya Balachandran on 2016-06-01 04:34:04 EDT ---

The logs indicate that the dht selfheal and hence the lookup fails for the directories that are not accessible.



From mnt-superman.log on 10.70.47.161:

[2016-06-01 05:21:39.776823] I [MSGID: 109063] [dht-layout.c:718:dht_layout_normalize] 0-87fda0f4b2404018904c3d49718497c5-tier-dht: Found anomalies in /dir-9 (gfid = 94224520-61a4-4d26-a2fa-152f2631a295). Holes=1 overlaps=0
[2016-06-01 05:21:39.783140] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-17: remote operation failed. Path: /dir-9 [Invalid argument]
[2016-06-01 05:21:39.783278] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-16: remote operation failed. Path: /dir-9 [Invalid argument]
[2016-06-01 05:21:39.784359] W [MSGID: 109005] [dht-selfheal.c:1172:dht_selfheal_dir_mkdir_cbk] 0-87fda0f4b2404018904c3d49718497c5-tier-dht: Directory selfheal failed: path = /dir-9, gfid = 94224520-61a4-4d26-a2fa-152f2631a295 [Invalid argument]
[2016-06-01 05:21:39.790632] W [fuse-resolve.c:66:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/dir-9: failed to resolve (Invalid argument)



From the snapshot brick (/run/gluster/snaps/87fda0f4b2404018904c3d49718497c5/brick3/reg-tier-3)  logs for 0-superman-client-17:

[2016-06-01 05:19:24.493623] W [MSGID: 120022] [quota-enforcer-client.c:236:quota_enforcer_lookup_cbk] 0-87fda0f4b2404018904c3d49718497c5-quota: Getting cluster-wide size of directory failed (path: / gfid:00000000-0000-0000-0000-000000000001) [Invalid argument]
[2016-06-01 05:19:24.493696] E [MSGID: 115056] [server-rpc-fops.c:515:server_mkdir_cbk] 0-87fda0f4b2404018904c3d49718497c5-server: 5516: MKDIR /dir-9 (00000000-0000-0000-0000-000000000001/dir-9) client: dhcp47-161.lab.eng.blr.redhat.com-1099-2016/05/31-05:21:53:395888-superman-client-17-0-0 [Invalid argument]



On examining the quotad process using gdb, the operation fails in quotad_aggregator_lookup () -> qd_nameless_lookup ().


qd_nameless_lookup () {

...
        subvol = qd_find_subvol (this, volume_uuid);
        if (subvol == NULL) {
                op_errno = EINVAL;     <------ fails here
                goto out;
        }


This is because snapshot volumes are not part of the quotad graph.


This is unrelated to tiering. Modifying the description accordingly.

--- Additional comment from Rejy M Cyriac on 2016-06-01 07:41:59 EDT ---

Accepted as Blocker for RHGS 3.1.3 release at the Blocker Bug Triage meeting on 01 June 2016

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-06-01 09:58:07 EDT ---

Since this bug has been approved for the z-stream release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.1.z+', and has been marked for RHGS 3.1 Update 3 release through the Internal Whiteboard entry of '3.1.3', the Target Release is being automatically set to 'RHGS 3.1.3'

--- Additional comment from Vijay Bellur on 2016-06-01 15:00:02 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-02 03:16:08 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-02 07:14:57 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-02 07:54:07 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-03 02:09:03 EDT ---

COMMIT: http://review.gluster.org/14608 committed in master by Rajesh Joseph (rjoseph) 
------
commit 03d523504230c336cf585159266e147945f31153
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed Jun 1 23:01:37 2016 +0530

    glusterd/snapshot: remove quota related options from snap volfile
    
    enabling inode-quota on a snapshot volume is unnecessary, because
    snapshot is a read-only volume. So we don't need to enforce quota
    on a snapshot volume.
    
    This patch will remove the quota related options from snapshot
    volfile.
    
    Change-Id: Iddabcb83820dac2384924a01d45abe1ef1e95600
    BUG: 1341796
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/14608
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: N Balachandran <nbalacha>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Rajesh Joseph <rjoseph>

Comment 1 Vijay Bellur 2016-06-03 06:38:56 UTC

REVIEW: http://review.gluster.org/14630 (glusterd/snapshot: remove quota related options from snap volfile) posted (#1) for review on release-3.6 by mohammed rafi  kc (rkavunga)

Comment 2 Mohammed Rafi KC 2016-08-16 12:55:11 UTC

This bug is being closed as GlusterFS-3.6 will no longer be recieving bug fixes. This bug has been fixed in more recent GlusterFS releases. If you still face this bug with the newer GlusterFS versions, please open a new bug.