Bug 1342374 - [quota+snapshot]: Directories are inaccessible from activated snapshot, when the snapshot was created during directory creation
Summary: [quota+snapshot]: Directories are inaccessible from activated snapshot, when ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: snapshot
Version: 3.7.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact:
URL:
Whiteboard:
Depends On: 1341034 1341796 1342372
Blocks: 1311817 1342375
TreeView+ depends on / blocked
 
Reported: 2016-06-03 06:11 UTC by Mohammed Rafi KC
Modified: 2016-06-28 12:19 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.12
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1342372
: 1342375 (view as bug list)
Environment:
Last Closed: 2016-06-28 12:19:23 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Mohammed Rafi KC 2016-06-03 06:11:28 UTC
+++ This bug was initially created as a clone of Bug #1342372 +++

+++ This bug was initially created as a clone of Bug #1341796 +++

+++ This bug was initially created as a clone of Bug #1341034 +++

Description of problem:
From the snapshot taken during directory creation, the directories which were being created aren't accessible.
snapshots taken later without any IO ops seems to  have consistent data.

Volume Name: superman
Type: Tier
Volume ID: ba49611f-1cbc-4a25-a1a8-8a0eecfe6f76
Status: Started
Number of Bricks: 20
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick1: 10.70.35.133:/bricks/brick7/reg-tier-3
Brick2: 10.70.35.10:/bricks/brick7/reg-tier-3
Brick3: 10.70.35.11:/bricks/brick7/reg-tier-3
Brick4: 10.70.35.225:/bricks/brick7/reg-tier-3
Brick5: 10.70.35.239:/bricks/brick7/reg-tier-3
Brick6: 10.70.37.60:/bricks/brick7/reg-tier-3
Brick7: 10.70.37.120:/bricks/brick7/reg-tier-3
Brick8: 10.70.37.101:/bricks/brick7/reg-tier-3
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick9: 10.70.37.101:/bricks/brick0/l1
Brick10: 10.70.37.120:/bricks/brick0/l1
Brick11: 10.70.37.60:/bricks/brick0/l1
Brick12: 10.70.35.239:/bricks/brick0/l1
Brick13: 10.70.35.225:/bricks/brick0/l1
Brick14: 10.70.35.11:/bricks/brick0/l1
Brick15: 10.70.35.10:/bricks/brick0/l1
Brick16: 10.70.35.133:/bricks/brick0/l1
Brick17: 10.70.37.101:/bricks/brick1/l1
Brick18: 10.70.37.120:/bricks/brick1/l1
Brick19: 10.70.37.60:/bricks/brick1/l1
Brick20: 10.70.35.239:/bricks/brick1/l1
Options Reconfigured:
features.barrier: disable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: disable

'ls -l' from mountpoint where snapshot is activated

???????????   ? ?    ?           ?            ? dir-1
???????????   ? ?    ?           ?            ? dir-10
???????????   ? ?    ?           ?            ? dir-11
???????????   ? ?    ?           ?            ? dir-12
???????????   ? ?    ?           ?            ? dir-13
???????????   ? ?    ?           ?            ? dir-14
???????????   ? ?    ?           ?            ? dir-15
???????????   ? ?    ?           ?            ? dir-16
???????????   ? ?    ?           ?            ? dir-17

gluster  snapshot list
snapshot-superman-1_GMT-2016.05.31-04.54.11
snapshot-superman-2_GMT-2016.05.31-05.02.13
snapshot-superman-3_GMT-2016.05.31-05.08.25
snapshot-superman-4_GMT-2016.05.31-05.24.10

snapshot 'snapshot-superman-1_GMT-2016.05.31-04.54.11' was taken during directory creation. Rest of the snapshots were taken later without IOs.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-6.el7rhgs.x86_64

How reproducible:
1/1, yet to determine

Steps to Reproduce:
1. create a disperse volume 2 x (4+2)
2. start linux untar operation, mkdir -p dir-{1..1000}/sd-{1..100} from two different clients
3. attach a 4x2 hot tier
4. create a snapshot 
5. activate the snapshot and list directories

Actual results:
directories are inaccessible

Expected results:
directories should be accessible

Additional info:
sosreports shall be attached shortly.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-31 02:55:40 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from krishnaram Karthick on 2016-05-31 04:44:36 EDT ---

 - Tried reproducing this issue, couldn't reproduce
 - This is possible when fixlayout in hot tier is not complete and we try to take a snapshot, will have to confirm this theory

--- Additional comment from krishnaram Karthick on 2016-05-31 04:45:28 EDT ---



--- Additional comment from krishnaram Karthick on 2016-06-01 01:58:14 EDT ---

snapshot-1 was activated and mounted on 10.70.47.161 on '/mnt/superman'

[root@dhcp47-161 ~]# mount
...
10.70.37.120:/snaps/snapshot-superman-1_GMT-2016.05.31-04.54.11/superman on /mnt/superman type fuse.glusterfs (ro,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
...

--- Additional comment from krishnaram Karthick on 2016-06-01 03:15:19 EDT ---

Although this issue is not consistently seen, This issue was not seen in 3.1.2 release.

Proposing this bug as a blocker, to discuss and decide whether to take it in 3.1.3.

--- Additional comment from Nithya Balachandran on 2016-06-01 04:34:04 EDT ---

The logs indicate that the dht selfheal and hence the lookup fails for the directories that are not accessible.



From mnt-superman.log on 10.70.47.161:

[2016-06-01 05:21:39.776823] I [MSGID: 109063] [dht-layout.c:718:dht_layout_normalize] 0-87fda0f4b2404018904c3d49718497c5-tier-dht: Found anomalies in /dir-9 (gfid = 94224520-61a4-4d26-a2fa-152f2631a295). Holes=1 overlaps=0
[2016-06-01 05:21:39.783140] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-17: remote operation failed. Path: /dir-9 [Invalid argument]
[2016-06-01 05:21:39.783278] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-superman-client-16: remote operation failed. Path: /dir-9 [Invalid argument]
[2016-06-01 05:21:39.784359] W [MSGID: 109005] [dht-selfheal.c:1172:dht_selfheal_dir_mkdir_cbk] 0-87fda0f4b2404018904c3d49718497c5-tier-dht: Directory selfheal failed: path = /dir-9, gfid = 94224520-61a4-4d26-a2fa-152f2631a295 [Invalid argument]
[2016-06-01 05:21:39.790632] W [fuse-resolve.c:66:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/dir-9: failed to resolve (Invalid argument)



From the snapshot brick (/run/gluster/snaps/87fda0f4b2404018904c3d49718497c5/brick3/reg-tier-3)  logs for 0-superman-client-17:

[2016-06-01 05:19:24.493623] W [MSGID: 120022] [quota-enforcer-client.c:236:quota_enforcer_lookup_cbk] 0-87fda0f4b2404018904c3d49718497c5-quota: Getting cluster-wide size of directory failed (path: / gfid:00000000-0000-0000-0000-000000000001) [Invalid argument]
[2016-06-01 05:19:24.493696] E [MSGID: 115056] [server-rpc-fops.c:515:server_mkdir_cbk] 0-87fda0f4b2404018904c3d49718497c5-server: 5516: MKDIR /dir-9 (00000000-0000-0000-0000-000000000001/dir-9) client: dhcp47-161.lab.eng.blr.redhat.com-1099-2016/05/31-05:21:53:395888-superman-client-17-0-0 [Invalid argument]



On examining the quotad process using gdb, the operation fails in quotad_aggregator_lookup () -> qd_nameless_lookup ().


qd_nameless_lookup () {

...
        subvol = qd_find_subvol (this, volume_uuid);
        if (subvol == NULL) {
                op_errno = EINVAL;     <------ fails here
                goto out;
        }


This is because snapshot volumes are not part of the quotad graph.


This is unrelated to tiering. Modifying the description accordingly.

--- Additional comment from Rejy M Cyriac on 2016-06-01 07:41:59 EDT ---

Accepted as Blocker for RHGS 3.1.3 release at the Blocker Bug Triage meeting on 01 June 2016

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-06-01 09:58:07 EDT ---

Since this bug has been approved for the z-stream release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.1.z+', and has been marked for RHGS 3.1 Update 3 release through the Internal Whiteboard entry of '3.1.3', the Target Release is being automatically set to 'RHGS 3.1.3'

--- Additional comment from Vijay Bellur on 2016-06-01 15:00:02 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-02 03:16:08 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-02 07:14:57 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-02 07:54:07 EDT ---

REVIEW: http://review.gluster.org/14608 (glusterd/snapshot: remove quota related options from snap volfile) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Vijay Bellur on 2016-06-03 02:09:03 EDT ---

COMMIT: http://review.gluster.org/14608 committed in master by Rajesh Joseph (rjoseph) 
------
commit 03d523504230c336cf585159266e147945f31153
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed Jun 1 23:01:37 2016 +0530

    glusterd/snapshot: remove quota related options from snap volfile
    
    enabling inode-quota on a snapshot volume is unnecessary, because
    snapshot is a read-only volume. So we don't need to enforce quota
    on a snapshot volume.
    
    This patch will remove the quota related options from snapshot
    volfile.
    
    Change-Id: Iddabcb83820dac2384924a01d45abe1ef1e95600
    BUG: 1341796
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/14608
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: N Balachandran <nbalacha>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Rajesh Joseph <rjoseph>

Comment 1 Vijay Bellur 2016-06-03 06:17:56 UTC
REVIEW: http://review.gluster.org/14629 (glusterd/snapshot: remove quota related options from snap volfile) posted (#1) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 2 Vijay Bellur 2016-06-03 15:38:34 UTC
COMMIT: http://review.gluster.org/14629 committed in release-3.7 by Atin Mukherjee (amukherj) 
------
commit 8d493c22deaf52db82f03c049f99d5ac5857769f
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed Jun 1 23:01:37 2016 +0530

    glusterd/snapshot: remove quota related options from snap volfile
    
    enabling inode-quota on a snapshot volume is unnecessary, because
    snapshot is a read-only volume. So we don't need to enforce quota
    on a snapshot volume.
    
    This patch will remove the quota related options from snapshot
    volfile.
    
    Backport of>
    >Change-Id: Iddabcb83820dac2384924a01d45abe1ef1e95600
    >BUG: 1341796
    >Signed-off-by: Mohammed Rafi KC <rkavunga>
    >Reviewed-on: http://review.gluster.org/14608
    >Reviewed-by: Atin Mukherjee <amukherj>
    >Reviewed-by: N Balachandran <nbalacha>
    >NetBSD-regression: NetBSD Build System <jenkins.org>>
    >Smoke: Gluster Build System <jenkins.com>
    >CentOS-regression: Gluster Build System <jenkins.com>
    >Reviewed-by: Rajesh Joseph <rjoseph>
    
    (cherry picked from commit 03d523504230c336cf585159266e147945f31153)
    
    Change-Id: I3c41d8abe8631e57a5fc58e9a0d7550c558ab231
    BUG: 1342374
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/14629
    CentOS-regression: Gluster Build System <jenkins.com>
    Smoke: Gluster Build System <jenkins.com>
    Tested-by: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 3 Kaushal 2016-06-28 12:19:23 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.