Bug 1384070 - inconsistent file permissions b/w write permission and sticky bits(---------T ) displayed when IOs are going on with md-cache enabled (and within the invalidation cycle)
Summary: inconsistent file permissions b/w write permission and sticky bits(---------T...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: md-cache
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Poornima G
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On:
Blocks: 1351528 1392713 1401376
TreeView+ depends on / blocked
 
Reported: 2016-10-12 13:19 UTC by Nag Pavan Chilakam
Modified: 2018-11-30 05:39 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.8.4-7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1392713 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:09:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Nag Pavan Chilakam 2016-10-12 13:19:43 UTC
Description of problem:
=======================
I have enabled md-cache with private build available at http://etherpad.corp.redhat.com/md-cache-3-2

I created a 2x2 volume and mounted it on two different clients:
From one client i created a 3GB txt file and triggered about 10Million hardlink creation to it as below
(file mnt-distrep.log is the original txt file)
for i in {1..10000000};do ln mnt-distrep.log hardlink.$i;done

I had the volume mounted on another client, say client2:

From the client2 , I was accessing files using ll or ls -l, everything was fine.
I then chnaged file permissions of one of the hardlink files say hardlink.90 to chmod 0777 hardlink.90(all this while the hardlink creations were still going on on client1)

Now if I did an ls -l, I saw that some of the hardlink files(which probably got created recently) were having the sticky bit ---------T 

I saw that in the 2x2 volume, all files were created in say one replica pair while the ---------T  files for each file was created on the other replica pair(this is due to the cached and hashed concept of dht ie data and link to files)

I then chose one file which was displaying  ----T  on the client2 and kept doing ll or ls -l
It was inconsistently toggling b/w data file and link to file as below


[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21860 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21866 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21869 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43729 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21880 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43748 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21888 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43758 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43764 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43768 root root 3156080712 Oct 12 17:29 hardlink.43363



My understanding is that , in this case instead of caching or looking up from cache it is trying to fetch the file info from the bricks and it  is inconsistently fetching from the hashed brick which is not right


Also, If I stopped the hardlink creation on the first client(by ctrl+c of the first client command line)
I see now consistently displaying only the right file permission as below

[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21860 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21866 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21869 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43729 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21880 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43748 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
---------T. 21888 root root 0 Oct 12 17:33 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43758 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43764 root root 3156080712 Oct 12 17:29 hardlink.43363
[root@dhcp35-180 dhcp35-179.lab.eng.blr.redhat.com]# ll hardlink.43363
-rwxrwxrwx. 43768 root root 3156080712 Oct 12 17:29 hardlink.43363



Version-Release number of selected component (if applicable):
====
as in etherpad


How reproducible:


Steps to Reproduce:
1.create md-cache setup on a 2x2 volume
2. mount volume on two clients
3. from cleint 1: create a file and start create of hardlinks to the file in a loop of say some 1lakh hardlinks
4. From another client , say client2 change file permissions of a hardlink already created say hardlink.10
5. as long as the 1lakh hardlink creation is in progress, keep issuing ls -l of hardlink.20 (some hardlink which has been created)
You will see that the file permissions is sometimes shown as what is expected and sometimes with just the sticky bit

Actual results:


Expected results:


Additional info:

Comment 3 Poornima G 2016-11-08 05:25:57 UTC
Its a nice finding.

Fix posted upstream: http://review.gluster.org/#/c/15789/

Comment 6 Atin Mukherjee 2016-12-05 03:39:38 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/91957/

Comment 8 Vivek Das 2017-01-12 05:07:26 UTC
Followed the steps the reproduce

Steps to Reproduce:
1.create md-cache setup on a 2x2 volume
2. mount volume on two clients
3. from cleint 1: create a file and start create of hardlinks to the file in a loop of say some 1lakh hardlinks
4. From another client , say client2 change file permissions of a hardlink already created say hardlink.10
5. as long as the 1lakh hardlink creation is in progress, keep issuing ls -l of hardlink.20 (some hardlink which has been created)
You will see that the file permissions is sometimes shown as what is expected and sometimes with just the sticky bit

I created a 2GB file and the created around 2 lakh hardlink of the file on a loop over cifs mount and then over fuse mount using two clients. Changed the file permission of a hardlink file and continued ls -l over the mount point.

I did not see any sticky bits in place of file permission.

Version
---------
samba-client-4.4.6-4.el7rhgs.x86_64
glusterfs-cli-3.8.4-11.el7rhgs.x86_64

Marking it as verified.

Comment 10 errata-xmlrpc 2017-03-23 06:09:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.