This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1315235 - [GSS] Two broken files on glusterfs fuse mount [NEEDINFO]
[GSS] Two broken files on glusterfs fuse mount
Status: ASSIGNED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute (Show other bugs)
3.1
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Nithya Balachandran
storage-qa-internal@redhat.com
dht-lookup-optimize, dht-gss, dht-gss...
: ZStream
Depends On:
Blocks: 1408949
  Show dependency treegraph
 
Reported: 2016-03-07 05:16 EST by Bipin Kunal
Modified: 2017-09-28 13:27 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
bkunal: needinfo? (nbalacha)
bkunal: needinfo? (rhinduja)


Attachments (Terms of Use)

  None (edit)
Comment 36 Bipin Kunal 2016-03-31 03:35:28 EDT
Susant,

File can only be created and modified from a client but all the client have its access. So our hypothesis still holds true but we do not have log messages to complete RCA.


We do not have mount logs from 120 clients(we can't ask either). As per customer, he has already verified on all 120 logs and the message "attempting deletion of stale" is not present in any of them.

So we are again in dilemma that what happened here.

Thanks,
Bipin Kunal
Comment 37 Susant Kumar Palai 2016-03-31 03:40:29 EDT
Bipin, as mentioned in comment 32 and 34, can we have brick logs.
Comment 38 Riyas Abdulrasak 2016-03-31 04:53:25 EDT
Hello Susant, 

Brick-logs are copied to the below location. Please have a look.  

    # ssh your_kerb@collab-shell.usersys.redhat.com
    # cd /cases/01591676

the following files have been downloaded and extracted on collab-shell:
--------------------------------
	1M	chunk-www-media-rian21.log-20160228.gz
	1M	ghost-www-media-rhs.log-20160228.gz
	58M	rh-storage1-cmd_history_logs.tar.gz
	60M	rh-storage2-cmd_history_logs.tar.gz
	60M	rh-storage3-cmd_history_logs.tar.gz
	59M	rh-storage4-cmd_history_logs.tar.gz
	16M	sosreport-webfarmz-workflow3.01591676-20160309130028-8441.tar.xz
	1M	strace_chunk
	1M	strace_Gost_mp3
	1M	vol_files.tar.gz
	1M	www-media-rian21.log
	1M	www-media-rian21.log
	43M	www-media-rian21.log
	43M	www-media-rian21.log

Brick logs
-----------------

11M	rh-storage1-rhs1-rian21_projects_media2-media.log.gz
10M	rh-storage2-rhs1-rian21_projects_media2-media.log.gz
9M	rh-storage3-rhs1-rian21_projects_media2-media.log.gz
8M	rh-storage4-rhs1-rian21_projects_media2-media.log.gz

--------------------------------

Regards
Riyas
Comment 39 Susant Kumar Palai 2016-03-31 05:25:05 EDT
(In reply to Riyas Abdulrasak from comment #38)
> Hello Susant, 
> 
> Brick-logs are copied to the below location. Please have a look.  
> 
>     # ssh your_kerb@collab-shell.usersys.redhat.com
>     # cd /cases/01591676
> 
> the following files have been downloaded and extracted on collab-shell:
> --------------------------------
> 	1M	chunk-www-media-rian21.log-20160228.gz
> 	1M	ghost-www-media-rhs.log-20160228.gz
> 	58M	rh-storage1-cmd_history_logs.tar.gz
> 	60M	rh-storage2-cmd_history_logs.tar.gz
> 	60M	rh-storage3-cmd_history_logs.tar.gz
> 	59M	rh-storage4-cmd_history_logs.tar.gz
> 	16M	sosreport-webfarmz-workflow3.01591676-20160309130028-8441.tar.xz
> 	1M	strace_chunk
> 	1M	strace_Gost_mp3
> 	1M	vol_files.tar.gz
> 	1M	www-media-rian21.log
> 	1M	www-media-rian21.log
> 	43M	www-media-rian21.log
> 	43M	www-media-rian21.log
> 
> Brick logs
> -----------------
> 
> 11M	rh-storage1-rhs1-rian21_projects_media2-media.log.gz
> 10M	rh-storage2-rhs1-rian21_projects_media2-media.log.gz
> 9M	rh-storage3-rhs1-rian21_projects_media2-media.log.gz
> 8M	rh-storage4-rhs1-rian21_projects_media2-media.log.gz
> 
> --------------------------------
> 
> Regards
> Riyas

Need the log files from 25 feb onwards. The current brick logs are from 29 feb onwards.
Comment 41 Susant Kumar Palai 2016-03-31 06:14:43 EDT
Here is the output of above thesis. 

[root@vm2 ~]# mount -t glusterfs vm2:/test1 /mnt2
[root@vm2 ~]# df
Filesystem           1K-blocks    Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       6795192 4341728   2101620  68% /
tmpfs                  1912136       0   1912136   0% /dev/shm
/dev/sda1               487652   78219    383833  17% /boot
/dev/sdb               8378368   33124   8345244   1% /brick
vm2:/test1            16756736   66304  16690432   1% /mnt2
[root@vm2 ~]# cd /mnt2
[root@vm2 mnt2]# cat file
cat: file: No such file or directory
[root@vm2 mnt2]# 
[root@vm2 mnt2]# 
[root@vm2 mnt2]# 
[root@vm2 mnt2]# 
[root@vm2 mnt2]# ls
file
[root@vm2 mnt2]# cat file
hi
[root@vm2 mnt2]#
Comment 45 Bipin Kunal 2016-06-20 06:19:59 EDT
Susant,

What should be our further action here? The customer case is still open and customer has been waiting for a proper RCA.

Do we have anything to proceed?

In previous update C #44 you have provided information about a race condition, Is this race fixed in 3.1.3 or do we have any open BZ for it?

-Regards,
Bipin
Comment 46 Susant Kumar Palai 2016-06-21 02:43:30 EDT
Hi Riyas,
    As mentioned in comment 44 we have a RCA, but can not be confirmed as rmdir does not log linkto file deletion. But we have a workaround for the problem as mentioned in comment 40. 

   One think we should do is thorough testing with lookup-optimize option on, so that we can uncover more such bugs. What do you say?

Thanks,
Susant
Comment 49 Raghavendra G 2016-06-24 04:48:50 EDT
In reply to Susant Kumar Palai from comment #48)
> (In reply to Riyas Abdulrasak from comment #47)
> > Hello Susant, 
> > 
> > I think it would be better if you can pass all the test cases to QE. 
> > 
> > Can we confirm to the customer that if he updates the cluster to 3.1.3 , he
> > won't hit the broken file issue again? Is the suspected cause is getting
> > fixed in 3.1.3?
> > 
> > Regards
> > Riyas
> Riyas,
>   Let me clarify few more things from my team and then we will have a clear
> picture.
> 
> Raghavendra/Nithya,
>   Will http://review.gluster.org/#/c/13852/ be sufficient to fix the
> rmdir-lookup race? And also do we have any other race which can lead to this
> situation?

No. We don't have a confirmed RCA on what made us to loose linkto files. Hence we cannot comment whether patch #13852 is sufficient. We need to do more testing on this feature to uncover unknown scenarios causing loss of linkto file.

> 
> Note: My suggestion would be to ask customer to turn off lookup-optimize for
> time being. Raghavendra, what do you suggest?

Yes, turning "lookup-optimize=off" would make this problem go away and file becomes accessible
Comment 50 Arun Vasudevan 2016-07-19 10:20:03 EDT
Hi Susant,

Can you provide the status of this BZ please.
Comment 54 Nithya Balachandran 2016-10-26 05:31:37 EDT
We need to test lookup-optimize thoroughly to see if anything goes wrong. As of now, with the information available to us, there is no confirmed RCA.

Note You need to log in before you can comment on or make changes to this bug.