1315235 – [GSS] Two broken files on glusterfs fuse mount

Bug 1315235 - [GSS] Two broken files on glusterfs fuse mount

Summary: [GSS] Two broken files on glusterfs fuse mount

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nithya Balachandran
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:	dht-lookup-optimize, dht-gss, dht-gss...
Depends On:
Blocks:	1408949
TreeView+	depends on / blocked

Reported:	2016-03-07 10:16 UTC by Bipin Kunal
Modified:	2021-06-10 11:11 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-19 10:08:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 36 Bipin Kunal 2016-03-31 07:35:28 UTC

Susant,

File can only be created and modified from a client but all the client have its access. So our hypothesis still holds true but we do not have log messages to complete RCA.


We do not have mount logs from 120 clients(we can't ask either). As per customer, he has already verified on all 120 logs and the message "attempting deletion of stale" is not present in any of them.

So we are again in dilemma that what happened here.

Thanks,
Bipin Kunal

Comment 37 Susant Kumar Palai 2016-03-31 07:40:29 UTC

Bipin, as mentioned in comment 32 and 34, can we have brick logs.

Comment 38 Riyas Abdulrasak 2016-03-31 08:53:25 UTC

Hello Susant, 

Brick-logs are copied to the below location. Please have a look.  

    # ssh your_kerb.redhat.com
    # cd /cases/01591676

the following files have been downloaded and extracted on collab-shell:
--------------------------------
	1M	chunk-www-media-rian21.log-20160228.gz
	1M	ghost-www-media-rhs.log-20160228.gz
	58M	rh-storage1-cmd_history_logs.tar.gz
	60M	rh-storage2-cmd_history_logs.tar.gz
	60M	rh-storage3-cmd_history_logs.tar.gz
	59M	rh-storage4-cmd_history_logs.tar.gz
	16M	sosreport-webfarmz-workflow3.01591676-20160309130028-8441.tar.xz
	1M	strace_chunk
	1M	strace_Gost_mp3
	1M	vol_files.tar.gz
	1M	www-media-rian21.log
	1M	www-media-rian21.log
	43M	www-media-rian21.log
	43M	www-media-rian21.log

Brick logs
-----------------

11M	rh-storage1-rhs1-rian21_projects_media2-media.log.gz
10M	rh-storage2-rhs1-rian21_projects_media2-media.log.gz
9M	rh-storage3-rhs1-rian21_projects_media2-media.log.gz
8M	rh-storage4-rhs1-rian21_projects_media2-media.log.gz

--------------------------------

Regards
Riyas

Comment 39 Susant Kumar Palai 2016-03-31 09:25:05 UTC

(In reply to Riyas Abdulrasak from comment #38)
> Hello Susant, 
> 
> Brick-logs are copied to the below location. Please have a look.  
> 
>     # ssh your_kerb.redhat.com
>     # cd /cases/01591676
> 
> the following files have been downloaded and extracted on collab-shell:
> --------------------------------
> 	1M	chunk-www-media-rian21.log-20160228.gz
> 	1M	ghost-www-media-rhs.log-20160228.gz
> 	58M	rh-storage1-cmd_history_logs.tar.gz
> 	60M	rh-storage2-cmd_history_logs.tar.gz
> 	60M	rh-storage3-cmd_history_logs.tar.gz
> 	59M	rh-storage4-cmd_history_logs.tar.gz
> 	16M	sosreport-webfarmz-workflow3.01591676-20160309130028-8441.tar.xz
> 	1M	strace_chunk
> 	1M	strace_Gost_mp3
> 	1M	vol_files.tar.gz
> 	1M	www-media-rian21.log
> 	1M	www-media-rian21.log
> 	43M	www-media-rian21.log
> 	43M	www-media-rian21.log
> 
> Brick logs
> -----------------
> 
> 11M	rh-storage1-rhs1-rian21_projects_media2-media.log.gz
> 10M	rh-storage2-rhs1-rian21_projects_media2-media.log.gz
> 9M	rh-storage3-rhs1-rian21_projects_media2-media.log.gz
> 8M	rh-storage4-rhs1-rian21_projects_media2-media.log.gz
> 
> --------------------------------
> 
> Regards
> Riyas

Need the log files from 25 feb onwards. The current brick logs are from 29 feb onwards.

Comment 41 Susant Kumar Palai 2016-03-31 10:14:43 UTC

Here is the output of above thesis. 

[root@vm2 ~]# mount -t glusterfs vm2:/test1 /mnt2
[root@vm2 ~]# df
Filesystem           1K-blocks    Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       6795192 4341728   2101620  68% /
tmpfs                  1912136       0   1912136   0% /dev/shm
/dev/sda1               487652   78219    383833  17% /boot
/dev/sdb               8378368   33124   8345244   1% /brick
vm2:/test1            16756736   66304  16690432   1% /mnt2
[root@vm2 ~]# cd /mnt2
[root@vm2 mnt2]# cat file
cat: file: No such file or directory
[root@vm2 mnt2]# 
[root@vm2 mnt2]# 
[root@vm2 mnt2]# 
[root@vm2 mnt2]# 
[root@vm2 mnt2]# ls
file
[root@vm2 mnt2]# cat file
hi
[root@vm2 mnt2]#

Comment 45 Bipin Kunal 2016-06-20 10:19:59 UTC

Susant,

What should be our further action here? The customer case is still open and customer has been waiting for a proper RCA.

Do we have anything to proceed?

In previous update C #44 you have provided information about a race condition, Is this race fixed in 3.1.3 or do we have any open BZ for it?

-Regards,
Bipin

Comment 46 Susant Kumar Palai 2016-06-21 06:43:30 UTC

Hi Riyas,
    As mentioned in comment 44 we have a RCA, but can not be confirmed as rmdir does not log linkto file deletion. But we have a workaround for the problem as mentioned in comment 40. 

   One think we should do is thorough testing with lookup-optimize option on, so that we can uncover more such bugs. What do you say?

Thanks,
Susant

Comment 49 Raghavendra G 2016-06-24 08:48:50 UTC

In reply to Susant Kumar Palai from comment #48)
> (In reply to Riyas Abdulrasak from comment #47)
> > Hello Susant, 
> > 
> > I think it would be better if you can pass all the test cases to QE. 
> > 
> > Can we confirm to the customer that if he updates the cluster to 3.1.3 , he
> > won't hit the broken file issue again? Is the suspected cause is getting
> > fixed in 3.1.3?
> > 
> > Regards
> > Riyas
> Riyas,
>   Let me clarify few more things from my team and then we will have a clear
> picture.
> 
> Raghavendra/Nithya,
>   Will http://review.gluster.org/#/c/13852/ be sufficient to fix the
> rmdir-lookup race? And also do we have any other race which can lead to this
> situation?

No. We don't have a confirmed RCA on what made us to loose linkto files. Hence we cannot comment whether patch #13852 is sufficient. We need to do more testing on this feature to uncover unknown scenarios causing loss of linkto file.

> 
> Note: My suggestion would be to ask customer to turn off lookup-optimize for
> time being. Raghavendra, what do you suggest?

Yes, turning "lookup-optimize=off" would make this problem go away and file becomes accessible

Comment 50 Arun Vasudevan 2016-07-19 14:20:03 UTC

Hi Susant,

Can you provide the status of this BZ please.

Comment 54 Nithya Balachandran 2016-10-26 09:31:37 UTC

We need to test lookup-optimize thoroughly to see if anything goes wrong. As of now, with the information available to us, there is no confirmed RCA.

Note You need to log in before you can comment on or make changes to this bug.