Susant, File can only be created and modified from a client but all the client have its access. So our hypothesis still holds true but we do not have log messages to complete RCA. We do not have mount logs from 120 clients(we can't ask either). As per customer, he has already verified on all 120 logs and the message "attempting deletion of stale" is not present in any of them. So we are again in dilemma that what happened here. Thanks, Bipin Kunal
Bipin, as mentioned in comment 32 and 34, can we have brick logs.
Hello Susant, Brick-logs are copied to the below location. Please have a look. # ssh your_kerb.redhat.com # cd /cases/01591676 the following files have been downloaded and extracted on collab-shell: -------------------------------- 1M chunk-www-media-rian21.log-20160228.gz 1M ghost-www-media-rhs.log-20160228.gz 58M rh-storage1-cmd_history_logs.tar.gz 60M rh-storage2-cmd_history_logs.tar.gz 60M rh-storage3-cmd_history_logs.tar.gz 59M rh-storage4-cmd_history_logs.tar.gz 16M sosreport-webfarmz-workflow3.01591676-20160309130028-8441.tar.xz 1M strace_chunk 1M strace_Gost_mp3 1M vol_files.tar.gz 1M www-media-rian21.log 1M www-media-rian21.log 43M www-media-rian21.log 43M www-media-rian21.log Brick logs ----------------- 11M rh-storage1-rhs1-rian21_projects_media2-media.log.gz 10M rh-storage2-rhs1-rian21_projects_media2-media.log.gz 9M rh-storage3-rhs1-rian21_projects_media2-media.log.gz 8M rh-storage4-rhs1-rian21_projects_media2-media.log.gz -------------------------------- Regards Riyas
(In reply to Riyas Abdulrasak from comment #38) > Hello Susant, > > Brick-logs are copied to the below location. Please have a look. > > # ssh your_kerb.redhat.com > # cd /cases/01591676 > > the following files have been downloaded and extracted on collab-shell: > -------------------------------- > 1M chunk-www-media-rian21.log-20160228.gz > 1M ghost-www-media-rhs.log-20160228.gz > 58M rh-storage1-cmd_history_logs.tar.gz > 60M rh-storage2-cmd_history_logs.tar.gz > 60M rh-storage3-cmd_history_logs.tar.gz > 59M rh-storage4-cmd_history_logs.tar.gz > 16M sosreport-webfarmz-workflow3.01591676-20160309130028-8441.tar.xz > 1M strace_chunk > 1M strace_Gost_mp3 > 1M vol_files.tar.gz > 1M www-media-rian21.log > 1M www-media-rian21.log > 43M www-media-rian21.log > 43M www-media-rian21.log > > Brick logs > ----------------- > > 11M rh-storage1-rhs1-rian21_projects_media2-media.log.gz > 10M rh-storage2-rhs1-rian21_projects_media2-media.log.gz > 9M rh-storage3-rhs1-rian21_projects_media2-media.log.gz > 8M rh-storage4-rhs1-rian21_projects_media2-media.log.gz > > -------------------------------- > > Regards > Riyas Need the log files from 25 feb onwards. The current brick logs are from 29 feb onwards.
Here is the output of above thesis. [root@vm2 ~]# mount -t glusterfs vm2:/test1 /mnt2 [root@vm2 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup-lv_root 6795192 4341728 2101620 68% / tmpfs 1912136 0 1912136 0% /dev/shm /dev/sda1 487652 78219 383833 17% /boot /dev/sdb 8378368 33124 8345244 1% /brick vm2:/test1 16756736 66304 16690432 1% /mnt2 [root@vm2 ~]# cd /mnt2 [root@vm2 mnt2]# cat file cat: file: No such file or directory [root@vm2 mnt2]# [root@vm2 mnt2]# [root@vm2 mnt2]# [root@vm2 mnt2]# [root@vm2 mnt2]# ls file [root@vm2 mnt2]# cat file hi [root@vm2 mnt2]#
Susant, What should be our further action here? The customer case is still open and customer has been waiting for a proper RCA. Do we have anything to proceed? In previous update C #44 you have provided information about a race condition, Is this race fixed in 3.1.3 or do we have any open BZ for it? -Regards, Bipin
Hi Riyas, As mentioned in comment 44 we have a RCA, but can not be confirmed as rmdir does not log linkto file deletion. But we have a workaround for the problem as mentioned in comment 40. One think we should do is thorough testing with lookup-optimize option on, so that we can uncover more such bugs. What do you say? Thanks, Susant
In reply to Susant Kumar Palai from comment #48) > (In reply to Riyas Abdulrasak from comment #47) > > Hello Susant, > > > > I think it would be better if you can pass all the test cases to QE. > > > > Can we confirm to the customer that if he updates the cluster to 3.1.3 , he > > won't hit the broken file issue again? Is the suspected cause is getting > > fixed in 3.1.3? > > > > Regards > > Riyas > Riyas, > Let me clarify few more things from my team and then we will have a clear > picture. > > Raghavendra/Nithya, > Will http://review.gluster.org/#/c/13852/ be sufficient to fix the > rmdir-lookup race? And also do we have any other race which can lead to this > situation? No. We don't have a confirmed RCA on what made us to loose linkto files. Hence we cannot comment whether patch #13852 is sufficient. We need to do more testing on this feature to uncover unknown scenarios causing loss of linkto file. > > Note: My suggestion would be to ask customer to turn off lookup-optimize for > time being. Raghavendra, what do you suggest? Yes, turning "lookup-optimize=off" would make this problem go away and file becomes accessible
Hi Susant, Can you provide the status of this BZ please.
We need to test lookup-optimize thoroughly to see if anything goes wrong. As of now, with the information available to us, there is no confirmed RCA.