Description of problem: Read operation in USS files, show the contents of wrong source brick Version-Release number of selected component (if applicable): [root@rhsauto001 arequal]# rpm -qa | grep glusterfs samba-glusterfs-3.6.509-169.1.el6rhs.x86_64 glusterfs-cli-3.6.0.36-1.el6rhs.x86_64 glusterfs-api-3.6.0.36-1.el6rhs.x86_64 glusterfs-3.6.0.36-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.36-1.el6rhs.x86_64 glusterfs-fuse-3.6.0.36-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.36-1.el6rhs.x86_64 glusterfs-libs-3.6.0.36-1.el6rhs.x86_64 glusterfs-server-3.6.0.36-1.el6rhs.x86_64 glusterfs-debuginfo-3.6.0.36-1.el6rhs.x86_64 How reproducible: 100% Steps to Reproduce: 1.create 1*2 distribute replicate volume 2.set the volume options 'metadata-self-heal' , 'entry-self-heal' and 'data-self-heal' to value “off” 3. set self-heal-daemon off 4. Create a nfs mount 5 create a file on mount point e.g echo "file before snapshot" >> file 6 Now bring one brick down and modify the file content e.g echo "B1 is down" >> file 7 Bring the brick up 7 Create snapshot snap1 and activate it 8 enable the USS e.g gluster volume set testvol features.uss enable 9 Now read the content of Snap1 10 file content are from wrong brick source Actual results: file content are from brick which was down while modifying file Expected results: file content should be from brick which up down while modifying file Additional info: [root@rhsauto001 arequal]# gluster v info Volume Name: testvol Type: Distributed-Replicate Volume ID: 8c67bd61-1de4-42f3-a919-b40d6fa1e009 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.47.143:/rhs/brick1/b1 Brick2: 10.70.47.145:/rhs/brick1/b2 Brick3: 10.70.47.150:/rhs/brick1/b3 Brick4: 10.70.47.151:/rhs/brick1/b4 Options Reconfigured: cluster.self-heal-daemon: off features.barrier: disable features.uss: enable cluster.entry-self-heal: off cluster.data-self-heal: off cluster.metadata-self-heal: off performance.open-behind: off performance.quick-read: off performance.io-cache: off performance.read-ahead: off performance.write-behind: off performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 File content from mount point [root@client testsnap]# cat file file before snapshot B1 is down file content from USS [root@client testsnap]# cat file file before snapshot
Vijaykumar and I did a RCA on this. We found out that the lookup is not being traversed till AFR when a readdir has already happened as snapview server returns a cached data, because of this data was not being fetched from the right brick as AFR did not have any role to play. We are working to resolve this issue.
On the second run, we found out that the data is served using md-cache. Need to figure out how we can effectively override md-cache is some of the corner cases.
Currently there is a known issue with AFR when md-cache is enabled and a readdirp is performed Problem here is when a readdirp is performed, md-cache and glfsapi both caches the stat data and serves cached data when a lookup comes from the client. Because of this lookup will not reach AFR. AFR will decide from which source data needs to be severed only in the lookup_cbk. lookup is not reaching AFR, hence the issue. Also because of glfsapi limitation, we cannot send explicit lookup from snapview server without creating a new 'glfs_object_t' handle. Workaround can be one of the below: 1) Disable readdirp on snapview server 2) Disable md-cache for snapshots in sanpview server