Bug 853595

Summary: path mismatch on the same gfid crashed the gsyncd.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: glusterfsAssignee: Amar Tumballi <amarts>
Status: CLOSED WONTFIX QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: high    
Version: 2.0CC: aavati, gluster-bugs, rfortier, rhs-bugs, shaines, vbellur, vkoppad, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 828119 Environment:
Last Closed: 2012-10-17 12:21:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 828119    
Bug Blocks: 853209    

Description Vidya Sakar 2012-09-01 06:27:04 UTC
+++ This bug was initially created as a clone of Bug #828119 +++

Created attachment 589093 [details]
Client log file.

Description of problem: With geo-rep session between the master and slave, while syncing the data from master and slave, gsyncd crashed saying no file or directory. 

Version-Release number of selected component (if applicable):release3.3


How reproducible:only produced once 


Steps to Reproduce:
(most probably these steps hit the issue) 
1.start a geo-rep session between master and slave.
2.create large set of files distributed in many directories on master.(have it in a script) 
3.Let it sync to the slave.
4. Delete files on the master and start creating the large set of files before the slave deletes all the files. 
  
Actual results:It crashed the gsyncd and status became faulty 


Expected results: It should run properly 


Additional info:
############################################################################

[root@gqac019 slave]# tail -n 20 /usr/local/var/log/glusterfs/root-slave.log 
[2012-06-04 04:09:08.697413] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:08.697560] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353948: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:09:32.385555] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:32.385667] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353957: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:09:43.624654] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:43.624775] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353966: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:09:52.134658] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:52.134695] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353975: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:10:06.960034] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:10:06.960844] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:10:06.961682] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 353984: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:11:05.415308] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:11:05.416808] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:11:05.417413] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354013: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:11:25.173683] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:11:25.174469] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:11:25.174893] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354023: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:12:00.261285] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:12:00.262096] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:12:00.262676] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354033: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)

####################################################################


[root@gqac019 slave]# grep "04:11:05.41" /usr/local/var/log/glusterfs/bricks/exportdir-s*
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.414967] E [posix.c:1852:posix_open] 0-colon-s-posix: open on /exportdir/s3/3/0/1/2/3/4/.file51.3OuM32: No such file or directory
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.415216] I [server3_1-fops.c:1538:server_open_cbk] 0-colon-s-server: 176550: OPEN /3/0/1/2/3/4/.file51.3OuM32 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) ==> -1 (No such file or directory)
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.416694] I [server3_1-fops.c:823:server_getxattr_cbk] 0-colon-s-server: 176551: GETXATTR /3/0/1/2/3/4/.file51.3OuM32 (trusted.glusterfs.dht.linkto) ==> -1 (No such file or directory)

########################################################################


 stat  /exportdir/s3/.glusterfs/9b/b5/9bb5aaff-abff-42b8-9eb0-39227e1d187d 
  File: `/exportdir/s3/.glusterfs/9b/b5/9bb5aaff-abff-42b8-9eb0-39227e1d187d'
  Size: 102000    	Blocks: 208        IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 537687929   Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-06-04 03:11:10.000000000 -0400
Modify: 2012-06-04 03:10:00.000000000 -0400
Change: 2012-06-04 03:11:10.581365011 -0400
[root@gqac019 slave]# stat  /exportdir/s3/3/0/1/2/3/4/file
file0   file15  file21  file26  file30  file35  file45  file51  file57  file60  file65  file70  file76  file79  file82  file85  file90  file95  file99  
file1   file2   file22  file27  file33  file36  file49  file52  file59  file61  file67  file72  file77  file8   file83  file88  file92  file96  
file14  file20  file23  file29  file34  file39  file5   file56  file6   file64  file68  file74  file78  file81  file84  file89  file94  file98  
[root@gqac019 slave]# stat  /exportdir/s3/3/0/1/2/3/4/file51
  File: `/exportdir/s3/3/0/1/2/3/4/file51'
  Size: 102000    	Blocks: 208        IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 537687929   Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-06-04 03:11:10.000000000 -0400
Modify: 2012-06-04 03:10:00.000000000 -0400
Change: 2012-06-04 03:11:10.581365011 -0400

--- Additional comment from amarts on 2012-07-05 04:52:42 EDT ---

Avati, is this possible because of HAS_ABSPATH() changes done for the performance?

Need a review for this scenario.

--- Additional comment from aavati on 2012-07-05 14:05:41 EDT ---

Looks like gsyncd was trying to synchronize the tempfile created by rsync ".file51.3OuM32" and is unable to find it by the time it attempts to open it (probably because it was renamed to "file51" by then?). I'm not sure how HAS_ABSPATH related change can affect this. The HAS_ABSPATH restored the behavior of 3.2.x into 3.3 (not a "new change" done to improve performance), so it should not introduce a new issue w.r.t using the right filename for a file than what the case was before.

--- Additional comment from amarts on 2012-07-06 03:45:22 EDT ---

(In reply to comment #2)
> Looks like gsyncd was trying to synchronize the tempfile created by rsync
> ".file51.3OuM32" and is unable to find it by the time it attempts to open it

Not exactly true, because on the client side, the file is resolved to the right path (file51)
---- snip from client log file ----
[2012-06-04 04:11:25.173683] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:11:25.174469] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:11:25.174893] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354023: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
---- end of snip ----

but on the server side, because we only pass gfid, its resolved to wrong path (a temp path which got renamed now to actual file).

---- snip of server log file ----
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.414967] E [posix.c:1852:posix_open] 0-colon-s-posix: open on /exportdir/s3/3/0/1/2/3/4/.file51.3OuM32: No such file or directory
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.415216] I [server3_1-fops.c:1538:server_open_cbk] 0-colon-s-server: 176550: OPEN /3/0/1/2/3/4/.file51.3OuM32 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) ==> -1 (No such file or directory)
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.416694] I [server3_1-fops.c:823:server_getxattr_cbk] 0-colon-s-server: 176551: GETXATTR /3/0/1/2/3/4/.file51.3OuM32 (trusted.glusterfs.dht.linkto) ==> -1 (No such file or directory)
---- end of snip ----

> (probably because it was renamed to "file51" by then?). I'm not sure how
> HAS_ABSPATH related change can affect this. The HAS_ABSPATH restored the
> behavior of 3.2.x into 3.3 (not a "new change" done to improve performance),
> so it should not introduce a new issue w.r.t using the right filename for a
> file than what the case was before.

My doubt on HAS_ABSPATH() came because, here, gfid based path still exists (seen with stat), but the path on server side doesn't exists anymore.

Comment 2 Vidya Sakar 2012-10-17 12:21:21 UTC
No plans to fix this in 2.0.z, closing this BZ. The fix will be tracked with 853209.