Bug 828119 - path mismatch on the same gfid crashed the gsyncd.
path mismatch on the same gfid crashed the gsyncd.
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
mainline
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Amar Tumballi
:
Depends On:
Blocks: 853209 853595
  Show dependency treegraph
 
Reported: 2012-06-04 05:55 EDT by Vijaykumar Koppad
Modified: 2014-08-24 20:49 EDT (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853209 853595 (view as bug list)
Environment:
Last Closed: 2013-07-24 13:29:25 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Client log file. (368.71 KB, text/x-log)
2012-06-04 05:55 EDT, Vijaykumar Koppad
no flags Details

  None (edit)
Description Vijaykumar Koppad 2012-06-04 05:55:16 EDT
Created attachment 589093 [details]
Client log file.

Description of problem: With geo-rep session between the master and slave, while syncing the data from master and slave, gsyncd crashed saying no file or directory. 

Version-Release number of selected component (if applicable):release3.3


How reproducible:only produced once 


Steps to Reproduce:
(most probably these steps hit the issue) 
1.start a geo-rep session between master and slave.
2.create large set of files distributed in many directories on master.(have it in a script) 
3.Let it sync to the slave.
4. Delete files on the master and start creating the large set of files before the slave deletes all the files. 
  
Actual results:It crashed the gsyncd and status became faulty 


Expected results: It should run properly 


Additional info:
############################################################################

[root@gqac019 slave]# tail -n 20 /usr/local/var/log/glusterfs/root-slave.log 
[2012-06-04 04:09:08.697413] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:08.697560] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353948: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:09:32.385555] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:32.385667] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353957: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:09:43.624654] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:43.624775] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353966: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:09:52.134658] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null)
[2012-06-04 04:09:52.134695] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353975: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:10:06.960034] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:10:06.960844] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:10:06.961682] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 353984: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:11:05.415308] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:11:05.416808] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:11:05.417413] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354013: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:11:25.173683] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:11:25.174469] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:11:25.174893] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354023: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
[2012-06-04 04:12:00.261285] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:12:00.262096] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:12:00.262676] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354033: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)

####################################################################


[root@gqac019 slave]# grep "04:11:05.41" /usr/local/var/log/glusterfs/bricks/exportdir-s*
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.414967] E [posix.c:1852:posix_open] 0-colon-s-posix: open on /exportdir/s3/3/0/1/2/3/4/.file51.3OuM32: No such file or directory
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.415216] I [server3_1-fops.c:1538:server_open_cbk] 0-colon-s-server: 176550: OPEN /3/0/1/2/3/4/.file51.3OuM32 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) ==> -1 (No such file or directory)
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.416694] I [server3_1-fops.c:823:server_getxattr_cbk] 0-colon-s-server: 176551: GETXATTR /3/0/1/2/3/4/.file51.3OuM32 (trusted.glusterfs.dht.linkto) ==> -1 (No such file or directory)

########################################################################


 stat  /exportdir/s3/.glusterfs/9b/b5/9bb5aaff-abff-42b8-9eb0-39227e1d187d 
  File: `/exportdir/s3/.glusterfs/9b/b5/9bb5aaff-abff-42b8-9eb0-39227e1d187d'
  Size: 102000    	Blocks: 208        IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 537687929   Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-06-04 03:11:10.000000000 -0400
Modify: 2012-06-04 03:10:00.000000000 -0400
Change: 2012-06-04 03:11:10.581365011 -0400
[root@gqac019 slave]# stat  /exportdir/s3/3/0/1/2/3/4/file
file0   file15  file21  file26  file30  file35  file45  file51  file57  file60  file65  file70  file76  file79  file82  file85  file90  file95  file99  
file1   file2   file22  file27  file33  file36  file49  file52  file59  file61  file67  file72  file77  file8   file83  file88  file92  file96  
file14  file20  file23  file29  file34  file39  file5   file56  file6   file64  file68  file74  file78  file81  file84  file89  file94  file98  
[root@gqac019 slave]# stat  /exportdir/s3/3/0/1/2/3/4/file51
  File: `/exportdir/s3/3/0/1/2/3/4/file51'
  Size: 102000    	Blocks: 208        IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 537687929   Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-06-04 03:11:10.000000000 -0400
Modify: 2012-06-04 03:10:00.000000000 -0400
Change: 2012-06-04 03:11:10.581365011 -0400
Comment 1 Amar Tumballi 2012-07-05 04:52:42 EDT
Avati, is this possible because of HAS_ABSPATH() changes done for the performance?

Need a review for this scenario.
Comment 2 Anand Avati 2012-07-05 14:05:41 EDT
Looks like gsyncd was trying to synchronize the tempfile created by rsync ".file51.3OuM32" and is unable to find it by the time it attempts to open it (probably because it was renamed to "file51" by then?). I'm not sure how HAS_ABSPATH related change can affect this. The HAS_ABSPATH restored the behavior of 3.2.x into 3.3 (not a "new change" done to improve performance), so it should not introduce a new issue w.r.t using the right filename for a file than what the case was before.
Comment 3 Amar Tumballi 2012-07-06 03:45:22 EDT
(In reply to comment #2)
> Looks like gsyncd was trying to synchronize the tempfile created by rsync
> ".file51.3OuM32" and is unable to find it by the time it attempts to open it

Not exactly true, because on the client side, the file is resolved to the right path (file51)
---- snip from client log file ----
[2012-06-04 04:11:25.173683] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d)
[2012-06-04 04:11:25.174469] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto
[2012-06-04 04:11:25.174893] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354023: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory)
---- end of snip ----

but on the server side, because we only pass gfid, its resolved to wrong path (a temp path which got renamed now to actual file).

---- snip of server log file ----
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.414967] E [posix.c:1852:posix_open] 0-colon-s-posix: open on /exportdir/s3/3/0/1/2/3/4/.file51.3OuM32: No such file or directory
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.415216] I [server3_1-fops.c:1538:server_open_cbk] 0-colon-s-server: 176550: OPEN /3/0/1/2/3/4/.file51.3OuM32 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) ==> -1 (No such file or directory)
/usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.416694] I [server3_1-fops.c:823:server_getxattr_cbk] 0-colon-s-server: 176551: GETXATTR /3/0/1/2/3/4/.file51.3OuM32 (trusted.glusterfs.dht.linkto) ==> -1 (No such file or directory)
---- end of snip ----

> (probably because it was renamed to "file51" by then?). I'm not sure how
> HAS_ABSPATH related change can affect this. The HAS_ABSPATH restored the
> behavior of 3.2.x into 3.3 (not a "new change" done to improve performance),
> so it should not introduce a new issue w.r.t using the right filename for a
> file than what the case was before.

My doubt on HAS_ABSPATH() came because, here, gfid based path still exists (seen with stat), but the path on server side doesn't exists anymore.

Note You need to log in before you can comment on or make changes to this bug.