Created attachment 589093 [details] Client log file. Description of problem: With geo-rep session between the master and slave, while syncing the data from master and slave, gsyncd crashed saying no file or directory. Version-Release number of selected component (if applicable):release3.3 How reproducible:only produced once Steps to Reproduce: (most probably these steps hit the issue) 1.start a geo-rep session between master and slave. 2.create large set of files distributed in many directories on master.(have it in a script) 3.Let it sync to the slave. 4. Delete files on the master and start creating the large set of files before the slave deletes all the files. Actual results:It crashed the gsyncd and status became faulty Expected results: It should run properly Additional info: ############################################################################ [root@gqac019 slave]# tail -n 20 /usr/local/var/log/glusterfs/root-slave.log [2012-06-04 04:09:08.697413] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null) [2012-06-04 04:09:08.697560] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353948: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory) [2012-06-04 04:09:32.385555] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null) [2012-06-04 04:09:32.385667] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353957: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory) [2012-06-04 04:09:43.624654] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null) [2012-06-04 04:09:43.624775] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353966: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory) [2012-06-04 04:09:52.134658] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: (null) [2012-06-04 04:09:52.134695] W [fuse-bridge.c:2812:fuse_xattr_cbk] 0-glusterfs-fuse: 353975: GETXATTR((null)) /3/0/1/2/3/4/file51 => -1 (No such file or directory) [2012-06-04 04:10:06.960034] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) [2012-06-04 04:10:06.960844] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto [2012-06-04 04:10:06.961682] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 353984: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory) [2012-06-04 04:11:05.415308] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) [2012-06-04 04:11:05.416808] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto [2012-06-04 04:11:05.417413] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354013: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory) [2012-06-04 04:11:25.173683] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) [2012-06-04 04:11:25.174469] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto [2012-06-04 04:11:25.174893] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354023: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory) [2012-06-04 04:12:00.261285] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) [2012-06-04 04:12:00.262096] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto [2012-06-04 04:12:00.262676] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354033: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory) #################################################################### [root@gqac019 slave]# grep "04:11:05.41" /usr/local/var/log/glusterfs/bricks/exportdir-s* /usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.414967] E [posix.c:1852:posix_open] 0-colon-s-posix: open on /exportdir/s3/3/0/1/2/3/4/.file51.3OuM32: No such file or directory /usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.415216] I [server3_1-fops.c:1538:server_open_cbk] 0-colon-s-server: 176550: OPEN /3/0/1/2/3/4/.file51.3OuM32 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) ==> -1 (No such file or directory) /usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.416694] I [server3_1-fops.c:823:server_getxattr_cbk] 0-colon-s-server: 176551: GETXATTR /3/0/1/2/3/4/.file51.3OuM32 (trusted.glusterfs.dht.linkto) ==> -1 (No such file or directory) ######################################################################## stat /exportdir/s3/.glusterfs/9b/b5/9bb5aaff-abff-42b8-9eb0-39227e1d187d File: `/exportdir/s3/.glusterfs/9b/b5/9bb5aaff-abff-42b8-9eb0-39227e1d187d' Size: 102000 Blocks: 208 IO Block: 4096 regular file Device: fd02h/64770d Inode: 537687929 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-06-04 03:11:10.000000000 -0400 Modify: 2012-06-04 03:10:00.000000000 -0400 Change: 2012-06-04 03:11:10.581365011 -0400 [root@gqac019 slave]# stat /exportdir/s3/3/0/1/2/3/4/file file0 file15 file21 file26 file30 file35 file45 file51 file57 file60 file65 file70 file76 file79 file82 file85 file90 file95 file99 file1 file2 file22 file27 file33 file36 file49 file52 file59 file61 file67 file72 file77 file8 file83 file88 file92 file96 file14 file20 file23 file29 file34 file39 file5 file56 file6 file64 file68 file74 file78 file81 file84 file89 file94 file98 [root@gqac019 slave]# stat /exportdir/s3/3/0/1/2/3/4/file51 File: `/exportdir/s3/3/0/1/2/3/4/file51' Size: 102000 Blocks: 208 IO Block: 4096 regular file Device: fd02h/64770d Inode: 537687929 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-06-04 03:11:10.000000000 -0400 Modify: 2012-06-04 03:10:00.000000000 -0400 Change: 2012-06-04 03:11:10.581365011 -0400
Avati, is this possible because of HAS_ABSPATH() changes done for the performance? Need a review for this scenario.
Looks like gsyncd was trying to synchronize the tempfile created by rsync ".file51.3OuM32" and is unable to find it by the time it attempts to open it (probably because it was renamed to "file51" by then?). I'm not sure how HAS_ABSPATH related change can affect this. The HAS_ABSPATH restored the behavior of 3.2.x into 3.3 (not a "new change" done to improve performance), so it should not introduce a new issue w.r.t using the right filename for a file than what the case was before.
(In reply to comment #2) > Looks like gsyncd was trying to synchronize the tempfile created by rsync > ".file51.3OuM32" and is unable to find it by the time it attempts to open it Not exactly true, because on the client side, the file is resolved to the right path (file51) ---- snip from client log file ---- [2012-06-04 04:11:25.173683] W [client3_1-fops.c:418:client3_1_open_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) [2012-06-04 04:11:25.174469] W [client3_1-fops.c:1059:client3_1_getxattr_cbk] 0-colon-s-client-2: remote operation failed: No such file or directory. Path: /3/0/1/2/3/4/file51 (9bb5aaff-abff-42b8-9eb0-39227e1d187d). Key: trusted.glusterfs.dht.linkto [2012-06-04 04:11:25.174893] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 354023: OPEN() /3/0/1/2/3/4/file51 => -1 (No such file or directory) ---- end of snip ---- but on the server side, because we only pass gfid, its resolved to wrong path (a temp path which got renamed now to actual file). ---- snip of server log file ---- /usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.414967] E [posix.c:1852:posix_open] 0-colon-s-posix: open on /exportdir/s3/3/0/1/2/3/4/.file51.3OuM32: No such file or directory /usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.415216] I [server3_1-fops.c:1538:server_open_cbk] 0-colon-s-server: 176550: OPEN /3/0/1/2/3/4/.file51.3OuM32 (9bb5aaff-abff-42b8-9eb0-39227e1d187d) ==> -1 (No such file or directory) /usr/local/var/log/glusterfs/bricks/exportdir-s3.log:[2012-06-04 04:11:05.416694] I [server3_1-fops.c:823:server_getxattr_cbk] 0-colon-s-server: 176551: GETXATTR /3/0/1/2/3/4/.file51.3OuM32 (trusted.glusterfs.dht.linkto) ==> -1 (No such file or directory) ---- end of snip ---- > (probably because it was renamed to "file51" by then?). I'm not sure how > HAS_ABSPATH related change can affect this. The HAS_ABSPATH restored the > behavior of 3.2.x into 3.3 (not a "new change" done to improve performance), > so it should not introduce a new issue w.r.t using the right filename for a > file than what the case was before. My doubt on HAS_ABSPATH() came because, here, gfid based path still exists (seen with stat), but the path on server side doesn't exists anymore.