Bug 1627620 - SAS job aborts complaining about file doesn't exist
Summary: SAS job aborts complaining about file doesn't exist
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: fuse
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On: 1627617
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-11 04:50 UTC by Raghavendra G
Modified: 2019-03-25 16:30 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1627617
Environment:
Last Closed: 2018-10-23 15:19:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Comment 1 Raghavendra G 2018-09-11 04:53:30 UTC
Description of problem:

10SEP2018_18_47_24_refresh_mp.log..di_wrap_mg_process_unit.9.log:ERROR: A lock is not available for P_BF.PARTITION_416_594_1_1.DATA.
10SEP2018_18_47_24_refresh_mp.log..di_wrap_mg_process_unit.9.log:ERROR: File P_BF.PARTITION_416_594_1_1.DATA does not exist.
10SEP2018_18_47_24_refresh_mp.log..di_wrap_mg_process_unit.9.log:ERROR: File P_BF.PARTITION_416_594_1_1.DATA does not exist.

Note that the actual file created on glusterfs is /sasdata/bulked/base_forecast/partition_416_594_1_1.sas7bdat as SAS masks file name

Comment 2 Raghavendra G 2018-09-11 04:54:13 UTC
From logs, I can see that /sasdata/bulked/base_forecast/partition_416_594_1_1.sas7bdat is a destination of rename:

sasdata.log:[2018-09-09 02:40:10.772241] I [MSGID: 109066] [dht-rename.c:2042:dht_rename] 0-sasdata-dht: renaming /bulked/base_forecast/partition_416_594_1_1.sas7bdat.lck (84ea03de-3032-44ff-81fc-8cd47cf14694) (hash=sasdata-replicate-0/cache=sasdata-replicate-0) => /bulked/base_forecast/partition_416_594_1_1.sas7bdat ((null)) (hash=sasdata-replicate-1/cache=<nul>) 
sasdata.log:[2018-09-09 02:40:10.861406] I [MSGID: 109066] [dht-rename.c:2042:dht_rename] 0-sasdata-dht: renaming /bulked/base_forecast/partition_416_594_1_1.sas7bdat.lck (cbb8c3dd-9317-4398-b92f-39146ab26bed) (hash=sasdata-replicate-0/cache=sasdata-replicate-0) => /bulked/base_forecast/partition_416_594_1_1.sas7bdat (84ea03de-3032-44ff-81fc-8cd47cf14694) (hash=sasdata-replicate-1/cache=sasdata-replicate-0) 
sasdata.log:[2018-09-10 13:43:16.713796] I [MSGID: 109066] [dht-rename.c:2042:dht_rename] 0-sasdata-dht: renaming /bulked/base_forecast/partition_416_594_1_1.sas7bdat.lck (b65e9f74-3151-40ae-b87b-2450a95796dd) (hash=sasdata-replicate-0/cache=sasdata-replicate-0) => /bulked/base_forecast/partition_416_594_1_1.sas7bdat ((null)) (hash=sasdata-replicate-1/cache=<nul>) 
sasdata.log:[2018-09-10 13:43:16.961512] I [MSGID: 109066] [dht-rename.c:2042:dht_rename] 0-sasdata-dht: renaming /bulked/base_forecast/partition_416_594_1_1.sas7bdat.lck (c36b2bc3-eebd-413f-91c4-894caa18e89c) (hash=sasdata-replicate-0/cache=sasdata-replicate-0) => /bulked/base_forecast/partition_416_594_1_1.sas7bdat (b65e9f74-3151-40ae-b87b-2450a95796dd) (hash=sasdata-replicate-1/cache=sasdata-replicate-0) 

So, what's causing SAS to error out is lack of rename atomicity on glusterfs. Note that as per rename atomicity rules, operations on destination should never fail. However, here they are failing. So, its a bug in glusterfs.

Normally path based operations happen in two parts:
* path is resolved into a unique handle. This is done through lookups
* Actual operations like open happen on this handle

Now when rename overwrites the destination, the inode associated with destination changes, so the previous handle/inode returned by lookup on dst before rename are no longer present in fs. But, when open happens on old handle, it fails.

[root@rhs-client22 glusterfs]# grep OPEN sasdata.log*
sasdata.log:[2018-09-09 01:34:14.671548] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 51854: OPEN a9d26828-24fe-4da9-b3a0-d1693bf21f34 resolution failed
sasdata.log:[2018-09-09 01:34:14.680534] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 51873: OPEN e129abc7-7762-46e5-a704-99d7ea586c1e resolution failed
sasdata.log:[2018-09-09 01:57:37.204277] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 563046: OPEN 0eb6df58-5b25-4f4e-aabc-df1a0f615b93 resolution failed
sasdata.log:[2018-09-09 02:40:10.820746] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 967246: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:10.937086] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 967646: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:16.551310] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 972175: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:16.604315] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 972201: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:16.687723] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 972240: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:41:27.403833] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 982942: OPEN 1c84e06c-8551-4f18-9777-2a1730919bcf resolution failed
sasdata.log:[2018-09-09 02:41:27.405453] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 982946: OPEN 1c84e06c-8551-4f18-9777-2a1730919bcf resolution failed
sasdata.log:[2018-09-09 02:41:27.407134] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 982952: OPEN 1c84e06c-8551-4f18-9777-2a1730919bcf resolution failed
sasdata.log:[2018-09-10 13:43:17.007100] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1011630: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:47:50.980324] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1081799: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:47:50.982690] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1081806: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:49:45.617909] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1117549: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:49:45.748022] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1117585: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed

VFS provides a way out of this by a mechanism of retrying (exactly once) the lookup and operation like open on receving an error with errno ESTALE. What might've caused the issue here is that the error returned by Glusterfs is ENOENT instead of ESTALE (note the error reported by SAS, it says "P_BF.PARTITION_416_594_1_1.DATA does not exist" not "stale file handle").

We've evidence for that from fuse-dump of /sasdata,
2018-09-10T19:19:45.73953758+05:30 "GLUSTER\xf5" OPEN {Len:48 Opcode:14 Unique:1117585 Nodeid:140296383928016 Uid:0 Gid:0 Pid:29342 Padding:0} {Flags:32770 Unused:0}
2018-09-10T19:19:45.748056325+05:30 "GLUSTER\xf5" {Len:16 Error:-2 Unique:1117585} ""
2018-09-10T19:19:45.617027131+05:30 "GLUSTER\xf5" OPEN {Len:48 Opcode:14 Unique:1117549 Nodeid:140296383928016 Uid:0 Gid:0 Pid:29330 Padding:0} {Flags:32768 Unused:0} 
2018-09-10T19:19:45.617932073+05:30 "GLUSTER\xf5" {Len:16 Error:-2 Unique:1117549} "" 

Note the error returned in above two calls its -2 (ENOENT) and not ESTALE. This would prevent kernel from retrying.

Note that earlier as part of readdirplus, partition_416_594_1_1.sas7bdat resolved to Nodeid:140296383928016
[{Nodeid:140296383928016 Generation:0 EntryValid:1 AttrValid:1 EntryValidNsec:0 AttrValidNsec:0 Attr:{Ino:13293258653971224285 Size:196608 Blocks:264 Atime:1536586996 Mtime:1536586997 Ctime:1536586997 Atimensec:836653129 Mtimensec:125644779 Ctimensec:126644750 Mode:33188 Nlink:1 Uid:0 Gid:0 Rdev:0 Blksize:131072 Padding:0}} {Ino:13293258653971224285 Off:1441 Namelen:30 Type:8 Name:[]} partition_416_594_1_1.sas7bdat]

Earlier, I had submitted a patch to convert ENOENT errors to ESTALE,
https://review.gluster.org/#/c/glusterfs/+/18521/

But this patch had missed out the codepath of open failing when inode resolution fails. As per the logs above, inode resolution in fact failed:
[root@rhs-client22 glusterfs]# grep OPEN sasdata.log*
sasdata.log:[2018-09-09 01:34:14.671548] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 51854: OPEN a9d26828-24fe-4da9-b3a0-d1693bf21f34 resolution failed
sasdata.log:[2018-09-09 01:34:14.680534] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 51873: OPEN e129abc7-7762-46e5-a704-99d7ea586c1e resolution failed
sasdata.log:[2018-09-09 01:57:37.204277] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 563046: OPEN 0eb6df58-5b25-4f4e-aabc-df1a0f615b93 resolution failed
sasdata.log:[2018-09-09 02:40:10.820746] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 967246: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:10.937086] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 967646: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:16.551310] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 972175: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:16.604315] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 972201: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:40:16.687723] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 972240: OPEN d6e38db6-9fae-49e8-847e-bccdf15244a5 resolution failed
sasdata.log:[2018-09-09 02:41:27.403833] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 982942: OPEN 1c84e06c-8551-4f18-9777-2a1730919bcf resolution failed
sasdata.log:[2018-09-09 02:41:27.405453] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 982946: OPEN 1c84e06c-8551-4f18-9777-2a1730919bcf resolution failed
sasdata.log:[2018-09-09 02:41:27.407134] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 982952: OPEN 1c84e06c-8551-4f18-9777-2a1730919bcf resolution failed
sasdata.log:[2018-09-10 13:43:17.007100] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1011630: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:47:50.980324] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1081799: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:47:50.982690] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1081806: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:49:45.617909] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1117549: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed
sasdata.log:[2018-09-10 13:49:45.748022] E [fuse-bridge.c:2236:fuse_open_resume] 0-glusterfs-fuse: 1117585: OPEN b65e9f74-3151-40ae-b87b-2450a95796dd resolution failed

So, patching up those codepaths to convert ENOENT to ESTALE will fix the issue. Also, its recommended to make sure all inode based operations (like GETATTR, SETATTR etc) never fail with ENOENT, but instead fail with ESTALE.

Comment 3 Worker Ant 2018-09-11 05:05:54 UTC
REVIEW: https://review.gluster.org/21146 (mount/fuse: convert ENOENT to ESTALE in open(dir)_resume) posted (#1) for review on master by Raghavendra G

Comment 4 Worker Ant 2018-09-11 06:11:44 UTC
REVIEW: https://review.gluster.org/21147 (mount/fuse: return ESTALE instead of ENOENT on all inode based operations) posted (#1) for review on master by Raghavendra G

Comment 5 Worker Ant 2018-09-11 11:14:00 UTC
COMMIT: https://review.gluster.org/21146 committed in master by "Raghavendra G" <rgowdapp@redhat.com> with a commit message- mount/fuse: convert ENOENT to ESTALE in open(dir)_resume

This patch is continuation of commit
fb4b914ce84bc83a5f418719c5ba7c25689a9251.

<snip>
mount/fuse: never fail open(dir) with ENOENT

    open(dir) being an operation on inode should never fail with
    ENOENT. If gfid is not present, the appropriate error is
    ESTALE. This will enable kernel to retry open after a revalidate
    lookup.
</snip>

Earlier commit failed to fix codepath where error response is sent
back on gfid resolution failures in fuse_open(dir)_resume. Current
patch completes that work

Change-Id: Ia07e3cece404811703c8cfbac9b402ca5fe98c1e
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
updates: bz#1627620

Comment 6 Worker Ant 2018-10-20 01:08:23 UTC
COMMIT: https://review.gluster.org/21147 committed in master by "Raghavendra G" <rgowdapp@redhat.com> with a commit message- mount/fuse: return ESTALE instead of ENOENT on all inode based operations

This patch is continuation of commit
fb4b914ce84bc83a5f418719c5ba7c25689a9251. This patch extends that
logic to all inode based operations and not just open(dir).

<snip>

mount/fuse: never fail open(dir) with ENOENT

open(dir) being an operation on inode should never fail with
ENOENT. If gfid is not present, the appropriate error is ESTALE. This
will enable kernel to retry open after a revalidate lookup.

</snip>

Change-Id: I6313f520827e9af725485631cb6a9d9718243bc4
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Fixes: bz#1627620

Comment 7 Shyamsundar 2018-10-23 15:19:00 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 8 Shyamsundar 2019-03-25 16:30:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.