Bug 1175551
Summary: | Intermittent open() failure after creating a directory | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Louis R. Marascio <marascio> |
Component: | distribute | Assignee: | bugs <bugs> |
Status: | CLOSED EOL | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.5.2 | CC: | bugs, ndevos, rgowdapp, srangana |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-06-17 15:58:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1094724, 1121920 | ||
Bug Blocks: |
Description
Louis R. Marascio
2014-12-18 02:09:21 UTC
> When a job starts it spawns numerous tasks in parallel. Each of these tasks, > at startup, effectively does: > > if [ ! -d /shared/scratchdir ]; then > mkdir -p /shared/scratchdir > fi I suspect that this concurrent creation of the directory is the issue. Other bugs have been linked with this one where this problem is observed too. It sounds possible that one process was able to create the directory on some of the bricks, and an other process created the same directory on other bricks. This leads to a directory that has different GFIDs on different bricks. A workaround would be to create the /shared/scratchdir directory before the job starts the additional tasks. Adding the maintainers of DHT on CC who should know more about this issue and the current state of the solution (http://review.gluster.org/4846). (In reply to Niels de Vos from comment #1) > A workaround would be to create the /shared/scratchdir directory before the > job starts the additional tasks. Yes, you're right about this. I changed the job flow to create the directory first and so far have not seen the failure again. This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release. |