Bug 1647229
Summary: | Errors w/ directory entries when running yarn | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | John Strunk <jstrunk> |
Component: | fuse | Assignee: | Csaba Henk <csaba> |
Status: | CLOSED WONTFIX | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.4 | CC: | amukherj, jahernan, jstrunk, rhopp, rhs-bugs, sabose, storage-qa-internal, sutan |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-01-17 09:33:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
John Strunk
2018-11-06 22:20:35 UTC
Looks like the fix is https://review.gluster.org/#/c/glusterfs/+/21147/ & https://review.gluster.org/#/c/glusterfs/+/21146/ This was fixed by Raghavendra while working on transactional work load support on glusterfs. Not sure if we could get it backported. (In reply to Amar Tumballi from comment #6) > Looks like the fix is https://review.gluster.org/#/c/glusterfs/+/21147/ & > https://review.gluster.org/#/c/glusterfs/+/21146/ > > This was fixed by Raghavendra while working on transactional work load > support on glusterfs. Not sure if we could get it backported. Those two patches were related to rename atomicity. However, with just the logs attached its difficult to figure whether its a rename atomicity issue. Also, note that rename atomicity errors are transient and will go away if the file is accessed again. However, in this case, even after application failed, ls complained about ENOENT. I would require following data: * strace of application that failed (I think its yarn here). * fusedump (mount glusterfs with option --dump-fuse=<path to dump>, note that the mount script doesn't recognise this option, so please copy the glusterfs mount's cmdline from ps and add --dump-fuse option to that). * Is the error persistent or does it go away after sometime? The errors seem to disappear after an unpredictable amount of time (can be a minute or so up to many minutes). Requested logs made available via private link. A note about correlating the logs... The fuse dump was taken on the host, so the volume is mounted on /mnt. The strace was run w/in the container, so the gluster volume is on /data. (/mnt/foo == /data/foo) Based on the strace from comment #8: [rgowdapp@rgowdapp 1647229]$ grep ENOENT strace.out | grep -wvE "stat|mkdir|open|lstat|connect|openat|access|statfs|epoll_ctl" -c 0 [rgowdapp@rgowdapp 1647229]$ grep "rename(" -w strace.out -c 0 * There are no renames. Hence this is not rename atomicity issue and hence patches in comment #6 are unlikely to fix the issue. * Also, ENOENT is not on an fd based operation. Barring openat, all are path based syscalls. Need to eliminate whether openat failed because fd didn't provide access after unlink semantics. Has any progress being made on this bug since the last update? (In reply to Atin Mukherjee from comment #11) > Has any progress being made on this bug since the last update? Not much, except for comment #10, which says this is not a rename atomicity issue. Hello, Do you have any plan to solve this issue ? what is the status ? is this issue wip or plan for the next sprint ? Thanks, Hello, Sorry to ask again, Could you let us know if there are any plans to investigate this issue ? It is fine if you tell us that this is it not critical or cannot be fixed in a short term. Though, knowing if this is going to be fix in 1 week or 1 month or never would help us to move forward and find other solutions. This issue is quite critical for https://che.openshift.io which promotes Eclipse Che, Openshift but also Gluster as it is being used in the background. Thanks, (In reply to Sun Tan from comment #14) > Hello, > Sorry to ask again, > Could you let us know if there are any plans to investigate this issue ? > It is fine if you tell us that this is it not critical or cannot be fixed in > a short term. Though, knowing if this is going to be fix in 1 week or 1 > month or never would help us to move forward and find other solutions. This > issue is quite critical for https://che.openshift.io which promotes Eclipse > Che, Openshift but also Gluster as it is being used in the background. > Thanks, Sorry about the inaction on my part. I've not analyzed the data uploaded in comment #8. I'll do so today and then comment on the timelines. np, thank you, Let us know if you have any update. Some updates: * I couldn't find any syscalls on the problematic directory from strace output attached: [root@rgowdapp 1647229]# grep -c "/theia/node_modules/color" strace.out 0 * My attempts at recreating the issue failed. [root@rgowdapp docker]# ln -s /usr/libexec/docker/docker-runc-current /usr/bin/docker-runc [root@rgowdapp docker]# docker run -it --rm -v /mnt:/home docker.io/sunix/docker-centos-git-yarn bash bash-4.2$ cd /home/ bash-4.2$ git clone https://github.com/theia-ide/theia.git Cloning into 'theia'... remote: Enumerating objects: 23, done. remote: Counting objects: 100% (23/23), done. remote: Compressing objects: 100% (21/21), done. remote: Total 41666 (delta 5), reused 9 (delta 2), pack-reused 41643 Receiving objects: 100% (41666/41666), 111.26 MiB | 2.26 MiB/s, done. Resolving deltas: 100% (27729/27729), done. Checking out files: 100% (1495/1495), done. bash-4.2$ cd theia/ bash-4.2$ yarn yarn install v1.12.3 [1/5] Validating package.json... [2/5] Resolving packages... [3/5] Fetching packages... warning monaco-languageclient.0: The engine "vscode" appears to be invalid. warning vscode-base-languageclient.0: The engine "vscode" appears to be invalid. info fsevents.4: The platform "linux" is incompatible with this module. info "fsevents.4" is an optional dependency and failed compatibility check. Excluding it from installation. [4/5] Linking dependencies... warning " > istanbul-instrumenter-loader.1" has unmet peer dependency "webpack@^2.0.0 || ^3.0.0 || ^4.0.0". warning " > tslint-language-service.9" has incorrect peer dependency "typescript@>= 2.3.1 < 3". warning "workspace-aggregator-38ceceb9-bedc-4134-855a-7afa827982aa > @theia/application-manager > font-awesome-webpack.5-beta.2" has unmet peer dependency "font-awesome@>=4.3.0". [5/5] Building fresh packages... [-/13] ⠂ waiting... [-/13] ⠂ waiting... [3/13] ⠄ electron [-/13] ⠄ waiting... [-/13] ⠄ waiting... error /home/theia/node_modules/electron: Command failed. Exit code: 1 Command: node install.js Arguments: Directory: /home/theia/node_modules/electron Output: Downloading electron-v2.0.14-linux-x64.zip Error: read ECONNRESET /home/theia/node_modules/electron/install.js:54 throw err ^ bash-4.2$ ls -l /home/theia/node_modules/electron total 372 -rw-r--r--. 1 default root 1060 Dec 12 07:39 LICENSE -rw-r--r--. 1 default root 4563 Dec 12 07:39 README.md -rwxr-xr-x. 1 default root 479 Dec 12 07:39 cli.js -rw-r--r--. 1 default root 366142 Dec 12 07:39 electron.d.ts -rw-r--r--. 1 default root 338 Dec 12 07:39 index.js -rw-r--r--. 1 default root 1887 Dec 12 07:39 install.js drwxr-xr-x. 4 default root 4096 Dec 12 07:45 node_modules -rw-r--r--. 1 default root 814 Dec 12 07:39 package.json bash-4.2$ ls -l /home/theia/node_modules/electron -l total 372 -rw-r--r--. 1 default root 1060 Dec 12 07:39 LICENSE -rw-r--r--. 1 default root 4563 Dec 12 07:39 README.md -rwxr-xr-x. 1 default root 479 Dec 12 07:39 cli.js -rw-r--r--. 1 default root 366142 Dec 12 07:39 electron.d.ts -rw-r--r--. 1 default root 338 Dec 12 07:39 index.js -rw-r--r--. 1 default root 1887 Dec 12 07:39 install.js drwxr-xr-x. 4 default root 4096 Dec 12 07:45 node_modules -rw-r--r--. 1 default root 814 Dec 12 07:39 package.json Note that I ran into an error. But the error was not exactly the same reported in this bug. As can be seen the directory is accessible and the error code is different. Later I changed the loglevel of glusterfs and running yarn was successful (I think there might be a race). * Trying to recreated I removed everything from the mount point, killed docker and repeated the test. This time I got EPERM errors bash-4.2$ ls -l total 68 -rw-r--r--. 1 root root 12030 Oct 6 19:15 anaconda-post.log lrwxrwxrwx. 1 root root 7 Oct 6 19:14 bin -> usr/bin drwxr-xr-x. 5 root root 360 Dec 12 08:22 dev drwxr-xr-x. 1 root root 4096 Dec 12 08:22 etc drwxr-xr-x. 3 root root 4096 Dec 12 08:22 home lrwxrwxrwx. 1 root root 7 Oct 6 19:14 lib -> usr/lib lrwxrwxrwx. 1 root root 9 Oct 6 19:14 lib64 -> usr/lib64 drwxr-xr-x. 2 root root 4096 Apr 11 2018 media drwxr-xr-x. 2 root root 4096 Apr 11 2018 mnt drwxr-xr-x. 2 root root 4096 Apr 11 2018 opt dr-xr-xr-x. 344 root root 0 Dec 12 08:22 proc dr-xr-x---. 1 root root 4096 Nov 22 14:01 root drwxr-xr-x. 1 root root 4096 Dec 12 08:22 run lrwxrwxrwx. 1 root root 8 Oct 6 19:14 sbin -> usr/sbin drwxr-xr-x. 2 root root 4096 Apr 11 2018 srv dr-xr-xr-x. 13 root root 0 Nov 24 14:06 sys drwxrwxrwt. 1 root root 4096 Nov 22 14:03 tmp -rwxr-xr-x. 1 root root 195 Nov 22 14:00 uid_entrypoint drwxr-xr-x. 1 root root 4096 Oct 6 19:14 usr drwxr-xr-x. 1 root root 4096 Oct 6 19:14 var bash-4.2$ touch usr/file touch: cannot touch 'usr/file': Permission denied bash-4.2$ yarn yarn install v1.12.3 warning Skipping preferred cache folder "/home/user/.cache/yarn" because it is not writable. warning Selected the next writable cache folder in the list, will be "/tmp/.yarn-cache-1001". info No lockfile found. [1/4] Resolving packages... [2/4] Fetching packages... [3/4] Linking dependencies... [4/4] Building fresh packages... error Could not write file "/yarn-error.log": "EACCES: permission denied, open '/yarn-error.log'" error An unexpected error occurred: "EACCES: permission denied, mkdir '/node_modules'". info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command. Error: ENOENT: no such file or directory, open '/home/user/.yarnrc' Note that the directory is owned by root, but the user seems to be 'default' bash-4.2$ whoami default This doesn't really look like glusterfs issue. If you've a live setup, I can take a look at what went wrong. Also, see whether you can stat the gfid handle of problematic directory (symbolic links) on each brick. From the bug description it looks like gfid-handle couldn't be resolved. |