Description of problem: When running `yarn`, it fails with the following error (or similar): error An unexpected error occurred: "ENOENT: no such file or directory, lstat '/home/theia/node_modules/color'". info If you think this is a bug, please open a bug report with the information provided in "/home/theia/yarn-error.log". info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command. looking at the directory entries shows problems: $ ls -al /home/theia/node_modules/ | grep color ls: cannot access /home/theia/node_modules/color: No such file or directory ls: cannot access /home/theia/node_modules/cross-spawn: No such file or directory d?????????? ? ? ? ? ? color drwxr-xr-x. 3 default root 4096 Nov 6 20:50 color-convert drwxr-xr-x. 2 default root 4096 Nov 6 20:50 color-name drwxr-xr-x. 2 default root 4096 Nov 6 20:50 color-string drwxr-xr-x. 2 default root 4096 Nov 6 20:50 color-support drwxr-xr-x. 3 default root 4096 Nov 6 20:50 colormin drwxr-xr-x. 2 default root 4096 Nov 6 20:50 colornames drwxr-xr-x. 2 default root 4096 Nov 6 20:50 colors drwxr-xr-x. 2 default root 4096 Nov 6 20:50 colorspace drwxr-xr-x. 2 default root 4096 Nov 6 20:50 css-color-names Version-Release number of selected component (if applicable): $ rpm -qa | grep gluster glusterfs-cli-3.12.2-25.el7rhgs.x86_64 glusterfs-api-3.12.2-25.el7rhgs.x86_64 glusterfs-rdma-3.12.2-25.el7rhgs.x86_64 glusterfs-libs-3.12.2-25.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64 glusterfs-fuse-3.12.2-25.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64 glusterfs-server-3.12.2-25.el7rhgs.x86_64 glusterfs-3.12.2-25.el7rhgs.x86_64 python2-gluster-3.12.2-25.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 pcp-pmda-gluster-4.1.0-5.el7_6.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.2.x86_64 vdsm-gluster-4.19.43-2.3.el7rhgs.noarch How reproducible: always Steps to Reproduce: 1. Create a 1x3 volume accd to vol info below On a client machine: 2. sudo mount -tglusterfs node3:/supervol01 /mnt/ 3. sudo chmod 777 /mnt/ 4. sudo docker run -it --rm -v /mnt:/home docker.io/sunix/docker-centos-git-yarn bash 5. cd /home 6. git clone https://github.com/theia-ide/theia.git 7. cd theia 8. yarn ... error ... Actual results: $ yarn yarn install v1.12.1 [1/5] Validating package.json... [2/5] Resolving packages... [3/5] Fetching packages... warning monaco-languageclient.0: The engine "vscode" appears to be invalid. warning vscode-base-languageclient.0: The engine "vscode" appears to be invalid. info fsevents.4: The platform "linux" is incompatible with this module. info "fsevents.4" is an optional dependency and failed compatibility check. Excluding it from installation. [4/5] Linking dependencies... warning " > istanbul-instrumenter-loader.1" has unmet peer dependency "webpack@^2.0.0 || ^3.0.0 || ^4.0.0". warning " > tslint-language-service.9" has incorrect peer dependency "typescript@>= 2.3.1 < 3". warning "workspace-aggregator-86df8aee-f807-4a0d-aa8e-75bffa368fee > @theia/application-manager > font-awesome-webpack.5-beta.2" has unmet peer dependency "font-awesome@>=4.3.0". error An unexpected error occurred: "ENOENT: no such file or directory, lstat '/home/theia/node_modules/color'". info If you think this is a bug, please open a bug report with the information provided in "/home/theia/yarn-error.log". info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command. Expected results: File and directory creation during yarn's run should succeed. Additional info: $ sudo gluster vol info supervol01 Volume Name: supervol01 Type: Replicate Volume ID: 558944e4-6880-4189-bfaa-dab2206b90bf Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node3:/bricks/supervol01/brick Brick2: node4:/bricks/supervol01/brick Brick3: node5:/bricks/supervol01/brick Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on client.ssl: on server.ssl: on performance.cache-refresh-timeout: 60 performance.cache-size: 134217728 performance.nl-cache: on performance.md-cache-timeout: 300 transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable auto-delete: enable # The 'color' directory is present on all bricks [vagrant@node5 ~]$ sudo getfattr -d -m . -e hex /bricks/supervol01/brick/theia/node_modules/color getfattr: Removing leading '/' from absolute path names # file: bricks/supervol01/brick/theia/node_modules/color security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x935efacfacf1453c8f14d8b8463c1a83 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 [vagrant@node4 ~]$ sudo getfattr -d -m . -e hex /bricks/supervol01/brick/theia/node_modules/color getfattr: Removing leading '/' from absolute path names # file: bricks/supervol01/brick/theia/node_modules/color security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x935efacfacf1453c8f14d8b8463c1a83 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 [vagrant@node3 ~]$ sudo getfattr -d -m . -e hex /bricks/supervol01/brick/theia/node_modules/color getfattr: Removing leading '/' from absolute path names # file: bricks/supervol01/brick/theia/node_modules/color security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x935efacfacf1453c8f14d8b8463c1a83 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 # The gfid symlinks also appear to be there [vagrant@node3 ~]$ sudo ls -al /bricks/supervol01/brick/.glusterfs/93/5e total 4 drwx------. 2 root root 50 Nov 6 15:50 . drwx------. 154 root root 4096 Nov 6 15:50 .. lrwxrwxrwx. 1 root root 54 Nov 6 15:50 935efacf-acf1-453c-8f14-d8b8463c1a83 -> ../../27/c9/27c9b8eb-f2b1-409a-942f-f52f1f3ebdf2/color # Mount logs from client show [root@jumper glusterfs]# tail -f mnt.log [2018-11-06 20:15:52.072215] I [MSGID: 108002] [afr-common.c:5164:afr_notify] 0-supervol01-replicate-0: Client-quorum is met [2018-11-06 20:15:52.072548] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 0-supervol01-client-1: Server lk version = 1 [2018-11-06 20:15:52.072681] I [MSGID: 114046] [client-handshake.c:1150:client_setvolume_cbk] 0-supervol01-client-0: Connected to supervol01-client-0, attached to remote volume '/bricks/supervol01/brick'. [2018-11-06 20:15:52.072711] I [MSGID: 114047] [client-handshake.c:1161:client_setvolume_cbk] 0-supervol01-client-0: Server and Client lk-version numbers are not same, reopening the fds [2018-11-06 20:15:52.073960] I [MSGID: 114035] [client-handshake.c:121:client_set_lk_version_cbk] 0-supervol01-client-0: Server lk version = 1 [2018-11-06 20:15:52.074218] I [fuse-bridge.c:4216:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2018-11-06 20:15:52.074231] I [fuse-bridge.c:4846:fuse_graph_sync] 0-fuse: switched to graph 0 [2018-11-06 20:15:52.077333] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-supervol01-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 [2018-11-06 20:42:04.068672] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 1357579: MKDIR() /theia/node_modules/babel-generator => -1 (File exists) [2018-11-06 20:42:04.089269] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 1357587: MKDIR() /theia/node_modules/babel-generator => -1 (File exists) [2018-11-06 20:50:45.797291] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 1383150: MKDIR() /theia/node_modules/color => -1 (File exists) [2018-11-06 20:50:45.869101] W [fuse-bridge.c:540:fuse_entry_cbk] 0-glusterfs-fuse: 1383175: MKDIR() /theia/node_modules/cross-spawn => -1 (File exists) # Other info - I didn't notice anything of interest in brick or self-heal logs. - Exiting the container, the same anomaly is visible under /mnt/theia/node_modules/ Re-mounting makes the problem go away: [vagrant@jumper ~]$ ls -al /mnt/theia/node_modules/ | grep color ls: cannot access /mnt/theia/node_modules/color: No such file or directory ls: cannot access /mnt/theia/node_modules/cross-spawn: No such file or directory d?????????? ? ? ? ? ? color drwxr-xr-x. 3 vagrant root 4096 Nov 6 15:50 color-convert drwxr-xr-x. 3 vagrant root 4096 Nov 6 15:50 colormin drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 color-name drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 colornames drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 colors drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 colorspace drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 color-string drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 color-support drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 css-color-names [vagrant@jumper ~]$ sudo umount /mnt/ [vagrant@jumper ~]$ sudo mount -tglusterfs node3:/supervol01 /mnt/ [vagrant@jumper ~]$ ls -al /mnt/theia/node_modules/ | grep color drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 color drwxr-xr-x. 3 vagrant root 4096 Nov 6 15:50 color-convert drwxr-xr-x. 3 vagrant root 4096 Nov 6 15:50 colormin drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 color-name drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 colornames drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 colors drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 colorspace drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 color-string drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 color-support drwxr-xr-x. 2 vagrant root 4096 Nov 6 15:50 css-color-names
Looks like the fix is https://review.gluster.org/#/c/glusterfs/+/21147/ & https://review.gluster.org/#/c/glusterfs/+/21146/ This was fixed by Raghavendra while working on transactional work load support on glusterfs. Not sure if we could get it backported.
(In reply to Amar Tumballi from comment #6) > Looks like the fix is https://review.gluster.org/#/c/glusterfs/+/21147/ & > https://review.gluster.org/#/c/glusterfs/+/21146/ > > This was fixed by Raghavendra while working on transactional work load > support on glusterfs. Not sure if we could get it backported. Those two patches were related to rename atomicity. However, with just the logs attached its difficult to figure whether its a rename atomicity issue. Also, note that rename atomicity errors are transient and will go away if the file is accessed again. However, in this case, even after application failed, ls complained about ENOENT. I would require following data: * strace of application that failed (I think its yarn here). * fusedump (mount glusterfs with option --dump-fuse=<path to dump>, note that the mount script doesn't recognise this option, so please copy the glusterfs mount's cmdline from ps and add --dump-fuse option to that). * Is the error persistent or does it go away after sometime?
The errors seem to disappear after an unpredictable amount of time (can be a minute or so up to many minutes). Requested logs made available via private link. A note about correlating the logs... The fuse dump was taken on the host, so the volume is mounted on /mnt. The strace was run w/in the container, so the gluster volume is on /data. (/mnt/foo == /data/foo)
Based on the strace from comment #8: [rgowdapp@rgowdapp 1647229]$ grep ENOENT strace.out | grep -wvE "stat|mkdir|open|lstat|connect|openat|access|statfs|epoll_ctl" -c 0 [rgowdapp@rgowdapp 1647229]$ grep "rename(" -w strace.out -c 0 * There are no renames. Hence this is not rename atomicity issue and hence patches in comment #6 are unlikely to fix the issue. * Also, ENOENT is not on an fd based operation. Barring openat, all are path based syscalls. Need to eliminate whether openat failed because fd didn't provide access after unlink semantics.
Has any progress being made on this bug since the last update?
(In reply to Atin Mukherjee from comment #11) > Has any progress being made on this bug since the last update? Not much, except for comment #10, which says this is not a rename atomicity issue.
Hello, Do you have any plan to solve this issue ? what is the status ? is this issue wip or plan for the next sprint ? Thanks,
Hello, Sorry to ask again, Could you let us know if there are any plans to investigate this issue ? It is fine if you tell us that this is it not critical or cannot be fixed in a short term. Though, knowing if this is going to be fix in 1 week or 1 month or never would help us to move forward and find other solutions. This issue is quite critical for https://che.openshift.io which promotes Eclipse Che, Openshift but also Gluster as it is being used in the background. Thanks,
(In reply to Sun Tan from comment #14) > Hello, > Sorry to ask again, > Could you let us know if there are any plans to investigate this issue ? > It is fine if you tell us that this is it not critical or cannot be fixed in > a short term. Though, knowing if this is going to be fix in 1 week or 1 > month or never would help us to move forward and find other solutions. This > issue is quite critical for https://che.openshift.io which promotes Eclipse > Che, Openshift but also Gluster as it is being used in the background. > Thanks, Sorry about the inaction on my part. I've not analyzed the data uploaded in comment #8. I'll do so today and then comment on the timelines.
np, thank you, Let us know if you have any update.
Some updates: * I couldn't find any syscalls on the problematic directory from strace output attached: [root@rgowdapp 1647229]# grep -c "/theia/node_modules/color" strace.out 0 * My attempts at recreating the issue failed. [root@rgowdapp docker]# ln -s /usr/libexec/docker/docker-runc-current /usr/bin/docker-runc [root@rgowdapp docker]# docker run -it --rm -v /mnt:/home docker.io/sunix/docker-centos-git-yarn bash bash-4.2$ cd /home/ bash-4.2$ git clone https://github.com/theia-ide/theia.git Cloning into 'theia'... remote: Enumerating objects: 23, done. remote: Counting objects: 100% (23/23), done. remote: Compressing objects: 100% (21/21), done. remote: Total 41666 (delta 5), reused 9 (delta 2), pack-reused 41643 Receiving objects: 100% (41666/41666), 111.26 MiB | 2.26 MiB/s, done. Resolving deltas: 100% (27729/27729), done. Checking out files: 100% (1495/1495), done. bash-4.2$ cd theia/ bash-4.2$ yarn yarn install v1.12.3 [1/5] Validating package.json... [2/5] Resolving packages... [3/5] Fetching packages... warning monaco-languageclient.0: The engine "vscode" appears to be invalid. warning vscode-base-languageclient.0: The engine "vscode" appears to be invalid. info fsevents.4: The platform "linux" is incompatible with this module. info "fsevents.4" is an optional dependency and failed compatibility check. Excluding it from installation. [4/5] Linking dependencies... warning " > istanbul-instrumenter-loader.1" has unmet peer dependency "webpack@^2.0.0 || ^3.0.0 || ^4.0.0". warning " > tslint-language-service.9" has incorrect peer dependency "typescript@>= 2.3.1 < 3". warning "workspace-aggregator-38ceceb9-bedc-4134-855a-7afa827982aa > @theia/application-manager > font-awesome-webpack.5-beta.2" has unmet peer dependency "font-awesome@>=4.3.0". [5/5] Building fresh packages... [-/13] ⠂ waiting... [-/13] ⠂ waiting... [3/13] ⠄ electron [-/13] ⠄ waiting... [-/13] ⠄ waiting... error /home/theia/node_modules/electron: Command failed. Exit code: 1 Command: node install.js Arguments: Directory: /home/theia/node_modules/electron Output: Downloading electron-v2.0.14-linux-x64.zip Error: read ECONNRESET /home/theia/node_modules/electron/install.js:54 throw err ^ bash-4.2$ ls -l /home/theia/node_modules/electron total 372 -rw-r--r--. 1 default root 1060 Dec 12 07:39 LICENSE -rw-r--r--. 1 default root 4563 Dec 12 07:39 README.md -rwxr-xr-x. 1 default root 479 Dec 12 07:39 cli.js -rw-r--r--. 1 default root 366142 Dec 12 07:39 electron.d.ts -rw-r--r--. 1 default root 338 Dec 12 07:39 index.js -rw-r--r--. 1 default root 1887 Dec 12 07:39 install.js drwxr-xr-x. 4 default root 4096 Dec 12 07:45 node_modules -rw-r--r--. 1 default root 814 Dec 12 07:39 package.json bash-4.2$ ls -l /home/theia/node_modules/electron -l total 372 -rw-r--r--. 1 default root 1060 Dec 12 07:39 LICENSE -rw-r--r--. 1 default root 4563 Dec 12 07:39 README.md -rwxr-xr-x. 1 default root 479 Dec 12 07:39 cli.js -rw-r--r--. 1 default root 366142 Dec 12 07:39 electron.d.ts -rw-r--r--. 1 default root 338 Dec 12 07:39 index.js -rw-r--r--. 1 default root 1887 Dec 12 07:39 install.js drwxr-xr-x. 4 default root 4096 Dec 12 07:45 node_modules -rw-r--r--. 1 default root 814 Dec 12 07:39 package.json Note that I ran into an error. But the error was not exactly the same reported in this bug. As can be seen the directory is accessible and the error code is different. Later I changed the loglevel of glusterfs and running yarn was successful (I think there might be a race). * Trying to recreated I removed everything from the mount point, killed docker and repeated the test. This time I got EPERM errors bash-4.2$ ls -l total 68 -rw-r--r--. 1 root root 12030 Oct 6 19:15 anaconda-post.log lrwxrwxrwx. 1 root root 7 Oct 6 19:14 bin -> usr/bin drwxr-xr-x. 5 root root 360 Dec 12 08:22 dev drwxr-xr-x. 1 root root 4096 Dec 12 08:22 etc drwxr-xr-x. 3 root root 4096 Dec 12 08:22 home lrwxrwxrwx. 1 root root 7 Oct 6 19:14 lib -> usr/lib lrwxrwxrwx. 1 root root 9 Oct 6 19:14 lib64 -> usr/lib64 drwxr-xr-x. 2 root root 4096 Apr 11 2018 media drwxr-xr-x. 2 root root 4096 Apr 11 2018 mnt drwxr-xr-x. 2 root root 4096 Apr 11 2018 opt dr-xr-xr-x. 344 root root 0 Dec 12 08:22 proc dr-xr-x---. 1 root root 4096 Nov 22 14:01 root drwxr-xr-x. 1 root root 4096 Dec 12 08:22 run lrwxrwxrwx. 1 root root 8 Oct 6 19:14 sbin -> usr/sbin drwxr-xr-x. 2 root root 4096 Apr 11 2018 srv dr-xr-xr-x. 13 root root 0 Nov 24 14:06 sys drwxrwxrwt. 1 root root 4096 Nov 22 14:03 tmp -rwxr-xr-x. 1 root root 195 Nov 22 14:00 uid_entrypoint drwxr-xr-x. 1 root root 4096 Oct 6 19:14 usr drwxr-xr-x. 1 root root 4096 Oct 6 19:14 var bash-4.2$ touch usr/file touch: cannot touch 'usr/file': Permission denied bash-4.2$ yarn yarn install v1.12.3 warning Skipping preferred cache folder "/home/user/.cache/yarn" because it is not writable. warning Selected the next writable cache folder in the list, will be "/tmp/.yarn-cache-1001". info No lockfile found. [1/4] Resolving packages... [2/4] Fetching packages... [3/4] Linking dependencies... [4/4] Building fresh packages... error Could not write file "/yarn-error.log": "EACCES: permission denied, open '/yarn-error.log'" error An unexpected error occurred: "EACCES: permission denied, mkdir '/node_modules'". info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command. Error: ENOENT: no such file or directory, open '/home/user/.yarnrc' Note that the directory is owned by root, but the user seems to be 'default' bash-4.2$ whoami default This doesn't really look like glusterfs issue. If you've a live setup, I can take a look at what went wrong. Also, see whether you can stat the gfid handle of problematic directory (symbolic links) on each brick. From the bug description it looks like gfid-handle couldn't be resolved.