Description of problem: When rh-nodejs10-npm is update to the latest version it becomes broken, however only inside the container. On machine itself npm --version works (same version) Version-Release number of selected component (if applicable): rh-nodejs10-npm-6.9.0-10.16.3.3.el7.s390x e.g. How reproducible: always Steps to Reproduce: 1. podman run --user 0 -ti registry.access.redhat.com/rhscl/python-27-rhel7 bash 2. (app-root)yum update rh-nodejs10-npm 3. (app-root)rpm -qf /opt/rh/rh-nodejs10/root/usr/bin/npm rh-nodejs10-npm-6.9.0-10.16.3.3.el7.s390x 4. npm --version Actual results: internal/modules/cjs/loader.js:638 throw err; ^ Error: Cannot find module '../lib/utils/unsupported.js' at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15) at Function.Module._load (internal/modules/cjs/loader.js:562:25) at Module.require (internal/modules/cjs/loader.js:692:17) at require (internal/modules/cjs/helpers.js:25:18) at /opt/rh/rh-nodejs10/root/usr/bin/npm:19:21 at Object.<anonymous> (/opt/rh/rh-nodejs10/root/usr/bin/npm:152:3) at Module._compile (internal/modules/cjs/loader.js:778:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10) at Module.load (internal/modules/cjs/loader.js:653:32) at tryModuleLoad (internal/modules/cjs/loader.js:593:12) Additional info: :latest is rhscl/python-27-rhel7:2.7-67.1568971388 (app-root) env MANPATH=/opt/rh/python27/root/usr/share/man:/opt/rh/rh-nodejs10/root/usr/share/man:/opt/rh/httpd24/root/usr/share/man: HOSTNAME=35d2f42fae7a APP_ROOT=/opt/app-root PIP_NO_CACHE_DIR=off TERM=xterm LIBRARY_PATH=/opt/rh/httpd24/root/usr/lib64 PYTHONUNBUFFERED=1 DESCRIPTION=Python 2.7 available as container is a base platform for building and running various Python 2.7 applications and frameworks. Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python's elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms. NODEJS_SCL=rh-nodejs10 X_SCLS=python27 rh-nodejs10 httpd24 LC_ALL=en_US.UTF-8 PYTHONIOENCODING=UTF-8 LD_LIBRARY_PATH=/opt/rh/python27/root/usr/lib64:/opt/rh/rh-nodejs10/root/usr/lib64:/opt/rh/httpd24/root/usr/lib64 VIRTUAL_ENV=/opt/app-root PYTHON_VERSION=2.7 PATH=/opt/app-root/bin:/opt/rh/python27/root/usr/bin:/opt/rh/rh-nodejs10/root/usr/bin:/opt/rh/httpd24/root/usr/bin:/opt/rh/httpd24/root/usr/sbin:/opt/app-root/src/.local/bin/:/opt/app-root/src/bin:/opt/app-root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin STI_SCRIPTS_URL=image:///usr/libexec/s2i PWD=/opt/app-root/src STI_SCRIPTS_PATH=/usr/libexec/s2i LANG=en_US.UTF-8 SUMMARY=Platform for building and running Python 2.7 applications PS1=(app-root) PLATFORM=el7 HOME=/opt/app-root/src SHLVL=1 PYTHONPATH=/opt/rh/rh-nodejs10/root/usr/lib/python2.7/site-packages XDG_DATA_DIRS=/opt/rh/python27/root/usr/share:/usr/local/share:/usr/share PKG_CONFIG_PATH=/opt/rh/python27/root/usr/lib64/pkgconfig:/opt/rh/httpd24/root/usr/lib64/pkgconfig container=oci _=/usr/bin/env
There are also these two different tracebacks I got: # npm install npm fs.js:114 throw err; ^ Error: EINVAL: invalid argument, readlink '/opt/rh' at Object.realpathSync (fs.js:1486:9) at toRealPath (internal/modules/cjs/loader.js:241:13) at Function.Module._findPath (internal/modules/cjs/loader.js:323:22) at Function.Module._resolveFilename (internal/modules/cjs/loader.js:633:25) at Function.Module._load (internal/modules/cjs/loader.js:562:25) at Function.Module.runMain (internal/modules/cjs/loader.js:831:12) at startup (internal/bootstrap/node.js:283:19) at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3) # npm --version internal/modules/cjs/loader.js:638 throw err; ^ Error: Cannot find module '/opt/rh/rh-nodejs10/root/usr/bin/npm' at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15) at Function.Module._load (internal/modules/cjs/loader.js:562:25) at Function.Module.runMain (internal/modules/cjs/loader.js:831:12) at startup (internal/bootstrap/node.js:283:19) at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3) I really don't understand why it is not always the same, even for the same command..
This issue is currently blocking the containers rebuild, so raising the priority. Info found so far: * this issue is only visible in the newest update: $> rpm -qa|grep node rh-nodejs10-runtime-3.2-2.el7.s390x rh-nodejs10-npm-6.9.0-10.16.3.3.el7.s390x rh-nodejs10-nodejs-10.16.3-3.el7.s390x * it only is reproducible only inside a s390x container image (no matter what image it is -- python, nodejs, ...)
I tried to reproduce in the mock, but it works there fine. So what can be different in the container image?
Interesting finding: it starts to work once I run the container as --privileged. Not sure why yet.
Some more info I found: * the problem started to occur with v10.16, which got a rebased npm from 6.4.1 to 6.9.0, so I expect the change happened in npm between those two versions * RHOAR nodejs builds behave the same, so it is probably not related * I don't understand that while --privileged helps, buy using capabilities separately does not help: docker run -ti --rm -u 0 --cap-add=chown --cap-add=dac_override --cap-add=dac_read_search --cap-add=fowner --cap-add=fsetid --cap-add=kill --cap-add=setgid --cap-add=setuid --cap-add=setpcap --cap-add=linux_immutable --cap-add=net_bind_service --cap-add=net_broadcast --cap-add=net_admin --cap-add=net_raw --cap-add=ipc_lock --cap-add=ipc_owner --cap-add=sys_module --cap-add=sys_rawio --cap-add=sys_chroot --cap-add=sys_ptrace --cap-add=sys_pacct --cap-add=sys_admin --cap-add=sys_boot --cap-add=sys_nice --cap-add=sys_resource --cap-add=sys_time --cap-add=sys_tty_config --cap-add=mknod --cap-add=lease --cap-add=audit_write --cap-add=audit_control --cap-add=setfcap --cap-add=mac_override --cap-add=mac_admin --cap-add=syslog --cap-add=wake_alarm --cap-add=block_suspend registry.access.redhat.com/rhscl/nodejs-10-rhel7 bash
Created attachment 1623505 [details] Strace output of the npm --version So, the trick is --security-opt=seccomp=unconfined option (used also in privileged), with this option it works: docker run -ti --rm -u 0 --security-opt=seccomp=unconfined registry.access.redhat.com/rhscl/nodejs-10-rhel7 bash Attaching strace output. Still not sure what happens there though.
Another finding: I started with working npm (6.4.1) and node (10.10) and installed a specific version of npm via locally: npm install npm.1 That resulted in a file /node_modules/npm/bin/npm-cli.js. Then, I installed just the node package from rhoar, and found out that: /usr/bin/node /node_modules/npm/bin/npm-cli.js --version worked with 10.15.3 and stopped working with 10.16.0. So, based on that, it looks like the problem is somewhere in the node itself, between version 10.15.3 and 10.16.0. Worth mentioning that the npm-cli.js breaks pretty soon, during one of the require() calls. What is weird, that it is not always the same call.
Dan Horak, we're a bit lost here, I'd appreciate if you or somebody from the team could take a look if you see something suspicious -- we're having issues on s390x in a container (seccomp is somehow involved here, because when turning it off for the container, it starts working). The issue is that we get various nodejs tracebacks, but a common thing is that the node cannot load libraries. Any help is appreciated.
Is it when running the image on RHEL-7 host? How about RHEL-7 Alt?
Not sure what host Honza tried the images on, but for the CI (where it fails as well afaik) we are running RHEL-ALT-7.6-updates-20190722.1 + RHEL-7.6-updates-20190403.1 according to QE
A minimal reproducer with a ubi7/s2i-base image: #> docker run -ti -u 0 registry.access.redhat.com/ubi7/s2i-base bash bash-4.2# yum -y update rh-nodejs10-nodejs bash-4.2# rpm -qa|grep rh-nodejs rh-nodejs10-runtime-3.2-2.el7.s390x rh-nodejs10-nodejs-10.16.3-3.el7.s390x rh-nodejs10-npm-6.9.0-10.16.3.3.el7.s390x bash-4.2# scl enable rh-nodejs10 -- npm --version internal/modules/cjs/loader.js:638 throw err; ^ Error: Cannot find module '../lib/utils/unsupported.js' at Function.Module._resolveFilename (internal/modules/cjs/loader.js:636:15) at Function.Module._load (internal/modules/cjs/loader.js:562:25) at Module.require (internal/modules/cjs/loader.js:692:17) at require (internal/modules/cjs/helpers.js:25:18) at /opt/rh/rh-nodejs10/root/usr/bin/npm:19:21 at Object.<anonymous> (/opt/rh/rh-nodejs10/root/usr/bin/npm:152:3) at Module._compile (internal/modules/cjs/loader.js:778:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10) at Module.load (internal/modules/cjs/loader.js:653:32) at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
https://github.com/nodejs/node/issues/19702 https://github.com/npm/npm/issues/20206 https://github.com/nodejs/node/issues/19997 are previous instances of this. They all seem to have been "fixed" by installing an official node package, or using some other mechanism to get a fresh/unbroken install of npm, indicating that there might be a problem with the redhat build of node, or npm (I'm not sure how RH builds its packages, and how or if it partitions them into). It doesn't appear to be a node.js bug, per se, but I don't find any statement about a root cause. I hope this is useful.
to be clear, I mentioned "redhat build of node" not in the sense of the compile/link of the node binary, but in the sense that node contains npm as a dependency, so node.js builds are one way to get the npm cli.
I have tried the above steps on a RHEL container running on S390 Ubuntu host and it works fine. A very similar issue was being looked into a few month ago by our team where the same error was being produced on a Docker container. If I remember correctly it was related to how Docker handles systems calls between different host kernels. It had trouble doing so in container hence the "path" to a module was not resolved correctly. I'm going to get some more info on this and make sure they are related or not.
Git bisect showed me that the difference is made by this patch: https://github.com/nodejs/node/commit/7e5ef4a0e1 Before this commit, it works fine, after that I start to see this issue. Now trying to look closer what libuv change caused the difference.
Created attachment 1624103 [details] Proposed patch for reverting statx that is working on kernel 4.11 only It looks like the problem is caused by libuv upstram PR: https://github.com/nodejs/node/pull/26707 Which is also this commit: https://github.com/nodejs/node/commit/7e5ef4a0e1 And in nodejs later this commit: https://github.com/nodejs/node/commit/7e5ef4a0e1 Based on the man page and the PR above, statx() is available on kernel 4.11 and later, so reverting this change for older kernel is probably ok.
Thanks for the info Honza, I can confirm that this is the same issue our team has dealt with before. As an alternative solution to reverting the change, using RHEL 8 or upgrading the docker daemon seems to also solve the problem.
(In reply to Milad Farazmand from comment #24) > Thanks for the info Honza, > I can confirm that this is the same issue our team has dealt with before. As > an alternative solution to reverting the change, using RHEL 8 or upgrading > the docker daemon seems to also solve the problem. So can we say this is kernel issue? Do you think it would be useful to let kernel guys take look into this?
Reported to nodejs upstram as https://github.com/nodejs/node/issues/29916
(In reply to Vít Ondruch from comment #27) > (In reply to Milad Farazmand from comment #24) > > Thanks for the info Honza, > > I can confirm that this is the same issue our team has dealt with before. As > > an alternative solution to reverting the change, using RHEL 8 or upgrading > > the docker daemon seems to also solve the problem. > > So can we say this is kernel issue? Do you think it would be useful to let > kernel guys take look into this? It may not be a kernel issue but instead a Docker problem. It has been addresses in this ticket: https://github.com/docker/for-linux/issues/208
Vit, is the docker version that is being used to test tied to the test infrastructure or the OS version being tested. If it is the former is updating the version of docker in the test infra an option?
I've checked the logs and we were using slightly older version of docker (docker-1.13.1-102.git7f2769b.el7.s390x). But we should be running on latest publicly available docker from RHEL-Extras.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3039