Bug 2060513

Summary: Nodejs includes some pre-compiled wasm code
Product: [Fedora] Fedora Reporter: Honza Horak <hhorak>
Component: nodejsAssignee: NodeJS Packaging SIG <nodejs-sig>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: fweimer, midawson, mrunge, nodejs-sig, rlau, sgallagh, thoger, thrcka, zsvetlik
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2133785 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2111861, 2121075, 2121093, 2121095, 2133785    

Description Honza Horak 2022-03-03 16:37:47 UTC
Description of problem:

At least in upstream, node.js code includes some pre-compiled code as part of some modules that are bundled in the nodejs. Like undici.js [1]. I don't think this follows the guidelines that require "No inclusion of pre-built binaries or libraries" [2].

I'm not sure yet what to do with that though.

[1] https://github.com/nodejs/node/blob/2d6aea7ff5ba41fb564c719fc896847a31e16d0b/deps/undici/undici.js#L1311-L1324
[2] https://fedoraproject.org/wiki/User:Vondruch/Draft_RawhideGuidelines#No_inclusion_of_pre-built_binaries_or_libraries

Comment 1 Richard Lau 2022-03-03 16:49:19 UTC
The upstream issue for this is https://github.com/nodejs/node/issues/42199, reported on behalf of another Linux distribution. 

So far the feature using undici has only landed in upstream's master branch and Node.js 17. It hasn't yet landed in LTS Node.js (e.g. 16, 14, 12) but fetch was a heavily requested feature so there is likely to be a community push to include it in Node.js 16 (although it will be behind a run-time flag). On current estimated schedules [1] we'd be looking at May 2022 at the earliest for landing in Node.js 16 as that's the next penciled in Node.js 16 semver-minor release.

[1] https://github.com/nodejs/Release/issues/658

Comment 2 Michael Dawson 2022-03-03 21:13:00 UTC
From my look at undici today to get a bit more up to speed:

1) The source for the generated undici.js file is available in this repo within the Node.js organization - https://github.com/nodejs/undici
2) The file which is in Node.js itself can be generated by:
  a) clone https://github.com/nodejs/undici
  b) cd to undici
  c) check out the corresponding version as identified in appropriate update in Node.js repo - https://github.com/nodejs/node/commit/dace5a804ead8bda32bf3d31af8323ede9e0b0a8 which identifies the specific hash that was used to build the file 
  d) npm install 
  e) npm run build:node
3) When I compared the result of doing that against what was checked it, it diff reported it as exactly the same
4) There is package-lock.json in the undici repo which I think would ensure the same versions of the tools were used to build the file

From that it seems like there is a reasonable likelihood that the same file content would result each time.

I repeated the process on one of the community s390 machines (which is BE) and it worked and generated the same file.

Comment 3 Michael Dawson 2022-03-03 23:07:01 UTC
Digging a bit deeper, I think the generation of the undici.js file only bundles/packages JavaScript and WASM that was already built/checked into the undici repository in https://github.com/nodejs/undici/tree/main/lib/llhttp.

Building the WASM itself include:

1) The C source for llhttp (which is vendored in from (https://github.com/nodejs/llhttp)  is in https://github.com/nodejs/undici/tree/main/deps/llhttp
2) The docker file https://github.com/nodejs/undici/blob/main/build/Dockerfile also pulls in a version of wasi-sdk
3) running npm run build:wasm rebuilds the wasm in a docker container and copies back out to the external file system (I had trouble with this due to userid mismatch)

Once I ran this I ended up with wasm files that diff told me were the same as the ones that were checked in.


The net of this however, is that suggesting that the WASM be built as part of the Node.js build process is not going to be reasonable. Not even considering the added complexity it would add to the overall build process, it requires docker, and that does not run on all of the supported platforms.

Comment 4 Michael Dawson 2022-03-03 23:30:34 UTC
A few questions I'd have so far:

1) Would having a copy of the undici repo vectored into the Node.js repo improve the situation? the undici repo is already in the Node.js org and it seems like the hash for the version in the undici repo is available. I think that is what the upstream issue https://github.com/nodejs/node/issues/42199 is asking for but not quite sure how it helps. 

2) How do we figure out if this case falls into https://fedoraproject.org/wiki/User:Vondruch/Draft_RawhideGuidelines#Use_of_pregenerated_code, or https://fedoraproject.org/wiki/User:Vondruch/Draft_RawhideGuidelines#No_inclusion_of_pre-built_binaries_or_libraries?  I can see arguments for either

Comment 5 Honza Horak 2022-03-04 08:13:19 UTC
(In reply to Honza Horak from comment #0)
> Draft_RawhideGuidelines#No_inclusion_of_pre-built_binaries_or_libraries

This is likely a better link for this:
https://docs.fedoraproject.org/en-US/packaging-guidelines/what-can-be-packaged/#prebuilt-binaries-or-libraries

Comment 6 Michael Dawson 2022-03-04 15:47:47 UTC
Looking at the motivations from the link:

> Compiler Flags: Pre-packaged program binaries and program libraries not built from the source code were probably not compiled with standard Fedora compiler flags for security and optimization.

I don't think this one applies in the WASM case as as there should be no Fedora specific flags etc. as the same WASM should just run across any platform and there are no OS specific options as farm as I know.

> Security: Pre-packaged program binaries and program libraries not built from the source code could contain parts that are malicious, dangerous, or just broken. Also, these are functionally impossible to patch.

This seems to be the part that would be a concern. The impossible to patch, is valid unless we have validated that we know how to build/replace. The malicious, dangerous part I assume is because it's hard/impossible to review what was committed. In the current context it is possible to review the original source since it's available so the concern would only be that whoever build/PR'd in the updated file maliciously added files that do not match the source that is claimed to be what was used. broken I don't think applies, since the functionality will be tested by the community test suite just as well as if the c code had been compiled instead of the process to build the WASM added.

Comment 7 Stephen Gallagher 2022-03-04 16:20:29 UTC
An option might be to build a separate undici package and have the Node.js packages BuildRequires: those to pull in the binaries produced from that package.


(In reply to Michael Dawson from comment #6)
> Looking at the motivations from the link:
> 
> > Compiler Flags: Pre-packaged program binaries and program libraries not built from the source code were probably not compiled with standard Fedora compiler flags for security and optimization.
> 
> I don't think this one applies in the WASM case as as there should be no
> Fedora specific flags etc. as the same WASM should just run across any
> platform and there are no OS specific options as farm as I know.
> 

Maybe none *today*, but that doesn't mean never.

> > Security: Pre-packaged program binaries and program libraries not built from the source code could contain parts that are malicious, dangerous, or just broken. Also, these are functionally impossible to patch.
> 
> This seems to be the part that would be a concern. The impossible to patch,
> is valid unless we have validated that we know how to build/replace.

A patch to replace one pre-compiled binary with another is itself problematic (both in terms of the size of the patch and that it will ALSO be running afoul of this policy)

> The
> malicious, dangerous part I assume is because it's hard/impossible to review
> what was committed. In the current context it is possible to review the
> original source since it's available so the concern would only be that
> whoever build/PR'd in the updated file maliciously added files that do not
> match the source that is claimed to be what was used.

Or that the compiler used was maliciously replaced, etc.

> broken I don't think
> applies, since the functionality will be tested by the community test suite
> just as well as if the c code had been compiled instead of the process to
> build the WASM added.

In this case, it usually means something like "it works fine in the upstream environment but doesn't play well with older/newer runtime libraries".

Comment 8 Michael Dawson 2022-03-04 18:19:25 UTC
> In this case, it usually means something like "it works fine in the upstream environment but doesn't play well with older/newer runtime libraries".

I don't quite follow. WASM runs in the V8 runtime and would not directly depend on any older/newer libraries

Comment 9 Michael Dawson 2022-03-04 18:22:40 UTC
> Maybe none *today*, but that doesn't mean never.

The whole purpose of WASM is that is be able to run across different environments. Any OS specific flags, etc. I think would apply to the runtime (V8 within Node.js in this case) on which it runs, otherwise the whole model is broken.

Comment 10 Michael Dawson 2022-03-07 16:34:51 UTC
https://github.com/nodejs/node/pull/42246 is adding the source code to the core repo itself, and a script that automates updates.

Comment 11 Tomas Hoger 2022-03-16 15:31:31 UTC
file / libmagic identifies wasm files by having \0asm as first four bytes:

https://github.com/file/file/blob/master/magic/Magdir/webassembly

Base64 encoded versions would start with AGFzb.  Searching for strings (or files) starting with that may be a way to detect wasm blobs being added to nodejs sources.

Comment 12 Stephen Gallagher 2022-03-17 16:02:25 UTC
From 16.14.0:

```
$ find . -path ./.git -prune -o -print -exec file {} \; |grep WebAssembly
./deps/v8/test/fuzzer/wasm/regress-1115280.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/test/fuzzer/wasm/regress-1127717.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/test/fuzzer/wasm/regress-1191853.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/test/fuzzer/wasm_async/regression-761784.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/test/fuzzer/wasm_async/valid.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/test/fuzzer/wasm_async/regress-1115431.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/test/mjsunit/wasm/incrementer.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/callback.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/finalize.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/global.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/hello.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/memory.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/reflect.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/serialize.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/table.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/threads.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/trap.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/hostref.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/multi.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./deps/v8/third_party/wasm-api/example/start.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/fixtures/es-modules/simple.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/fixtures/shared-memory.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/fixtures/simple.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/cant_dotdot.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/clock_getres.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/exitcode.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/fd_prestat_get_refresh.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/follow_symlink.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/getentropy.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/getrusage.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/gettimeofday.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/notdir.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/preopen_populates.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/read_file.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/read_file_twice.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/stdin.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/symlink_escape.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/symlink_loop.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/write_file.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/create_symlink.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/freopen.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/link.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/main_args.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/ftruncate.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/poll.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/readdir.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
./test/wasi/wasm/stat.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
```

And searching for strings:
```
$ grep -lr AGFzb 
deps/cjs-module-lexer/dist/lexer.js
deps/cjs-module-lexer/dist/lexer.mjs
```

From that, it doesn't look as if any of the *.wasm files are actually used at runtime. I think it's just the cjs-module-lexer embedded code that we need to worry about.

Comment 13 Michael Dawson 2022-05-17 22:01:08 UTC
Current search with "grep -lr AGFzb" on community source shows:

```
cjs-module-lexer/dist/lexer.mjs
cjs-module-lexer/dist/lexer.js
undici/src/lib/llhttp/llhttp_simd.wasm.js
undici/src/lib/llhttp/llhttp.wasm.js
undici/undici.js
```

which I think does capture the instances of wasm at this point.

Comment 15 Ben Cotton 2022-08-09 13:13:35 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle.
Changing version to 37.