Bug 1998924

Summary: brp-mangle-shebangs doesn't mangle shebangs of javascript (and other) executables, as their mime type does not match text/.*
Product: [Fedora] Fedora Reporter: Neal Gompa <ngompa13>
Component: redhat-rpm-configAssignee: Miro Hrončok <mhroncok>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: ajax, ffesti, fweimer, igor.raits, jonathan, j, mhroncok, pmatilai, sergio, torsava, zsvetlik
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: redhat-rpm-config-206-1.fc36 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-08 18:27:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2030427    

Description Neal Gompa 2021-08-30 02:03:42 UTC
Description of problem:
brp-mangle-shebangs seems to incorrectly detect the shebang for the yarn program in yarnpkg and do nothing useful.

It prints the following in the build log:
> mangling shebang in /usr/lib/node_modules/yarn/bin/yarn from /bin/sh to #!/usr/bin/sh

The shebang in yarn.js doesn't get detected at all, for some reason, and it never gets fixed from "#!/usr/bin/env node" to "#!/usr/bin/node".

Version-Release number of selected component (if applicable):
192-1.fc35

How reproducible:
Always

Steps to Reproduce:
1. Rebuild yarnpkg
2. Install built yarnpkg RPM

Actual results:
Nodejs isn't installed with it, so yarn doesn't work.

Expected results:
Nodejs is installed with it, and yarn works.

Additional info:
The build logs can be seen from this build: https://koji.fedoraproject.org/koji/buildinfo?buildID=1805526

Comment 1 Neal Gompa 2021-08-30 02:23:26 UTC
I'm making a workaround PR for this in yarnpkg for now: https://src.fedoraproject.org/rpms/yarnpkg/pull-request/1

Comment 2 Miro Hrončok 2021-08-30 08:43:33 UTC
I am getting a lot of:

npm WARN old lockfile FetchError: request to https://registry.npmjs.org/yn failed, reason: getaddrinfo EAI_AGAIN registry.npmjs.org
npm WARN old lockfile     at ClientRequest.<anonymous> (/usr/lib/node_modules/npm/node_modules/minipass-fetch/lib/index.js:97:14)
npm WARN old lockfile     at ClientRequest.emit (node:events:394:28)
npm WARN old lockfile     at TLSSocket.socketErrorListener (node:_http_client:447:9)
npm WARN old lockfile     at TLSSocket.emit (node:events:406:35)
npm WARN old lockfile     at emitErrorNT (node:internal/streams/destroy:157:8)
npm WARN old lockfile     at emitErrorCloseNT (node:internal/streams/destroy:122:3)
npm WARN old lockfile     at processTicksAndRejections (node:internal/process/task_queues:83:21)
npm WARN old lockfile  Could not fetch metadata for yn.0 FetchError: request to https://registry.npmjs.org/yn failed, reason: getaddrinfo EAI_AGAIN registry.npmjs.org
npm WARN old lockfile     at ClientRequest.<anonymous> (/usr/lib/node_modules/npm/node_modules/minipass-fetch/lib/index.js:97:14)
npm WARN old lockfile     at ClientRequest.emit (node:events:394:28)
npm WARN old lockfile     at TLSSocket.socketErrorListener (node:_http_client:447:9)
npm WARN old lockfile     at TLSSocket.emit (node:events:406:35)
npm WARN old lockfile     at emitErrorNT (node:internal/streams/destroy:157:8)
npm WARN old lockfile     at emitErrorCloseNT (node:internal/streams/destroy:122:3)
npm WARN old lockfile     at processTicksAndRejections (node:internal/process/task_queues:83:21) {
npm WARN old lockfile   code: 'EAI_AGAIN',
npm WARN old lockfile   errno: 'EAI_AGAIN',
npm WARN old lockfile   syscall: 'getaddrinfo',
npm WARN old lockfile   hostname: 'registry.npmjs.org',
npm WARN old lockfile   type: 'system'
npm WARN old lockfile }

Which makes it hard to reproduce, but hopefully the mockbuild will eventually move on.

Comment 3 Miro Hrončok 2021-08-30 09:29:02 UTC
So, the script has:

cd "$RPM_BUILD_ROOT"

# Large packages such as kernel can have thousands of executable files.
# We take care to not fork/exec thousands of "file"s and "grep"s,
# but run just two of them.
# (Take care to exclude filenames which would mangle "file" output).
find -executable -type f ! -path '*:*' ! -path $'*\n*' \
| file -N --mime-type -f - \
| grep -P ".+(?=: text/)" \
| {
... while ....
}



And I've checked that list on yarnpkg's buildroot. It does not contain /usr/lib/node_modules/yarn/bin/yarn.js at all and so it does not contain many others, like /usr/lib/node_modules/yarn/node_modules/uri-js/dist/es5/uri.all.js.map.


The culprit is in `grep -P ".+(?=: text/)"`: That grep filters out many valid use cases, such as:

./usr/lib/node_modules/yarn/bin/yarn.js: application/javascript
./usr/lib/node_modules/yarn/node_modules/uri-js/dist/es5/uri.all.js.map: application/json


So, the mime type detection here is too clever and all files not detected as "text/" are ignored completely, which exactly does not work for executable things, such as nodejs scripts.


I wonder whether we should "simplify" the text/binary detection by using `file --parameter bytes=8` -- my experiments show that this ideal amount of bytes to either detect application/octet-stream or text/plain. As a side effect, it could also make the script faster.

I am only afraid of the impact. This needs a copr impact check.

Comment 4 Florian Weimer 2021-08-30 09:55:53 UTC
Look at what we did with eu-elfclass.  file --mime-type is just not very reliable.

If you want to rewrite shell paths, you should check if a shell path is present at the start of the script, and not rely on some fuzzy notion of MIME type.

Comment 5 Miro Hrončok 2021-08-30 10:08:26 UTC
> If you want to rewrite shell paths, you should check if a shell path is present...

We do, but we only check "text files" -- we need to distinguish between "executable text file without shebang" (we strip the executable bit) and "executable binary file" (we leave it alone).

I'll check out eu-elfclass and see what we do there. Thanks.

Comment 6 Florian Weimer 2021-08-30 10:11:07 UTC
Sorry, typo: eu-elfclassify

It's just a C program that actually parses the file (with a real ELF parser) to determine what it's type is, rather than using the guesses that libmagic makes. (There are still heuristics, but that's to do with ELF, we're not cutting any corners.)

Comment 8 Miro Hrončok 2021-09-22 11:24:13 UTC
I've started a thread on devel (and packaging) list: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/K3QCBUXYR6ZA34I777X6F2RYJKKECJLM/

Comment 9 Sergio Basto 2021-09-25 02:39:44 UTC
(In reply to Miro Hrončok from comment #3)
> So, the script has:
> 
> cd "$RPM_BUILD_ROOT"
> 
> # Large packages such as kernel can have thousands of executable files.
> # We take care to not fork/exec thousands of "file"s and "grep"s,
> # but run just two of them.
> # (Take care to exclude filenames which would mangle "file" output).
> find -executable -type f ! -path '*:*' ! -path $'*\n*' \
> | file -N --mime-type -f - \
> | grep -P ".+(?=: text/)" \
> | {
> ... while ....
> }
> 
> 
> 
> And I've checked that list on yarnpkg's buildroot. It does not contain
> /usr/lib/node_modules/yarn/bin/yarn.js at all and so it does not contain
> many others, like
> /usr/lib/node_modules/yarn/node_modules/uri-js/dist/es5/uri.all.js.map.
> 
> 
> The culprit is in `grep -P ".+(?=: text/)"`: That grep filters out many
> valid use cases, such as:
> 
> ./usr/lib/node_modules/yarn/bin/yarn.js: application/javascript
> ./usr/lib/node_modules/yarn/node_modules/uri-js/dist/es5/uri.all.js.map:
> application/json
> 
> 
> So, the mime type detection here is too clever and all files not detected as
> "text/" are ignored completely, which exactly does not work for executable
> things, such as nodejs scripts.

instead use file --mime-type you may use --mime-encoding or none option 

file /usr/lib/node_modules/yarn/bin/yarn.js  -b
Node.js script, ASCII text executable

file /usr/lib/node_modules/yarn/bin/yarn.js --mime-encoding -b
us-ascii

is mime-encoding that is not binary, I think