Bug 1773216

Summary: speed up /usr/lib/rpm/redhat/brp-mangle-shebangs
Product: [Fedora] Fedora Reporter: Denys Vlasenko <dvlasenk>
Component: redhat-rpm-configAssignee: Florian Festi <ffesti>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: ajax, ffesti, fweimer, igor.raits, john.j5live, jonathan, j, mjw, packaging-team-maint, pmatilai, pmoravco, praiskup, vmukhame
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-03 14:35:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Denys Vlasenko 2019-11-16 16:58:48 UTC
For kernel builds, /usr/lib/rpm/redhat/brp-mangle-shebangs runs for 25 seconds.
It's mainly caused by this loop executing "file" and "grep" for every file in the tree:

fail=0
while IFS= read -r -d $'\0' f; do
  file -N --mime-type "$f" | grep -q -P ".+(?=: text/)" || continue
...
done < <(find -executable -type f -print0)

exit $fail

The following replacement runs one "file" and one "grep" process to examine all files:

find -executable -type f ! -path '*:*' ! -path $'*\n*' \
| file -N --mime-type -f - \
| grep -P ".+(?=: text/)" \
| {
fail=0
while IFS= read -r line; do
  f=${line%%:*}
...
done

exit $fail
}

The complicated "find" invocation is skipping any files with colons or newlines in their names.

Tested by me to work, and completes in 3 seconds.

Comment 1 Denys Vlasenko 2019-11-16 17:11:27 UTC
This script also has another issue: unsafe/incorrect command expansion:

trim() {
  printf '%s' "$*"
}
...
  read shebang_line < "$f" || :
  orig_shebang=$(trim $(echo "$shebang_line" | grep -Po "#!\K.*" || echo))

The "trimming", i.e. replacement of multiple spaces and removal of leading and trailing spaces, is achieved because "trim $(cmd)" construct has an unquoted $(), which is subject to word splitting.
BUT!
It is also subject to glob expansion - any ?s and *s will be attempted to be expanded as well - definitely NOT what we want!

I propose to replace this code

  read shebang_line < "$f" || :
  orig_shebang=$(trim $(echo "$shebang_line" | grep -Po "#!\K.*" || echo))
  shebang="$orig_shebang"
  if [ -n "$exclude_shebangs" ]; then
    echo "$shebang" | grep -q -E "$exclude_shebangs" && continue
  fi
  if [ -n "$exclude_shebangs_from" ]; then
    echo "$shebang" | grep -q -E -f "$exclude_shebangs_from" && continue
  fi

  if [ -z "$shebang" ]; then
    echo >&2 "*** WARNING: $f is executable but has empty or no shebang, removing executable bit"
    chmod -x "$f"
    touch -d "$ts" "$f"
    continue
  elif [ -n "${shebang##/*}" ]; then
    echo >&2 "*** ERROR: $f has shebang which doesn't start with '/' ($shebang)"
    fail=1
    continue
  fi

with a code which avoids the expansion issue, and which does not spawn any subprocesses for string manipulations (at the cost of being more elaborate):

  read shebang_line < "$f"
  orig_shebang="${shebang_line#\#!}"
  if [ "$orig_shebang" = "$shebang_line" ]; then
    echo >&2 "*** WARNING: $f is executable but has no shebang, removing executable bit"
    chmod -x "$f"
    touch -d "$ts" "$f"
    continue
  fi

  # Trim spaces
  while shebang="${orig_shebang//  / }"; [ "$shebang" != "$orig_shebang" ]; do
    orig_shebang="$shebang"
  done
  # Treat "#! /path/to " as "#!/path/to"
  orig_shebang="${orig_shebang# }"

  shebang="$orig_shebang"

  if [ -z "$shebang" ]; then
    echo >&2 "*** WARNING: $f is executable but has empty shebang, removing executable bit"
    chmod -x "$f"
    touch -d "$ts" "$f"
    continue
  fi

Comment 2 Denys Vlasenko 2019-11-16 17:13:19 UTC
while we are at it, running "stat" tool for every file:

ts=$(stat -c %y "$f")

is not necessary, can improve this by running it only when we know we will be using $ts value (it is used to "fix up" mtime of the files if we modify them).

Comment 3 Panu Matilainen 2019-11-20 08:02:11 UTC
Moving to redhat-rpm-config where brp-mangle-shebangs comes from.

Nice speed-ups. The rpm-team has other priorities at this time though, the best way to drive this forward is to submit a PR on redhat-rpm-config.

Comment 4 Igor Raits 2019-11-20 11:07:11 UTC
Yes, please send PR to redhat-rpm-config.

Comment 5 Denys Vlasenko 2019-11-21 11:17:03 UTC
(In reply to Igor Gnatenko from comment #4)
> Yes, please send PR to redhat-rpm-config.

I don't understand the above. Do you want me to open another BZ/ticket/issue/whatever somewhere else? Where?

Comment 6 Florian Festi 2019-11-21 11:36:09 UTC
redhat-rpm-config is special as the Fedora distgit is the actual upstream repository. So changes to redhat-rpm-config can be made my directly open pull requests against the Fedora package.

Comment 8 Panu Matilainen 2019-12-03 14:35:38 UTC
PR merged now, closing. Thanks for the patch!