Bug 1773216 - speed up /usr/lib/rpm/redhat/brp-mangle-shebangs
Summary: speed up /usr/lib/rpm/redhat/brp-mangle-shebangs
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: redhat-rpm-config
Version: rawhide
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Florian Festi
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-16 16:58 UTC by Denys Vlasenko
Modified: 2019-12-03 14:35 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-03 14:35:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Denys Vlasenko 2019-11-16 16:58:48 UTC
For kernel builds, /usr/lib/rpm/redhat/brp-mangle-shebangs runs for 25 seconds.
It's mainly caused by this loop executing "file" and "grep" for every file in the tree:

fail=0
while IFS= read -r -d $'\0' f; do
  file -N --mime-type "$f" | grep -q -P ".+(?=: text/)" || continue
...
done < <(find -executable -type f -print0)

exit $fail

The following replacement runs one "file" and one "grep" process to examine all files:

find -executable -type f ! -path '*:*' ! -path $'*\n*' \
| file -N --mime-type -f - \
| grep -P ".+(?=: text/)" \
| {
fail=0
while IFS= read -r line; do
  f=${line%%:*}
...
done

exit $fail
}

The complicated "find" invocation is skipping any files with colons or newlines in their names.

Tested by me to work, and completes in 3 seconds.

Comment 1 Denys Vlasenko 2019-11-16 17:11:27 UTC
This script also has another issue: unsafe/incorrect command expansion:

trim() {
  printf '%s' "$*"
}
...
  read shebang_line < "$f" || :
  orig_shebang=$(trim $(echo "$shebang_line" | grep -Po "#!\K.*" || echo))

The "trimming", i.e. replacement of multiple spaces and removal of leading and trailing spaces, is achieved because "trim $(cmd)" construct has an unquoted $(), which is subject to word splitting.
BUT!
It is also subject to glob expansion - any ?s and *s will be attempted to be expanded as well - definitely NOT what we want!

I propose to replace this code

  read shebang_line < "$f" || :
  orig_shebang=$(trim $(echo "$shebang_line" | grep -Po "#!\K.*" || echo))
  shebang="$orig_shebang"
  if [ -n "$exclude_shebangs" ]; then
    echo "$shebang" | grep -q -E "$exclude_shebangs" && continue
  fi
  if [ -n "$exclude_shebangs_from" ]; then
    echo "$shebang" | grep -q -E -f "$exclude_shebangs_from" && continue
  fi

  if [ -z "$shebang" ]; then
    echo >&2 "*** WARNING: $f is executable but has empty or no shebang, removing executable bit"
    chmod -x "$f"
    touch -d "$ts" "$f"
    continue
  elif [ -n "${shebang##/*}" ]; then
    echo >&2 "*** ERROR: $f has shebang which doesn't start with '/' ($shebang)"
    fail=1
    continue
  fi

with a code which avoids the expansion issue, and which does not spawn any subprocesses for string manipulations (at the cost of being more elaborate):

  read shebang_line < "$f"
  orig_shebang="${shebang_line#\#!}"
  if [ "$orig_shebang" = "$shebang_line" ]; then
    echo >&2 "*** WARNING: $f is executable but has no shebang, removing executable bit"
    chmod -x "$f"
    touch -d "$ts" "$f"
    continue
  fi

  # Trim spaces
  while shebang="${orig_shebang//  / }"; [ "$shebang" != "$orig_shebang" ]; do
    orig_shebang="$shebang"
  done
  # Treat "#! /path/to " as "#!/path/to"
  orig_shebang="${orig_shebang# }"

  shebang="$orig_shebang"

  if [ -z "$shebang" ]; then
    echo >&2 "*** WARNING: $f is executable but has empty shebang, removing executable bit"
    chmod -x "$f"
    touch -d "$ts" "$f"
    continue
  fi

Comment 2 Denys Vlasenko 2019-11-16 17:13:19 UTC
while we are at it, running "stat" tool for every file:

ts=$(stat -c %y "$f")

is not necessary, can improve this by running it only when we know we will be using $ts value (it is used to "fix up" mtime of the files if we modify them).

Comment 3 Panu Matilainen 2019-11-20 08:02:11 UTC
Moving to redhat-rpm-config where brp-mangle-shebangs comes from.

Nice speed-ups. The rpm-team has other priorities at this time though, the best way to drive this forward is to submit a PR on redhat-rpm-config.

Comment 4 Igor Raits 2019-11-20 11:07:11 UTC
Yes, please send PR to redhat-rpm-config.

Comment 5 Denys Vlasenko 2019-11-21 11:17:03 UTC
(In reply to Igor Gnatenko from comment #4)
> Yes, please send PR to redhat-rpm-config.

I don't understand the above. Do you want me to open another BZ/ticket/issue/whatever somewhere else? Where?

Comment 6 Florian Festi 2019-11-21 11:36:09 UTC
redhat-rpm-config is special as the Fedora distgit is the actual upstream repository. So changes to redhat-rpm-config can be made my directly open pull requests against the Fedora package.

Comment 8 Panu Matilainen 2019-12-03 14:35:38 UTC
PR merged now, closing. Thanks for the patch!


Note You need to log in before you can comment on or make changes to this bug.