Bug 1392236

Summary: systemd-232 broke composing
Product: [Fedora] Fedora Reporter: Dennis Gilmore <dennis>
Component: systemdAssignee: systemd-maint
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: johannbg, kevin, lnykryn, mschmidt, msekleta, muadda, ssahani, s, systemd-maint, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-11 20:40:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mock output from lorax run with systemd-232-4.fc26
none
kernel-install: avoid process substitution none

Description Dennis Gilmore 2016-11-06 16:18:52 UTC
Description of problem:
I have untagged all of the systemd-232 builds from rawhide as they broke the composes. lorax builds the runtime environment for anaconda. some changes in systemd-232 resulted in the kernel being installed in the bootloaderspec locations and not the traditional locations resulting in depmod being unable to find the files necessary as they were in different locations.


PLease fix up systemd packaging to ensure that it works as it did previously.

Comment 1 Zbigniew Jędrzejewski-Szmek 2016-11-06 16:26:28 UTC
Can you point me at some logs, or the failed image, or whatever? Or a command-line that I can run to reproduce the problem?

What systemd rpm did this use? There is systemd-232-1.fc26 and systemd-232-2.fc26 and systemd-232-3.fc26.
Was grubby installed, what version?

Comment 3 Zbigniew Jędrzejewski-Szmek 2016-11-07 05:56:54 UTC
232-1 was broken because /usr/bin/kernel-install had a syntax error
232-2 was disliked by rpm because it had %ghost %dir
232-3 was broken because it had %pre/%post scripts with unexpanded macros, because it was built in a chroot with systemd-232-2, which rpm couldn't unpack properly. Since you have untagged 232-2, simply building systemd-232-4 will be enough to fix the %pre/%post scripts.

I built 232-4 as scratch in koji [1], and it seems to work fine. In particular the modules and kernel are installed in the expected location, depmod works. So maybe everything is OK now. If you don't object, I'll rebuild 232-4 officially tomorrow, to have it in another compose.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=16328109

Comment 4 Dennis Gilmore 2016-11-07 15:03:28 UTC
sure, worth a try

Comment 5 Dennis Gilmore 2016-11-08 18:24:11 UTC
Created attachment 1218644 [details]
mock output from lorax run with systemd-232-4.fc26

with systemd-232-4.fc26 installed depmod failed. something in the logic changes for kernel-install is not right

Comment 6 Kevin Fenzi 2016-11-08 18:33:35 UTC
It looks to me like it might be installing using the bootloader spec and then loraxes depmod fails because System.map isn't in the right place. 

Did anything change around how it detects if it should use that or not?

Comment 7 Zbigniew Jędrzejewski-Szmek 2016-11-08 18:55:35 UTC
The way that kernel-install detects what to do was reimplemented: there was an ugly downstream patch for grubby support in Fedora, and that was replaced by upstream logic to allow "plugins" to skip other plugins, and a Fedora "plugin" to use that if grubby is detected. But the end result was supposed to be exactly the same.

It would be good to look into this lorax chroot and see why it's failing... In the attached logs there are no details.

Comment 8 Michal Schmidt 2016-12-16 17:17:08 UTC
Created attachment 1232648 [details]
kernel-install: avoid process substitution

The problem is the RPM scriptlets (and hence kernel-install) are run in a chroot with no filesystems mounted, particularly no /dev and no /proc.

kernel-install relies on bash process substitution, e.g.:
===
dropindirs_sort()
{
#...
    readarray -t files < <(
        for d in "$@"; do
            for i in "$d/"*"$suffix"; do
                if [[ -e "$i" ]]; then
                    echo "${i##*/}"
                fi
            done
        done | sort -Vu
    )
#...
}
#...
readarray -t PLUGINS < <(
    dropindirs_sort ".install" \
        "/etc/kernel/install.d" \
        "/usr/lib/kernel/install.d"
)
===

bash implements process substitution using /dev/fd/N. (This is even documented in man bash. It also mentions named pipes as an alternative implementation, but that's a build-time decision. /dev/fd/N is used if it's detected by bash's ./configure.)

Attached is a patch to use here-strings instead.

Comment 9 Michal Schmidt 2016-12-16 17:29:06 UTC
The code using process substitution was present in kernel-install before v232, but it would not be reached in this situation, because the script would exit early, from:

if [[ -x /sbin/new-kernel-pkg ]]; then
#...
    # exit, if we can't find a boot loader spec conforming setup
    if ! [[ -d /boot/loader/entries || -L /boot/loader/entries ]]; then
        exit 0
    fi
fi

Comment 10 Kevin Fenzi 2017-01-02 19:31:34 UTC
Shall we apply this patch and see what happens with composes now? :)

Comment 11 Kevin Fenzi 2017-01-06 19:17:14 UTC
I've gone ahead and added this patch to a build and we are doing a rawhide compose to test it out now.

Comment 12 Kevin Fenzi 2017-01-08 16:56:32 UTC
ok. Applying that and also: 

@@ -460,7 +460,7 @@ if [ -f /etc/nsswitch.conf ] ; then
                 /^hosts:/ !b
                 /\<myhostname\>/ b
                 s/[[:blank:]]*$/ myhostname/
-                ' /etc/nsswitch.conf &>/dev/null
+                ' /etc/nsswitch.conf >/dev/null 2>&1 || :
 
         # remove mymachines from passwd and group lines of /etc/nsswitch.conf
         # https://bugzilla.redhat.com/show_bug.cgi?id=1284325
@@ -470,19 +470,19 @@ if [ -f /etc/nsswitch.conf ] ; then
         sed -i.bak -r -e '
                 s/^(passwd:.*) mymachines$/\1/;
                 s/^(group:.*) mymachines$/\1/;
-                ' /etc/nsswitch.conf &>/dev/null
+                ' /etc/nsswitch.conf >/dev/null 2>&1 || :
 
         # Add [!UNAVAIL=return] after resolve
         grep -E -q '^hosts:.*resolve[[:space:]]*($|[[:alpha:]])' /etc/nsswitch.conf &&
         sed -i.bak -e '
                 /^hosts:/ { s/resolve/& [!UNAVAIL=return]/}
-                ' /etc/nsswitch.conf &>/dev/null
+                ' /etc/nsswitch.conf >/dev/null 2>&1 || :
 
         # Add nss-systemd to passwd and group
         grep -E -q '^(passwd|group):.* systemd' /etc/nsswitch.conf ||
         sed -i.bak -r -e '
                 s/^(passwd|group):(.*)/\1: \2 systemd/
-                ' /etc/nsswitch.conf &>/dev/null
+                ' /etc/nsswitch.conf >/dev/null 2>&1 || :
 fi

gets composes working again. Without the above, there's scriptlet errors and anaconda exits. The other more correct fix for the above would be to add 'Requires(pre): grep sed' so those are installed when the libs post runs. I didn't do that because I wasn't sure if that would cause issues for small images. 

I guess we can close this now. There's some more issues to work though, but composes now work and this version is out in rawhide now. Or we can keep it open to make sure these changes make their way upstream...

Comment 13 Zbigniew Jędrzejewski-Szmek 2017-01-11 20:40:41 UTC
Michal, Kevin, thanks!

I don't understand why "&>/dev/null" does not work. bash(1) describes them as being exactly equivalent. Alas.

> There's some more issues to work though
Can you provide more details?

I submitted the kernel-install patch upstream as https://github.com/systemd/systemd/pull/5068.

Comment 14 Kevin Fenzi 2017-01-11 22:04:30 UTC
Well, the important part is the || : (which makes it always return 0/success. 

If there's no 'Requires(pre): grep sed' then those aren't going to be in the install root when doing an initial install and will error out.

Comment 15 Zbigniew Jędrzejewski-Szmek 2017-01-11 22:29:10 UTC
Ah, OK. You even wrote that in comment #12. I don't think Requires(pre) is needed, since those scriptlets are run %post. I added Requires(post) now.

Comment 16 Zbigniew Jędrzejewski-Szmek 2017-01-11 22:48:19 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #13)
> I don't understand why "&>/dev/null" does not work. bash(1) describes them
> as being exactly equivalent. Alas.
Oh, "&>" is fine. Somehow I didn't see that you also added "|| :". Now I feel stupid. So the change to replace "&>" can be reverted.