Bug 1392236
Summary: | systemd-232 broke composing | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dennis Gilmore <dennis> | ||||||
Component: | systemd | Assignee: | systemd-maint | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | johannbg, kevin, lnykryn, mschmidt, msekleta, muadda, ssahani, s, systemd-maint, zbyszek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-01-11 20:40:41 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Dennis Gilmore
2016-11-06 16:18:52 UTC
Can you point me at some logs, or the failed image, or whatever? Or a command-line that I can run to reproduce the problem? What systemd rpm did this use? There is systemd-232-1.fc26 and systemd-232-2.fc26 and systemd-232-3.fc26. Was grubby installed, what version? http://koji.fedoraproject.org/koji/taskinfo?taskID=16292408 we had different failures with systemd-232-2 due to it failing to install. the behaviour with systemd-232-1 was the same as systemd-232-3 from memory. I spent a whol eday digging into it and hitting different bugs in different states. we were only able to compose by going back to systemd-231 you can look at the various compose logs at https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161104.n.0/ https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161104.n.1 https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161104.n.2/ https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161104.n.3/ https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161104.n.4/ https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161104.n.5/ https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161105.n.0/ https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161105.n.1/ 232-1 was broken because /usr/bin/kernel-install had a syntax error 232-2 was disliked by rpm because it had %ghost %dir 232-3 was broken because it had %pre/%post scripts with unexpanded macros, because it was built in a chroot with systemd-232-2, which rpm couldn't unpack properly. Since you have untagged 232-2, simply building systemd-232-4 will be enough to fix the %pre/%post scripts. I built 232-4 as scratch in koji [1], and it seems to work fine. In particular the modules and kernel are installed in the expected location, depmod works. So maybe everything is OK now. If you don't object, I'll rebuild 232-4 officially tomorrow, to have it in another compose. [1] http://koji.fedoraproject.org/koji/taskinfo?taskID=16328109 sure, worth a try Created attachment 1218644 [details]
mock output from lorax run with systemd-232-4.fc26
with systemd-232-4.fc26 installed depmod failed. something in the logic changes for kernel-install is not right
It looks to me like it might be installing using the bootloader spec and then loraxes depmod fails because System.map isn't in the right place. Did anything change around how it detects if it should use that or not? The way that kernel-install detects what to do was reimplemented: there was an ugly downstream patch for grubby support in Fedora, and that was replaced by upstream logic to allow "plugins" to skip other plugins, and a Fedora "plugin" to use that if grubby is detected. But the end result was supposed to be exactly the same. It would be good to look into this lorax chroot and see why it's failing... In the attached logs there are no details. Created attachment 1232648 [details]
kernel-install: avoid process substitution
The problem is the RPM scriptlets (and hence kernel-install) are run in a chroot with no filesystems mounted, particularly no /dev and no /proc.
kernel-install relies on bash process substitution, e.g.:
===
dropindirs_sort()
{
#...
readarray -t files < <(
for d in "$@"; do
for i in "$d/"*"$suffix"; do
if [[ -e "$i" ]]; then
echo "${i##*/}"
fi
done
done | sort -Vu
)
#...
}
#...
readarray -t PLUGINS < <(
dropindirs_sort ".install" \
"/etc/kernel/install.d" \
"/usr/lib/kernel/install.d"
)
===
bash implements process substitution using /dev/fd/N. (This is even documented in man bash. It also mentions named pipes as an alternative implementation, but that's a build-time decision. /dev/fd/N is used if it's detected by bash's ./configure.)
Attached is a patch to use here-strings instead.
The code using process substitution was present in kernel-install before v232, but it would not be reached in this situation, because the script would exit early, from: if [[ -x /sbin/new-kernel-pkg ]]; then #... # exit, if we can't find a boot loader spec conforming setup if ! [[ -d /boot/loader/entries || -L /boot/loader/entries ]]; then exit 0 fi fi Shall we apply this patch and see what happens with composes now? :) I've gone ahead and added this patch to a build and we are doing a rawhide compose to test it out now. ok. Applying that and also: @@ -460,7 +460,7 @@ if [ -f /etc/nsswitch.conf ] ; then /^hosts:/ !b /\<myhostname\>/ b s/[[:blank:]]*$/ myhostname/ - ' /etc/nsswitch.conf &>/dev/null + ' /etc/nsswitch.conf >/dev/null 2>&1 || : # remove mymachines from passwd and group lines of /etc/nsswitch.conf # https://bugzilla.redhat.com/show_bug.cgi?id=1284325 @@ -470,19 +470,19 @@ if [ -f /etc/nsswitch.conf ] ; then sed -i.bak -r -e ' s/^(passwd:.*) mymachines$/\1/; s/^(group:.*) mymachines$/\1/; - ' /etc/nsswitch.conf &>/dev/null + ' /etc/nsswitch.conf >/dev/null 2>&1 || : # Add [!UNAVAIL=return] after resolve grep -E -q '^hosts:.*resolve[[:space:]]*($|[[:alpha:]])' /etc/nsswitch.conf && sed -i.bak -e ' /^hosts:/ { s/resolve/& [!UNAVAIL=return]/} - ' /etc/nsswitch.conf &>/dev/null + ' /etc/nsswitch.conf >/dev/null 2>&1 || : # Add nss-systemd to passwd and group grep -E -q '^(passwd|group):.* systemd' /etc/nsswitch.conf || sed -i.bak -r -e ' s/^(passwd|group):(.*)/\1: \2 systemd/ - ' /etc/nsswitch.conf &>/dev/null + ' /etc/nsswitch.conf >/dev/null 2>&1 || : fi gets composes working again. Without the above, there's scriptlet errors and anaconda exits. The other more correct fix for the above would be to add 'Requires(pre): grep sed' so those are installed when the libs post runs. I didn't do that because I wasn't sure if that would cause issues for small images. I guess we can close this now. There's some more issues to work though, but composes now work and this version is out in rawhide now. Or we can keep it open to make sure these changes make their way upstream... Michal, Kevin, thanks! I don't understand why "&>/dev/null" does not work. bash(1) describes them as being exactly equivalent. Alas. > There's some more issues to work though Can you provide more details? I submitted the kernel-install patch upstream as https://github.com/systemd/systemd/pull/5068. Well, the important part is the || : (which makes it always return 0/success. If there's no 'Requires(pre): grep sed' then those aren't going to be in the install root when doing an initial install and will error out. Ah, OK. You even wrote that in comment #12. I don't think Requires(pre) is needed, since those scriptlets are run %post. I added Requires(post) now. (In reply to Zbigniew Jędrzejewski-Szmek from comment #13) > I don't understand why "&>/dev/null" does not work. bash(1) describes them > as being exactly equivalent. Alas. Oh, "&>" is fine. Somehow I didn't see that you also added "|| :". Now I feel stupid. So the change to replace "&>" can be reverted. |