Bug 2018913 - /etc/resolv.conf is not a symlink after kickstart
Summary: /etc/resolv.conf is not a symlink after kickstart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: distribution
Version: 35
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2055033 2055070 2164378
Blocks: IoT
TreeView+ depends on / blocked
 
Reported: 2021-11-01 09:01 UTC by François Rigault
Modified: 2023-08-30 07:38 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-26 15:01:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log/anaconda (852.24 KB, application/gzip)
2021-11-01 09:01 UTC, François Rigault
no flags Details
/tmp/packaging.log from installer (66.61 KB, text/plain)
2021-11-18 08:21 UTC, Radek Vykydal
no flags Details
/tmp/packaging.log from installer (177.92 KB, text/plain)
2021-11-18 08:23 UTC, Radek Vykydal
no flags Details
/tmp/packaging.log from installer environment (67.64 KB, text/plain)
2022-01-17 09:27 UTC, Radek Vykydal
no flags Details
output from rpm -U --deploops (11.93 KB, text/plain)
2022-02-16 09:51 UTC, Zbigniew Jędrzejewski-Szmek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2032085 1 unspecified CLOSED Some variants are missing /etc/resolv.conf symlink (use systemd-resolved) 2022-12-28 08:47:36 UTC

Description François Rigault 2021-11-01 09:01:08 UTC
Created attachment 1838811 [details]
/var/log/anaconda

Description of problem:

similar to https://bugzilla.redhat.com/show_bug.cgi?id=1933454
after performing an installation with kickstart, /etc/resolv.conf is not a symlink


Version-Release number of selected component (if applicable):

Fedora-Server-netinst-x86_64-35_Beta-1.2.iso


How reproducible:

everytime I tried


Steps to Reproduce:
1. perform a kickstart installation
2. cat /etc/resolv.conf 
3.

Actual results:
cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver 10.224.122.1



Expected results:
/etc/resolv.conf should be a symlink to /run/systemd/resolve/stub-resolv.conf

Additional info:
service seems otherwise well started and fine

[frigo@fedows ~]$ dig @127.0.0.53 www.example.com +short
93.184.216.34
[frigo@fedows ~]$ resolvectl status
Global
         Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
  resolv.conf mode: foreign
Current DNS Server: 10.224.122.1
       DNS Servers: 10.224.122.1

Link 2 (enp1s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.224.122.1
       DNS Servers: 10.224.122.1



Kickstart file:

lang en_US
keyboard us
timezone Europe/Monaco --utc
rootpw XXX --iscrypted
#platform x86, AMD64, or Intel EM64T
url --mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=fedora-35&arch=x86_64
repo --name=updates
user --name=frigo --groups=wheel --iscrypted --password=XXX
sshkey --username=frigo "ssh-rsa XXX"
network --bootproto=dhcp --device=enp1s0 --hostname=${name}
reboot
#text
#cdrom
zerombr
# clearpart --all --initlabel
autopart
selinux --enforcing
firewall --enabled
firstboot --disable
services --enabled sshd
%packages
@^workstation-product-environment
-nano
-nano-default-editor
vim
vim-default-editor
tcpdump
nc
tmux
strace
ethtool
mozilla-ublock-origin
%end
%post
echo "frigo ALL=(ALL)       NOPASSWD: ALL" | /usr/bin/tee /etc/sudoers.d/frigo
%end

Comment 1 François Rigault 2021-11-02 12:42:46 UTC
Actually I also run a Fedora-Cloud-Base-35_Beta-1.2.x86_64.raw.xz that has the same behavior, /etc/resolv.conf is a file managed by NetworkManager instead of a symlink.
Am I wrong thinking /etc/resolv.conf should be a symlink there too?

Comment 2 Radek Vykydal 2021-11-12 08:45:45 UTC
In Fedora 34 systemd used to replace NetworkManager's /etc/resolv.conf via post install script of systemd package.

In Fedora 35 systemd-resolved service was split into a separate systemd-resolved package. The package/service is missing in F35 installer environment and I think it is the reason why the replacement in post install script of systemd-resolved is not triggered (systemctl -q is-enabled systemd-resolved.service &>/dev/null fails).

I'd like to ask systemd (and perhaps NM) developers how we should proceed here.

1. One possible fix could be adding the systemd-resolved package to installer environment and enabling the service (not tested yet) but isn't there some more future-proof way to make sure the symlink is created during installation ?

2. On a related note, we are still using NetworkManager generated resolv.conf in installer environment. Should we migrate to systemd-resolved ?
One installer specific thing here is that we are copying /etc/resolv.conf from installer environment to installed system root before installing packages (the reason according to the comment in the source code: "# make name resolution work for rpm scripts in chroot")

Comment 3 Radek Vykydal 2021-11-12 15:20:35 UTC
(In reply to Radek Vykydal from comment #2)
> In Fedora 34 systemd used to replace NetworkManager's /etc/resolv.conf via
> post install script of systemd package.
> 
> In Fedora 35 systemd-resolved service was split into a separate
> systemd-resolved package. The package/service is missing in F35 installer
> environment and I think it is the reason why the replacement in post install
> script of systemd-resolved is not triggered (systemctl -q is-enabled
> systemd-resolved.service &>/dev/null fails).

Interestingly, when installing Fedora 34 with Fedora 35 installer, the symlink is created, although it is dangling (perhaps because systemd-resolved is not present on F35 installer image) causing a crash - see https://bugzilla.redhat.com/show_bug.cgi?id=2019579#c18

Comment 4 Zbigniew Jędrzejewski-Szmek 2021-11-16 07:46:50 UTC
(In reply to Radek Vykydal from comment #2)
> In Fedora 35 systemd-resolved service was split into a separate
> systemd-resolved package. The package/service is missing in F35 installer
> environment and I think it is the reason why the replacement in post install
> script of systemd-resolved is not triggered (systemctl -q is-enabled
> systemd-resolved.service &>/dev/null fails).

Arrrgh, package split are hards. This is a great explanation. I think this is all
true, with a minor correction: since systemd-resolved.rpm is not installed at all,
the scriptlet does not run at all (it's been moved to the new subpackage too).

> I'd like to ask systemd (and perhaps NM) developers how we should proceed
> here.
> 
> 1. One possible fix could be adding the systemd-resolved package to
> installer environment and enabling the service (not tested yet) but isn't
> there some more future-proof way to make sure the symlink is created during
> installation ?

Another option would be to not create the symlink at all (or remove it at some
point). It should then be created by tmpfiles in early boot.

I'm not sure what the best solution here is. From the systemd side, I very much
hope we can keep the split into various subpackages. In fact, there are near-future
plans to split out more subpackages for the purposes of minimization.

> 2. On a related note, we are still using NetworkManager generated
> resolv.conf in installer environment. Should we migrate to systemd-resolved ?
> One installer specific thing here is that we are copying /etc/resolv.conf
> from installer environment to installed system root before installing
> packages (the reason according to the comment in the source code: "# make
> name resolution work for rpm scripts in chroot")

I think this would be the way to go. It'd be better to have the same environment
here as in a default Fedora installation.

I don't know how the chroot is set up in this case, but nss-resolve uses
/run/dbus/system_bus_socket, so that'd have to be provided in the chroot
(or systemd-resolved should be running in the chroot).

Comment 5 Radek Vykydal 2021-11-18 08:11:26 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #4)
> (In reply to Radek Vykydal from comment #2)
> > In Fedora 35 systemd-resolved service was split into a separate
> > systemd-resolved package. The package/service is missing in F35 installer
> > environment and I think it is the reason why the replacement in post install
> > script of systemd-resolved is not triggered (systemctl -q is-enabled
> > systemd-resolved.service &>/dev/null fails).
> 
> Arrrgh, package split are hards. This is a great explanation. I think this
> is all
> true, with a minor correction: since systemd-resolved.rpm is not installed
> at all,
> the scriptlet does not run at all (it's been moved to the new subpackage
> too).

Actually, systemd-resolved.rpm is installed on the target system, but for some reason the scriptlet is not run in the installer chroot.

When installing Fedora 34 tree (before the split, where the scriptlet is in systemd package) with Fedora 35 installer image (does not contain systemd-resolved) the symlink is created (but it is dangling I think because systemd-resolved service is not present in F35 installer environment, which results in https://bugzilla.redhat.com/show_bug.cgi?id=2019579#c18)

I tried also installing F35 with F34 installer and it that case the symlink is not created. So it seems like combination of issues both in installer environment (missing systemd-resolved?) and with rpm postinstall scripts (haven't found yet why the systemd-resolved postinstall script is not run).

I'll attach packaging logs for the beforementioned cases, but I need to investigate more here. It seems to me that there is a problem in running the scriptlet of split out package because:
- F34 installer installing F35 does not create symlink
- when I add and enable systemd-resolved service to F35 installer (although manually via updates.img so I am not 100% sure there is no issue here, I'll check it with properly created installer image with systemd-resolved) installing F35 does not create symlink either

> 
> > I'd like to ask systemd (and perhaps NM) developers how we should proceed
> > here.
> > 
> > 1. One possible fix could be adding the systemd-resolved package to
> > installer environment and enabling the service (not tested yet) but isn't
> > there some more future-proof way to make sure the symlink is created during
> > installation ?
> 
> Another option would be to not create the symlink at all (or remove it at
> some
> point). It should then be created by tmpfiles in early boot.

IIRIC there were issues getting along with NetworkManager using tmpfiles ?
But for Anaconda it would be easier solution I think.

> 
> I'm not sure what the best solution here is. From the systemd side, I very
> much
> hope we can keep the split into various subpackages. In fact, there are
> near-future
> plans to split out more subpackages for the purposes of minimization.

I think we should keep the split, we should be able to adapt Anaconda accordingly. We will probably add systemd-resolved to installer dependencies, we'll need it anyway when moving from NM to systemd-resolved in installer environment.

Comment 6 Radek Vykydal 2021-11-18 08:19:12 UTC
Created attachment 1842522 [details]
/tmp/packaging.log from installer

> Actually, systemd-resolved.rpm is installed on the target system, but for
> some reason the scriptlet is not run in the installer chroot.

Packaging log for F35 installation.

Comment 7 Radek Vykydal 2021-11-18 08:21:11 UTC
Created attachment 1842523 [details]
/tmp/packaging.log from installer

> When installing Fedora 34 tree (before the split, where the scriptlet is in
> systemd package) with Fedora 35 installer image (does not contain
> systemd-resolved) the symlink is created (but it is dangling I think because
> systemd-resolved service is not present in F35 installer environment, which
> results in https://bugzilla.redhat.com/show_bug.cgi?id=2019579#c18)

Packaging log for the case.

Comment 8 Radek Vykydal 2021-11-18 08:23:13 UTC
Created attachment 1842524 [details]
/tmp/packaging.log from installer

> I tried also installing F35 with F34 installer and it that case the symlink
> is not created. ...

Packaging log from the case.

Comment 9 Radek Vykydal 2021-11-18 13:50:43 UTC
PR with kickstart test: https://github.com/rhinstaller/kickstart-tests/pull/636

Comment 10 Zbigniew Jędrzejewski-Szmek 2021-12-23 09:48:16 UTC
The changes done for https://bugzilla.redhat.com/show_bug.cgi?id=2032085 should partially fix the issue: the symlink should be created even if systemd-resolved is not installed in the installer, but is installed in the target system.

So the only thing that remains would be to add systemd-resolved.rpm to the installer environment.
(This corresponds to item 3. in https://bugzilla.redhat.com/show_bug.cgi?id=2032085#c5).

Comment 11 Radek Vykydal 2022-01-14 16:26:08 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #10)
> The changes done for https://bugzilla.redhat.com/show_bug.cgi?id=2032085
> should partially fix the issue: the symlink should be created even if
> systemd-resolved is not installed in the installer, but is installed in the
> target system.

I does not seem to work in installer environment. I was trying to debug why and (regardless of systemd-resolved service being enabled and running in installer environment which I tried as well) the symlink is not created, failing on https://src.fedoraproject.org/rpms/systemd/blob/rawhide/f/systemd.spec#_962 with
systemctl: command not found (the message is redirected to /dev/null). Anaconda installs the packages in single transaction in chroot (/mnt/sysimage). When I try to run the check in chroot /mnt/sysimage from %pre-install script (run right before package installation), the check succeeds. 

> 
> So the only thing that remains would be to add systemd-resolved.rpm to the
> installer environment.
> (This corresponds to item 3. in
> https://bugzilla.redhat.com/show_bug.cgi?id=2032085#c5).

Yes, I'd like to do that and migrate to systemd-resolved in installer.

And perhaps address also item 1. from https://bugzilla.redhat.com/show_bug.cgi?id=2032085#c5. IIRC the reason for doing that (copying /etc/resolv.conf to chroot)

But irrespective of those two items I think we need to make the systemd-resolved post script creating the symlink working.

Comment 12 Zbigniew Jędrzejewski-Szmek 2022-01-14 17:40:35 UTC
Do you have a log from the installer handy?
That is strange, because systemd-resolved already has Requires(post): %{name},
so systemctl should be available.

Comment 13 Christoph Karl 2022-01-16 15:09:22 UTC
Maybe related to https://bugzilla.redhat.com/show_bug.cgi?id=2032085

Comment 14 Radek Vykydal 2022-01-17 09:27:55 UTC
Created attachment 1851304 [details]
/tmp/packaging.log from installer environment

I've built systemd package scratch build (on rawhide branch) with some naive debugging info added:

https://rvykydal.fedorapeople.org/systemd-resolved/dbg.patch
https://koji.fedoraproject.org/koji/taskinfo?taskID=81346952

And added repo with the package in kickstart
https://rvykydal.fedorapeople.org/systemd-resolved/ks.x.cfg
(repo --name systemd-resolved --baseurl http://10.43.136.2/users/rv/resolved/)

which I used for installation with rawhide installer with systemd-resolved package added (so the /etc/resolv.conf was managed by systemd-resolved, anaconda copied is as a file into installer system chroot /mnt/sysimage where the package installation happens .. point 1. from the https://bugzilla.redhat.com/show_bug.cgi?id=2032085#c5 , but the guard that fails would prevent creating of the symlink even if there were not andy /etc/resolv.conf file in the chroot)

I am attaching packaging.log with the "DDDDD" messages.
there is also https://rvykydal.fedorapeople.org/systemd-resolved/dnf.librepo.log bu t that is not interesting I think.

If you have any hints how to get better / more debugging info from the installation (rpm sriptlets) I can reproduce with those.

Comment 15 Zbigniew Jędrzejewski-Szmek 2022-01-17 11:50:55 UTC
The attachment is private too.

Comment 16 Zbigniew Jędrzejewski-Szmek 2022-01-17 13:08:19 UTC
09:03:46,058 DBG dnf: Installed: systemd-250.2-8.fc36.x86_64                         <------------------
09:03:46,059 DBG dnf: Installed: systemd-libs-250.2-8.fc36.x86_64
09:03:46,059 DBG dnf: Installed: systemd-networkd-250.2-8.fc36.x86_64
09:03:46,059 DBG dnf: Installed: systemd-oomd-defaults-250.2-8.fc36.noarch
09:03:46,059 DBG dnf: Installed: systemd-pam-250.2-8.fc36.x86_64
09:03:46,059 DBG dnf: Installed: systemd-resolved-250.2-8.fc36.x86_64                <------------------
09:03:46,059 DBG dnf: Installed: systemd-udev-250.2-8.fc36.x86_64
09:03:46,060 DBG dnf: Installed: tpm2-tss-3.1.0-4.fc36.x86_64
09:03:46,060 DBG dnf: Installed: trousers-0.3.15-5.fc36.x86_64
09:03:46,060 DBG dnf: Installed: trousers-lib-0.3.15-5.fc36.x86_64
09:03:46,060 DBG dnf: Installed: tzdata-2021e-1.fc36.noarch
09:03:46,060 DBG dnf: Installed: unbound-libs-1.13.2-4.fc36.x86_64
09:03:46,060 DBG dnf: Installed: util-linux-2.37.2-1.fc36.x86_64
09:03:46,061 DBG dnf: Installed: util-linux-core-2.37.2-1.fc36.x86_64
09:03:46,061 DBG dnf: Installed: vim-data-2:8.2.4068-1.fc36.noarch
09:03:46,061 DBG dnf: Installed: vim-minimal-2:8.2.4068-1.fc36.x86_64
09:03:46,061 DBG dnf: Installed: which-2.21-31.fc36.x86_64
09:03:46,061 DBG dnf: Installed: whois-nls-5.5.11-1.fc36.noarch
09:03:46,061 DBG dnf: Installed: xkeyboard-config-2.34-1.fc36.noarch
09:03:46,062 DBG dnf: Installed: xz-5.2.5-7.fc35.x86_64
09:03:46,062 DBG dnf: Installed: xz-libs-5.2.5-7.fc35.x86_64                         <------------------
09:03:46,062 DBG dnf: Installed: yum-4.10.0-1.fc36.noarch
09:03:46,062 DBG dnf: Installed: zchunk-libs-1.1.15-3.fc36.x86_64
09:03:46,062 DBG dnf: Installed: zlib-1.2.11-30.fc35.x86_64                          <------------------
09:03:46,062 DBG dnf: Installed: zram-generator-1.1.1-2.fc36.x86_64
09:03:46,063 DBG dnf: Installed: zram-generator-defaults-1.1.1-2.fc36.noarch

So this cannot work: we install systemd first, but systemtl is linked to (among others) libs
provided by xz-libs and zlib.

I think this is a bug in rpm, unfortunately. systemd-resolved.rpm has Requires(post):systemd, which means
that it is saying "I expect to be able to call binaries provided by systemd.rpm" from the %post scriptlet.
systemd-resolved.rpm doesn't need to know or care what other dependencies systemd.rpm has. This means that
rpm must install all Requires for systemd.rpm *before* systemd-resolved.rpm is installed. And systemd.rpm
has Requires:libz.so.1()(64bit) and Requires:liblzma.so.5()(64bit), which are satisfied by zlib and
xz-libs. Without those packages being installed first, the Requires(post) dependency would be meaningless.

(One reason for rpm to *not* obey those deps would be a dependency loop. But both xz-libs
and zlib are good, and depend only on glibc. And glibc was installed early in the transaction, so
zlib and xz-libs could have been installed earlier too.)

Unfortunately this is likely to affect any package that call systemctl in scriptlets and
has Requires(post):systemd for that reason.

As a work-around, I could add some Requires(post) deps to systemd to force the installation order,
if we can't fix this in rpm quickly.

Comment 17 Panu Matilainen 2022-01-17 13:35:44 UTC
These things are in practise always a dependency loop which needs to be analyzed and fixed, blindly piling more Requires(post) will only make it worse in the long run. There are ample opportunities for loops now that weak dependencies are included in the order calculation.

I'd suggest that anaconda always run the transaction with rpm.RPMTRANS_FLAG_DEPLOOPS to make the loops visible in the logs for easier debugging.

Comment 18 Radek Vykydal 2022-02-03 10:41:38 UTC
For the record:

(In reply to Radek Vykydal from comment #11)
> (In reply to Zbigniew Jędrzejewski-Szmek from comment #10)
 
> > So the only thing that remains would be to add systemd-resolved.rpm to the
> > installer environment.
> > (This corresponds to item 3. in
> > https://bugzilla.redhat.com/show_bug.cgi?id=2032085#c5).
> 
> Yes, I'd like to do that and migrate to systemd-resolved in installer.

https://github.com/rhinstaller/anaconda/pull/3783

> And perhaps address also item 1. from
> https://bugzilla.redhat.com/show_bug.cgi?id=2032085#c5. IIRC the reason for
> doing that (copying /etc/resolv.conf to chroot)

https://github.com/rhinstaller/anaconda/pull/3814

Comment 19 Zbigniew Jędrzejewski-Szmek 2022-02-10 16:12:15 UTC
Sorry for dropping the ball here…

(In reply to Panu Matilainen from comment #17)
> These things are in practise always a dependency loop which needs to be
> analyzed and fixed, blindly piling more Requires(post) will only make it
> worse in the long run. There are ample opportunities for loops now that weak
> dependencies are included in the order calculation.

Can you confirm the intended rpm behaviour here:
C has Requires(post):B
B has Requires:A
Will rpm make sure to order A and B before it C (or at least before C's %post)?

See my comment above: I checked manually, and there seem to be no other dependencies
involved: "both xz-libs and zlib are good, and depend only on glibc. And glibc was
installed early in the transaction, so zlib and xz-libs could have been installed earlier too."

Comment 20 Zbigniew Jędrzejewski-Szmek 2022-02-15 19:01:01 UTC
Panu, please provide some guidance here. Without knowning what the *intended* behaviour in rpm is,
it's hard to say if rpm is buggy or the packages are buggy or something else.

Comment 21 Panu Matilainen 2022-02-16 07:03:23 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #19)
> Can you confirm the intended rpm behaviour here:
> C has Requires(post):B
> B has Requires:A
> Will rpm make sure to order A and B before it C (or at least before C's
> %post)?

Yes, rpm will order those C, A, B. IFF there are no unresolvable dependency loops in the picture. They're not always obvious.
 
> See my comment above: I checked manually, and there seem to be no other
> dependencies
> involved: "both xz-libs and zlib are good, and depend only on glibc. And
> glibc was
> installed early in the transaction, so zlib and xz-libs could have been
> installed earlier too."

Note that rpm nowadays uses weak dependencies for order calculation too, which is unfortunately a common source of loops. 'rpm -U --deploops *.rpm' gives you a loop analysis.

Comment 22 Zbigniew Jędrzejewski-Szmek 2022-02-16 07:46:57 UTC
> Note that rpm nowadays uses weak dependencies for order calculation too, which is unfortunately a common source of loops.

That sounds reasonable. If you decide to install a package, no matter if the dependency was
weak or strong, once it's decided to be satisfied, it should impact ordering in the same way.

--

But I'm an idiot — the log that Radek attached is just an alphabetical (*) listing of packages
installed. We want the part with "Installing …", not the summary at the end with "Installed …".

Radek, any chance you could attach that?

(*) dnf sorts capital letter all early, which doesn't make much sense in this context. That's
how we end up with NetworkManager and NetworkManager-libs at the top.

Comment 23 Panu Matilainen 2022-02-16 08:07:03 UTC
> If you decide to install a package, no matter if the dependency was
weak or strong, once it's decided to be satisfied, it should impact ordering in the same way.

Yup. Only, in presence of loops, strong ones should be favored over weak ones for loop-cutting and that doesn't currently happen, which is why loops are suddenly much more of an issue than they were before.

Comment 24 Zbigniew Jędrzejewski-Szmek 2022-02-16 09:42:29 UTC
Hmm, I filed a few bugs and pull requests to clean things up:

https://bugzilla.redhat.com/show_bug.cgi?id=2055011: dnf: please use case-insensitive listings when sorting packages alphabetically
https://src.fedoraproject.org/rpms/chrony/pull-request/7: chrony: Drop obsolete workaround in scriptlet
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1617: kernel: scriptlets: clean up declared installation dependencies, stop calling kernel-install in chroot or container
https://bugzilla.redhat.com/show_bug.cgi?id=2055033: rpm migration scriptlet should be converted to lua?
https://bugzilla.redhat.com/show_bug.cgi?id=2055070: util-linux-core scriptlet should be converted to lua?
There is also the issue that systemd-journal user is created too late, I'll look into it.

but none of those would fix the issue at hand.
The problem is rpm shedules systemd-resolved before systemd.

This is easy to reproduce e.g. with:
$ sudo dnf install --releasever=rawhide --repo=fedora --installroot=/var/tmp/f36-test4 --nogpgcheck --setopt install_weak_deps=True rpm systemd systemd-resolved
...
  Installing       : os-prober-1.77-9.fc36.x86_64                                                                                                                 162/166 
  Running scriptlet: grub2-tools-1:2.06-14.fc36.x86_64                                                                                                            163/166 
  Installing       : grub2-tools-1:2.06-14.fc36.x86_64                                                                                                            163/166 
  Installing       : systemd-resolved-250.3-3.fc36.x86_64                                                                                                         164/166 
  Running scriptlet: systemd-resolved-250.3-3.fc36.x86_64                                                                                                         164/166 
  Installing       : systemd-250.3-3.fc36.x86_64                                                                                                                  165/166 
warning: group systemd-journal does not exist - using root

  Running scriptlet: systemd-250.3-3.fc36.x86_64                                                                                                                  165/166 
Creating group 'input' with GID 104.
Creating group 'kvm' with GID 36.
Creating group 'render' with GID 105.
Creating group 'sgx' with GID 106.
Creating group 'systemd-journal' with GID 190.
Creating group 'systemd-network' with GID 192.
Creating user 'systemd-network' (systemd Network Management) with UID 192 and GID 192.
Creating group 'systemd-oom' with GID 999.
Creating user 'systemd-oom' (systemd Userspace OOM Killer) with UID 999 and GID 999.
Creating group 'systemd-resolve' with GID 193.
Creating user 'systemd-resolve' (systemd Resolver) with UID 193 and GID 193.

  Installing       : systemd-udev-250.3-3.fc36.x86_64                                                                                                             166/166 
  Running scriptlet: systemd-udev-250.3-3.fc36.x86_64                                                                                                             166/166 
Created symlink /etc/systemd/system/sysinit.target.wants/systemd-boot-update.service → /usr/lib/systemd/system/systemd-boot-update.service.

  Running scriptlet: filesystem-3.16-2.fc36.x86_64                                                                                                                166/166 
  Running scriptlet: crypto-policies-scripts-20220203-2.git112f859.fc36.noarch                                                                                    166/166 
  Running scriptlet: rpm-4.17.0-8.fc36.x86_64                                                                                                                     166/166 
  Running scriptlet: authselect-libs-1.3.0-10.fc37.x86_64                                                                                                         166/166 
  Running scriptlet: ca-certificates-2021.2.52-3.fc36.noarch                                                                                                      166/166 
  Running scriptlet: grub2-common-1:2.06-14.fc36.noarch                                                                                                           166/166 
  Running scriptlet: systemd-udev-250.3-3.fc36.x86_64                                                                                                             166/166 
  Verifying        : acl-2.3.1-3.fc36.x86_64                                                                                                                        1/166 

The question is why is systemd-resolved installed before systemd.
systemd has Recommends:systemd-resolved,
and systemd-resolved has Requires(post):systemd.
rpm is installing those packages adjacent to each other, so it certainly could order systemd-resolved *after* systemd.

If the Requires(post) dependency is ignored, there isn't much that systemd-resolved scriptlet can do.
It needs systemctl to be present to do the checks it needs.

Comment 25 Zbigniew Jędrzejewski-Szmek 2022-02-16 09:51:21 UTC
Created attachment 1861435 [details]
output from rpm -U --deploops

Comment 26 Zbigniew Jędrzejewski-Szmek 2022-02-16 09:53:37 UTC
BTW, isn't "Strongly Connected Component". There is a loop, but "strongly connected" means
that there's an edge between all the nodes in the graph, which is very unlikely to be true.

Comment 27 Panu Matilainen 2022-02-16 11:07:48 UTC
I'll let you argue strongly connected semantics with ffesti whose baby the ordering code is :)

Other than that, a 63 member loop is not going to be installed correctly no matter how correct some individual dependencies within that set may be. That needs to be analyzed and cut down somewhere. Quite often these mega loops have centered around systemd getting pulled into a situation where it shouldn't through an innocent looking change somewhere else, but of course could be something else too.

One additional, newish trick (rpm >= 4.16) that can be used to cut down on loops is Requires(meta) for things that don't need install-time ordering, such as the common sub-package "Requires: foo = %{name}-%{version}-%{release}" deps.

Comment 28 Zbigniew Jędrzejewski-Szmek 2022-02-16 15:34:52 UTC
Arrrgh, there is no Recommends(meta). Do I get this right that Recommends is treated the same
as Requires for the purposes of install-time ordering, and you can opt-out of ordering with
Requires(meta), but you can't opt out of ordering for Recommends?

Comment 29 Zbigniew Jędrzejewski-Szmek 2022-02-16 17:27:39 UTC
https://src.fedoraproject.org/rpms/libpwquality/pull-request/3
https://src.fedoraproject.org/rpms/libfido2/pull-request/1

I also have a fix for the systemd issue and some more dependency trimming, but I need to give it
more testing before pushing.

Comment 30 Panu Matilainen 2022-02-17 07:39:12 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #28)
> Arrrgh, there is no Recommends(meta). Do I get this right that Recommends is
> treated the same
> as Requires for the purposes of install-time ordering, and you can opt-out
> of ordering with
> Requires(meta), but you can't opt out of ordering for Recommends?

Argh indeed. See https://github.com/rpm-software-management/rpm/pull/1022, this is sorted out upstream but not in any release yet. We need to get some maintenance releases out soon...

Comment 31 Panu Matilainen 2022-02-17 07:42:24 UTC
Oh BTW, forgot to add to yesterday's meta-comment: there are multiple candidates within rpm too, just been intentionally slow to adopt to avoid bootstrapping issues from older rpm versions. But all of Fedora is now meta-capable so that's not really an issue anymore.

Comment 32 Zbigniew Jędrzejewski-Szmek 2022-02-17 11:26:59 UTC
Panu, thanks!

--

(In reply to Zbigniew Jędrzejewski-Szmek from comment #29)
> I also have a fix for the systemd issue and some more dependency trimming

systemd-250.3-4.fc37 and systemd-250.3-4.fc36 dropped various requirements for
sed,acl,coreutils,shadow-utils,grep in systemd-libs and systemd (it's a bit messy, see the dist-git commits for details)
and fixed the bug with systemd-journal group being unknown.

--

OK, so we have a bunch of pull requests in flight, and a bunch of RFE bugs filed.
We should test if the issue is resolved in a few days.
It seems that rpm is innocent here and I was wrong in casting aspersion on it.
Once we have Recommends(meta), we can also start using it in various packages.

Summary:
https://github.com/rhinstaller/kickstart-tests/pull/636: kickstart test (MERGED)
https://github.com/rhinstaller/anaconda/pull/3783: Use systemd-resolved in installer environment (MERGED, released anaconda-36.16-1)
https://github.com/rhinstaller/anaconda/pull/3814: Do not copy /etc/resolv.conf to chroot before installation (MERGED, unreleased)
https://bugzilla.redhat.com/show_bug.cgi?id=2055011: dnf: please use case-insensitive listings when sorting packages alphabetically
https://src.fedoraproject.org/rpms/chrony/pull-request/7: chrony: Drop obsolete workaround in scriptlet (BUILT chrony-4.2-5.fc36)
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1617: kernel: scriptlets: clean up declared installation dependencies, stop calling kernel-install in chroot or container (under review)
https://bugzilla.redhat.com/show_bug.cgi?id=2055033: rpm migration scriptlet should be converted to lua? (POST)
https://bugzilla.redhat.com/show_bug.cgi?id=2055070: util-linux-core scriptlet should be converted to lua?
https://src.fedoraproject.org/rpms/libpwquality/pull-request/3: Remove dependency on pam and build-time warning about unversioned python
https://src.fedoraproject.org/rpms/libfido2/pull-request/1: Drop dependency on systemd-udev (BUILT libfido2-1.10.0-3.fc36)
https://bugzilla.redhat.com/show_bug.cgi?id=2055572: pam: please split out pam-libs subpackage
https://src.fedoraproject.org/rpms/grub2/pull-request/16: grub2: Drop one use of which and requirements on it

Comment 33 Peter Robinson 2022-02-17 18:29:30 UTC
> > I also have a fix for the systemd issue and some more dependency trimming
> 
> systemd-250.3-4.fc37 and systemd-250.3-4.fc36 dropped various requirements
> for
> sed,acl,coreutils,shadow-utils,grep in systemd-libs and systemd (it's a bit
> messy, see the dist-git commits for details)
> and fixed the bug with systemd-journal group being unknown.

Can we get a f35 fix too please?

Comment 34 Zbigniew Jędrzejewski-Szmek 2022-02-22 12:24:06 UTC
After looking into this, I'm still confused.

> Other than that, a 63 member loop is not going to be installed correctly no matter how correct some individual dependencies within that set may be.

This would mean that if "SCC" is reported, there is always a problem.
But let's take a very simple example: a minimal installation with glibc:

$ sudo dnf install --releasever=rawhide --repo=fedora --installroot=/var/tmp/f36-test8 --nogpgcheck --setopt install_weak_deps=False glibc
$ dnf --releasever=rawhide --repo=fedora download $(rpm --root=/var/tmp/f36-test8 -qa)
$ sudo rpm --root=/var/tmp/f36-test8a -U --deploops rpms8/*.rpm
warning: 2 Strongly Connected Components
warning: SCC #1: 5 members (5 external dependencies)
warning: 	glibc-2.35-2.fc37.x86_64
warning: 		-> glibc-minimal-langpack-2.35-2.fc37.x86_64
warning: 		-> glibc-common-2.35-2.fc37.x86_64
warning: 	ncurses-libs-6.2-9.20210508.fc36.x86_64
warning: 		-> glibc-2.35-2.fc37.x86_64
warning: 	bash-5.1.16-2.fc36.x86_64
warning: 		-> ncurses-libs-6.2-9.20210508.fc36.x86_64
warning: 		-> glibc-2.35-2.fc37.x86_64
warning: 	glibc-common-2.35-2.fc37.x86_64
warning: 		-> glibc-2.35-2.fc37.x86_64
warning: 		-> bash-5.1.16-2.fc36.x86_64
warning: 	glibc-minimal-langpack-2.35-2.fc37.x86_64
warning: 		-> glibc-common-2.35-2.fc37.x86_64
warning: 		-> glibc-2.35-2.fc37.x86_64
warning: SCC #2: 4 members (3 external dependencies)
warning: 	fedora-repos-37-0.1.noarch
warning: 		-> fedora-release-37-0.2.noarch
warning: 		-> fedora-repos-rawhide-37-0.1.noarch
warning: 	fedora-repos-rawhide-37-0.1.noarch
warning: 		-> fedora-repos-37-0.1.noarch
warning: 	fedora-release-common-37-0.2.noarch
warning: 		-> fedora-repos-37-0.1.noarch
warning: 		-> fedora-release-37-0.2.noarch
warning: 	fedora-release-37-0.2.noarch
warning: 		-> fedora-release-common-37-0.2.noarch

If we look into SCC #1, bash obviously requires glibc because it links to it. The loop is created by
glibc → glibc-common → bash, because glibc-common has %transfiletriggerin and %transfiletriggerpostun
using bash to call /sbin/ldconfig.

The scriptlets that use bash are only called *after* the transaction, so in fact
there doesn't seem to be any ordering loop: it should be totally OK to first
install all the packages and then call the scriptlets.

My questions:
1. does --deploops report loops that cause rpm to have problems with figuring out
   a corrent transaction order, or just any loops that are found?

2. is rpm smart enough to figure out that posttrans scriplets don't cause problems
   with ordering?

   Or in other words: in this example packages are installed correctly, because there
   is no actual loop and any installation order is fine. But could this "loop" here
   cause rpm to use an installation order that doesn't satisfy all declared requirements
   in some other cases? E.g. if bash had R(post):foo, and foo had R:glibc, and then
   rpm would use a different installation order than glibc, foo, bash because it
   thinks that glibc has an ordering requirement on bash?

3. (for the previous example, not seen here)
   What does -> and => mean in the output?

Comment 35 Zbigniew Jędrzejewski-Szmek 2022-02-22 16:00:10 UTC
Anyway, https://src.fedoraproject.org/rpms/glibc/pull-request/54 to convert glibc-common %transfiletriggers to <lua>.

Comment 36 Zbigniew Jędrzejewski-Szmek 2022-02-23 23:36:36 UTC
I found another fun one: https://src.fedoraproject.org/rpms/setup/pull-request/8: Fix %post scriptlet to not require the shell

I made a build of systemd with two changes: a workaround for the reported lack of 'utmp' group by rpm
(see comment #24 above), and the %post scriptlet for resolved moved to %posttrans. So things
should now work fine if systemd-resolved is installed before systemd.

The build for rawhide failed on #2057735:
/usr/include/libaudit.h:42:10: fatal error: audit.h: No such file or directory

In my local testing, the symlink is now created as expected.

Comment 37 Fedora Update System 2022-02-24 06:40:25 UTC
FEDORA-2022-0bbb402870 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-0bbb402870

Comment 38 Fedora Update System 2022-02-24 16:13:55 UTC
FEDORA-2022-0bbb402870 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-0bbb402870`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-0bbb402870

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 39 Radek Vykydal 2022-02-25 09:53:56 UTC
Seems to be fixed according to our kickstart tests. Thank you Zbigniew for all the fixes!

Comment 40 Fedora Update System 2022-02-25 20:48:49 UTC
FEDORA-2022-0bbb402870 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-0bbb402870`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-0bbb402870

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 41 Geoffrey Marr 2022-03-01 17:38:56 UTC
@zbyszek.pl Zbigniew, could you build this fix for F35? I think it will fix the problem with Fedora IoT F35 not composing.

Comment 42 Fedora Update System 2022-03-26 15:01:37 UTC
FEDORA-2022-0bbb402870 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 43 Zbigniew Jędrzejewski-Szmek 2023-08-30 07:38:56 UTC
F35 is EOL.


Note You need to log in before you can comment on or make changes to this bug.