Bug 2136916
| Summary: | The inst.sshd boot option no longer works | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Martin Kolman <mkolman> |
| Component: | systemd | Assignee: | Michal Sekletar <msekleta> |
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | rawhide | CC: | anaconda-maint-list, asosedki, awilliam, crypto-team, dbelyavs, dominik, dtardon, dwalsh, fedoraproject, filbranden, flepied, hello, jjelen, jkonecny, jonathan, kellin, lkundrak, lnykryn, mattias.ellert, msekleta, ryncsn, ssahani, s, systemd-maint, tm, vanmeeuwen+fedora, vponcova, vslavik, w, yuwatana, zbyszek |
| Target Milestone: | --- | Keywords: | Bugfix, Reproducer, Triaged |
| Target Release: | --- | Flags: | fedora-admin-xmlrpc:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | systemd-252.3-594.fc38 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-12-08 22:38:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Martin Kolman
2022-10-21 20:38:19 UTC
Created attachment 1919497 [details]
package list for the 20221007 image where inst.sshd works
Created attachment 1919498 [details]
package list for the 20221008 image where inst.sshd no longer works
Created attachment 1919987 [details]
journal dump from the 20221007 image
This is a journal dump from the 20221007 image, with openssh-9.0p1-5. Note that there is *no* "Listening on sshd" line in it.
(In reply to Martin Kolman from comment #3) > Created attachment 1919987 [details] > journal dump from the 20221007 image > > This is a journal dump from the 20221007 image, with openssh-9.0p1-5. Note > that there is *no* "Listening on sshd" line in it. Slight correction - should have been "Listening on sshd.socket". Created attachment 1919988 [details]
journal dump from the 20221008 image
This is a journal dump from the 20221008 image with openssh-9.0p1-6 - note that there *is* now the "Listening on sshd.socket" line, that was not present on the 20221007 image.
Created attachment 1919989 [details]
journal dump from the 20221008 image but with inst.sshd boot option being used
This is journal dump from the 20221008 image (which has openssh-9.0p1-6) but this time the inst.sshd option has been set. Looking at the contents of the log file, we can see that the sshd.socket service is active:
"Oct 24 11:46:06 fedora systemd[1]: Listening on sshd.socket - OpenSSH Server Socket."
Also, as inst.sshd has been passed, the anaconda-sshd.service should be started:
Oct 24 11:46:07 fedora systemd[1]: Starting anaconda-sshd.service - OpenSSH server daemon...
But then:
subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: Started anaconda-sshd.service - OpenSSH server daemon.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com kernel: kauditd_printk_skb: 49 callbacks suppressed
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com kernel: audit: type=1130 audit(1666611969.745:219): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: anaconda-direct.service - the anaconda installation program was skipped because no trigger condition checks were met.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: Reached target anaconda.target - Anaconda System Services.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: Starting anaconda.service - Anaconda...
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com sshd[1842]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: anaconda-sshd.service: Main process exited, code=exited, status=255/EXCEPTION
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com sshd[1842]: error: Bind to port 22 on :: failed: Address already in use.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: anaconda-sshd.service: Failed with result 'exit-code'.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com sshd[1842]: fatal: Cannot bind any address.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: anaconda-sshd.service: Consumed 2.089s CPU time.
Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com kernel: audit: type=1131 audit(1666611969.821:220): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
So it looks like the SSH port (22) is already occupied by the sshd.socket service and "our" sshd service can't be started as a result.
This did not happen on older <=20221007 images with openssh <=openssh-9.0p1-6 - so anaconda-sshd.service was able to bind to port 22 successfully.
(In reply to Martin Kolman from comment #6) > Created attachment 1919989 [details] > journal dump from the 20221008 image but with inst.sshd boot option being > used > > This is journal dump from the 20221008 image (which has openssh-9.0p1-6) but > this time the inst.sshd option has been set. Looking at the contents of the > log file, we can see that the sshd.socket service is active: > > > "Oct 24 11:46:06 fedora systemd[1]: Listening on sshd.socket - OpenSSH > Server Socket." > > > Also, as inst.sshd has been passed, the anaconda-sshd.service should be > started: > > > Oct 24 11:46:07 fedora systemd[1]: Starting anaconda-sshd.service - OpenSSH > server daemon... > > > But then: > > > subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd comm="systemd" > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: Started > anaconda-sshd.service - OpenSSH server daemon. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com kernel: > kauditd_printk_skb: 49 callbacks suppressed > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com kernel: audit: > type=1130 audit(1666611969.745:219): pid=1 uid=0 auid=4294967295 > ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd > comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? > res=success' > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: > anaconda-direct.service - the anaconda installation program was skipped > because no trigger condition checks were met. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: Reached > target anaconda.target - Anaconda System Services. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: Starting > anaconda.service - Anaconda... > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com sshd[1842]: error: > Bind to port 22 on 0.0.0.0 failed: Address already in use. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: > anaconda-sshd.service: Main process exited, code=exited, status=255/EXCEPTION > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com sshd[1842]: error: > Bind to port 22 on :: failed: Address already in use. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: > anaconda-sshd.service: Failed with result 'exit-code'. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com sshd[1842]: fatal: > Cannot bind any address. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com audit[1]: > SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 > subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd comm="systemd" > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com systemd[1]: > anaconda-sshd.service: Consumed 2.089s CPU time. > Oct 24 11:46:09 dhcp113.anaconda.englab.brq.redhat.com kernel: audit: > type=1131 audit(1666611969.821:220): pid=1 uid=0 auid=4294967295 > ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=anaconda-sshd > comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? > res=failed' > > > So it looks like the SSH port (22) is already occupied by the sshd.socket > service and "our" sshd service can't be started as a result. > > This did not happen on older <=20221007 images with openssh > <=openssh-9.0p1-6 - so anaconda-sshd.service was able to bind to port 22 > successfully. One more typo. :P Should have been <=openssh-9.0p1-5 - the last "known good" openssh version in this regard. First - thanks a lot to Vendula Poncova for noticing port 22 is being occupied by sshd.socket - that sent me in the right direction! :) So based on the logs attached to comments 3-7 it seems very likely this regression has been caused by a change in openssh-9.0p1-6. From the changelog: * Wed Oct 05 2022 Anthony Rabbito <hello> - 9.0p1-6 - Add a socket unit to ssh-agent user unit (rhbz#2125576) Looking into distgit, looks like these are the actual changes: https://src.fedoraproject.org/rpms/openssh/c/9417892cb75727f72757dec758d16d115356da68?branch=rawhide https://src.fedoraproject.org/rpms/openssh/c/11b8701db94897460eb06709597e25a1e8ea14f6?branch=rawhide https://src.fedoraproject.org/rpms/openssh/c/499c2eb7ecdc6c790450411a945c8c35b0e74301?branch=rawhide Even though neither directly touches sshd.socket, the ssh-agent.service and ssh-agent.socket changes must have somehow activated sshd.socket: diff --git a/ssh-agent.service b/ssh-agent.service index c215022..50a9ea1 100644 --- a/ssh-agent.service +++ b/ssh-agent.service @@ -5,6 +5,7 @@ ConditionEnvironment=!SSH_AGENT_PID Description=OpenSSH key agent Documentation=man:ssh-agent(1) man:ssh-add(1) man:ssh(1) +Requires=ssh-agent.socket [Service] Environment=SSH_AUTH_SOCK=%t/ssh-agent.socket @@ -12,3 +13,6 @@ ExecStart=/usr/bin/ssh-agent -a $SSH_AUTH_SOCK PassEnvironment=SSH_AGENT_PID SuccessExitStatus=2 Type=forking + +[Install] +Also=ssh-agent.socket diff --git a/ssh-agent.socket b/ssh-agent.socket new file mode 100644 index 0000000..d589cbc --- /dev/null +++ b/ssh-agent.socket @@ -0,0 +1,14 @@ +[Unit] +Description=OpenSSH key agent +Documentation=man:ssh-agent(1) man:ssh-add(1) man:ssh(1) + +[Socket] +ListenStream=%t/ssh-agent.socket +Service=ssh-agent.service +Priority=6 +Backlog=5 +SocketMode=0600 +DirectoryMode=0700 + +[Install] +WantedBy=sockets.target So as this has been clearly caused by a recent change in openssh, switching the component to openssh. Still please let us now if you need more information about the installation environment or want to test a prospective fix - thanks! :) Possible workaround: 0. make sure inst.sshd is passed as boot option 1: stop sshd socket service: systemctl stop sshd.socket 2. start anaconda sshd service: systemctl start anaconda-sshd Any updates on this ? This still blocking CI image refresh for our Web UI tests (making it more and more likely it breaks with the outdated image) as well as making any runtime installation issues hard to debug. A speedy fix or revert of the patch causing this issues would be appreciated! Hi, a concerned passerby here. Since we've ended up with two competitors for port 22: which one do we want to win? https://github.com/rhinstaller/anaconda/blob/master/data/systemd/anaconda-sshd.service or https://src.fedoraproject.org/rpms/openssh/blob/rawhide/f/sshd.socket ? Do I guess correctly that we can't leave sshd.socket on in the installation image since the user must opt into having SSH open to the public? In that case, can installation images just... not ship sshd.socket? needinfo'ing jkonecny at random^W^Was the latest non-comment committer to anaconda-sshd.service in hopes that he can shed more light on why it is like it is. Hi Alexander, honestly we shouldn't be fighting about the socket. Anaconda is not using anything so special we are using sshd anyway. We need to find a way how to work with the new solution in a way that inst.sshd will still work as expected. For that we need an input from the openssh maintainers. IMHO I would expect that our service would change and use their socket or maybe we should really just remove that from the installation environment because we don't want to have it auto-enabled anyway? For the record, bug 2125576 being addressed is what triggered the clash with anaconda-sshd.service. Dear colleagues, Unfortunately I don't understand the whole picture here so before changing this place I'd like to get some shared opinion. It looks to me like this can be fixed by just disabling sshd.socket on the boot.iso -- PR is here https://github.com/weldr/lorax/pull/1283 Hi Brian, thanks for coming with the fix but I would like to first find out if this service should really be enabled by default. The sshd.socket seems to be enable SSH by default on Fedora where I don't think it is intentional. Dmitry could you please confirm my assumption above? And if I'm correct than we need to resolve this issue and not enable the sshd.socket by default (in general I see that as security hole). (In reply to Jiri Konecny from comment #16) > Hi Brian, thanks for coming with the fix but I would like to first find out > if this service should really be enabled by default. That's a separate issue from the title of this bug. Although I now see this bug isn't assigned to anaconda, could have sworn it was when I looked at it earlier. Seems safer to me to disable it in lorax, no matter what. > > The sshd.socket seems to be enable SSH by default on Fedora where I don't > think it is intentional. > > Dmitry could you please confirm my assumption above? And if I'm correct than > we need to resolve this issue and not enable the sshd.socket by default (in > general I see that as security hole). As far as the boot.iso is concerned, yes it would be, if it worked. Luckily there is no config file and connections fail. I did some testing about the default state of the sshd.socket: In the installed Rawhide system: sshd.socket is disabled Installation environment Fedora 36: sshd.socket is disabled Installation environment Fedora Rawhide: sshd.socket is enabled (which takes port before anaconda-sshd.service we are using when inst.sshd) We don't want to have sshd.socket enabled in the installation environment and there is a regression in default enabled state. I also wonder why the sshd.socket is enabled in the installation environment but not in the installed system (maybe Anaconda prevents that or Lorax causing it?). Also more info about inst.sshd. It's just a simple service file: https://github.com/rhinstaller/anaconda/blob/master/data/systemd/anaconda-sshd.service So to make this works all sshd services/sockets have to be disabled. I'll attach logs from my discovery for comparation. Created attachment 1922894 [details]
Rawhide systemctl output
Created attachment 1922895 [details]
Fedora 26 systemctl output
Sorry the last attachment is of course Fedora 36. The description has a typo. (In reply to Brian Lane from comment #17) > (In reply to Jiri Konecny from comment #16) > > Hi Brian, thanks for coming with the fix but I would like to first find out > > if this service should really be enabled by default. > > That's a separate issue from the title of this bug. Although I now see this > bug isn't assigned to anaconda, could have sworn it was when I looked at it > earlier. Seems safer to me to disable it in lorax, no matter what. It is safer but we need to make this work also in Image Builder so Lorax fix would be just half of it. Another reason is that there could be more environments than just us which could face the same issue so it should be fixed appropriately. The sshd.socket should not be enabled by default, but openssh updated can not enable it itself anyway. The enabled services and sockets are in https://src.fedoraproject.org/rpms/fedora-release/blob/rawhide/f/90-default.preset and there is really only the sshd.service (which should be there) so if something enables you the sshd.socket I would probably look into the systemd. The change between -5 and -6 release in openssh could not have introduced anything related to that (unless systemd is terribly broken and by enabling user services it would enable also unrelated system socket. This would also go against all the packaging guidelines). So I think the openssh change is red herring. I'm still having a hard time understanding how this change https://src.fedoraproject.org/rpms/openssh/pull-request/35#request_diff caused this RHBZ. It does not touch sshd.socket and the change is for the opposite effect of sshd. Regardless I mentioned to Jiri on the side it can be worth using a Systemd drop-in/override for sshd.service taking the directives of https://github.com/rhinstaller/anaconda/blob/master/data/systemd/anaconda-sshd.service and just using the sshd unit with anaconda overrides. Regardless I think it's worth finding how how sshd magically starting being enabled by default in the first place. We contacted systemd team to help us with debugging and based on their debugging it seems that this is bug in systemd. This is minimal reproducer on the installed system: dnf -y --installroot=/var/lib/machines/f install passwd dnf fedora-release vim-minimal systemd systemd-networkd openssh-server output: ``` Running scriptlet: openssh-server-9.0p1-8.fc38.x86_64 Created symlink /etc/systemd/system/multi-user.target.wants/sshd.service → /usr/lib/systemd/system/sshd.service. Created symlink /etc/systemd/system/sockets.target.wants/sshd.socket → /usr/lib/systemd/system/sshd.socket. ``` Thanks a lot everyone for your help! Meanwhile as hotfix we would like to merge the lorax PR. That would also make the installation environment more robust in the future. The underlying root cause seems to be absence of mounted /proc in the environment where systemctl preset is called on the unit file. As a result all non-static unit files (i.e. such that have [Install] section) are enabled by preset operation. This is clearly wrong and major bug. Here is a minimal reproducer, $ rpm -q systemd systemd-252.2-591.fc38.x86_64 $ mount --bind / /mnt/root $ chroot /mnt/root $ (chroot) systemctl list-unit-files | grep debug-shell.service debug-shell.service disabled enabled As seen in the above output preset state (3rd column) of debug-shell.service is enabled which is clearly wrong. Preset state is reported correctly after mounting procfs. $ (chroot) mount -t proc proc /proc $ (chroot) systemctl list-unit-files | grep debug-shell.service debug-shell.service disabled disabled FEDORA-2022-7f1889cc8c has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2022-7f1889cc8c FEDORA-2022-7f1889cc8c has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report. *** Bug 2141140 has been marked as a duplicate of this bug. *** |