Bug 2172912
| Summary: | Broken /dev/log socket created during boot in recovery, causing grub2-mkconfig to hang forever | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Renaud Métrich <rmetrich> |
| Component: | rear | Assignee: | Pavel Cahyna <pcahyna> |
| Status: | ASSIGNED --- | QA Contact: | Jakub Haruda <jharuda> |
| Severity: | high | Docs Contact: | Šárka Jana <sjanderk> |
| Priority: | high | ||
| Version: | 9.1 | CC: | jharuda, pcahyna |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | --- | Flags: | pcahyna:
needinfo?
(rmetrich) |
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
(In reply to Renaud Métrich from comment #0) > Description of problem: > > With RHEL9, the /dev/log inode is supposed to be a symlink to > /run/systemd/journal/dev-log. Thank you for the analysis. Is it a new problem in RHEL 9, or has it existed in RHEL 8 as well? I see a similar situation in RHEL 8: # ls -l /dev/log lrwxrwxrwx. 1 root root 28 Feb 22 04:16 /dev/log -> /run/systemd/journal/dev-log I don't know if this affects RHEL8. For sure the good inode is: # ls -l /dev/log lrwxrwxrwx. 1 root root 28 Feb 22 04:16 /dev/log -> /run/systemd/journal/dev-log I am curious though how does having correct systemd unit outside the chroot help the program running in the chroot? Is it because /run is shared so that connecting to /run/systemd/journal/dev-log in the chroot actually connects to the daemon that runs outside? It's because /dev/log outside the chroot is broken, causing /dev/log inside the chroot to be broken as well since it's a bind mount Hi Renaud, thank you for the analysis again, I have looked into the details of systemd units startup in the rescue system. IMO, your proposed workaround (to copy all the systemd logging-related units) is not very well suitable for inclusion in upstream, as ReaR needs to support many distros and these details will vary among them. At least, it would require lots of difficult testing in all the supported distros. Therefore, I propose a less invasive solution. I found that there are multiple problems with the current systemd units: nothing wants basic.target and therefore the services/sockets that it contains get never started (this affect the /dev/log socket and the rsyslogd service that is listening on it). Moreover, if I fix this, the socket starts very early and for some reason this does not work. If I order it after basic system initialization, everything starts working. The socket gets started, when one attempts to log to it rsyslogd is spawned and sends the messages to /var/log/messages. (/dev/log is not a symlink to /run/systemd/journal/dev-log, but I don't think it is a big problem). By the way, I can reproduce the problem as well using a simple for loop: for i in `seq 1 1000`; do echo foo$i; done this hangs when the problem occur, because the socket gets filled. Wit my fixes to the systemd units, it is fine, the output goies to /var/log/messages. I can also see the output from grub2-mkconfig (actually, from os-prober) there. So the problem you are seeing should be fixed. The changes are on my branch: https://github.com/pcahyna/rear/tree/rsyslog . What do you think? Regarding RHEL 8, I see that the logs go into the systemd journal by default, so it seems that the problem does not occur there and so I won't touch it. |
Description of problem: With RHEL9, the /dev/log inode is supposed to be a symlink to /run/systemd/journal/dev-log. But when booting the ReaR ISO, it's not the case, it's a regular socket with nobody listening on. This causes no harm unless programs log to /dev/log, which gets filled and once filled up, programs will hang. Affected program can be anything, but usually it is likely grub2-mkconfig and children (including os-prober) executing in the chroot after recovery that will be affected: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- ++ chroot /mnt/local /bin/bash --login -c 'grub2-mkconfig -o /boot/grub2/grub.cfg' Generating grub configuration file ... --> HANG -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- In this scenario, hang happens when having many mount points, which lead to having os-prober scan all the mount points and send many debug messages such as "debug: /dev/mapper/vg-lvname is not an HFS+ partition: exiting" through /dev/log. The exact root cause behind having the /dev/log socket broken is the usage of templates in ReaR for some systemd services, e.g. /usr/share/rear/skel/default/usr/lib/systemd/system/syslog.socket Such template is not in sync with systemd's units on RHEL9, causing the issue. The workaround consists in 2 operations, to be performed before recovering: 1. Tell to copy standard systemd's units to the ReaR ISO: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- COPY_AS_IS+=( /usr/lib/systemd/system/systemd-journald-dev-log.socket /usr/lib/systemd/system/systemd-journald.socket /usr/lib/systemd/system/systemd-journald.service /usr/lib/systemd/system/sockets.target.wants/systemd-journald-dev-log.socket ) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 2. Delete /usr/share/rear/skel/default/usr/lib/systemd/system/syslog.socket The proper solution is likely to remove all templates mapping systemd units and copy the systemd units to the ISO instead. Version-Release number of selected component (if applicable): rear-2.6-15 How reproducible: Always Steps to Reproduce: 1. Create a VM with many filesystems /dev/mapper/rhel-root / xfs defaults 0 0 UUID=01d8a9ea-ee10-4ec2-b839-bac3c7e36db6 /boot xfs defaults 0 0 /dev/mapper/rhel-datamntpoint1 /datamntpoint1 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint10 /datamntpoint10 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint11 /datamntpoint11 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint12 /datamntpoint12 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint13 /datamntpoint13 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint14 /datamntpoint14 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint15 /datamntpoint15 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint16 /datamntpoint16 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint17 /datamntpoint17 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint18 /datamntpoint18 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint19 /datamntpoint19 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint2 /datamntpoint2 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint20 /datamntpoint20 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint21 /datamntpoint21 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint22 /datamntpoint22 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint23 /datamntpoint23 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint24 /datamntpoint24 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint25 /datamntpoint25 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint26 /datamntpoint26 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint3 /datamntpoint3 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint4 /datamntpoint4 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint5 /datamntpoint5 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint6 /datamntpoint6 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint7 /datamntpoint7 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint8 /datamntpoint8 xfs defaults 0 0 /dev/mapper/rhel-datamntpoint9 /datamntpoint9 xfs defaults 0 0 /dev/mapper/rhel-swap none swap defaults 0 0 2. Create a ReaR backup 3. Restore the backup Actual results: Hang while executing grub2-mkconfig Expected results: No hang, /dev/log socket being a symlink