2172912 – Broken /dev/log socket created during boot in recovery, causing grub2-mkconfig to hang forever

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2172912 - Broken /dev/log socket created during boot in recovery, causing grub2-mkconfig to hang forever

Summary: Broken /dev/log socket created during boot in recovery, causing grub2-mkconfi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	rear
Sub Component:
Version:	9.1
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Pavel Cahyna
QA Contact:	Jakub Haruda
Docs Contact:	Šárka Jana
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-02-23 13:48 UTC by Renaud Métrich
Modified:	2023-11-07 12:57 UTC (History)
CC List:	2 users (show)
Fixed In Version:	rear-2.6-18.el9
Doc Type:	Bug Fix
Doc Text:	.The `rsyslog` logging service now starts at boot of the rescue system Previously, the `rsyslog` service for message logging did not automatically start in the rescue system. The `/dev/log` socket kept receiving messages during the recovery process with no service listening at this socket. Consequently, the `/dev/log` socket was filled with messages and caused the recovery process to be stuck. For example, the `grub2-mkconfig` command to regenerate the GRUB configuration produces a high amount of log messages depending on the number of mounted file systems. If you used ReaR to recover systems with many mounted file systems, numerous log messages would fill the `/dev/log` socket, and the recovery process froze. With this fix, the `systemd` units in the rescue system now include the sockets target in the boot procedure to start the logging socket at boot. As a result, the `rsyslog` service starts in the rescue environment when required, and the processes that need to log messages during recovery are no longer stuck. The recovery process completes successfully and you can find the log messages in the `/var/log/messages` file in the rescue RAM disk.
Clone Of:
Environment:
Last Closed:	2023-11-07 08:37:21 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-149796	None	None	None	2023-02-23 13:51:36 UTC
Red Hat Knowledge Base (Solution)	6999967	None	None	None	2023-02-28 09:34:49 UTC
Red Hat Product Errata	RHBA-2023:6571	None	None	None	2023-11-07 08:37:39 UTC

Description Renaud Métrich 2023-02-23 13:48:43 UTC

Description of problem:

With RHEL9, the /dev/log inode is supposed to be a symlink to /run/systemd/journal/dev-log.
But when booting the ReaR ISO, it's not the case, it's a regular socket with nobody listening on.

This causes no harm unless programs log to /dev/log, which gets filled and once filled up, programs will hang.

Affected program can be anything, but usually it is likely grub2-mkconfig and children (including os-prober) executing in the chroot after recovery that will be affected:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
++ chroot /mnt/local /bin/bash --login -c 'grub2-mkconfig -o /boot/grub2/grub.cfg'
Generating grub configuration file ...

--> HANG
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

In this scenario, hang happens when having many mount points, which lead to having os-prober scan all the mount points and send many debug messages such as "debug: /dev/mapper/vg-lvname is not an HFS+ partition: exiting" through /dev/log.

The exact root cause behind having the /dev/log socket broken is the usage of templates in ReaR for some systemd services, e.g. /usr/share/rear/skel/default/usr/lib/systemd/system/syslog.socket

Such template is not in sync with systemd's units on RHEL9, causing the issue.

The workaround consists in 2 operations, to be performed before recovering:

1. Tell to copy standard systemd's units to the ReaR ISO:

   -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
   COPY_AS_IS+=( /usr/lib/systemd/system/systemd-journald-dev-log.socket /usr/lib/systemd/system/systemd-journald.socket /usr/lib/systemd/system/systemd-journald.service /usr/lib/systemd/system/sockets.target.wants/systemd-journald-dev-log.socket )
    -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Delete /usr/share/rear/skel/default/usr/lib/systemd/system/syslog.socket

The proper solution is likely to remove all templates mapping systemd units and copy the systemd units to the ISO instead.

Version-Release number of selected component (if applicable):

rear-2.6-15

How reproducible:

Always

Steps to Reproduce:

1. Create a VM with many filesystems

   /dev/mapper/rhel-root   /                       xfs     defaults        0 0
   UUID=01d8a9ea-ee10-4ec2-b839-bac3c7e36db6 /boot                   xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint1 /datamntpoint1          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint10 /datamntpoint10         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint11 /datamntpoint11         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint12 /datamntpoint12         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint13 /datamntpoint13         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint14 /datamntpoint14         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint15 /datamntpoint15         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint16 /datamntpoint16         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint17 /datamntpoint17         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint18 /datamntpoint18         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint19 /datamntpoint19         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint2 /datamntpoint2          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint20 /datamntpoint20         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint21 /datamntpoint21         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint22 /datamntpoint22         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint23 /datamntpoint23         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint24 /datamntpoint24         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint25 /datamntpoint25         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint26 /datamntpoint26         xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint3 /datamntpoint3          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint4 /datamntpoint4          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint5 /datamntpoint5          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint6 /datamntpoint6          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint7 /datamntpoint7          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint8 /datamntpoint8          xfs     defaults        0 0
   /dev/mapper/rhel-datamntpoint9 /datamntpoint9          xfs     defaults        0 0
   /dev/mapper/rhel-swap   none                    swap    defaults        0 0

2. Create a ReaR backup
3. Restore the backup

Actual results:

Hang while executing grub2-mkconfig

Expected results:

No hang, /dev/log socket being a symlink

Comment 1 Pavel Cahyna 2023-02-23 14:05:00 UTC

(In reply to Renaud Métrich from comment #0)
> Description of problem:
> 
> With RHEL9, the /dev/log inode is supposed to be a symlink to
> /run/systemd/journal/dev-log.

Thank you for the analysis. Is it a new problem in RHEL 9, or has it existed in RHEL 8 as well?
I see a similar situation in RHEL 8:

# ls -l /dev/log
lrwxrwxrwx. 1 root root 28 Feb 22 04:16 /dev/log -> /run/systemd/journal/dev-log

Comment 2 Renaud Métrich 2023-02-23 14:15:35 UTC

I don't know if this affects RHEL8.

For sure the good inode is:

# ls -l /dev/log
lrwxrwxrwx. 1 root root 28 Feb 22 04:16 /dev/log ->
/run/systemd/journal/dev-log

Comment 3 Pavel Cahyna 2023-02-23 14:20:46 UTC

I am curious though how does having correct systemd unit outside the chroot help the program running in the chroot? Is it because /run is shared so that connecting to /run/systemd/journal/dev-log in the chroot actually connects to the daemon that runs outside?

Comment 4 Renaud Métrich 2023-02-23 14:29:29 UTC

It's because /dev/log outside the chroot is broken, causing /dev/log inside the chroot to be broken as well since it's a bind mount

Comment 6 Pavel Cahyna 2023-06-16 11:40:00 UTC

Hi Renaud, thank you for the analysis again, I have looked into the details of systemd units startup in the rescue system. IMO, your proposed workaround (to copy all the systemd logging-related units) is not very well suitable for inclusion in upstream, as ReaR needs to support many distros and these details will vary among them. At least, it would require lots of difficult testing in all the supported distros. Therefore, I propose a less invasive solution. I found that there are multiple problems with the current systemd units: nothing wants basic.target and therefore the services/sockets that it contains get never started (this affect the /dev/log socket and the rsyslogd service that is listening on it). Moreover, if I fix this, the socket starts very early and for some reason this does not work. If I order it after basic system initialization, everything starts working. The socket gets started, when one attempts to log to it rsyslogd is spawned and sends the messages to /var/log/messages. (/dev/log is not a symlink to /run/systemd/journal/dev-log, but I don't think it is a big problem). By the way, I can reproduce the problem as well using a simple for loop:
for i in `seq 1 1000`; do echo foo$i; done
this hangs when the problem occur, because the socket gets filled.
Wit my fixes to the systemd units, it is fine, the output goies to /var/log/messages. I can also see the output from grub2-mkconfig (actually, from os-prober) there. So the problem you are seeing should be fixed. The changes are on my branch: https://github.com/pcahyna/rear/tree/rsyslog . What do you think?

Regarding RHEL 8, I see that the logs go into the systemd journal by default, so it seems that the problem does not occur there and so I won't touch it.

Comment 18 errata-xmlrpc 2023-11-07 08:37:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rear bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6571

Note You need to log in before you can comment on or make changes to this bug.