2189428 – CentOS Stream 8 LXD and cloud images taking 2 minutes to get IP

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2189428 - CentOS Stream 8 LXD and cloud images taking 2 minutes to get IP

Summary: CentOS Stream 8 LXD and cloud images taking 2 minutes to get IP

Keywords:
Status:	CLOSED DUPLICATE of bug 2174645
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	CentOS Stream
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	Frantisek Sumsal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-04-25 07:56 UTC by Lucas
Modified:	2023-04-25 13:21 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-04-25 09:15:03 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Logs from the fresh LXD container (7.98 KB, text/plain) 2023-04-25 07:56 UTC, Lucas	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-155619	0	None	None	None	2023-04-25 07:57:03 UTC

Description Lucas 2023-04-25 07:56:29 UTC

Created attachment 1959737 [details]
Logs from the fresh LXD container

Description of problem:

We use centos stream 8 images on both LXD containers and on cloud VMs (Hetzner mostly in our case), and the base image currently seems to have an issue where its taking up to 2 minutes to finish the network startup. The issue seems related to the dhclient startup, but I might be wrong.

I've attached all the logs from a fresh LXD container startup where the issue can be seen clearly. The container boots at 15:52:50 and the logs simply stop at 15:52:51 until 15:54:21 where they eventually resume, resulting in a startup finishing around 15:54:23, where it finally has an accessible IP, and thus the network is finally working.

To add a bit more on that: We are also using podman centos stream 8 images (quay.io/centos/centos:stream8) and these are not experiencing the same issue.

Version-Release number of selected component (if applicable):


How reproducible:

It happens 100% of times I create LXD containers.


Steps to Reproduce:
The behavior is the same for our LXD containers and our cloud VMs. Just start a fresh container using a Centos Stream 8 base image and it takes around 1~2 minutes to startup its network.

For the LXD scenario, the image being used is `images:centos/8-Stream`, like this:
```
lxc launch images:centos/8-Stream test-lxd
```

Actual results:

The network service takes 2 minutes on average to startup and get any IP


Expected results:

In the previous version (and other similar images) that usually happens withing a few short seconds

Comment 1 Martin Osvald 🛹 2023-04-25 09:15:03 UTC

Thank you for reporting this problem.

Checking the attached log, at first glance there is no apparent issue with dhclient itself or any dhcp server side issue (getting from DHCPDISCOVER to DHCPACK is very fast):

~~~
Apr 20 15:54:21 test-lxd network[467]: Bringing up interface eth0:
Apr 20 15:54:21 test-lxd dhclient[630]: Created duid "\000\004\337\252\266\0058}C\201\204}L\255\204\034SK".
Apr 20 15:54:21 test-lxd dhclient[630]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 (xid=0xca1db832)
Apr 20 15:54:21 test-lxd dhclient[630]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0xca1db832)
Apr 20 15:54:21 test-lxd dhclient[630]: DHCPOFFER from 10.155.213.1
Apr 20 15:54:21 test-lxd dhclient[630]: DHCPACK from 10.155.213.1 (xid=0xca1db832)
Apr 20 15:54:23 test-lxd NET[658]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Apr 20 15:54:23 test-lxd dbus-daemon[468]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.8' (uid=0 pid=660 comm="hostnamectl --transient " label="unconfined")
Apr 20 15:54:23 test-lxd systemd[1]: Starting Hostname Service...
Apr 20 15:54:23 test-lxd dbus-daemon[468]: [system] Successfully activated service 'org.freedesktop.hostname1'
Apr 20 15:54:23 test-lxd systemd[1]: Started Hostname Service.
Apr 20 15:54:23 test-lxd dhclient[630]: bound to 10.155.213.114 -- renewal in 1754 seconds.
Apr 20 15:54:23 test-lxd network[611]: Determining IP information for eth0... done.
Apr 20 15:54:23 test-lxd network[467]: [  OK  ]
~~~

On the other hand, I see the delay seems to arise earlier here:

~~~
Apr 20 15:52:51 test-lxd systemd-udevd[437]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr 20 15:52:51 test-lxd systemd[1]: Started udev Coldplug all Devices.
Apr 20 15:52:51 test-lxd systemd-udevd[438]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.

... note the delay happens here:

Apr 20 15:54:21 test-lxd systemd[1]: systemd-journal-flush.service: start operation timed out. Terminating.
Apr 20 15:54:21 test-lxd systemd[1]: systemd-journal-flush.service: Main process exited, code=killed, status=15/TERM
Apr 20 15:54:21 test-lxd systemd[1]: systemd-journal-flush.service: Failed with result 'timeout'.
~~~

I found a similar problem on the net:

https://unix.stackexchange.com/questions/740162/systemd-journal-flush-service-failed-with-result-timeout-failed-to-start-flush

There are already opened BZs for this for RHEL7 (WONTFIX) & RHEL8:

RHEL7 BZ:

[Bug 1649857 - systemd-journal-flush.service fails in timeout causing boot to be delayed by 1min30]
https://bugzilla.redhat.com/show_bug.cgi?id=1649857

RHEL8 BZ:

[Bug 2174645 - Failed to start Flush Journal to Persistent Storage (systemd-journal-flush.service)]
https://bugzilla.redhat.com/show_bug.cgi?id=2174645

When looking at the state of RHEL8 BZ I can see it is targeted to get fixed in RHEL8.8.

There is also a comment asking about CentOS 8 fix first:

https://bugzilla.redhat.com/show_bug.cgi?id=2174645#c26


Please, refer to the information shared in those links and BZs.

Given the information above, I am going to change the component of this to systemd and close it as a duplicate of bug 2174645.

*** This bug has been marked as a duplicate of bug 2174645 ***

Comment 2 Lucas 2023-04-25 11:07:29 UTC

Thanks! I was totally barking at the wrong tree here due to this behavior affecting initially the networking, but I can see now that it's on an earlier point. 

I'll play around with the workaround from stackexchange to see if that's feasible for our flow, and will also keep an eye on this bug 2174645

Appreciate the help and fast reply!

Note You need to log in before you can comment on or make changes to this bug.