Bug 1994804 - Long delay with AWS systems becoming available
Summary: Long delay with AWS systems becoming available
Keywords:
Status: CLOSED DUPLICATE of bug 1862930
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: cloud-init
Version: ---
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: beta
: ---
Assignee: sushil kulkarni
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-17 20:56 UTC by David Valin
Modified: 2022-03-07 01:00 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-18 12:52:02 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
systemd analyze-time output (148 bytes, text/plain)
2021-08-17 20:56 UTC, David Valin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-93897 0 None None None 2021-08-17 20:57:51 UTC

Description David Valin 2021-08-17 20:56:25 UTC
Created attachment 1814954 [details]
systemd  analyze-time output

Description of problem:
On AWS, we are seeing on system creation, taking 122 seconds to be able to login to the system via ssh.  Ubuntu and Amazon linux is taking about 55 seconds

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:
1.Create an AWS instance with RHEL (was using ami-03d64741867e7bb94, c5.xlarge instance)
2.Time how long it takes before you can ssh into the system
3.

Actual results:
Time from cloud instance creation start to logging in, is taking about 122 seconds.

Expected results:
Expect to be on par with Ubuntu/Amazon Linux, which is around 55 seconds.

Additional info:

From dmesg files

[   10.003160] ppdev: user-space parallel port driver
[   62.901525] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   62.910524] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   62.928081] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   62.940125] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   63.511141] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

That is the large jump in time.  Amazon Linux shows this about 4 seconds into the run.

From 
systemd analyze-blame
         53.120s cloud-init-local.service
          8.451s kdump.service

So the cloud-init-local.service appears to be the culprit.

Looking at its log file 
2021-08-17 20:26:39,749 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '172.31.0.1', 'dev', 'eth0', 'src', '172.31.4.239'] with allowed return codes [0] (shell=False, capture=True)
2021-08-17 20:26:39,760 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', 'default', 'via', '172.31.0.1', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-08-17 20:27:19,832 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 40.061 seconds
2021-08-17 20:27:29,843 - util.py[DEBUG]: Resolving URL: http://instance-data.:8773 took 10.010 seconds

Appears that the URL resolution is what is causing the problems.

Log files attached

Comment 8 Amnon Ilan 2021-08-18 12:52:02 UTC

*** This bug has been marked as a duplicate of bug 1862930 ***

Comment 9 aboudr01 2022-03-03 15:17:06 UTC
bug 1862930 is a internal bug that I cannot view the status on. Is there any status on the resolution of it or any work arounds mentioned? We are still experiencing this issue with rhel 8.5.

Comment 10 Frank Liang 2022-03-04 01:28:15 UTC
(In reply to aboudr01 from comment #9)
> bug 1862930 is a internal bug that I cannot view the status on. Is there any
> status on the resolution of it or any work arounds mentioned? We are still
> experiencing this issue with rhel 8.5.
Are you using the images built by your self or created from running instances?
Does it appear only in your first launch? If yes, please try to cleanup below file before uploading your own image or creating from running instance.
#truncate -s 0 /etc/resolv.conf

Comment 11 aboudr01 2022-03-04 15:11:22 UTC
We are using a redhat provided 8.3 AMI in the c2s region which points to a yum repo updated with 8.5 rpms. It appears to only be a problem with the first boot. After stopping and starting the instance, when I run cloud-init analyze blame the first boot record shows

-- Boot Record 01 --
51.03100s (init-local/search-Ec2Local)

and the second

-- Boot Record 02 --
00.65400s (init-local/search-Ec2Local)

We are also creating our own images, starting from the redhat provided one, which are also having the issue. I will try the suggested work around of clearing resolv.conf to see if it resolves the issue for our own images.

Comment 12 aboudr01 2022-03-04 17:24:16 UTC
I have confirmed that clearing /etc/resolv.conf before making an image resolves the problem.

Thanks,
Aaron

Comment 13 Frank Liang 2022-03-07 01:00:19 UTC
(In reply to aboudr01 from comment #12)
> I have confirmed that clearing /etc/resolv.conf before making an image
> resolves the problem.
> 
> Thanks,
> Aaron

Thanks for your confirmation, we are working on writing an article about this topic.


Note You need to log in before you can comment on or make changes to this bug.