Bug 1630666

Summary: [Regression][RHEL7.6][cloud-init] cloud-init-local.service takes over 50s when creating EC2 instances
Product: Red Hat Enterprise Linux 7 Reporter: Chen Shi <cheshi>
Component: cloud-initAssignee: Eduardo Otubo <eterrell>
Status: CLOSED CURRENTRELEASE QA Contact: Frank Liang <xiliang>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.6CC: cheshi, huzhao, jgreguske, jomurphy, linl, sacpatil, vkuznets, xiliang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-03 19:54:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chen Shi 2018-09-19 03:16:44 UTC
[IMPORTANT] This issue is under investigation now, it may also relate to the AWS infrastructure, I will keep updating. Before I found something new, let's treat this as an regression issue first.

Description of problem:
During T3 instance type verification, we found that cloud-init-local.service takes over 50s to start up during EC2 instance create.

Version-Release number of selected component (if applicable):
kernel-3.10.0-933.el7.x86_64
cloud-init-18.2-1.el7.x86_64

RHEL Version:
RHEL7.6

How reproducible:
100%

Steps to Reproduce:
1. Create an instance.
2. $ sudo systemd-analyze time
3. $ sudo systemd-analyze blame
4. $ sudo systemd-analyze critical-chain
5. $ sudo systemd-analyze dot

Actual results:
We can see the cloud-init-local.service causes too much time (~51s) to start up.

Expected results:
The service start up time should be less than 10s except kdump.service.

Additional info:
1. The start time are more than 50s when creating an instance. (have issue)
2. The start time are less than 1.5s when rebooting the instance.
3. This issue can be reproduced over other instance type like m5 and t2.

** VMSize       Method     Kernel     Initrd     Userspace   Total      
** t3.2xlarge   create     2.484s     1.998s     1min11.160s 1min15.644s
** t3.2xlarge   reboot     2.505s     1.508s     21.736s     25.749s    
** t3.large     create     2.272s     1.939s     1min11.450s 1min15.662s
** t3.large     reboot     2.270s     1.513s     21.453s     25.238s    
** t3.medium    create     2.243s     1.984s     1min11.361s 1min15.589s
** t3.medium    reboot     2.245s     1.459s     22.123s     25.829s    
** t3.micro     create     2.285s     1.980s     56.121s     1min388ms  
** t3.micro     reboot     2.276s     1.483s     4.910s      8.671s     
** t3.nano      create     2.280s     2.043s     56.276s     1min600ms  
** t3.nano      reboot     2.268s     1.448s     4.625s      8.343s     
** t3.small     create     2.276s     1.942s     1min11.527s 1min15.745s
** t3.small     reboot     2.272s     1.524s     21.446s     25.243s    
** t3.xlarge    create     2.325s     1.714s     1min9.820s  1min13.860s
** t3.xlarge    reboot     2.324s     1.477s     20.122s     23.925s    

** VMSize       Method     cloud-init-local    
** t3.2xlarge   create     51.179s             
** t3.2xlarge   reboot     1.405s              
** t3.large     create     51.447s             
** t3.large     reboot     1.239s              
** t3.medium    create     51.508s             
** t3.medium    reboot     1.358s              
** t3.micro     create     51.101s             
** t3.micro     reboot     1.243s              
** t3.nano      create     51.506s             
** t3.nano      reboot     1.199s              
** t3.small     create     51.126s             
** t3.small     reboot     1.371s              
** t3.xlarge    create     51.020s             
** t3.xlarge    reboot     1.236s     

** VMSize       Method     Kernel     Initrd     Userspace   Total      
** m5.xlarge    create     2.265s     1.875s     1min8.625s  1min12.766s
** m5.xlarge    reboot     2.282s     1.397s     19.769s     23.449s    
** t2.xlarge    create     1.917s     1.605s     1min10.310s 1min13.833s
** t2.xlarge    reboot     1.995s     1.161s     20.050s     23.206s    

** VMSize       Method     cloud-init-local    
** m5.xlarge    create     51.378s             
** m5.xlarge    reboot     1.190s              
** t2.xlarge    create     51.379s             
** t2.xlarge    reboot     1.367s

Comment 3 Chen Shi 2018-09-19 08:11:27 UTC
The results suggest that it still an regression issue against RHEL7.5 and this issue can be found with official build only.

https://docs.google.com/spreadsheets/d/1kcWGDtKnXSq-pTx9fTiPjdLw0iKCUKBY5XtjZbn8NrY

Comment 5 Eduardo Otubo 2019-02-15 13:51:45 UTC
Please check if this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1625874