Bug 1925213

Summary: [Azure][RHEL-8] cloud-init.service costs 2 minutes in DSv4 VM
Product: Red Hat Enterprise Linux 8 Reporter: Yuxin Sun <yuxisun>
Component: cloud-initAssignee: Emanuele Giuseppe Esposito <eesposit>
Status: CLOSED CURRENTRELEASE QA Contact: Huijuan Zhao <huzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.4CC: eterrell, huzhao, jgreguske, litian, ribarry, xiachen, yacao
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-08 09:59:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yuxin Sun 2021-02-04 15:47:59 UTC
Description of problem:
The DSv4 sizes don't have a temporary(ephemeral) disk. The cloud-init doesn't handle it properly and costs 2 minutes to wait for the ephemeral disk.
* This happened in both first boot and reboot.

Version-Release number of selected components (if applicable):

RHEL Version:
RHEL8.4
cloud-init-20.3-10.el8.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy a RHEL-8.4 VM  on Azure in Standard_D2s_v4 size.
2. Login the instance, check cloud-init log

Actual results:
# systemd-analyze blame
     2min 5.049s cloud-init.service

cloud-init.log:
2021-02-04 08:37:53,221 - handlers.py[DEBUG]: start: azure-ds/address_ephemeral_resize: address_ephemeral_resize
2021-02-04 08:37:53,221 - handlers.py[DEBUG]: start: azure-ds/wait-for-ephemeral-disk: wait for ephemeral disk
2021-02-04 08:37:53,221 - util.py[DEBUG]: Azure ephemeral disk: Waiting up to 120 seconds for the following files: ['/dev/disk/cloud/azure_resource']
2021-02-04 08:39:53,242 - util.py[DEBUG]: Azure ephemeral disk: Still missing files after 120 seconds: {'/dev/disk/cloud/azure_resource'}
2021-02-04 08:39:53,242 - DataSourceAzure.py[WARNING]: ephemeral device '/dev/disk/cloud/azure_resource' did not appear after 120 seconds.
...
2021-02-04 08:39:53,551 - util.py[WARNING]: Failed partitioning operation
Device /dev/disk/cloud/azure_resource did not exist and was not created with a udevadm settle.
2021-02-04 08:39:53,551 - util.py[DEBUG]: Failed partitioning operation
Device /dev/disk/cloud/azure_resource did not exist and was not created with a udevadm settle.
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py", line 141, in handle
    func=mkpart, args=(disk, definition))
  File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 2292, in log_time
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py", line 780, in mkpart
    assert_and_settle_device(device)
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py", line 757, in assert_and_settle_device
    "with a udevadm settle." % device)
RuntimeError: Device /dev/disk/cloud/azure_resource did not exist and was not created with a udevadm settle.
2021-02-04 08:39:53,652 - cc_disk_setup.py[DEBUG]: setting up filesystems: [{'filesystem': 'ext4', 'device': 'ephemeral0.1'}]
2021-02-04 08:39:53,653 - cc_disk_setup.py[DEBUG]: ephemeral0.1 is mapped to disk=/dev/disk/cloud/azure_resource part=1
2021-02-04 08:39:53,653 - cc_disk_setup.py[DEBUG]: Creating new filesystem.
2021-02-04 08:39:53,653 - subp.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)
2021-02-04 08:39:53,666 - util.py[DEBUG]: Creating fs for /dev/disk/cloud/azure_resource took 0.014 seconds
2021-02-04 08:39:53,666 - util.py[WARNING]: Failed during filesystem operation
Device /dev/disk/cloud/azure_resource did not exist and was not created with a udevadm settle.
2021-02-04 08:39:53,667 - util.py[DEBUG]: Failed during filesystem operation
Device /dev/disk/cloud/azure_resource did not exist and was not created with a udevadm settle.
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py", line 159, in handle
    func=mkfs, args=(definition,))
  File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 2292, in log_time
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py", line 883, in mkfs
    assert_and_settle_device(device)
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py", line 757, in assert_and_settle_device
    "with a udevadm settle." % device)
RuntimeError: Device /dev/disk/cloud/azure_resource did not exist and was not created with a udevadm settle.

Expected results:
cloud-init.service costs ~10s

Additional info:

Comment 1 John Ferlan 2021-02-17 12:07:11 UTC
Assigned to Rick for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 3 Eduardo Otubo 2021-02-19 11:50:36 UTC
(In reply to Yuxin Sun from comment #0)
> Description of problem:
> The DSv4 sizes don't have a temporary(ephemeral) disk. The cloud-init
> doesn't handle it properly and costs 2 minutes to wait for the ephemeral
> disk.
> * This happened in both first boot and reboot.
> 
> Version-Release number of selected components (if applicable):
> 
> RHEL Version:
> RHEL8.4
> cloud-init-20.3-10.el8.noarch
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Deploy a RHEL-8.4 VM  on Azure in Standard_D2s_v4 size.
> 2. Login the instance, check cloud-init log
> 
> Actual results:
> # systemd-analyze blame
>      2min 5.049s cloud-init.service
> 
> cloud-init.log:
> 2021-02-04 08:37:53,221 - handlers.py[DEBUG]: start:
> azure-ds/address_ephemeral_resize: address_ephemeral_resize
> 2021-02-04 08:37:53,221 - handlers.py[DEBUG]: start:
> azure-ds/wait-for-ephemeral-disk: wait for ephemeral disk
> 2021-02-04 08:37:53,221 - util.py[DEBUG]: Azure ephemeral disk: Waiting up
> to 120 seconds for the following files: ['/dev/disk/cloud/azure_resource']
> 2021-02-04 08:39:53,242 - util.py[DEBUG]: Azure ephemeral disk: Still
> missing files after 120 seconds: {'/dev/disk/cloud/azure_resource'}
> 2021-02-04 08:39:53,242 - DataSourceAzure.py[WARNING]: ephemeral device
> '/dev/disk/cloud/azure_resource' did not appear after 120 seconds.
> ...
> 2021-02-04 08:39:53,551 - util.py[WARNING]: Failed partitioning operation
> Device /dev/disk/cloud/azure_resource did not exist and was not created with
> a udevadm settle.
> 2021-02-04 08:39:53,551 - util.py[DEBUG]: Failed partitioning operation
> Device /dev/disk/cloud/azure_resource did not exist and was not created with
> a udevadm settle.
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py",
> line 141, in handle
>     func=mkpart, args=(disk, definition))
>   File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 2292, in
> log_time
>     ret = func(*args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py",
> line 780, in mkpart
>     assert_and_settle_device(device)
>   File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py",
> line 757, in assert_and_settle_device
>     "with a udevadm settle." % device)
> RuntimeError: Device /dev/disk/cloud/azure_resource did not exist and was
> not created with a udevadm settle.
> 2021-02-04 08:39:53,652 - cc_disk_setup.py[DEBUG]: setting up filesystems:
> [{'filesystem': 'ext4', 'device': 'ephemeral0.1'}]
> 2021-02-04 08:39:53,653 - cc_disk_setup.py[DEBUG]: ephemeral0.1 is mapped to
> disk=/dev/disk/cloud/azure_resource part=1
> 2021-02-04 08:39:53,653 - cc_disk_setup.py[DEBUG]: Creating new filesystem.
> 2021-02-04 08:39:53,653 - subp.py[DEBUG]: Running command ['udevadm',
> 'settle'] with allowed return codes [0] (shell=False, capture=True)
> 2021-02-04 08:39:53,666 - util.py[DEBUG]: Creating fs for
> /dev/disk/cloud/azure_resource took 0.014 seconds
> 2021-02-04 08:39:53,666 - util.py[WARNING]: Failed during filesystem
> operation
> Device /dev/disk/cloud/azure_resource did not exist and was not created with
> a udevadm settle.
> 2021-02-04 08:39:53,667 - util.py[DEBUG]: Failed during filesystem operation
> Device /dev/disk/cloud/azure_resource did not exist and was not created with
> a udevadm settle.
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py",
> line 159, in handle
>     func=mkfs, args=(definition,))
>   File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 2292, in
> log_time
>     ret = func(*args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py",
> line 883, in mkfs
>     assert_and_settle_device(device)
>   File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_disk_setup.py",
> line 757, in assert_and_settle_device
>     "with a udevadm settle." % device)
> RuntimeError: Device /dev/disk/cloud/azure_resource did not exist and was
> not created with a udevadm settle.
> 
> Expected results:
> cloud-init.service costs ~10s
> 
> Additional info:

@yuxisun can you confirm this happens only on instances that does not have ephemeral disk? Also, can you test for regression on rhel-8.4 as well?

Comment 4 Eduardo Otubo 2021-02-19 11:55:26 UTC
It does look like this problem is the same as in: https://bugs.launchpad.net/cloud-init/+bug/1901011

There's already a PR opened for this issue, reviews are positive but no merge yet. I'll ping the maintainers so we can speed up a little bit on this fix.
https://github.com/canonical/cloud-init/pull/800

Comment 5 Eduardo Otubo 2021-02-24 09:06:51 UTC
@yuxisun the PR is already merged but it a little late for 8.4. If you agree we could deffer it to 8.5. What do you think?

Comment 7 Eduardo Otubo 2021-02-24 10:36:50 UTC
@yuxisun the PR is already merged but it a little late for 8.4. If you agree we could deffer it to 8.5. What do you think?

Comment 8 Eduardo Otubo 2021-02-24 12:33:15 UTC
(In reply to Eduardo Otubo from comment #7)
> @yuxisun the PR is already merged but it a little late for 8.4.
> If you agree we could deffer it to 8.5. What do you think?

I don't know what caused the message to duplicate. But we're good. Postponing to rhel-8.5.0 :-)

Thanks!