Bug 2032524

Summary: [RHEL9] [Azure] cloud-init fails to configure the system
Product: Red Hat Enterprise Linux 9 Reporter: Neal Gompa <ngompa13>
Component: cloud-initAssignee: Emanuele Giuseppe Esposito <eesposit>
Status: CLOSED ERRATA QA Contact: Huijuan Zhao <huzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: CentOS StreamCC: adimania, apevec, atodorov, bstinson, carl, davide, davidmccheyne, daxelrod, dustymabe, eesposit, eterrell, extras-qa, fedora, francois.rigault, gholms, huzhao, jgreguske, jwboyer, lars, ldu, matt, mhayden, michel, mrezanin, ngompa13, shardy, s, xiachen, xiliang, yacao, yuxisun
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cloud-init-21.1-15.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1974262
: 2039697 (view as bug list) Environment:
Last Closed: 2022-05-17 12:26:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2039697    

Description Neal Gompa 2021-12-14 16:05:07 UTC
+++ This bug was initially created as a clone of Bug #1974262 +++

Description of problem:
cloud-init service fails when trying to provision on Azure with errors like the following:

2021-04-20 03:33:11,917 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceAzure.DataSourceAzure'> failed

Version-Release number of selected components (if applicable):
21.1-14.el9

How reproducible:
100%

Steps to Reproduce:

(Note, this is with Fedora 34 and 35, as there is no Azure CentOS image yet, but custom built ones on EL9 demonstrate this issue)

1. Create a Fedora 34 VM ("urn": "tunnelbiz:fedora:fedoraupdate:34.0.1") on Azure
2. Login and check cloud-init service status

Actual results:
[root@walafedora ~]# systemctl status cloud-init
× cloud-init.service - Initial cloud-init job (metadata service crawler)
     Loaded: loaded (/usr/lib/systemd/system/cloud-init.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Mon 2021-06-21 14:53:56 +08; 1h 27min ago
    Process: 709 ExecStart=/usr/bin/cloud-init init (code=exited, status=1/FAILURE)
   Main PID: 709 (code=exited, status=1/FAILURE)
        CPU: 535ms

Jun 21 14:53:56 walafedora cloud-init[806]: ci-info: +-------+-------------+---------+-----------+-------+
Jun 21 14:53:56 walafedora cloud-init[806]: ci-info: | Route | Destination | Gateway | Interface | Flags |
Jun 21 14:53:56 walafedora cloud-init[806]: ci-info: +-------+-------------+---------+-----------+-------+
Jun 21 14:53:56 walafedora cloud-init[806]: ci-info: |   2   |  multicast  |    ::   |    eth0   |   U   |
Jun 21 14:53:56 walafedora cloud-init[806]: ci-info: +-------+-------------+---------+-----------+-------+
Jun 21 14:53:56 walafedora cloud-init[806]: 2021-06-21 06:53:56,163 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNone.DataSourceNone'> failed
Jun 21 14:53:56 walafedora cloud-init[806]: 2021-06-21 06:53:56,174 - util.py[WARNING]: No instance datasource found! Likely bad things to come!
Jun 21 14:53:56 walafedora systemd[1]: cloud-init.service: Main process exited, code=exited, status=1/FAILURE
Jun 21 14:53:56 walafedora systemd[1]: cloud-init.service: Failed with result 'exit-code'.
Jun 21 14:53:56 walafedora systemd[1]: Failed to start Initial cloud-init job (metadata service crawler).

/var/log/cloud-init.log:
2021-04-20 03:33:11,916 - handlers.py[DEBUG]: start: init-local/search-Azure: searching for local data from DataSourceAzure
2021-04-20 03:33:11,916 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceAzure.DataSourceAzure'>
2021-04-20 03:33:11,917 - handlers.py[DEBUG]: finish: init-local/search-Azure: FAIL: no local data found from DataSourceAzure
2021-04-20 03:33:11,917 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceAzure.DataSourceAzure'> failed
2021-04-20 03:33:11,917 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceAzure.DataSourceAzure'> failed
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/cloudinit/sources/__init__.py", line 759, in find_source
    s = cls(sys_cfg, distro, paths)
  File "/usr/lib/python3.9/site-packages/cloudinit/sources/DataSourceAzure.py", line 292, in __init__
    sources.DataSource.__init__(self, sys_cfg, distro, paths)
  File "/usr/lib/python3.9/site-packages/cloudinit/sources/__init__.py", line 211, in __init__
    self.ds_cfg = util.get_cfg_by_path(
  File "/usr/lib/python3.9/site-packages/cloudinit/util.py", line 735, in get_cfg_by_path
    if tok not in cur:
TypeError: argument of type 'NoneType' is not iterable
2021-04-20 03:33:11,922 - main.py[DEBUG]: No local datasource found

Expected results: 
No error in cloud-init

Additional info:

(From the cloned bug 1974262...)

It also fails against AzureStack,  Fedora-Cloud-Base-35-1.2.x86_64.qcow2 (same with 34, so not a regression)

[   38.793452] cloud-init[651]: 2021-11-04 15:13:43,593 - azure.py[WARNING]: Error communicating with Azure fabric; You may experience connectivity issues: Unexpected error while running command.
[   38.842623] cloud-init[651]: Command: ['opesl', 'req', '-x509', '-nodes', '-subj', '/CN=LinuxTransport', '-days', '32768', '-newkey', 'rsa:2048', '-keyout', 'TransportPrivate.pem', '-out', 'TransportCert.pem']
[   38.896366] cloud-init[651]: Exit code: -
[   38.906487] cloud-init[651]: Reason: [Errno 2] No such file or directory: b'openssl'
[   38.928738] cloud-init[651]: Stdout: -
[   38.938581] cloud-init[651]: Stderr: -
[   39.178793] cloud-init[651]: 2021-11-04 15:13:44,646 - util.py[WARNING]: Failed partitioning operation
[   39.209486] cloud-init[651]: Error running partition command on /dev/sdb
[   39.231286] cloud-init[651]: 'NoneType' object has no attribute 'encode'
[   43.187537] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

no ssh key is installed and the system is unusable.


cloud-init azure.py depends on openssl but the dependency is not there.
Workaround: virt-customize --install openssl -a Fedora-Cloud-Base-35-1.2.x86_64.qcow2
make it work.
(there is still a broken service:
Nov 04 15:48:15 fedora systemd[1]: Starting Rebuild Dynamic Linker Cache...                                                                                                                                                                           
Nov 04 15:48:17 fedora ldconfig[618]: /sbin/ldconfig: Renaming of /etc/ld.so.cache~ to /etc/ld.so.cache failed: Permission denied 
but at least ssh is working)

Looks like the cloud image does not work on Azure :/

--- Additional comment from François Rigault on 2021-11-04 12:17:46 EDT ---

also need the gdisk package for the partitioning issue. With both packages cloud-init seems to work as expected.


virt-customize --install gdisk --install openssl -a Fedora-Cloud-Base-35-1.2.x86_64.qcow2

--- Additional comment from Neal Gompa on 2021-12-14 10:52:09 EST ---

The fix here would be to add "gdisk" and "openssl" as required runtime dependencies for cloud-init.

--- Additional comment from Neal Gompa on 2021-12-14 10:59:24 EST ---

PR proposed: https://src.fedoraproject.org/rpms/cloud-init/pull-request/23

Comment 2 Huijuan Zhao 2021-12-15 02:35:23 UTC
Tried it with RHEL-9(cloud-init-21.1-14.el9) on Azure, did not meet this issue as we pre-installed openssl and gdisk in the image. 

Tried to remove openssl and gdisk from RHEL-9, failed to remove openssl as it is dependency for several other packages, and it is included in rhel-guest-image by default. So maybe no need adding openssl as cloud-init dependency. 

But after removed gdisk, there is error when do partitioning operation which should be called by cloud-utils-growpart:
$ cat /var/log/cloud-init.log
---------------------------------
    673 2021-12-15 02:05:09,765 - util.py[WARNING]: Failed partitioning operation
    674 Error running partition command on /dev/sda
    675 'NoneType' object has no attribute 'encode'
    676 2021-12-15 02:05:09,773 - util.py[DEBUG]: Failed partitioning operation
    677 Error running partition command on /dev/sda
    678 'NoneType' object has no attribute 'encode'
    679 Traceback (most recent call last):
    680   File "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line 491, in check_partition_gpt_layout
    681     out, _err = subp.subp(prt_cmd, update_env=LANG_C_ENV)
    682   File "/usr/lib/python3.9/site-packages/cloudinit/subp.py", line 253, in subp
    683     bytes_args = [
    684   File "/usr/lib/python3.9/site-packages/cloudinit/subp.py", line 254, in <listcomp>
    685     x if isinstance(x, bytes) else x.encode("utf-8")
    686 AttributeError: 'NoneType' object has no attribute 'encode'
    687 
    688 The above exception was the direct cause of the following exception:
    689 
    690 Traceback (most recent call last):
    691   File "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line 139, in handle
    692     util.log_time(logfunc=LOG.debug,
    693   File "/usr/lib/python3.9/site-packages/cloudinit/util.py", line 2409, in log_time
    694     ret = func(*args, **kwargs)
    695   File "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line 808, in mkpart
    696     if check_partition_layout(table_type, device, layout):
    697   File "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line 537, in check_partition_layout
    698     found_layout = get_dyn_func(
    699   File "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line 431, in get_dyn_func
    700     return globals()[func_name](*func_args)
    701   File "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line 493, in check_partition_gpt_layout
    702     raise Exception(
    703 Exception: Error running partition command on /dev/sda
    704 'NoneType' object has no attribute 'encode'
---------------------------------


Neal, did you use Fedora-Cloud-Base-35-1.2.x86_64.qcow2 to meet the issue? Is it Fedora image which does not have openssl and gdisk pre-installed by default?

According to the tests in RHEL-9 on Azure, IMO maybe we can add gdisk as cloud-utils-growpart dependency in RHEL. Please correct me if anything wrong. Thanks!

Comment 3 Neal Gompa 2021-12-15 08:49:50 UTC
(In reply to Huijuan Zhao from comment #2)
> Tried it with RHEL-9(cloud-init-21.1-14.el9) on Azure, did not meet this
> issue as we pre-installed openssl and gdisk in the image. 
> 
> Tried to remove openssl and gdisk from RHEL-9, failed to remove openssl as
> it is dependency for several other packages, and it is included in
> rhel-guest-image by default. So maybe no need adding openssl as cloud-init
> dependency. 
> 
> But after removed gdisk, there is error when do partitioning operation which
> should be called by cloud-utils-growpart:
> $ cat /var/log/cloud-init.log
> ---------------------------------
>     673 2021-12-15 02:05:09,765 - util.py[WARNING]: Failed partitioning
> operation
>     674 Error running partition command on /dev/sda
>     675 'NoneType' object has no attribute 'encode'
>     676 2021-12-15 02:05:09,773 - util.py[DEBUG]: Failed partitioning
> operation
>     677 Error running partition command on /dev/sda
>     678 'NoneType' object has no attribute 'encode'
>     679 Traceback (most recent call last):
>     680   File
> "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line
> 491, in check_partition_gpt_layout
>     681     out, _err = subp.subp(prt_cmd, update_env=LANG_C_ENV)
>     682   File "/usr/lib/python3.9/site-packages/cloudinit/subp.py", line
> 253, in subp
>     683     bytes_args = [
>     684   File "/usr/lib/python3.9/site-packages/cloudinit/subp.py", line
> 254, in <listcomp>
>     685     x if isinstance(x, bytes) else x.encode("utf-8")
>     686 AttributeError: 'NoneType' object has no attribute 'encode'
>     687 
>     688 The above exception was the direct cause of the following exception:
>     689 
>     690 Traceback (most recent call last):
>     691   File
> "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line
> 139, in handle
>     692     util.log_time(logfunc=LOG.debug,
>     693   File "/usr/lib/python3.9/site-packages/cloudinit/util.py", line
> 2409, in log_time
>     694     ret = func(*args, **kwargs)
>     695   File
> "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line
> 808, in mkpart
>     696     if check_partition_layout(table_type, device, layout):
>     697   File
> "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line
> 537, in check_partition_layout
>     698     found_layout = get_dyn_func(
>     699   File
> "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line
> 431, in get_dyn_func
>     700     return globals()[func_name](*func_args)
>     701   File
> "/usr/lib/python3.9/site-packages/cloudinit/config/cc_disk_setup.py", line
> 493, in check_partition_gpt_layout
>     702     raise Exception(
>     703 Exception: Error running partition command on /dev/sda
>     704 'NoneType' object has no attribute 'encode'
> ---------------------------------
> 
> 
> Neal, did you use Fedora-Cloud-Base-35-1.2.x86_64.qcow2 to meet the issue?
> Is it Fedora image which does not have openssl and gdisk pre-installed by
> default?
> 

That is the case, yes. I also tested with a custom CentOS Stream 9 image that I built, where it's possible to not have the openssl CLI tools installed.

As we need openssl and gdisk at the cloud-init level, it makes sense to guarantee it is always there for this stuff.

> According to the tests in RHEL-9 on Azure, IMO maybe we can add gdisk as
> cloud-utils-growpart dependency in RHEL. Please correct me if anything
> wrong. Thanks!

I think the dependency needs to be at the cloud-init level because sgdisk gets run by cloud-init stuff before passing it to cloud-utils-growpart. If cloud-utils-growpart *also* calls sgdisk, then it needs a gdisk dependency too, but cloud-init *definitely* needs the gdisk dependency.

Comment 4 Huijuan Zhao 2021-12-20 04:24:28 UTC
Neal, thanks for the explanation and updates. 
It is ok to add openssl and gdisk as dependency at the cloud-init level from QE side, thanks!

Comment 18 errata-xmlrpc 2022-05-17 12:26:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: cloud-init), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2308