Bug 1460206 - [WALA] [cloud-init] cloud-init can't find Azure endpoint
[WALA] [cloud-init] cloud-init can't find Azure endpoint
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: cloud-init (Show other bugs)
7.4
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Ryan McCabe
Vratislav Hutsky
: Regression, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-09 07:18 EDT by Yuhui Jiang
Modified: 2018-05-15 22:34 EDT (History)
13 users (show)

See Also:
Fixed In Version: cloud-init-0.7.9-9.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 19:23:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Yuhui Jiang 2017-06-09 07:18:11 EDT
Description of problem: 
cloud-init can't find Azure endpoint server during provisioning on Azure Platform,and then cause cloud-init to use "DataSourceNone" as its datasource(this value is fallback value).
cloud-init find Azure endpoint server in  "/run/cloud-init/dhclient.hooks" folder, this folder and file in this folder is generated by "/etc/NetworkManager/dispatcher.d/cloud-init-azure-hook".
The file is a hook script,and it will be invoked by NetworkManager during booting.But the hook script will check whether there is a marker file under /run/cloud-init,called "enabled".
If the marker file doesn't exist,the hook script will not generate "/run/cloud-init/dhclient.hooks" folder and files in this folder.And the marker file is generated by a systemd generator "/usr/lib/systemd/system-generators/cloud-init-generator".
From 0.7.9-5, we remove the generator,so the marker will not exist,and finally cause the issue.

So....in a word,
due to "/usr/lib/systemd/system-generators/cloud-init-generator" was removed 
       ==> not generate marker file "/run/cloud-init/enabled"
           ==> NetworkManager hook script will not execute dhcp hook
               ==> not generator "/run/cloud-init/dhclient.hooks"
                   ==> cloud-init can't find Azure endpoint server

the content of "/etc/NetworkManager/dispatcher.d/cloud-init-azure-hook" is below,the keypoint I have highlighted
#!/bin/sh
# This file is part of cloud-init. See LICENSE file for license information.

# This script hooks into NetworkManager(8) via its scripts
# arguments are 'interface-name' and 'action'
#
is_azure() {
    local dmi_path="/sys/class/dmi/id/board_vendor" vendor=""
    if [ -e "$dmi_path" ] && read vendor < "$dmi_path"; then
        [ "$vendor" = "Microsoft Corporation" ] && return 0
    fi
    return 1
}

is_enabled() {
    # only execute hooks if cloud-init is enabled and on azure
    [ -e /run/cloud-init/enabled ] || return 1
    is_azure
}


if is_enabled; then
    case "$1:$2" in
        *:up) exec cloud-init dhclient-hook up "$1";;
        *:down) exec cloud-init dhclient-hook down "$1";;
    esac
fi


### Why do we remove the generator?
refer this link: https://bugzilla.redhat.com/show_bug.cgi?id=1440831


Version-Release number of selected component (if applicable): 
cloud-init-0.7.9-5.el7.x86_64.rpm(the latest build is 0.7.9-8) 

RHEL Version: 
RHEL-7.4

How reproducible: 
100%

Steps to Reproduce: 
1. Prepare a running VM in Azure. Install cloud-init-0.7.9-8.el7.x86_64.rpm
2. Add a new user and this user must authenticate by keypair(not use password),and this user have sudo privilege.
    (This step can ensure you can successfully login the VM,even though provision process failed)
3. systemctl enable cloud-{init,init-local,config,final}
4. Change /etc/waagent.conf, set Provisioning.Enabled=n, Provisioning.UseCloudInit=y
5. Deprovision this VM use WALA,and use this VM as a template to create a new VM
6. After provision finishing,login the VM
7. Check if cloud-init does provisioning successfully

Actual results: 
cloud-init doesn't successfully provision(cloud-init can't find Azure endpoint server,and use fallback value "DataSourceNone")

......
2017-06-09 11:09:38,303 - azure.py[INFO]: Registering with Azure...
2017-06-09 11:09:38,303 - azure.py[DEBUG]: Finding Azure endpoint...
2017-06-09 11:09:38,304 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg (quiet=False)
2017-06-09 11:09:38,304 - util.py[DEBUG]: Read 1150 bytes from /etc/cloud/cloud.cfg
2017-06-09 11:09:38,304 - util.py[DEBUG]: Attempting to load yaml from string of length 1150 with allowed root types (<type 'dict'>,)
2017-06-09 11:09:38,325 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg.d/05_logging.cfg (quiet=False)
2017-06-09 11:09:38,325 - util.py[DEBUG]: Read 1821 bytes from /etc/cloud/cloud.cfg.d/05_logging.cfg
2017-06-09 11:09:38,325 - util.py[DEBUG]: Attempting to load yaml from string of length 1821 with allowed root types (<type 'dict'>,)
2017-06-09 11:09:38,334 - util.py[DEBUG]: Attempting to load yaml from string of length 0 with allowed root types (<type 'dict'>,)
2017-06-09 11:09:38,334 - util.py[DEBUG]: load_yaml given empty string, returning default
2017-06-09 11:09:38,335 - azure.py[DEBUG]: /run/cloud-init/dhclient.hooks not found.
2017-06-09 11:09:38,335 - azure.py[DEBUG]: Unable to find endpoint in dhclient logs.  Falling back to check lease files
2017-06-09 11:09:38,336 - azure.py[DEBUG]: Looking for endpoint in lease file /var/lib/dhcp/dhclient.eth0.leases
2017-06-09 11:09:38,336 - util.py[DEBUG]: Reading from /var/lib/dhcp/dhclient.eth0.leases (quiet=False)
......
2017-06-09 18:05:38,819 - DataSourceAzure.py[INFO]: Error communicating with Azure fabric; assume we aren't on Azure.
......


Expected results: 
cloud-init successfully provision

Additional info:
Comment 5 Yuhui Jiang 2017-06-19 04:32:41 EDT
I have changed a wrong status previous,make appologize for this.And I have rolled back the status.

I use the new scratch build(https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13443396) to verify on RHEL-7.4-20170616.3

Below are my steps,

1. Install cloud-init-0.7.9-9.el7.x86_64.rpm in RHEL7.4 on Azure Platform
   And install WALA-2.2.12 in this VM
2. Add a new user (authenticate by keypair,and this user have sudo privilege).
3. systemctl enable cloud-{init,init-local,config,final}
4. Change /etc/waagent.conf, set Provisioning.Enabled=n, Provisioning.UseCloudInit=y
5. Deprovision this VM use WALA,and use this VM as a template to create a new VM
6. After provision finishing,login the VM
7. Check /var/log/cloud-init.log

And now cloud-init successfully do provision process.
Thanks!
Comment 6 Vratislav Hutsky 2017-06-19 08:56:19 EDT
I'm not entirely sure about this sentence in the previous comment:

"And now cloud-init successfully do provision process."

Does this mean that you were able to verify the issue?

Thank you for clarification.
Comment 7 Yuhui Jiang 2017-06-19 22:44:21 EDT
Hi Vratislav,
yeah, I have verified.The new scratch build(0.7.9-9) has resolved the issue.
Sorry for my previsous comment description to make you confused.
Thanks!
Comment 9 Yuhui Jiang 2017-06-27 02:52:31 EDT
Verified this bug pass on RHEL-7.4-20170621.0 with cloud-init-0.7.9-9.el7.x86_64.rpm on Azure Platform.

cloud-init can successfully find endpoint on Azure Platform.
Comment 10 errata-xmlrpc 2017-08-01 19:23:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2275

Note You need to log in before you can comment on or make changes to this bug.