Bug 1734100 - [upi-vmware] bootstrap vm hangs at first boot
Summary: [upi-vmware] bootstrap vm hangs at first boot
Keywords:
Status: CLOSED EOL
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Kathryn Alexander
QA Contact: liujia
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-29 15:44 UTC by Mario Abajo
Modified: 2020-05-18 06:56 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-18 06:56:25 UTC
Target Upstream Version:
jiajliu: needinfo-


Attachments (Terms of Use)

Description Mario Abajo 2019-07-29 15:44:20 UTC
Description of problem:

The bootstrap vm hangs in the boot process as if no ignition data were supplied.
Even supplying "guestinfo.ignition.config.data" and "guestinfo.ignition.config.data.encoding" in the configuration parameters of the VM doesn't make any difference.


Version-Release number of selected component (if applicable): 
[root@bootstrap ~]# cat /etc/os-release 
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="410.8.20190520.0"
VERSION_ID="4.1"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.1"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.1"
OSTREE_VERSION=410.8.20190520.0


How reproducible:

Trying to reproduce the issue in a lab i found a way to reproduce the error.
What i found after some test is this:
(first case, result OK)
- Deploy manually the rhcos-4.1.0-x86_64-vmware.ova file
- Supply "ignition config data" and "ignition config data encode" as requested
- Adjust hardware specs
- power on
- bootstrap vm deploys correctly (good)
- looking at the configuration parameters there is nothing similar to "guestinfo.ignition.config.data" defined

(second case, result FAIL)
- Deploy manually the rhcos-4.1.0-x86_64-vmware.ova file
- Avoid supplying "ignition config data" and "ignition config data encode" when requested.
- Adjust hardware specs
- Add "ignition config data" and "ignition config data encode" as configuration parameters as mentioned in the documentation chapter 4.1.9, point 7.vii of the procedure and in the CoreOS doc (https://coreos.com/os/docs/latest/booting-on-vmware.html)
- power on
- Vm hangs indefinitely

Looking inside the correctly deployed boostrap vm i manage to found the ignition configuration data this way:

[root@bootstrap ~]# vmtoolsd --cmd 'info-get guestinfo.ovfEnv'
<Environment oe:id="" xmlns="http://schemas.dmtf.org/ovf/environment/1" xmlns:oe="http://schemas.dmtf.org/ovf/environment/1" xmlns:ve="http://www.vmware.com/schema/ovfenv" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><PropertySection><Property oe:key="guestinfo.ignition.config.data" oe:value="ewogImlnbml0aW9uIjogewogICAiY29uZmlnIjogewogICAgICJhcHBlbmQiOiBbCiAgICAgICB7CiAgICAgICAgICJzb3VyY2UiOiAiaHR0cDovLzE5Mi4xNjguMi4xOjgwODAvYm9vdHN0cmFwLmlnbiIsICAKICAgICAgICAgInZlcmlmaWNhdGlvbiI6IHt9CiAgICAgICB9CiAgICAgXQogICB9LAogICAidGltZW91dHMiOiB7fSwKICAgInZlcnNpb24iOiAiMi4xLjAiCiB9LAogIm5ldHdvcmtkIjoge30sCiAicGFzc3dkIjoge30sCiAic3RvcmFnZSI6IHt9LAogInN5c3RlbWQiOiB7fQp9Cg=="/><Property oe:key="guestinfo.ignition.config.data.encoding" oe:value="base64"/></PropertySection></Environment>


(third case, result OK)
- Deploy manually the rhcos-4.1.0-x86_64-vmware.ova file
- Avoid supplying "ignition config data" and "ignition config data encode" when requested.
- Adjust hardware specs
- Add additional configuration parameter "guestinfo.ovfEnv" via ansible (couldn't do it with th einterface) with this data:

<Environment oe:id="" xmlns="http://schemas.dmtf.org/ovf/environment/1" xmlns:oe="http://schemas.dmtf.org/ovf/environment/1" xmlns:ve="http://www.vmware.com/schema/ovfenv" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><PropertySection><Property oe:key="guestinfo.ignition.config.data" oe:value="ewogImlnbml0aW9uIjogewogICAiY29uZmlnIjogewogICAgICJhcHBlbmQiOiBbCiAgICAgICB7CiAgICAgICAgICJzb3VyY2UiOiAiaHR0cDovLzE5Mi4xNjguMi4xOjgwODAvYm9vdHN0cmFwLmlnbiIsICAKICAgICAgICAgInZlcmlmaWNhdGlvbiI6IHt9CiAgICAgICB9CiAgICAgXQogICB9LAogICAidGltZW91dHMiOiB7fSwKICAgInZlcnNpb24iOiAiMi4xLjAiCiB9LAogIm5ldHdvcmtkIjoge30sCiAicGFzc3dkIjoge30sCiAic3RvcmFnZSI6IHt9LAogInN5c3RlbWQiOiB7fQp9Cg=="/><Property oe:key="guestinfo.ignition.config.data.encoding" oe:value="base64"/></PropertySection></Environment>

- power on
- bootstrap vm deploys correctly (good)


Expected results:

As indicated in the documentation, the bootstrap vm should boot by defining the configuration parameters "guestinfo.ignition.config.data" and "guestinfo.ignition.config.data.encoding".

Additional info:

Comment 1 Steve Milner 2019-07-29 16:42:24 UTC
Mario,

Thanks for the report. Has this been brought up with the UPI folks already?

Comment 2 Mario Abajo 2019-07-30 07:32:14 UTC
(In reply to Steve Milner from comment #1)
> Mario,
> 
> Thanks for the report. Has this been brought up with the UPI folks already?

Hello Steve,

No, not yet, but if you can give me someone to contact i will appreciate it. Thanks

Comment 21 Kathryn Alexander 2019-09-18 16:30:34 UTC
His Liu, will you please confirm the PR that Davis opened?

Comment 22 liujia 2019-09-19 07:51:36 UTC
Follow up method 2 added in pr16738, add compute-1 node to an existed cluster successfully.

# oc get node
NAME              STATUS   ROLES    AGE     VERSION
compute-0         Ready    worker   3h45m   v1.14.6+194d29900
compute-1         Ready    worker   5m23s   v1.14.6+194d29900
control-plane-0   Ready    master   3h45m   v1.14.6+194d29900

Comment 23 Kathryn Alexander 2019-09-20 15:10:37 UTC
Thank you! I've merged the change and am waiting for it to go live.

Comment 25 Mario Abajo 2019-11-20 13:42:06 UTC
Hi I have to reopen this case as the issue still persist in OCP 4.2 with vsphere 6.5u3
See case 02522835.
The solution of adding the "guestinfo.ovfEnv" manage to solve the issue.

Comment 26 Kathryn Alexander 2019-11-22 16:40:47 UTC
Davis, do you agree that adding the "guestinfo.ovfEnv" parameter should be required?

Comment 27 davis phillips 2019-11-25 15:40:39 UTC
Its pretty strange, because I've not been able to replicate this issue. 

There was another github issue:
https://github.com/openshift/installer/issues/2537

This is from a configuration deployed via the UPI processed highlighted in the blog post I wrote earlier this year. 

vmtoolsd --cmd 'info-get guestinfo.ovfEnv'

   <PropertySection>
         <Property oe:key="guestinfo.ignition.config.data" oe:value="eyJpZ25pdGlvbiI6eyJjb25maWciOnsiYXBwZW5kIjpbeyJzb3VyY2UiOiJkYXRhOnRleHQvcGxhaW47Y2hhcnNldD11dGYtODtiYXNlNjQsZXlKcFoyNXBkR2x2YmlJNmV5SmpiMjVtYVdjaU9uc2lZWEJ3Wlc1a0lqcGJleUp6YjNWeVkyVWlPaUpvZEhSd2N6b3ZMMkZ3YVMxcGJuUXVkWEJwTG1VeVpTNWliM011Y21Wa2FHRjBMbU52YlRveU1qWXlNeTlqYjI1bWFXY3ZiV0Z6ZEdWeUlpd2lkbVZ5YVdacFkyRjBhVzl1SWpwN2ZYMWRmU3dpYzJWamRYSnBkSGtpT

..omitted..

So, the "guestinfo.ovfEnv" should be created from the vapp properties applied via each deployed template. Would it make more sense to add some information to a troubleshooting section?

Comment 28 Mario Abajo 2019-11-26 09:32:49 UTC
My two cents on this; up to my knowledge what ignition do to load the ignition data [1] is to read the "ovfenv" parameter, not the "ignition.config.data" as stated in the doc, so we rely on some component of vsphere to fill "ovfenv" with "ignition.config.data" and "ignition.config.data.encoding" parameters but for unknown reason (at least for me) this doesn't always happens. I think that it would be better to not depend on a third party for this process, so we should read the same parameters that we fill.

[1] https://github.com/coreos/ignition/blob/befbc8677cc44b8ec089cfc7c5bfe015cfed88cd/internal/providers/vmware/vmware_amd64.go#L65

Comment 29 Kathryn Alexander 2019-12-04 16:37:12 UTC
Jia, does Mario's comment 28 align with your test, and do we need to pass guestinfo.ovfEnv?

Davis, we have not created the installation troubleshooting section yet, but if this is not currently required, I can file it as a good addition to the future troubleshooting docs.

Comment 30 liujia 2019-12-05 09:22:44 UTC
I did not hit the issue during past v4.1/v4.2/v4.3 upi/vsphere tests. And I have a try for bootstrap node launch again following https://docs.openshift.com/container-platform/4.2/installing/installing_vsphere/installing-vsphere.html step by step. Still works well with two ways.

1. Genetate instal-config/manifests/ignition files(base64 encode).
2. Upload ignition file to s3 and prepare rhcos template(rhcos-42.80.20191002.0 on vsphere
3. Clone vm for bootstrap node as the 1st way in step7 of above doc(https://docs.openshift.com/container-platform/4.2/installing/installing_vsphere/installing-vsphere.html#installation-vsphere-machines_installing-vsphere)
'''
On the Customize hardware tab, click VM Options → Advanced.

From the Latency Sensitivity list, select High.

Click Edit Configuration, and on the Configuration Parameters window, click Add Configuration Params. Define the following parameter names and values:

guestinfo.ignition.config.data: Paste the contents of the base64-encoded Ignition config file for this machine type.

guestinfo.ignition.config.data.encoding: Specify base64.

disk.EnableUUID: Specify TRUE.
'''
After above steps(before the vm powoff), checked that [configure]-[Settings]-[vApp Options] page
-In "OVF Settings" section, it is disabled with "The OVF enviroment is only available when the vm is powered on".

Power on above vm, bootstrap node start successfully, and now OVF enviroment can be checked from above page.
<PropertySection>
         <Property oe:key="guestinfo.ignition.config.data" oe:value=""/>
         <Property oe:key="guestinfo.ignition.config.data.encoding" oe:value=""/>
</PropertySection>

And ssh into the bootstrap node successully.

4. Power off above bootsrap vm.
5.  Clone vm for bootstrap node as the 2st way in step7 of above doc(https://docs.openshift.com/container-platform/4.2/installing/installing_vsphere/installing-vsphere.html#installation-vsphere-machines_installing-vsphere)
'''
On the Customize hardware tab, click VM Options → Advanced.

From the Latency Sensitivity list, select High.

Alternatively, prior to powering on the virtual machine add via vApp properties:

Navigate to a virtual machine from the vCenter Server inventory.

On the Configure tab, expand Settings and select vApp options.

Scroll down and under Properties apply the configurations from above.

'''
After above steps, there are 3 items added into [configure]-[Settings]-[vApp Options]-[Properties] page, and 
-In "OVF Settings" section, it is disabled with "The OVF enviroment is only available when the vm is powered on".

Power on above vm, bootstrap node start successfully, and now OVF enviroment can be checked from above page.
<PropertySection>
         <Property oe:key="disk.EnableUUID" oe:value="TRUE"/>
         <Property oe:key="guestinfo.ignition.config.data" oe:value="xxxxxxx"/>
         <Property oe:key="guestinfo.ignition.config.data.encoding" oe:value="base64"/>
</PropertySection>
And ssh into the bootstrap node successully.


Note You need to log in before you can comment on or make changes to this bug.