Bug 1698073 - Sometimes vsphere compute and control-plane nodes come up without open-vm-tools
Summary: Sometimes vsphere compute and control-plane nodes come up without open-vm-tools
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Steve Milner
QA Contact: Micah Abbott
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-09 14:42 UTC by Hemant Kumar
Modified: 2023-09-14 05:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-07 13:53:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Hemant Kumar 2019-04-09 14:42:29 UTC
It looks like 2 installs that use same vsphere template (rhcos-latest) come up with 2 different versions of RHCOS on compute and control-plane nodes. Bootstrap node comes up correctly, it is the compute and control-plane nodes that are broken.

Comment 1 davis phillips 2019-04-09 15:13:15 UTC
The machine-os-content for the release is pointing towards: 410.8.20190329.0.

This particular build didn't include open-vm-tools. 

I have tested:
"410.8.20190401.0" and "410.8.20190330.0" both of these subsequent builds include open-vm-tools. This image is a one off it seems.

rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
  pivot://registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-08-195526@sha256:83e25d1681c42be4e9257e288d79d3d3591301e1d5897caff96cecc38d7878f0
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190329.0 (2019-03-29T16:35:54Z)

● pivot://docker-registry-default.cloud.registry.upshift.redhat.com/redhat-coreos/ootpa@sha256:4582398a53ad79e7401be70df54c2ab0afb5543897752def56360b1d4f2d3bd4
              CustomOrigin: Provisioned from oscontainer
                   Version: 410.8.20190326.0 (2019-03-26T15:44:56Z)

Comment 2 Hemant Kumar 2019-04-09 16:54:29 UTC
@davis - I am still confused why 2 deployments (30 mins apart) come with different versions of RHCOS on compute and control-plane nodes. I understand it picked slightly older version of image but why?

Comment 3 davis phillips 2019-04-09 19:18:56 UTC
The initial boot image is replaced by the machine-os-content supplied by the installer:

This happens after ETCD establishes quorum but before kubelet.service starts.

[Unit]
Description=Pivot Tool
ConditionPathExists=/etc/pivot/image-pullspec
After=ignition-firstboot-complete.service
Before=kubelet.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/usr/bin/pivot

[Install]
WantedBy=multi-user.target

The bootstrap node's bootkube.sh service pulls the release image, which contains a reference to the MCO (machine-config-operator) and also a reference to a newer machine-os-content. The bootkube.sh service runs the MCO in "bootstrap" mode to generate and serve Ignition to the master machines.

https://github.com/openshift/machine-config-operator/blob/9da96326a5ff737869709f5fa2e6c716df4dbaf4/docs/OSUpgrades.md

For a while the machine-os-content was pointing towards the 0329 image which is the only one that didn't include open-vm-tools. The subsequent releases does include them. I've tested and verified as much.

Comment 4 davis phillips 2019-04-09 21:42:58 UTC
I'm going to test now. But, I believe the image has been updated.

[dphillip@control-plane-1 ~]$ skopeo inspect --authfile=~/.docker/config.json docker://registry.svc.ci.openshift.org/rhcos/machine-os-content:latest
{
    "Name": "registry.svc.ci.openshift.org/rhcos/machine-os-content",
    "Tag": "latest",
    "Digest": "sha256:ca664d88674d930afd6d727d0d6242668fe3e274f077abbe9d7375854b1bf788",
    "RepoTags": [
        "latest"
    ],
    "Created": "2019-04-08T22:07:35.285453782Z",
    "DockerVersion": "",
    "Labels": {
        "com.coreos.ostree-commit": "9df99f7dd9e11ba06ef83b006cfe256b4b9ab1e2acc30f47f5a6a4979f6d5f20",
        "version": "410.8.20190408.1"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:4dd39f488cc2725f4711e0a0d66a986fec0aa282bc899633491955824f0551e3"
    ]
}

Comment 5 davis phillips 2019-04-09 22:07:14 UTC
It looks like we may be good to go. 

[core@control-plane-0 ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-09-210205@sha256:ca664d88674d930afd6d727d0d6242668fe3e274f077abbe9d7375854b1bf788
              CustomOrigin: Managed by pivot tool
                   Version: 410.8.20190408.1 (2019-04-08T21:44:07Z)

  pivot://docker-registry-default.cloud.registry.upshift.redhat.com/redhat-coreos/ootpa@sha256:4582398a53ad79e7401be70df54c2ab0afb5543897752def56360b1d4f2d3bd4
              CustomOrigin: Provisioned from oscontainer
                   Version: 410.8.20190326.0 (2019-03-26T15:44:56Z)
[core@control-plane-0 ~]$ systemctl status vmtoolsd
● vmtoolsd.service - Service for virtual machines hosted on VMware
   Loaded: loaded (/usr/lib/systemd/system/vmtoolsd.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-04-09 22:03:58 UTC; 2min 18s ago
     Docs: http://github.com/vmware/open-vm-tools
 Main PID: 887 (vmtoolsd)
    Tasks: 1 (limit: 26213)
   Memory: 4.2M
      CPU: 52ms
   CGroup: /system.slice/vmtoolsd.service
           └─887 /usr/bin/vmtoolsd

Comment 6 Steve Milner 2019-04-10 13:13:08 UTC
> It looks like we may be good to go. 

Agreed. With the latest images/payload you should be good. Will wait to hear Hemant's findings before resolving this bug.

Comment 7 Steve Milner 2019-05-06 15:57:54 UTC
Hemant,

Is this still an issue?

Comment 8 Red Hat Bugzilla 2023-09-14 05:26:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.