Description of problem: After a successful installation of openshift 3.7 into a basic HA cluster hosted on vSphere 6.0, we attempted to enable persistent storage per the documentation at https://docs.openshift.com/container-platform/3.7/install_config/configuring_vsphere.html After placing the vsphere.conf file into all the nodes, and adding the appropriate items to the master-config.yaml and node-config.yaml, restarting the atomic-openshift-node service resulted in a failure. Version-Release number of selected component (if applicable): # yum list installed atomic* Loaded plugins: langpacks, product-id, protectbase, search-disabled-repos 0 packages excluded due to repository protections Installed Packages atomic-openshift.x86_64 3.7.23-1.git.5.83efd71.el7 @rhel-7-server-ose-3.7-rpms atomic-openshift-clients.x86_64 3.7.23-1.git.5.83efd71.el7 @rhel-7-server-ose-3.7-rpms atomic-openshift-docker-excluder.noarch 3.7.23-1.git.5.83efd71.el7 @rhel-7-server-ose-3.7-rpms atomic-openshift-excluder.noarch 3.7.23-1.git.5.83efd71.el7 @rhel-7-server-ose-3.7-rpms atomic-openshift-master.x86_64 3.7.23-1.git.5.83efd71.el7 @rhel-7-server-ose-3.7-rpms atomic-openshift-node.x86_64 3.7.23-1.git.5.83efd71.el7 @rhel-7-server-ose-3.7-rpms atomic-openshift-sdn-ovs.x86_64 3.7.23-1.git.5.83efd71.el7 @rhel-7-server-ose-3.7-rpms atomic-openshift-utils.noarch 3.7.23-1.git.0.bc406aa.el7 @rhel-7-server-ose-3.7-rpms atomic-registries.x86_64 1:1.22.1-1.gitd36c015.el7 @rhel-7-server-extras-rpms How reproducible: Steps to Reproduce: 1. Perform advanced install OCP 3.7 without configuring persistent storage 2. Place /etc/origin/cloudprovider/vsphere.conf file on all nodes 3. Edit /etc/origin/master/master-config.yaml to add cloud-provider/cloud-config arguments on master nodes 4. Edit /etc/origin/node/node-config.yaml to add cloud-provider/cloud-config arguments on all nodes 5. Restart atomic-openshift-master-api and atomic-openshift-master-controllers services on masters 6. Restart atomic-openshift-node services on nodes Actual results: atomic-openshift-node service fails to start. Expected results: atomic-openshift-node services starts and OCP cluster becomes operational with vSphere backed persistent storage. Error Log from service: [root@dca-hd-osm-01 ~]# systemctl start atomic-openshift-node.service Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details. [root@dca-hd-osm-01 ~]# journalctl -xe Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-master-controllers[2234]: W0403 13:37:38.213966 2234 reflector.go:343] github.com/openshift/origin/pkg/apps/generated/informers/in Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:38.531540 26857 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/det Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:38.531594 26857 vsphere.go:690] The vSphere cloud provider does not support zones Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:38.537568 26857 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node dca-hd- Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:38.537649 26857 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node dca-h Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:38.537681 26857 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node dca-hd- Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:38.537704 26857 kubelet_node_status.go:82] Attempting to register node dca-hd-osm-01 Apr 03 13:37:38 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:38.542476 26857 kubelet_node_status.go:106] Unable to register node "dca-hd-osm-01" with API server: nodes Apr 03 13:37:39 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-master-api[2142]: I0403 13:37:39.219226 2142 rest.go:362] Starting watch for /apis/apps.openshift.io/v1/deploymentconfigs, rv=1916 Apr 03 13:37:39 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:39.995001 26857 sdn_controller.go:48] Could not find an allocated subnet for node: dca-hd-osm-01.int.dca.c Apr 03 13:37:40 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:40.326744 26857 kubelet.go:1854] SyncLoop (ADD, "api"): "" Apr 03 13:37:40 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:40.340769 26857 pod_container_deletor.go:77] Container "8114cab0b7ce141fc62113244159930c44fee6e937e2e00c6b Apr 03 13:37:40 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:40.979799 26857 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d Apr 03 13:37:40 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:40.980033 26857 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPlu Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:41.743783 26857 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/det Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:41.744533 26857 vsphere.go:690] The vSphere cloud provider does not support zones Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:41.792712 26857 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node dca-hd- Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:41.792764 26857 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node dca-h Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:41.792802 26857 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node dca-hd- Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:41.792875 26857 kubelet_node_status.go:82] Attempting to register node dca-hd-osm-01 Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:41.798231 26857 kubelet_node_status.go:106] Unable to register node "dca-hd-osm-01" with API server: nodes Apr 03 13:37:43 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:43.375967 26857 sdn_controller.go:48] Could not find an allocated subnet for node: dca-hd-osm-01.int.dca.c Apr 03 13:37:44 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-master-api[2142]: I0403 13:37:44.972218 2142 rest.go:362] Starting watch for /api/v1/pods, rv=859 labels= fields= timeout=9m17s Apr 03 13:37:45 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:45.982591 26857 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d Apr 03 13:37:45 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:45.982762 26857 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPlu Apr 03 13:37:46 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:46.025411 26857 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'dca-hd-os Apr 03 13:37:46 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-master-api[2142]: I0403 13:37:46.508085 2142 rest.go:362] Starting watch for /apis/rbac.authorization.k8s.io/v1beta1/rolebindings, Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:48.198428 26857 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/det Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:48.198487 26857 vsphere.go:690] The vSphere cloud provider does not support zones Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:48.218214 26857 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node dca-hd- Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:48.218261 26857 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node dca-h Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:48.218281 26857 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node dca-hd- Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:48.218308 26857 kubelet_node_status.go:82] Attempting to register node dca-hd-osm-01 Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:48.222760 26857 kubelet_node_status.go:106] Unable to register node "dca-hd-osm-01" with API server: nodes Apr 03 13:37:48 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:48.444719 26857 sdn_controller.go:48] Could not find an allocated subnet for node: dca-hd-osm-01.int.dca.c Apr 03 13:37:50 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:50.988624 26857 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d Apr 03 13:37:50 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:50.988796 26857 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPlu Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:55.222975 26857 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/det Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:55.223728 26857 vsphere.go:690] The vSphere cloud provider does not support zones Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:55.247316 26857 kubelet_node_status.go:433] Recording NodeHasSufficientDisk event message for node dca-hd- Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:55.247755 26857 kubelet_node_status.go:433] Recording NodeHasSufficientMemory event message for node dca-h Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:55.248270 26857 kubelet_node_status.go:433] Recording NodeHasNoDiskPressure event message for node dca-hd- Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: I0403 13:37:55.248679 26857 kubelet_node_status.go:82] Attempting to register node dca-hd-osm-01 Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:55.257465 26857 kubelet_node_status.go:106] Unable to register node "dca-hd-osm-01" with API server: nodes Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:55.992945 26857 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d Apr 03 13:37:55 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:55.994398 26857 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPlu Apr 03 13:37:56 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:56.063436 26857 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'dca-hd-os Apr 03 13:37:56 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: W0403 13:37:56.065879 26857 sdn_controller.go:48] Could not find an allocated subnet for node: dca-hd-osm-01.int.dca.c Additional info: OCP Inventory File: [OSEv3:vars] ############################################################################################ ### Ansible Vars ############################################################################################ timeout=60 ansible_become=yes ansible_ssh_user=ottoman ############################################################################################ ###Openshift Basic Vars ############################################################################################ openshift_release=v3.7 openshift_deployment_type=openshift-enterprise openshift_disable_check="disk_availability,memory_availability,package_version,docker_image_availability" openshift_https_proxy=http://159.145.7.103:8080/ openshift_http_proxy=http://159.145.7.103:8080/ openshift_no_proxy='127.0.0.1,localhost,.int.dca.ca.gov,10.64.0.0/14,172.30.0.0/16' ##Enabling Metrics and Logging openshift_metrics_install_metrics=true #openshift_metrics_storage_kind=dynamic openshift_logging_install_logging=true #openshift_logging_storage_kind=dynamic # LDAP auth openshift_master_identity_providers=[{'name': 'Active Directory', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['mail']}, 'bindDN': 'CN=XXXX\, XXXXXXX@DCA,OU=BusinessFunction,DC=dca,DC=ca,DC=gov', 'bindPassword': 'XXXXXXX', 'ca': '/etc/origin/master/ldap-ca.crt', 'insecure': 'false', 'url': 'ldaps://XXXXXXXXXX.dca.ca.gov:636/OU=HQ,DC=dca,DC=ca,DC=gov?mail'},{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}] openshift_master_htpasswd_users={'ottoman': '$apr1$ei17YaHy$6MUhs2t6oFQlb2HsJqqKk/'} openshift_master_ldap_ca_file=/home/ottoman/repos/openShift/ldap/ldap-ca.crt ############################################################################################# #### Openshift Networking ############################################################################################## os_sdn_network_plugin_name='redhat/openshift-ovs-subnet' osm_cluster_network_cidr=10.64.0.0/14 openshift_master_cluster_method=native openshift_master_cluster_hostname=dca-hp-hqosm.int.dca.ca.gov openshift_master_cluster_public_hostname=dca-hp-hqosm.int.dca.ca.gov openshift_master_default_subdomain=hqos.int.dca.ca.gov ############################################################################################ ### OpenShift Hosts ############################################################################################ [OSEv3:children] nodes masters etcd [masters] dca-hd-osm-[01:03].int.dca.ca.gov [etcd] dca-hd-osm-[01:03].int.dca.ca.gov [nodes] ##These are the application nodes dca-hd-osa-[01:03].int.dca.ca.gov openshift_node_labels="{'region': 'primary','zone': 'default'}" ##These are the infranodes dca-hd-osi-[01:02].int.dca.ca.gov openshift_node_labels="{'region': 'infra','zone': 'default'}" ##These are the masters dca-hd-osm-[01:03].int.dca.ca.gov vsphere.conf: [Global] user = "<username>" password = "<password>" server = "dca-h-vc01.breeze.ca.gov" port = 443 insecure-flag = 1 datacenter = "DCA-HQ-INT" datastore = "dca-fc-hp-pure-tier1-03" working-dir = "/DCA-HQ-INT/vm/App-Non-Prod/openshift" vm-uuid = "" [Disk] scsicontrollertype = pvscsi Workaround: I don't have the cutoff part of the logs above, but in my troubleshooting this seems to be the critical error: Apr 03 13:37:41 dca-hd-osm-01.int.dca.ca.gov atomic-openshift-node[26857]: E0403 13:37:41.798231 26857 kubelet_node_status.go:106] Unable to register node "dca-hd-osm-01" with API server: nodes The cutoff part of the message is something to the effect that "dca-hd-osm-01" is forbidden from registering node "dca-hd-osm-01.int.dca.ca.gov" Using this information and combining it with the note from the documentation ("After enabling the vSphere Cloud Provider, Node names are set to the VM names from the vCenter Inventory.") and the OCP 3.7 release notes about the change to the Node Authorizer (See: https://docs.openshift.com/container-platform/3.7/release_notes/ocp_3_7_release_notes.html#ocp-37-security) I suspected the node name change from FQDN to VMware name was not being authorized correctly. As a workaround to test this, I generated a new node configuration using the VMware name: # oc adm create-node-config \ --node-dir=/tmp/node-osm-01 \ --node=dca-hd-osm-01 \ --hostnames=dca-hd-osm-01.int.dca.ca.gov,10.130.12.52 \ --certificate-authority="/etc/origin/master/ca.crt" \ --signer-cert="/etc/origin/master/ca.crt" \ --signer-key="/etc/origin/master/ca.key" \ --signer-serial="/etc/origin/master/ca.serial.txt" \ --node-client-certificate-authority="/etc/origin/master/ca.crt" \ --master="https://dca-hp-hqosm.int.dca.ca.gov:8443" I copied the new configuration into /etc/origin/node (except for node-config.yaml), then I modified node-config.yaml to use the VMware name for the nodeName entry. After these modifications, the atomic-openshift-node service started correctly. Applying this change to all nodes in the cluster, node were added to the cluster and I was then able to add the StorageClass object using the vsphere-volume provisioner and PVCs were able to dynamically create PVs. This workaround however starts to break other things. Openshift-ansible scripts that rely on node names using the FQDN now error out since the node names are now the VMware names, NO_PROXY settings have to be adjusted for docker/OCP, etc.
I don't have vmWare at hand to check this, but all other cloud providers *require* the machine hostname to be equal to name of the VM in vmWare. Did this recommendation work: https://docs.openshift.com/container-platform/3.7/install_config/configuring_vsphere.html#vsphere-applying-configuration-changes ? If not, then we need to update it to something that works. Or we might update the docs to recommend installation OpenShift on vmWare machines already with the right hostnames. You can fix the hostnames later with some manual effort and it seems to me you did it correctly. It's not a workaround, it's the fix.
@Davis, do you have any other insights who/how should set hostnames when installing OpenShift on vmWare? Is our documentation accurate?
@Jan, yes that is correct. Upon implementing the cloud provider configuration for vSphere the hostname in OpenShift is changes to the virtual machine name. We circumvent the problem by using hostvars to assign openshift_name and set it to the inventory_hostname in ansible. Here is an example with a dynamic inventory role in Ansible: https://github.com/openshift/openshift-ansible-contrib/blob/master/reference-architecture/vmware-ansible/playbooks/roles/instance-groups/tasks/main.yaml#L35-L42 Here is an example with a static inventory in Ansible: https://gist.github.com/dav1x/b374f2e2becbeae1b6efd6c36aefd1ce The cluster installation and configuration just needs to align with what the cloud provider expects. We could probably clarify that in the docs. I'll make myself a task to do so.
I am re-assigning the bug to Docs then. BTW, should we update docs of all the other cloud providers? It's exactly the same on all of them, VM name should be the same as hostname.
Thanks everyone for the input. I can confirm that when we renamed the VM names to their matching FQDNs and reinstalled OCP, everything worked as expected. If this requirement is documented in the other cloud providers, I must have missed it and going forward I think it would be very helpful to have this called out as a requirement in the documentation for using a vSphere cloud provider.
Hi, I'd just like to report exactly the same issue. While not having tried the provided workaround I'd like to offer help if needed. We've created a fully reproducable setup with Packer, Terraform and Ansible. Kind regards, Yves
As specified above, this is more of a documentation issue. As of Openshift 3.11, the VM name requirement is now documented, but it is a little hidden. Check https://docs.openshift.com/container-platform/3.11/install_config/configuring_vsphere.html#install-config-configuring-vsphere Then scroll down to "Configuring OpenShift Container Platform to use vSphere storage" which is about 2/3 the way down the page and it will list the prerequisites. Technically this problem has now been documented, but it might be more helpful if this prerequisite is listed at the start of the page before the actual steps for configuration changes take place.
We've found the following issue: - VMware vCenter VMs have the FQDN as name (e.g. node-001.example.com) - Ansible Inventory uses the same FQDN - VMs itself act correctly by just issuing the domain part of the hostname when hostname if specified with "-f" flag I think this could be an issue as when setting the hostname inside the VM to a FQDN (and therefore make no difference between running hostname command with "-f" flag) everything works.
OCP 3.6-3.10 is no longer on full support [1]. Marking un-triaged bugs CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Version to the appropriate version where reproduced. [1]: https://access.redhat.com/support/policy/updates/openshift