Hide Forgot
Description of problem: When configuring openstack provider on openshift regarding this doc https://docs.openshift.com/enterprise/3.1/install_config/configuring_openstack.html we have this issue on the node side preventing the node to be ready from the master prespective [openshift@plop-master-0 ~]$ oc get nodes NAME LABELS STATUS AGE plop-node-compute-0.novalocal kubernetes.io/hostname=plop-node-compute-0.novalocal,type=compute NotReady 20m plop-node-compute-1.novalocal kubernetes.io/hostname=plop-node-compute-1.novalocal,type=compute Ready 20m In the node log, I can see that (I setted a loglevel to 8 on the node, trying to get max debug info) 3:33:38.671650 10746 config.go:98] Looking for [api], have seen map[api:{}] 13:33:38.671693 10746 kubelet.go:2149] SyncLoop (housekeeping) 13:33:45.135192 10746 openstack.go:251] openstack.Instances() called 13:33:45.179711 10746 openstack.go:288] Found 6 compute flavors 13:33:45.179741 10746 openstack.go:289] Claiming to support Instances 13:33:45.522736 10746 kubelet.go:969] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: Failed to find object Version-Release number of selected component (if applicable): oot@plop-node-compute-0 ~]# openshift version openshift v3.1.1.6-16-g5327e56 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 How reproducible: Steps to Reproduce: 1.Openstack iaas with default configuration 2. Configure openshift node with openstack cloud provider 3. Actual results: Node is NotReady 3:33:38.671650 10746 config.go:98] Looking for [api], have seen map[api:{}] 13:33:38.671693 10746 kubelet.go:2149] SyncLoop (housekeeping) 13:33:45.135192 10746 openstack.go:251] openstack.Instances() called 13:33:45.179711 10746 openstack.go:288] Found 6 compute flavors 13:33:45.179741 10746 openstack.go:289] Claiming to support Instances 13:33:45.522736 10746 kubelet.go:969] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: Failed to find object Expected results: Node is ready Additional info: and seems to be related to the fact that kubelet is quering openstack with the kube node name with in this form plop-node-compute-0.novalocal, but nova list has instances names without .novalocal domain There was some work done related to the issue here https://bugzilla.redhat.com/show_bug.cgi?id=1273739 but it seems to not recover all the issue sides
What do you have set for nodeName in node-config.yaml? Have you tried specifying "plop-node-compute-0" for that value?
nodeName is plop-node-compute-0.novalocal it is configured by openshift-ansible by this nodeName: {{ openshift.common.hostname | lower }} behind the error, we have this https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L1069 https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/openstack/openstack.go#L462 Is that the ExternalId method should rather call for the metada server to get the instance Id and not querying nova if we need the instanceId, it is given here by https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/openstack/openstack.go#L177
I find this PR but it seems that openstack integration is working for openshift entreprise 3.2 https://github.com/openshift/openshift-ansible/pull/883/files
Applied this patch to openshift: diff --git a/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go b/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go index 5737867..a8337a4 100644 --- a/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go +++ b/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go @@ -463,6 +463,11 @@ func (i *Instances) NodeAddresses(name string) ([]api.NodeAddress, error) { // ExternalID returns the cloud provider ID of the specified instance (deprecated). func (i *Instances) ExternalID(name string) (string, error) { + str, err := readInstanceID() + if err == nil { + glog.V(4).Infof("(ExternalID) instance-id is ", str) + return str, err + } srv, err := getServerByName(i.compute, name) if err != nil { return "", err @@ -472,6 +477,11 @@ func (i *Instances) ExternalID(name string) (string, error) { // InstanceID returns the cloud provider ID of the specified instance. func (i *Instances) InstanceID(name string) (string, error) { + str, err := readInstanceID() + if err == nil { + glog.V(4).Infof("(InstanceID) instance-id is ", str) + return "/" + str, err + } srv, err := getServerByName(i.compute, name) if err != nil { return "", err It gets past the E0330 12:13:52.145314 4710 kubelet.go:969] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: Failed to find object But hit a downstream problem at E0330 12:47:44.254257 10381 kubelet.go:1131] Unable to construct api.Node object for kubelet: failed to get node address from cloud provider: Failed to find object The address lookup still happens by name. Basically all functions that do getServerByName() are going to have an issue, which include getAddressesByName()
So I have a potential fix here https://github.com/sjenning/kubernetes/commit/21188755852cb352a87a4789171f74b44bc007cd The underlying assumption is that all functions that call readInstanceID() are getting information on the node that is running the code (not how readInstanceID() takes no name argument) A potentially better fix, but more code change, would be to cache the instance id in a local struct that implements Instances, then we don't have to call readInstanceID() all the time.
Just saw the IRC ping. Scott Dodson is correct that the SDN will use the nodeName for resolving the IP address of the node (unless the NodeIP value is set, but that causes other issues, so we usually try to avoid setting it). Ideally, all of the cloud providers would use the metadata service for determining the host's id rather than using the value of nodeName, since that would avoid the requirement that the node name matches the value being queried for getting the instance information. I'm not sure what all information is needed for populating the node config, but I suspect most of it should be available directly from the metadata service without having to query the cloud API as well.
re: irc conversation on openshift-dev, it is late in the game for a code change in kube to workaround this. It is, in fact, in the documentation that the nodeName need to match the openstack instance name. The installer just doesn't do this right now and requires user editing of the node-config.yaml. This is an inconvenience but not a bug, per say. I think there is agreement to let this be for 3.2 and look at improving the installer to use the instance name for nodeName in the future. If no one objects, I'll close this.
Oops, meant to leave this open for now awaiting objections
If this bug is getting closed, it would be nice to get a trello card added for tracking the upstream behavior change.
Here is the card. Feel free to expand on it. https://trello.com/c/dyHpMQw9/335-as-a-user-i-want-to-the-installer-to-configure-for-the-openstack-cloudprovider-without-having-to-manually-edit-the-node-configs
*** Bug 1375422 has been marked as a duplicate of this bug. ***
(In reply to Seth Jennings from comment #7) > re: irc conversation on openshift-dev, it is late in the game for a code > change in kube to workaround this. It is, in fact, in the documentation > that the nodeName need to match the openstack instance name. Based on that I found that we need to have fqdns as instance names, because I am unable to find out any other inventory variables (openshift_hostname and openshift_public_hostname) setting such that openshift-ansible would deploy fine the nodes and being openstack-aware. With the fqdns as instance names it works. > The installer > just doesn't do this right now and requires user editing of the > node-config.yaml. This is an inconvenience but not a bug, per say. > > I think there is agreement to let this be for 3.2 and look at improving the > installer to use the instance name for nodeName in the future. > > If no one objects, I'll close this.