Bug 1321964

Summary:	Openstack cloud provider not initialized properly
Product:	OpenShift Container Platform	Reporter:	jawed <jkhelil>
Component:	Node	Assignee:	Seth Jennings <sjenning>
Status:	CLOSED NOTABUG	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.1.0	CC:	agoldste, aos-bugs, jdetiber, jeder, jhenner, jkhelil, jokerman, mmccomas
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-04-05 17:58:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description jawed 2016-03-29 13:31:15 UTC

Description of problem:

When configuring openstack provider on openshift regarding this doc
https://docs.openshift.com/enterprise/3.1/install_config/configuring_openstack.html
we have this issue on the node side preventing the node to be ready from the master prespective

[openshift@plop-master-0 ~]$ oc get nodes
NAME                             LABELS                                                               STATUS     AGE
plop-node-compute-0.novalocal   kubernetes.io/hostname=plop-node-compute-0.novalocal,type=compute   NotReady   20m
plop-node-compute-1.novalocal   kubernetes.io/hostname=plop-node-compute-1.novalocal,type=compute   Ready      20m

In the node log, I can see that (I setted a loglevel to 8 on the node, trying to get max debug info)

3:33:38.671650   10746 config.go:98] Looking for [api], have seen map[api:{}]
13:33:38.671693   10746 kubelet.go:2149] SyncLoop (housekeeping)
13:33:45.135192   10746 openstack.go:251] openstack.Instances() called
13:33:45.179711   10746 openstack.go:288] Found 6 compute flavors
13:33:45.179741   10746 openstack.go:289] Claiming to support Instances
13:33:45.522736   10746 kubelet.go:969] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: Failed to find object



Version-Release number of selected component (if applicable):
oot@plop-node-compute-0 ~]# openshift version
openshift v3.1.1.6-16-g5327e56
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

How reproducible:


Steps to Reproduce:
1.Openstack iaas with default configuration 
2. Configure openshift node with openstack cloud provider
3.

Actual results:
Node is NotReady 

3:33:38.671650   10746 config.go:98] Looking for [api], have seen map[api:{}]
13:33:38.671693   10746 kubelet.go:2149] SyncLoop (housekeeping)
13:33:45.135192   10746 openstack.go:251] openstack.Instances() called
13:33:45.179711   10746 openstack.go:288] Found 6 compute flavors
13:33:45.179741   10746 openstack.go:289] Claiming to support Instances
13:33:45.522736   10746 kubelet.go:969] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: Failed to find object

Expected results:
Node is ready

Additional info:
and seems to be related to the fact that kubelet is quering openstack with the kube node name with in this form
plop-node-compute-0.novalocal, but nova list has instances names without .novalocal domain

There was some work done related to the issue here
https://bugzilla.redhat.com/show_bug.cgi?id=1273739

but it seems to not recover all the issue sides

Comment 1 Andy Goldstein 2016-03-29 15:10:29 UTC

What do you have set for nodeName in node-config.yaml? Have you tried specifying "plop-node-compute-0" for that value?

Comment 2 jawed 2016-03-30 15:33:29 UTC

nodeName is plop-node-compute-0.novalocal 
it is configured by openshift-ansible by this 
nodeName: {{ openshift.common.hostname | lower }}



behind the error, we have this https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L1069

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/openstack/openstack.go#L462

Is that the ExternalId method should rather call for the metada server to get the instance Id and not querying nova
if we need the instanceId, it is given here by https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/openstack/openstack.go#L177

Comment 3 jawed 2016-03-30 15:49:01 UTC

I find this PR but it seems that openstack integration is working for openshift entreprise 3.2
https://github.com/openshift/openshift-ansible/pull/883/files

Comment 4 Seth Jennings 2016-03-30 17:14:57 UTC

Applied this patch to openshift:

diff --git a/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go b/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go
index 5737867..a8337a4 100644
--- a/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go
+++ b/Godeps/_workspace/src/k8s.io/kubernetes/pkg/cloudprovider/providers/openstack/openstack.go
@@ -463,6 +463,11 @@ func (i *Instances) NodeAddresses(name string) ([]api.NodeAddress, error) {
 
 // ExternalID returns the cloud provider ID of the specified instance (deprecated).
 func (i *Instances) ExternalID(name string) (string, error) {
+       str, err := readInstanceID()
+       if err == nil {
+               glog.V(4).Infof("(ExternalID) instance-id is ", str)
+               return str, err
+       }
        srv, err := getServerByName(i.compute, name)
        if err != nil {
                return "", err
@@ -472,6 +477,11 @@ func (i *Instances) ExternalID(name string) (string, error) {
 
 // InstanceID returns the cloud provider ID of the specified instance.
 func (i *Instances) InstanceID(name string) (string, error) {
+       str, err := readInstanceID()
+       if err == nil {
+               glog.V(4).Infof("(InstanceID) instance-id is ", str)
+               return "/" + str, err
+       }
        srv, err := getServerByName(i.compute, name)
        if err != nil {
                return "", err

It gets past the

E0330 12:13:52.145314    4710 kubelet.go:969] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: Failed to find object

But hit a downstream problem at

E0330 12:47:44.254257   10381 kubelet.go:1131] Unable to construct api.Node object for kubelet: failed to get node address from cloud provider: Failed to find object

The address lookup still happens by name.

Basically all functions that do getServerByName() are going to have an issue, which include getAddressesByName()

Comment 5 Seth Jennings 2016-03-30 21:21:40 UTC

So I have a potential fix here
https://github.com/sjenning/kubernetes/commit/21188755852cb352a87a4789171f74b44bc007cd

The underlying assumption is that all functions that call readInstanceID() are getting information on the node that is running the code (not how readInstanceID() takes no name argument)

A potentially better fix, but more code change, would be to cache the instance id in a local struct that implements Instances, then we don't have to call readInstanceID() all the time.

Comment 6 Jason DeTiberus 2016-03-31 00:36:53 UTC

Just saw the IRC ping. Scott Dodson is correct that the SDN will use the nodeName for resolving the IP address of the node (unless the NodeIP value is set, but that causes other issues, so we usually try to avoid setting it).

Ideally, all of the cloud providers would use the metadata service for determining the host's id rather than using the value of nodeName, since that would avoid the requirement that the node name matches the value being queried for getting the instance information. I'm not sure what all information is needed for populating the node config, but I suspect most of it should be available directly from the metadata service without having to query the cloud API as well.

Comment 7 Seth Jennings 2016-04-01 14:11:36 UTC

re: irc conversation on openshift-dev, it is late in the game for a code change in kube to workaround this.  It is, in fact, in the documentation that the nodeName need to match the openstack instance name.  The installer just doesn't do this right now and requires user editing of the node-config.yaml.  This is an inconvenience but not a bug, per say.

I think there is agreement to let this be for 3.2 and look at improving the installer to use the instance name for nodeName in the future.

If no one objects, I'll close this.

Comment 8 Seth Jennings 2016-04-01 14:12:51 UTC

Oops, meant to leave this open for now awaiting objections

Comment 9 Jason DeTiberus 2016-04-01 21:12:05 UTC

If this bug is getting closed, it would be nice to get a trello card added for tracking the upstream behavior change.

Comment 10 Seth Jennings 2016-04-05 17:58:39 UTC

Here is the card.  Feel free to expand on it.
https://trello.com/c/dyHpMQw9/335-as-a-user-i-want-to-the-installer-to-configure-for-the-openstack-cloudprovider-without-having-to-manually-edit-the-node-configs

Comment 11 Seth Jennings 2016-09-13 18:27:25 UTC

*** Bug 1375422 has been marked as a duplicate of this bug. ***

Comment 12 Jaroslav Henner 2017-03-15 22:19:23 UTC

(In reply to Seth Jennings from comment #7)
> re: irc conversation on openshift-dev, it is late in the game for a code
> change in kube to workaround this.  It is, in fact, in the documentation
> that the nodeName need to match the openstack instance name.

Based on that I found that we need to have fqdns as instance names, because I am unable to find out any other inventory variables (openshift_hostname and openshift_public_hostname) setting such that openshift-ansible would deploy fine the nodes and being openstack-aware. With the fqdns as instance names it works.


>  The installer
> just doesn't do this right now and requires user editing of the
> node-config.yaml.  This is an inconvenience but not a bug, per say.
> 
> I think there is agreement to let this be for 3.2 and look at improving the
> installer to use the instance name for nodeName in the future.
> 
> If no one objects, I'll close this.