Bug 1194471 - openshift-sdn-node doesn't correctly detect hostname
Summary: openshift-sdn-node doesn't correctly detect hostname
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Rajat Chopra
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-19 22:23 UTC by Erik M Jacobs
Modified: 2015-11-23 14:43 UTC (History)
9 users (show)

Fixed In Version: openshift-sdn-0.4-1.git.0.bc3855b.el7ose
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-23 14:43:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Erik M Jacobs 2015-02-19 22:23:30 UTC
Description of problem:
Docs for sdn-node:

#  -hostname="": Hostname as registered with master (for node mode),
#    will default to 'hostname -f'

output:
[root@ose3-master training]# hostname -f
ose3-master.example.com

But when sdn-node is started only with "-v=4", the following occurs:

Feb 19 16:36:41 ose3-master openshift-sdn[2468]: E0219 16:36:41.130571 02468 controller.go:206] Could not find an allocated subnet for this minion (ose3-master)(100: Key not found (/registry/sdn/subnets/ose3-mas
ter) [23]). Waiting..

So it appears that, despite claiming to default to 'hostname -f', it actually defaults to the shortname.

Version-Release number of selected component (if applicable):
0.4-2.git.0.eec1f90.el7ose

Comment 1 Rajat Chopra 2015-02-19 23:54:23 UTC
Fixed with https://github.com/openshift/openshift-sdn/pull/25
if /etc/hosts has several entries to the public IP entry, e.g.
192.168.133.2 ose3-master.example.com ose3-master

hostname -f picks up ose3-master.example.com
But the code 'os.Hostname' in golang picks ose3-master, triggering a mismatch between openshift-master as it sees a minion and openshift-sdn-node.

Pull request above reverts to what the documentation says, openshift-sdn-node will use 'hostname -f' if -hostname option is not specified.

Comment 3 Brenton Leanhardt 2015-02-20 11:48:21 UTC
We'll make sure this gets built today in OSE.  For now it can be tested in Origin as soon at the PR is merged.

Comment 5 Anping Li 2015-02-26 02:11:29 UTC
Verified and pass.
1) The sdn-master can be started with only "-v=4"
[root@master222 sysconfig]# systemctl status openshift-sdn-master
openshift-sdn-master.service - OpenShift SDN Master
   Loaded: loaded (/usr/lib/systemd/system/openshift-sdn-master.service; disabled)
   Active: active (running) since Thu 2015-02-26 09:39:12 CST; 26min ago
     Docs: https://github.com/openshift/openshift-sdn
 Main PID: 13472 (openshift-sdn)
   CGroup: /system.slice/openshift-sdn-master.service
           └─13472 /usr/bin/openshift-sdn -v=4

Feb 26 09:43:00 master222.ose.com.cn openshift-sdn[13472]: I0226 09:43:00.452965 13472 registry.go:269] Issuing a minion event: &{compareAndSwap 0xc208004420 0xc208004540 25 106506 1}
Feb 26 09:43:00 master222.ose.com.cn openshift-sdn[13472]: I0226 09:43:00.457984 13472 registry.go:212] unmarshalling {"Minion":"192.168.0.224","Sub":"10.1.2.0/24"}
2) The sdn-node can connected to the master using long hostname
osc get nodes
NAME                   LABELS              STATUS
master222.ose.com.cn   <none>              Ready
node223.ose.com.cn     <none>              Ready

Comment 6 Johnny Liu 2015-02-26 03:30:17 UTC
Add one more step for comment 5, add the following line to /ect/hosts
<IP>   master222.ose.com.cn master222

Comment 7 Johnny Liu 2015-03-19 09:58:25 UTC
When /etc/hosts have the following lines, this bug would reproduced, so re-open it.

# hostname
jialiu-node1.example.com
# hostname -f
jialiu-node1.example.com
# cat /etc/hosts
10.66.79.112	jialiu-node1 jialiu-node1.example.com
***NOTE:***
When the line is 
10.66.79.112	jialiu-node1.example.com jialiu-node1
This issue would not happen.

Comment 8 Rajat Chopra 2015-04-28 17:06:42 UTC
Sorry, not clear how there is a mismatch again. The doc says it will pick 'hostname -f' and that is what it does irrespective of what is there in /etc/hosts.

Comment 9 xjia 2015-06-01 09:59:25 UTC
Version:
3.0/2015-05-30.1/

Verify:
Now we don't have openshift-sdn-master service, so check openshift-master service instead.
[root@jia-master ~]# systemctl status openshift-master
openshift-master.service - OpenShift Master
   Loaded: loaded (/usr/lib/systemd/system/openshift-master.service; enabled)
   Active: active (running) since Mon 2015-06-01 17:41:17 CST; 15min ago
     Docs: https://github.com/openshift/origin
 Main PID: 3681 (openshift)
   CGroup: /system.slice/openshift-master.service
           └─3681 /usr/bin/openshift start master --config=/etc/openshift/master/master-config.yaml --loglevel=4

Jun 01 17:56:19 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:19.622990    3681 reflector.go:241] Watch close - *api.Pod tota...ceived
Jun 01 17:56:20 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:20.118436    3681 reflector.go:241] Watch close - *api.Namespac...ceived
Jun 01 17:56:21 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:21.760152    3681 nodecontroller.go:279] Nodes ReadyCondition u...Heartb
Jun 01 17:56:21 jia-master.v3-ose.com openshift-master[3681]: vs {Capacity:map[memory:{Amount:3975819264.000 Format:BinarySI} pods:{Amoun...kubele
Jun 01 17:56:26 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:26.450394    3681 reflector.go:241] Watch close - *api.Namespac...ceived
Jun 01 17:56:26 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:26.672080    3681 reflector.go:241] Watch close - *api.LimitRan...ceived
Jun 01 17:56:26 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:26.849378    3681 reflector.go:241] Watch close - *api.ServiceA...ceived
Jun 01 17:56:27 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:27.047428    3681 reflector.go:241] Watch close - *api.Secret t...ceived
Jun 01 17:56:27 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:27.247934    3681 reflector.go:241] Watch close - *api.Resource...ceived
Jun 01 17:56:28 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:28.004772    3681 reflector.go:241] Watch close - *api.ServiceA...ceived
Hint: Some lines were ellipsized, use -l to show in full.
[root@jia-master ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.14.6.125 jia-master.v3-ose.com
10.14.6.116 jia-minion.v3-ose.com
[root@jia-master ~]# osc get nodes
NAME                    LABELS                     STATUS
jia-minion.v3-ose.com   region=primary,zone=east   Ready

Comment 10 xjia 2015-06-01 10:40:48 UTC
Sorry, seems my step is not enough.
Add these steps for this bug:

[root@openshift-v3 training]# hostname 
jia-master.v3-ose.com
[root@openshift-v3 training]# hostname  -f 
jia-master
[root@openshift-v3 training]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.14.6.125 jia-minion jia-minion.v3-ose.com
10.14.6.116 jia-master jia-master.v3-ose.com
[root@openshift-v3 training]# 
[root@openshift-v3 training]# 
[root@openshift-v3 training]# systemctl status openshift-master
openshift-master.service - OpenShift Master
   Loaded: loaded (/usr/lib/systemd/system/openshift-master.service; enabled)
   Active: active (running) since Mon 2015-06-01 18:38:42 CST; 1min 21s ago
     Docs: https://github.com/openshift/origin
 Main PID: 3562 (openshift)
   CGroup: /system.slice/openshift-master.service
           └─3562 /usr/bin/openshift start master --config=/etc/openshift/master/master-config.yaml --loglevel=4

Jun 01 18:38:44 jia-master.v3-ose.com openshift-master[3562]: I0601 18:38:44.612345    3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes-ro" endpoints. (1.066µs)
Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: I0601 18:38:54.816009    3562 trace.go:57] Trace "getFromCache" (started 2015-06-01 18:38:54.815757954 +0800 CST):
Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: [12.887µs] [12.887µs] Raw get done
Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: [206.035µs] [193.148µs] Deep copied
Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: [209.15µs] [3.115µs] END
Jun 01 18:39:13 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:13.170072    3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes" endpoints. (19.228µs)
Jun 01 18:39:13 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:13.170154    3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes-ro" endpoints. (1.29µs)
Jun 01 18:39:44 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:44.159335    3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes" endpoints. (13.724µs)
Jun 01 18:39:44 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:44.159381    3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes-ro" endpoints. (853ns)
Jun 01 18:39:55 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:55.030710    3562 nodecontroller.go:252] Creating timestamp entry for newly observed Node jia-minion.v3-ose.com

Comment 11 Johnny Liu 2015-06-01 13:12:33 UTC
Continue comment 10, this bug should be more specific for node, but not master.

# hostname
jia-minion.v3-ose.com

# hostname -f
jia-minion

# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.14.6.125 jia-minion jia-minion.v3-ose.com
10.14.6.116 jia-master jia-master.v3-ose.com

# service openshift-node restart
Redirecting to /bin/systemctl restart  openshift-node.service

# osc get nodes
NAME                    LABELS                     STATUS
jia-minion.v3-ose.com   region=primary,zone=west   Ready


Note You need to log in before you can comment on or make changes to this bug.