Description of problem: Docs for sdn-node: # -hostname="": Hostname as registered with master (for node mode), # will default to 'hostname -f' output: [root@ose3-master training]# hostname -f ose3-master.example.com But when sdn-node is started only with "-v=4", the following occurs: Feb 19 16:36:41 ose3-master openshift-sdn[2468]: E0219 16:36:41.130571 02468 controller.go:206] Could not find an allocated subnet for this minion (ose3-master)(100: Key not found (/registry/sdn/subnets/ose3-mas ter) [23]). Waiting.. So it appears that, despite claiming to default to 'hostname -f', it actually defaults to the shortname. Version-Release number of selected component (if applicable): 0.4-2.git.0.eec1f90.el7ose
Fixed with https://github.com/openshift/openshift-sdn/pull/25 if /etc/hosts has several entries to the public IP entry, e.g. 192.168.133.2 ose3-master.example.com ose3-master hostname -f picks up ose3-master.example.com But the code 'os.Hostname' in golang picks ose3-master, triggering a mismatch between openshift-master as it sees a minion and openshift-sdn-node. Pull request above reverts to what the documentation says, openshift-sdn-node will use 'hostname -f' if -hostname option is not specified.
We'll make sure this gets built today in OSE. For now it can be tested in Origin as soon at the PR is merged.
Verified and pass. 1) The sdn-master can be started with only "-v=4" [root@master222 sysconfig]# systemctl status openshift-sdn-master openshift-sdn-master.service - OpenShift SDN Master Loaded: loaded (/usr/lib/systemd/system/openshift-sdn-master.service; disabled) Active: active (running) since Thu 2015-02-26 09:39:12 CST; 26min ago Docs: https://github.com/openshift/openshift-sdn Main PID: 13472 (openshift-sdn) CGroup: /system.slice/openshift-sdn-master.service └─13472 /usr/bin/openshift-sdn -v=4 Feb 26 09:43:00 master222.ose.com.cn openshift-sdn[13472]: I0226 09:43:00.452965 13472 registry.go:269] Issuing a minion event: &{compareAndSwap 0xc208004420 0xc208004540 25 106506 1} Feb 26 09:43:00 master222.ose.com.cn openshift-sdn[13472]: I0226 09:43:00.457984 13472 registry.go:212] unmarshalling {"Minion":"192.168.0.224","Sub":"10.1.2.0/24"} 2) The sdn-node can connected to the master using long hostname osc get nodes NAME LABELS STATUS master222.ose.com.cn <none> Ready node223.ose.com.cn <none> Ready
Add one more step for comment 5, add the following line to /ect/hosts <IP> master222.ose.com.cn master222
When /etc/hosts have the following lines, this bug would reproduced, so re-open it. # hostname jialiu-node1.example.com # hostname -f jialiu-node1.example.com # cat /etc/hosts 10.66.79.112 jialiu-node1 jialiu-node1.example.com ***NOTE:*** When the line is 10.66.79.112 jialiu-node1.example.com jialiu-node1 This issue would not happen.
Sorry, not clear how there is a mismatch again. The doc says it will pick 'hostname -f' and that is what it does irrespective of what is there in /etc/hosts.
Version: 3.0/2015-05-30.1/ Verify: Now we don't have openshift-sdn-master service, so check openshift-master service instead. [root@jia-master ~]# systemctl status openshift-master openshift-master.service - OpenShift Master Loaded: loaded (/usr/lib/systemd/system/openshift-master.service; enabled) Active: active (running) since Mon 2015-06-01 17:41:17 CST; 15min ago Docs: https://github.com/openshift/origin Main PID: 3681 (openshift) CGroup: /system.slice/openshift-master.service └─3681 /usr/bin/openshift start master --config=/etc/openshift/master/master-config.yaml --loglevel=4 Jun 01 17:56:19 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:19.622990 3681 reflector.go:241] Watch close - *api.Pod tota...ceived Jun 01 17:56:20 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:20.118436 3681 reflector.go:241] Watch close - *api.Namespac...ceived Jun 01 17:56:21 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:21.760152 3681 nodecontroller.go:279] Nodes ReadyCondition u...Heartb Jun 01 17:56:21 jia-master.v3-ose.com openshift-master[3681]: vs {Capacity:map[memory:{Amount:3975819264.000 Format:BinarySI} pods:{Amoun...kubele Jun 01 17:56:26 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:26.450394 3681 reflector.go:241] Watch close - *api.Namespac...ceived Jun 01 17:56:26 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:26.672080 3681 reflector.go:241] Watch close - *api.LimitRan...ceived Jun 01 17:56:26 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:26.849378 3681 reflector.go:241] Watch close - *api.ServiceA...ceived Jun 01 17:56:27 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:27.047428 3681 reflector.go:241] Watch close - *api.Secret t...ceived Jun 01 17:56:27 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:27.247934 3681 reflector.go:241] Watch close - *api.Resource...ceived Jun 01 17:56:28 jia-master.v3-ose.com openshift-master[3681]: I0601 17:56:28.004772 3681 reflector.go:241] Watch close - *api.ServiceA...ceived Hint: Some lines were ellipsized, use -l to show in full. [root@jia-master ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.14.6.125 jia-master.v3-ose.com 10.14.6.116 jia-minion.v3-ose.com [root@jia-master ~]# osc get nodes NAME LABELS STATUS jia-minion.v3-ose.com region=primary,zone=east Ready
Sorry, seems my step is not enough. Add these steps for this bug: [root@openshift-v3 training]# hostname jia-master.v3-ose.com [root@openshift-v3 training]# hostname -f jia-master [root@openshift-v3 training]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.14.6.125 jia-minion jia-minion.v3-ose.com 10.14.6.116 jia-master jia-master.v3-ose.com [root@openshift-v3 training]# [root@openshift-v3 training]# [root@openshift-v3 training]# systemctl status openshift-master openshift-master.service - OpenShift Master Loaded: loaded (/usr/lib/systemd/system/openshift-master.service; enabled) Active: active (running) since Mon 2015-06-01 18:38:42 CST; 1min 21s ago Docs: https://github.com/openshift/origin Main PID: 3562 (openshift) CGroup: /system.slice/openshift-master.service └─3562 /usr/bin/openshift start master --config=/etc/openshift/master/master-config.yaml --loglevel=4 Jun 01 18:38:44 jia-master.v3-ose.com openshift-master[3562]: I0601 18:38:44.612345 3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes-ro" endpoints. (1.066µs) Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: I0601 18:38:54.816009 3562 trace.go:57] Trace "getFromCache" (started 2015-06-01 18:38:54.815757954 +0800 CST): Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: [12.887µs] [12.887µs] Raw get done Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: [206.035µs] [193.148µs] Deep copied Jun 01 18:38:54 jia-master.v3-ose.com openshift-master[3562]: [209.15µs] [3.115µs] END Jun 01 18:39:13 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:13.170072 3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes" endpoints. (19.228µs) Jun 01 18:39:13 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:13.170154 3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes-ro" endpoints. (1.29µs) Jun 01 18:39:44 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:44.159335 3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes" endpoints. (13.724µs) Jun 01 18:39:44 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:44.159381 3562 endpoints_controller.go:258] Finished syncing service "default/kubernetes-ro" endpoints. (853ns) Jun 01 18:39:55 jia-master.v3-ose.com openshift-master[3562]: I0601 18:39:55.030710 3562 nodecontroller.go:252] Creating timestamp entry for newly observed Node jia-minion.v3-ose.com
Continue comment 10, this bug should be more specific for node, but not master. # hostname jia-minion.v3-ose.com # hostname -f jia-minion # cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.14.6.125 jia-minion jia-minion.v3-ose.com 10.14.6.116 jia-master jia-master.v3-ose.com # service openshift-node restart Redirecting to /bin/systemctl restart openshift-node.service # osc get nodes NAME LABELS STATUS jia-minion.v3-ose.com region=primary,zone=west Ready