Description of problem: On OSE v3 Beta node, when openshift-sdn-node is started and you try the following command it hangs forever : systemctl restart openshift-sdn-node Kind Regards Frederic
This happens whenever openshift-sdn-node is attempting to reach the master but failing to do so. It will eventually kill the process and restart, or at least that's been my experience, does that happen for you as well?
Hi Scott, Well, your previous message gave me a hint. In the /var/log/messages on my OSEv2 master which is also a node, I have got the following information : Feb 18 10:34:41 ose3-master openshift-sdn: E0218 10:34:41.828239 00908 controller.go:199] Could not find an allocated subnet for this minion (ose3-master.myredhat.com)(501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]). Waiting.. Now up, I am going to investigate in that area. If you have got an idea why I have got such messages, please feel free to let me know. Thanks for your support and your time. KR Frederic
The "All the given peers are not reachable" message is coming from the embeded etcd client. If it can't reach the master it won't be able to find out any subnet configuration.
Indeed, I noticed that my openshift-sdn-master did not start correctly. I do not know why. Here is the output of the "systemctl start openshift-sdn-master" cmd I ran systemctl status -l openshift-sdn-master openshift-sdn-master.service - OpenShift SDN Master Loaded: loaded (/usr/lib/systemd/system/openshift-sdn-master.service; enabled) Active: inactive (dead) since Wed 2015-02-18 10:49:09 EST; 11s ago Docs: https://github.com/openshift/openshift-sdn Process: 2297 ExecStart=/usr/bin/openshift-sdn $OPTIONS (code=exited, status=0/SUCCESS) Main PID: 2297 (code=exited, status=0/SUCCESS) Feb 18 10:49:09 ose3-master.myredhat.com systemd[1]: Starting OpenShift SDN Master... Feb 18 10:49:09 ose3-master.myredhat.com systemd[1]: Started OpenShift SDN Master. Feb 18 10:49:09 ose3-master.myredhat.com openshift-sdn[2297]: I0218 10:49:09.854025 02297 main.go:108] Installing signal handlers Feb 18 10:49:09 ose3-master.myredhat.com openshift-sdn[2297]: I0218 10:49:09.858536 02297 controller.go:47] Self IP : 192.168.122.20 Feb 18 10:49:09 ose3-master.myredhat.com openshift-sdn[2297]: E0218 10:49:09.859459 02297 controller.go:73] Error in initializing/fetching subnets - 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0] Do you have any idea what is wrong here ?
OK, I found why. At first glance, It seems that my openshift-master.service was not started correctly. I am going to do double checks now and get back to you. Thanks for your time. KR /f
FYI, now everything is running correctly on my standalone server which acts as master and minion/node. I am going to check on my other servers which acts as minion/node. [root@ose3-master ~]# systemctl | grep openshift openshift-master.service loaded active running OpenShift Master openshift-node.service loaded active running OpenShift Node openshift-sdn-master.service loaded active running OpenShift SDN Master openshift-sdn-node.service loaded active running OpenShift SDN Node KR /f
Frederic, If this was just after a reboot, yesterday we pushed out new packages that made service startup after a reboot better for openshift-master, though perhaps not perfect. Please update to openshift-0.3.0-0.git.147.4be9abc.el7ose as it should help with that scenario. I think this bug should still be considered on it's own, openshift-sdn-node should restart more cleanly while attempting to connect to etcd if possible. -- Scott
Hi Scott, Here are the ones I am using : openshift-0.3.0-0.git.146.c125b05.el7ose.x86_64 openshift-node-0.3.0-0.git.146.c125b05.el7ose.x86_64 openshift-sdn-0.4-1.git.0.4809789.el7ose.x86_64 openshift-sdn-node-0.4-1.git.0.4809789.el7ose.x86_64 openshift-master-0.3.0-0.git.146.c125b05.el7ose.x86_64 tuned-profiles-openshift-node-0.3.0-0.git.146.c125b05.el7ose.x86_64 openshift-sdn-master-0.4-1.git.0.4809789.el7ose.x86_64 I tried to update them but I was notified there was no packages marked for update. Meanwhile, I have tested them on a standalone sever acting as Master and Node. It seems to work. But on my other servers acting as Node, I still have the following message : Could not find an allocated subnet for this minion (ose3-node1.myredhat.com)(501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]). Waiting.. On my master/node server I have the following services up and running : openshift-master.service loaded active running OpenShift Master openshift-node.service loaded active running OpenShift Node openshift-sdn-master.service loaded active running OpenShift SDN Master openshift-sdn-node.service loaded active running OpenShift SDN Node On my Node only server I have the following services up and running : openshift-node.service loaded active running OpenShift Node openshift-sdn-node.service loaded active running OpenShift SDN Node I am going to continue to investigate. KR /f
OK, I think I found where the problem was. Indeed I set the MASTER_URL value inside of the openshift-sdn-node configuration file based on the example provided inside that file. Initially I did a copy paste and it was my mistake. - see below - # Example: # MASTER_URL=https://10.0.0.1:4001 MASTER_URL=https://192.168.122.20:4001 The solution was to replace https by http like the following : # Example: # MASTER_URL=https://10.0.0.1:4001 MASTER_URL=http://192.168.122.20:4001 Now logs are like this - which are normal- : Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: + grep -q '^OPTIONS='\''--insecure-registry=0.0.0.0/0 -b=lbr0 --mtu=1450 --selinux-enabled'\''' /etc/sysconfig/docker Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: + cat Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: + systemctl daemon-reload Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: + systemctl restart docker.service Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: I0218 11:45:06.922853 02622 controller.go:275] Output of adding table=0,cookie=0x32,priority=200,ip,in_port=9,nw_dst=10.1.0.0/24,actions=set_field:192.168.122....ut:10: (<nil>) Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: I0218 11:45:06.939617 02622 controller.go:277] Output of adding table=0,cookie=0x32,priority=200,arp,in_port=9,nw_dst=10.1.0.0/24,actions=set_field:192.168.122...ut:10: (<nil>) Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: I0218 11:45:06.943949 02622 controller.go:275] Output of adding table=0,cookie=0x97,priority=200,ip,in_port=9,nw_dst=10.1.1.0/24,actions=set_field:192.168.122....ut:10: (<nil>) Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: I0218 11:45:06.958295 02622 controller.go:277] Output of adding table=0,cookie=0x97,priority=200,arp,in_port=9,nw_dst=10.1.1.0/24,actions=set_field:192.168.122...ut:10: (<nil>) Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: I0218 11:45:06.962123 02622 controller.go:268] Output of adding table=0,cookie=0xd8,priority=200,ip,in_port=10,nw_dst=10.1.2.0/24,actions=output:9: (<nil>) Feb 18 11:45:06 ose3-node2.myredhat.com openshift-sdn[2622]: I0218 11:45:06.965916 02622 controller.go:270] Output of adding table=0,cookie=0xd8,priority=200,arp,in_port=10,nw_dst=10.1.2.0/24,actions=output:9: (<nil>) N.B. @Scott You are right. When the openshift-sdn-node hangs, the only way to restart it is to kill it first. I CLOSE THIS TICKET Kind Regards Frederic