Bug 1263136

Summary: [networking_124]F5 router can not be running
Product: OKD Reporter: zhaozhanqi <zzhao>
Component: RoutingAssignee: Rajat Chopra <rchopra>
Status: CLOSED WORKSFORME QA Contact: zhaozhanqi <zzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.xCC: aos-bugs, mmasters
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-20 06:37:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Miciah Dashiel Butler Masters 2015-09-15 13:34:22 UTC
I would expect there to be a "kubernetes" service, created by the master, which would explain why the "openshift_default_kubernetes" pool exists.  There should be no route defined for the "kubernetes" service, so the F5 route synchronizer should not configure F5 BIG-IP to use the "openshift_default_kubernetes" pool for anything.

Similarly, I have seen a "router" service, so I would expect a "openshift_default_router" pool to exist.  It is not clear why it does in some cases and does not in your case, but like the "openshift_default_kubernetes" pool, it should not be used for anything anyway.

More important is what other pools exist.  If you have created services and routes for which the F5 route synchronizer is failing to create corresponding pools and policy rules, can you provide details on those routes and services, and the corresponding log output (if any) from the router? (`oc get routes -o json` and `oc get services -o json` for the former.)

Comment 3 zhaozhanqi 2015-09-16 03:12:39 UTC
I can see other service in the F5 pool list. openshift_default_router is also in there, but always be deleted and recreated because the router pod is unhealthy and recreated.

I0916 03:01:45.399543     808 manager.go:1492] pod "router-1-0ljxl_default" container "router" is unhealthy (probe result: failure), it will be killed and re-created.
I0916 03:02:05.420802     808 manager.go:1492] pod "router-1-0ljxl_default" container "router" is unhealthy (probe result: failure), it will be killed and re-created.
I0916 03:02:15.429068     808 manager.go:1492] pod "router-1-0ljxl_default" container "router" is unhealthy (probe result: failure), it will be killed and re-created.


here are some logs from journalctl, hope it can help you analyse the root cause.

8:42 ip-172-18-10-71 systemd-udevd[247]: error: /dev/dm-4: No such device or address
Sep 16 03:08:42 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:42.601160080Z" level=info msg="DELETE /containers/c148af20f47886bf82886e6550d16b39f9d332f23d4b790fbadcd403b54a2ac2?v=1"
Sep 16 03:08:42 ip-172-18-10-71 kernel: EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
Sep 16 03:08:42 ip-172-18-10-71 kernel: SELinux: initialized (dev dm-4, type ext4), uses xattr
Sep 16 03:08:43 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:43.488763474Z" level=info msg="GET /version"
Sep 16 03:08:47 ip-172-18-10-71 systemd-udevd[247]: error: /dev/dm-5: No such device or address
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.491186801Z" level=info msg="GET /version"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.868613664Z" level=info msg="GET /containers/json"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.871177953Z" level=info msg="GET /containers/json"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.873113538Z" level=info msg="GET /containers/json?all=1"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.875927957Z" level=info msg="GET /containers/60e2fdb1edca7249b667e7fa0d79cf77acaf4b01ccdeebc5a8d16f9d28547862/json"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.877493151Z" level=info msg="GET /containers/6700e6912c5b13b4f0b9a87c41fc32416d81b36821657fefac20da127dee517e/json"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.881071068Z" level=info msg="GET /containers/6700e6912c5b13b4f0b9a87c41fc32416d81b36821657fefac20da127dee517e/json"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.883292458Z" level=info msg="GET /containers/json?all=1"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.885953010Z" level=info msg="GET /containers/60e2fdb1edca7249b667e7fa0d79cf77acaf4b01ccdeebc5a8d16f9d28547862/json"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.887532882Z" level=info msg="GET /containers/6700e6912c5b13b4f0b9a87c41fc32416d81b36821657fefac20da127dee517e/json"
Sep 16 03:08:48 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:48.971541160Z" level=info msg="GET /containers/json"
Sep 16 03:08:49 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:49.074441202Z" level=info msg="GET /containers/json"
Sep 16 03:08:49 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:49.177522263Z" level=info msg="GET /containers/json"
Sep 16 03:08:49 ip-172-18-10-71 docker[679]: time="2015-09-16T03:08:49.280566313Z" level=info msg="GET /containers/json"

Comment 4 zhaozhanqi 2015-10-20 06:37:32 UTC
The F5 router now can be running. anyway this issue has been fixed.
marked this issue 'verified'