Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1457330

Summary:

Liveness Probe Port for custom routers wrong after openshift upgrades

Product:

OpenShift Container Platform

Reporter:

Javier Ramirez <javier.ramirez>

Component:

Cluster Version Operator

Assignee:

Russell Teague <rteague>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Anping Li <anli>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.4.1

CC:

aos-bugs, jokerman, mmccomas, sdodson, simon.gunzenreiner

Target Milestone:

---

Keywords:

NeedsTestCase

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1462721 (view as bug list)

Environment:

Last Closed:

2017-06-19 12:21:12 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Deployment configs for routers before upgrade	none
Deployment configs for routers after upgrade	none
Ansible log for the upgrade	none
custom ansible playbook for custom routers creation	none

Description Javier Ramirez 2017-05-31 14:11:46 UTC

Description of problem:

For egress connectivity of customers, we create many custom routers with 'oc adm router' command. Each of those routers uses specific ports for http, https, stats, ... The liveness probe port of a router is typically set to the stats port.

When upgrading, we experienced that the routers ended up with CrashLoopBackoff after the image version is patched by the upgrade scripts.

This is because the liveness probe port was the same as the statistics port before the upgrade, after the upgrade, the liveness probe port is set to 1936 for all routers.

Version-Release number of selected component (if applicable):
3.4.1.18

How reproducible:
Always

Steps to Reproduce:
1. Create custom router with specific port for stats
2. Upgrade the environment
3.

Actual results:

The routers ended up with CrashLoopBackoff after the image version is patched by the upgrade scripts. We then normally recreated the routers by deleting them and recreating them.

Expected results:

Routers to keep using the specific ports.

Additional info:
We then normally recreated the routers by deleting them and recreating them. During last upgrade, we really investigated what causes the issue and during upgrade of a prod environment last night, we could as well collect the deployment-configs before and after the upgrade.

As you can see in the attachments, the liveness probe port was the same as the statistics port before the upgrade, after the upgrade, the liveness probe port is set to 1936 for all routers.

In the ansible log we can see when this happens:
-> "{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"router\",\"image\":\"openshift3/ose-haproxy-router:v3.4.1.18\",\"livenessProbe\":{\"tcpSocket\":null,\"httpGet\":{\"path\": \"/healthz\", \"port\": 1936, \"host\": \"localhost\", \"scheme\": \"HTTP\"},\"initialDelaySeconds\":10,\"timeoutSeconds\":1}}]}}}}",
"--api-version=v1"

Comment 1 Javier Ramirez 2017-05-31 14:16:08 UTC

Created attachment 1283806 [details]
Deployment configs for routers before upgrade

Comment 2 Javier Ramirez 2017-05-31 14:16:48 UTC

Created attachment 1283807 [details]
Deployment configs for routers after upgrade

Comment 3 Javier Ramirez 2017-05-31 14:17:30 UTC

Created attachment 1283808 [details]
Ansible log for the upgrade

Comment 4 Javier Ramirez 2017-05-31 14:19:05 UTC

Created attachment 1283809 [details]
custom ansible playbook for custom routers creation

Comment 5 Russell Teague 2017-06-16 20:13:45 UTC

Scott,
This bug was fixed[0] in 3.6 through the work for migrating to oc_* modules.  This could potentially be backported to 3.5 since we have the modules, but I've not investigated the extent of that effort.  Since we don't have the modules in 3.4 it would be a significant effort to backport to 3.4. What is your recommendation on moving forward with this issue?


[0] https://github.com/openshift/openshift-ansible/pull/3897/files

Comment 6 Scott Dodson 2017-06-19 00:39:45 UTC

Lets go ahead and backport to 3.5.

Javier, it sounded like your customer was able to work around the problem. If we ensure that this is fixed in the playbooks to upgrade to 3.5 is that acceptable?

Comment 7 Javier Ramirez 2017-06-19 11:09:23 UTC

(In reply to Scott Dodson from comment #6)
> Lets go ahead and backport to 3.5.
> 
> Javier, it sounded like your customer was able to work around the problem.
> If we ensure that this is fixed in the playbooks to upgrade to 3.5 is that
> acceptable?

Yes, that sounds good for the customer. If you know the bz number for the 3.5 backport, please let me know.

Comment 8 Russell Teague 2017-06-19 12:18:34 UTC

3.5 backport is https://bugzilla.redhat.com/show_bug.cgi?id=1462721