Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1462721 - [3.5] Liveness Probe Port for custom routers wrong after openshift upgrades
[3.5] Liveness Probe Port for custom routers wrong after openshift upgrades
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Upgrade (Show other bugs)
3.5.1
Unspecified Unspecified
high Severity high
: ---
: 3.5.z
Assigned To: Russell Teague
Anping Li
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-19 08:16 EDT by Russell Teague
Modified: 2017-06-29 09:33 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The use of oc patch to update router images was setting additional config items to defaults, even if they were configured differently in the environment. The tasks were converted to use Ansible modules which are much more precise and change only the provided parameter.
Story Points: ---
Clone Of: 1457330
Environment:
Last Closed: 2017-06-29 09:33:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1666 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-06-29 13:32:39 EDT

  None (edit)
Description Russell Teague 2017-06-19 08:16:25 EDT
+++ This bug was initially created as a clone of Bug #1457330 +++

Description of problem:

For egress connectivity of customers, we create many custom routers with 'oc adm router' command. Each of those routers uses specific ports for http, https, stats, ... The liveness probe port of a router is typically set to the stats port.

When upgrading, we experienced that the routers ended up with CrashLoopBackoff after the image version is patched by the upgrade scripts.

This is because the liveness probe port was the same as the statistics port before the upgrade, after the upgrade, the liveness probe port is set to 1936 for all routers.

Version-Release number of selected component (if applicable):
3.4.1.18

How reproducible:
Always

Steps to Reproduce:
1. Create custom router with specific port for stats
2. Upgrade the environment
3. 

Actual results:

The routers ended up with CrashLoopBackoff after the image version is patched by the upgrade scripts. We then normally recreated the routers by deleting them and recreating them. 

Expected results:

Routers to keep using the specific ports.

Additional info:
 We then normally recreated the routers by deleting them and recreating them. During last upgrade, we really investigated what causes the issue and during upgrade of a prod environment last night, we could as well collect the deployment-configs before and after the upgrade.

As you can see in the attachments, the liveness probe port was the same as the statistics port before the upgrade, after the upgrade, the liveness probe port is set to 1936 for all routers.

In the ansible log we can see when this happens:
->  "{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"router\",\"image\":\"openshift3/ose-haproxy-router:v3.4.1.18\",\"livenessProbe\":{\"tcpSocket\":null,\"httpGet\":{\"path\": \"/healthz\", \"port\": 1936, \"host\": \"localhost\", \"scheme\": \"HTTP\"},\"initialDelaySeconds\":10,\"timeoutSeconds\":1}}]}}}}", 
        "--api-version=v1"

--- Additional comment from Javier Ramirez on 2017-05-31 10:16 EDT ---



--- Additional comment from Javier Ramirez on 2017-05-31 10:16 EDT ---



--- Additional comment from Javier Ramirez on 2017-05-31 10:17 EDT ---



--- Additional comment from Javier Ramirez on 2017-05-31 10:19 EDT ---



--- Additional comment from Russell Teague on 2017-06-16 16:13:45 EDT ---

Scott,
This bug was fixed[0] in 3.6 through the work for migrating to oc_* modules.  This could potentially be backported to 3.5 since we have the modules, but I've not investigated the extent of that effort.  Since we don't have the modules in 3.4 it would be a significant effort to backport to 3.4. What is your recommendation on moving forward with this issue?


[0] https://github.com/openshift/openshift-ansible/pull/3897/files

--- Additional comment from Scott Dodson on 2017-06-18 20:39:45 EDT ---

Lets go ahead and backport to 3.5.

Javier, it sounded like your customer was able to work around the problem. If we ensure that this is fixed in the playbooks to upgrade to 3.5 is that acceptable?

--- Additional comment from Javier Ramirez on 2017-06-19 07:09:23 EDT ---

(In reply to Scott Dodson from comment #6)
> Lets go ahead and backport to 3.5.
> 
> Javier, it sounded like your customer was able to work around the problem.
> If we ensure that this is fixed in the playbooks to upgrade to 3.5 is that
> acceptable?

Yes, that sounds good for the customer. If you know the bz number for the 3.5 backport, please let me know.
Comment 1 Russell Teague 2017-06-19 08:50:50 EDT
Proposed: https://github.com/openshift/openshift-ansible/pull/4493
Comment 2 Russell Teague 2017-06-20 15:04:14 EDT
Merged: https://github.com/openshift/openshift-ansible/pull/4493
Comment 4 Anping Li 2017-06-28 05:35:24 EDT
Verified with openshift-ansible-3.5.89.

1. Create router with stats-port=1937
oadm router --stats-port=1937
2. The port is still 1937 after upgrade
[cloud-user@container--1 ~]$ oc get dc router -o json|grep -B 1 1937
                                "name": "STATS_PORT",
                                "value": "1937"
--
                                "path": "/healthz",
                                "port": 1937,
--
                            {
                                "containerPort": 1937,
                                "hostPort": 1937,
--
                                "path": "/healthz",
                                "port": 1937,
Comment 6 errata-xmlrpc 2017-06-29 09:33:14 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1666

Note You need to log in before you can comment on or make changes to this bug.