Bug 1392042 - Router pod failed after updating both clusternetworkCIDR and servicenetworkworkCIDR
Summary: Router pod failed after updating both clusternetworkCIDR and servicenetworkwo...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-04 15:58 UTC by Weibin Liang
Modified: 2016-11-07 12:42 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-07 12:42:59 UTC
Target Upstream Version:


Attachments (Terms of Use)
Test logs from both v3.4.0.21 and v3.3.1.4 (6.82 KB, text/plain)
2016-11-04 15:58 UTC, Weibin Liang
no flags Details

Description Weibin Liang 2016-11-04 15:58:04 UTC
Created attachment 1217444 [details]
Test logs from both v3.4.0.21 and v3.3.1.4

Description of problem:
Router pod failed after updating both clusternetworkCIDR and servicenetworkworkCIDR with below error:
[root@dhcp-41-92 master]# oc logs router1-1-deploy
error: couldn't get deployment router1-1: Get
https://10.121.224.1:443/api/v1/namespaces/https/replicationcontrollers/router1-1: x509: certificate is valid for
10.18.41.92, 172.30.0.1, not 10.121.224.1

Version-Release number of selected component (if applicable):
oc v3.4.0.21+ca4702d
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible: 
Reproduciable


Steps to Reproduce:
[root@dhcp-41-92 master]# oc new-project https
Now using project "https" on server "https://dhcp-41-92.bos.redhat.com:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
[root@dhcp-41-92 master]# oadm policy add-scc-to-user privileged -z https-user
[root@dhcp-41-92 master]# oadm router router1 --replicas=1  --service-account=https-user -n https --host-network=true
info: password for stats user admin has been set to 00D99QAAxy
--> Creating router router1 ...
    serviceaccount "https-user" created
    warning: clusterrolebinding "router-router1-role" already exists
    deploymentconfig "router1" created
    service "router1" created
--> Success
[root@dhcp-41-92 master]# oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OSE3.3/one_nosecurity_route.json
route "route-1" created
service "endpoints" created
pod "endpoint-1" created
[root@dhcp-41-92 master]# oc get route
NAME      HOST/PORT             PATH      SERVICES    PORT      TERMINATION
route-1   hello-openshift.com             endpoints   <all>     
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          6s
router1-1-deploy   0/1       ContainerCreating   0          6s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          9s
router1-1-deploy   0/1       ContainerCreating   0          9s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          10s
router1-1-deploy   0/1       ContainerCreating   0          10s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          11s
router1-1-deploy   0/1       ContainerCreating   0          11s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          21s
router1-1-deploy   0/1       ContainerCreating   0          21s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
endpoint-1         1/1       Running   0          22s
router1-1-deploy   0/1       Error     0          22s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
endpoint-1         1/1       Running   0          24s
router1-1-deploy   0/1       Error     0          24s
[root@dhcp-41-92 master]# oc logs router1-1-deploy
error: couldn't get deployment router1-1: Get
https://10.121.224.1:443/api/v1/namespaces/https/replicationcontrollers/router1-1: x509: certificate is valid for
10.18.41.92, 172.30.0.1, not 10.121.224.1


Actual results:
router1-1-deploy can not up and run

Expected results:
router1-1-deploy should be in running state

Additional info:
Same test passed in v3.3.1.4

Comment 1 Ben Bennett 2016-11-04 16:43:33 UTC
I'll have to see what cert it is complaining about.

But one major difference between 3.3 and 3.4 is that we turn on extended validation of certs by default when you run 'oadm router'.  It was present in 3.3 but defaulted to off.

If that's the problem, then we need to decide if that is something we want to allow too, or if we want it to really be caught because it breaks the router.  If the former, we need to change the code, if the latter we need to update the migration docs.

But that's all just hand-waving until I analyze it a bit more.

Comment 2 Dan Winship 2016-11-07 12:42:59 UTC
I'm going to close this bug to avoid further confusion; this is documenting a problem with the instructions that appear in https://github.com/openshift/openshift-docs/pull/3112. It is not a bug in any released or unreleased version of OpenShift itself.


Note You need to log in before you can comment on or make changes to this bug.