Bug 1392042

Summary: Router pod failed after updating both clusternetworkCIDR and servicenetworkworkCIDR
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED NOTABUG QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.4.0CC: aloughla, aos-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 12:42:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Test logs from both v3.4.0.21 and v3.3.1.4 none

Description Weibin Liang 2016-11-04 15:58:04 UTC
Created attachment 1217444 [details]
Test logs from both v3.4.0.21 and v3.3.1.4

Description of problem:
Router pod failed after updating both clusternetworkCIDR and servicenetworkworkCIDR with below error:
[root@dhcp-41-92 master]# oc logs router1-1-deploy
error: couldn't get deployment router1-1: Get
https://10.121.224.1:443/api/v1/namespaces/https/replicationcontrollers/router1-1: x509: certificate is valid for
10.18.41.92, 172.30.0.1, not 10.121.224.1

Version-Release number of selected component (if applicable):
oc v3.4.0.21+ca4702d
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible: 
Reproduciable


Steps to Reproduce:
[root@dhcp-41-92 master]# oc new-project https
Now using project "https" on server "https://dhcp-41-92.bos.redhat.com:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
[root@dhcp-41-92 master]# oadm policy add-scc-to-user privileged -z https-user
[root@dhcp-41-92 master]# oadm router router1 --replicas=1  --service-account=https-user -n https --host-network=true
info: password for stats user admin has been set to 00D99QAAxy
--> Creating router router1 ...
    serviceaccount "https-user" created
    warning: clusterrolebinding "router-router1-role" already exists
    deploymentconfig "router1" created
    service "router1" created
--> Success
[root@dhcp-41-92 master]# oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OSE3.3/one_nosecurity_route.json
route "route-1" created
service "endpoints" created
pod "endpoint-1" created
[root@dhcp-41-92 master]# oc get route
NAME      HOST/PORT             PATH      SERVICES    PORT      TERMINATION
route-1   hello-openshift.com             endpoints   <all>     
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          6s
router1-1-deploy   0/1       ContainerCreating   0          6s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          9s
router1-1-deploy   0/1       ContainerCreating   0          9s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          10s
router1-1-deploy   0/1       ContainerCreating   0          10s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          11s
router1-1-deploy   0/1       ContainerCreating   0          11s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS              RESTARTS   AGE
endpoint-1         0/1       ContainerCreating   0          21s
router1-1-deploy   0/1       ContainerCreating   0          21s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
endpoint-1         1/1       Running   0          22s
router1-1-deploy   0/1       Error     0          22s
[root@dhcp-41-92 master]# oc get pods
NAME               READY     STATUS    RESTARTS   AGE
endpoint-1         1/1       Running   0          24s
router1-1-deploy   0/1       Error     0          24s
[root@dhcp-41-92 master]# oc logs router1-1-deploy
error: couldn't get deployment router1-1: Get
https://10.121.224.1:443/api/v1/namespaces/https/replicationcontrollers/router1-1: x509: certificate is valid for
10.18.41.92, 172.30.0.1, not 10.121.224.1


Actual results:
router1-1-deploy can not up and run

Expected results:
router1-1-deploy should be in running state

Additional info:
Same test passed in v3.3.1.4

Comment 1 Ben Bennett 2016-11-04 16:43:33 UTC
I'll have to see what cert it is complaining about.

But one major difference between 3.3 and 3.4 is that we turn on extended validation of certs by default when you run 'oadm router'.  It was present in 3.3 but defaulted to off.

If that's the problem, then we need to decide if that is something we want to allow too, or if we want it to really be caught because it breaks the router.  If the former, we need to change the code, if the latter we need to update the migration docs.

But that's all just hand-waving until I analyze it a bit more.

Comment 2 Dan Winship 2016-11-07 12:42:59 UTC
I'm going to close this bug to avoid further confusion; this is documenting a problem with the instructions that appear in https://github.com/openshift/openshift-docs/pull/3112. It is not a bug in any released or unreleased version of OpenShift itself.