Hide Forgot
Created attachment 1217444 [details] Test logs from both v3.4.0.21 and v3.3.1.4 Description of problem: Router pod failed after updating both clusternetworkCIDR and servicenetworkworkCIDR with below error: [root@dhcp-41-92 master]# oc logs router1-1-deploy error: couldn't get deployment router1-1: Get https://10.121.224.1:443/api/v1/namespaces/https/replicationcontrollers/router1-1: x509: certificate is valid for 10.18.41.92, 172.30.0.1, not 10.121.224.1 Version-Release number of selected component (if applicable): oc v3.4.0.21+ca4702d kubernetes v1.4.0+776c994 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: Reproduciable Steps to Reproduce: [root@dhcp-41-92 master]# oc new-project https Now using project "https" on server "https://dhcp-41-92.bos.redhat.com:8443". You can add applications to this project with the 'new-app' command. For example, try: oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git to build a new example application in Ruby. [root@dhcp-41-92 master]# oadm policy add-scc-to-user privileged -z https-user [root@dhcp-41-92 master]# oadm router router1 --replicas=1 --service-account=https-user -n https --host-network=true info: password for stats user admin has been set to 00D99QAAxy --> Creating router router1 ... serviceaccount "https-user" created warning: clusterrolebinding "router-router1-role" already exists deploymentconfig "router1" created service "router1" created --> Success [root@dhcp-41-92 master]# oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OSE3.3/one_nosecurity_route.json route "route-1" created service "endpoints" created pod "endpoint-1" created [root@dhcp-41-92 master]# oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION route-1 hello-openshift.com endpoints <all> [root@dhcp-41-92 master]# oc get pods NAME READY STATUS RESTARTS AGE endpoint-1 0/1 ContainerCreating 0 6s router1-1-deploy 0/1 ContainerCreating 0 6s [root@dhcp-41-92 master]# oc get pods NAME READY STATUS RESTARTS AGE endpoint-1 0/1 ContainerCreating 0 9s router1-1-deploy 0/1 ContainerCreating 0 9s [root@dhcp-41-92 master]# oc get pods NAME READY STATUS RESTARTS AGE endpoint-1 0/1 ContainerCreating 0 10s router1-1-deploy 0/1 ContainerCreating 0 10s [root@dhcp-41-92 master]# oc get pods NAME READY STATUS RESTARTS AGE endpoint-1 0/1 ContainerCreating 0 11s router1-1-deploy 0/1 ContainerCreating 0 11s [root@dhcp-41-92 master]# oc get pods NAME READY STATUS RESTARTS AGE endpoint-1 0/1 ContainerCreating 0 21s router1-1-deploy 0/1 ContainerCreating 0 21s [root@dhcp-41-92 master]# oc get pods NAME READY STATUS RESTARTS AGE endpoint-1 1/1 Running 0 22s router1-1-deploy 0/1 Error 0 22s [root@dhcp-41-92 master]# oc get pods NAME READY STATUS RESTARTS AGE endpoint-1 1/1 Running 0 24s router1-1-deploy 0/1 Error 0 24s [root@dhcp-41-92 master]# oc logs router1-1-deploy error: couldn't get deployment router1-1: Get https://10.121.224.1:443/api/v1/namespaces/https/replicationcontrollers/router1-1: x509: certificate is valid for 10.18.41.92, 172.30.0.1, not 10.121.224.1 Actual results: router1-1-deploy can not up and run Expected results: router1-1-deploy should be in running state Additional info: Same test passed in v3.3.1.4
I'll have to see what cert it is complaining about. But one major difference between 3.3 and 3.4 is that we turn on extended validation of certs by default when you run 'oadm router'. It was present in 3.3 but defaulted to off. If that's the problem, then we need to decide if that is something we want to allow too, or if we want it to really be caught because it breaks the router. If the former, we need to change the code, if the latter we need to update the migration docs. But that's all just hand-waving until I analyze it a bit more.
I'm going to close this bug to avoid further confusion; this is documenting a problem with the instructions that appear in https://github.com/openshift/openshift-docs/pull/3112. It is not a bug in any released or unreleased version of OpenShift itself.