Description of problem:
The image 'openshift3/ose' has been changed to 'openshift3/ose-control-plane'. So `oc diagnostics networkcheck` should also be updated. or it will make the pod can not be running.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run `oc diagnostics networking`
2. oc get pod --all-namespaces
Check the failed pod is using 'openshift3/ose' image:
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 56s kubelet, ip-172-18-14-237.ec2.internal MountVolume.SetUp succeeded for volume "host-root-dir"
Normal SuccessfulMountVolume 56s kubelet, ip-172-18-14-237.ec2.internal MountVolume.SetUp succeeded for volume "default-token-6f768"
Normal SuccessfulMountVolume 56s kubelet, ip-172-18-14-237.ec2.internal MountVolume.SetUp succeeded for volume "kconfig-secret"
Normal BackOff 25s (x2 over 52s) kubelet, ip-172-18-14-237.ec2.internal Back-off pulling image "openshift3/ose:v3.10.0-0.29.0"
Warning Failed 25s (x2 over 52s) kubelet, ip-172-18-14-237.ec2.internal Error: ImagePullBackOff
Normal Pulling 11s (x3 over 55s) kubelet, ip-172-18-14-237.ec2.internal pulling image "openshift3/ose:v3.10.0-0.29.0"
Warning Failed 9s (x3 over 54s) kubelet, ip-172-18-14-237.ec2.internal Failed to pull image "openshift3/ose:v3.10.0-0.29.0": rpc error: code = Unknown desc = repository docker.io/openshift3/ose not found: does not exist or no pull access
Warning Failed 9s (x3 over 54s) kubelet, ip-172-18-14-237.ec2.internal Error: ErrImagePull
no this error and pod can be running.
There are two issues with this bug:
(1) Network Diagnostics incorrectly detected openshift environment as containerized because of the new way of configuring and running openshift in 3.10 where it uses node image to run SDN, etc. With 3.10, containerized mode is no longer supported, so containerized specific code should be removed from network diagnostic.
(2) Network diagnostic test pod uses relative image path and that is not correctly resolved by the docker. We need to add appropriate registries to the docker config depending on the environment (AWS, openstack, etc.).
Issue (1) in comment#1 is fixed by https://github.com/openshift/origin/pull/19754
Ravi: Can you open a new bug for (2) please?
https://github.com/openshift/origin/pull/19754 has merged.
Created https://bugzilla.redhat.com/show_bug.cgi?id=1584494 for issue (2)
Issue (2) will be fixed by https://github.com/openshift/origin/pull/19901
Testing with AWS internal registry must pass --pod-image, --test-pod-image and --test-pod-port options.
@ Ravi Sankar
I don't think those two PR above are resolving the current bug issue. Could you check the bug title and the issue is we need to use 'openshift3/ose-control-plane instead' of 'openshift3/ose '
PR https://github.com/openshift/origin/pull/19901 is approved but not yet merged in master.
I thought PR 19901 only resolve the docker registry issue. ok, I will have a try after this PR is merged. thanks.
Yes, this needs one more fix: https://github.com/openshift/origin/pull/20013
Move this bug to MODIFIED
since this PR https://github.com/openshift/origin/pull/20013 only was merged in origin, but NOT OCP yet.
3.10 pr: https://github.com/openshift/origin/pull/20116
Verified this bug, now the default image is using openshift3/ose-control-plane.
There is another issue. I found the default registry 'registry.access.redhat.com/' is added again in 3.10, is that expected?
and from the PR https://github.com/openshift/origin/pull/20013 said:
For testing on AWS, user need to manually pass the image params
The test pod image will use 'docker.io/openshift/hello-openshift?
I tried use the default image '--test-pod-image='registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.10', but it failed with error:
# oc adm diagnostics networkcheck --pod-image='registry.reg-aws.openshift.com:443/openshift3/ose-control-plane:v3.10' --test-pod-image='registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.10'
[Note] Determining if client configuration exists for client/cluster diagnostics
Info: Successfully read a client config file at '/root/.kube/config'
Info: Using context for cluster-admin access: 'default/qe-zzhao-master-etcd-nfs-1:8443/system:admin'
[Note] Running diagnostic: NetworkCheck
Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint
ERROR: [DNet2005 from diagnostic NetworkCheck@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/cluster/network/run_pod.go:170]
Setting up test environment for network diagnostics failed: Failed to run network diags test pod and service: Failed to run network diags test pods, failed: 24, total: 24, details: error: --deployment or OPENSHIFT_DEPLOYMENT_NAME is required
[Note] Summary of diagnostics execution (version v3.10.10):
[Note] Errors seen: 1
I will create another bug to trace this issue.