Bug 1572182

Summary: oc diagnostics networkcheck should using openshift3/ose-control-plane instead of openshift3/ose
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Ravi Sankar <rpenta>
Status: CLOSED CURRENTRELEASE QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.10.0CC: aos-bugs, bbennett, hongli, rpenta, wmeng, xtian
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: (1) Network diags incorrectly considering the installation as containerized. (2) Relative image path used by diagnostic test pods are not correctly resolved by kubelet Consequence: network check diags fails Fix: Network diags will correctly determine as non-containerized install and fully qualified image names are used for test pods. Result: network check diagnostics works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-20 21:12:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1583500, 1584494, 1588768    
Bug Blocks:    

Description zhaozhanqi 2018-04-26 11:03:10 UTC
Description of problem:
The image 'openshift3/ose' has been changed to 'openshift3/ose-control-plane'. So `oc diagnostics networkcheck` should also be updated. or it will make the pod can not be running.

Version-Release number of selected component (if applicable):
v3.10.0-0.29.0

How reproducible:
always

Steps to Reproduce:
1. run `oc diagnostics networking`
2. oc get pod --all-namespaces
3.

Actual results:

Check the failed pod is using 'openshift3/ose' image:

                          Message
  ----     ------                 ----               ----                                    -------
  Normal   SuccessfulMountVolume  56s                kubelet, ip-172-18-14-237.ec2.internal  MountVolume.SetUp succeeded for volume "host-root-dir"
  Normal   SuccessfulMountVolume  56s                kubelet, ip-172-18-14-237.ec2.internal  MountVolume.SetUp succeeded for volume "default-token-6f768"
  Normal   SuccessfulMountVolume  56s                kubelet, ip-172-18-14-237.ec2.internal  MountVolume.SetUp succeeded for volume "kconfig-secret"
  Normal   BackOff                25s (x2 over 52s)  kubelet, ip-172-18-14-237.ec2.internal  Back-off pulling image "openshift3/ose:v3.10.0-0.29.0"
  Warning  Failed                 25s (x2 over 52s)  kubelet, ip-172-18-14-237.ec2.internal  Error: ImagePullBackOff
  Normal   Pulling                11s (x3 over 55s)  kubelet, ip-172-18-14-237.ec2.internal  pulling image "openshift3/ose:v3.10.0-0.29.0"
  Warning  Failed                 9s (x3 over 54s)   kubelet, ip-172-18-14-237.ec2.internal  Failed to pull image "openshift3/ose:v3.10.0-0.29.0": rpc error: code = Unknown desc = repository docker.io/openshift3/ose not found: does not exist or no pull access
  Warning  Failed                 9s (x3 over 54s)   kubelet, ip-172-18-14-237.ec2.internal  Error: ErrImagePull



Expected results:

no this error and pod can be running.


Additional info:

Comment 1 Ravi Sankar 2018-05-17 07:23:10 UTC
There are two issues with this bug:

(1) Network Diagnostics incorrectly detected openshift environment as containerized because of the new way of configuring and running openshift in 3.10 where it uses node image to run SDN, etc. With 3.10, containerized mode is no longer supported, so containerized specific code should be removed from network diagnostic.

(2) Network diagnostic test pod uses relative image path and that is not correctly resolved by the docker. We need to add appropriate registries to the docker config depending on the environment (AWS, openstack, etc.).

Comment 2 Ravi Sankar 2018-05-17 17:35:31 UTC
Issue (1) in comment#1 is fixed by https://github.com/openshift/origin/pull/19754

Comment 3 Ben Bennett 2018-05-23 13:16:45 UTC
Ravi: Can you open a new bug for (2) please?

https://github.com/openshift/origin/pull/19754 has merged.

Comment 4 Ravi Sankar 2018-05-31 04:35:13 UTC
Created https://bugzilla.redhat.com/show_bug.cgi?id=1584494 for issue (2)

Comment 5 Ravi Sankar 2018-06-01 17:16:12 UTC
Issue (2) will be fixed by https://github.com/openshift/origin/pull/19901
Testing with AWS internal registry must pass --pod-image, --test-pod-image and --test-pod-port options.

Comment 8 zhaozhanqi 2018-06-06 02:54:20 UTC
@ Ravi Sankar

I don't think those two PR above are resolving the current bug issue. Could you check the bug title and the issue is we need to use 'openshift3/ose-control-plane instead' of 'openshift3/ose '

Comment 9 Ravi Sankar 2018-06-06 04:48:23 UTC
PR https://github.com/openshift/origin/pull/19901 is approved but not yet merged in master.

Comment 10 zhaozhanqi 2018-06-06 05:33:41 UTC
I thought PR 19901 only resolve the docker registry issue. ok, I will have a try after this PR is merged. thanks.

Comment 11 Ravi Sankar 2018-06-14 23:49:44 UTC
@zhaozhanqi 
Yes, this needs one more fix: https://github.com/openshift/origin/pull/20013

Comment 13 zhaozhanqi 2018-06-27 03:26:39 UTC
Move this bug to MODIFIED

since this PR https://github.com/openshift/origin/pull/20013 only was merged in origin, but NOT OCP yet.

Comment 14 Ravi Sankar 2018-06-27 16:45:06 UTC
3.10 pr: https://github.com/openshift/origin/pull/20116

Comment 16 zhaozhanqi 2018-07-02 08:00:53 UTC
Verified this bug, now the default image is using openshift3/ose-control-plane.

There is another issue. I found the default registry 'registry.access.redhat.com/' is added again in 3.10, is that expected?
 and from the PR https://github.com/openshift/origin/pull/20013 said:

For testing on AWS, user need to manually pass the image params
--pod-image=registry.reg-aws.openshift.com:443/openshift3/ose-control-plane:v3.10
--test-pod-image=docker.io/openshift/hello-openshift

The test pod image will use 'docker.io/openshift/hello-openshift?
 I tried use the default image '--test-pod-image='registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.10', but it failed with error:

# oc adm diagnostics networkcheck --pod-image='registry.reg-aws.openshift.com:443/openshift3/ose-control-plane:v3.10' --test-pod-image='registry.reg-aws.openshift.com:443/openshift3/ose-deployer:v3.10'
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
Info:  Using context for cluster-admin access: 'default/qe-zzhao-master-etcd-nfs-1:8443/system:admin'

[Note] Running diagnostic: NetworkCheck
       Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint
       
ERROR: [DNet2005 from diagnostic NetworkCheck@openshift/origin/pkg/oc/admin/diagnostics/diagnostics/cluster/network/run_pod.go:170]
       Setting up test environment for network diagnostics failed: Failed to run network diags test pod and service: Failed to run network diags test pods, failed: 24, total: 24, details: error: --deployment or OPENSHIFT_DEPLOYMENT_NAME is required
       
[Note] Summary of diagnostics execution (version v3.10.10):
[Note] Errors seen: 1

I will create another bug to trace this issue.