Bug 1322314

Summary: Diagnostics container did not report the missing of router pod
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: ocAssignee: Luke Meyer <lmeyer>
Status: CLOSED ERRATA QA Contact: Wei Sun <wsun>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: agoldste, aos-bugs, jokerman, mmccomas, tdawson, xiazhao
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 16:34:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
full diagnostics log attached none

Description Xia Zhao 2016-03-30 09:40:49 UTC
Problem description: 
Run "oadm diagnostics  --images='openshift3/ose-${component}:${version}' " when router pod is not running, the checks inside diag container can pass, without reporting the missing of router pod

Version-Release number of selected component (if applicable):
openshift v3.2.0.8
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

How reproducible:
Always

Steps to Reproduce:
1. Login openshift master, go to default project
2. Scale down router pod to 0 :
# oc scale rc  router-1 --replicas=0
3. Make sure router pod is not running, only the registry pod is running:
# oc get po
NAME                      READY     STATUS    RESTARTS   AGE
docker-registry-1-6mboi   1/1       Running   0          45s
4. Diagnostics openshift by running the diag container:
oadm diagnostics  --images='openshift3/ose-${component}:${version}'

Actual Result:
Diag container passed:
[Note] Running diagnostic: DiagnosticPod
       Description: Create a pod to run diagnostics from the application standpoint
       
Info:  Output from the diagnostic pod (image openshift3/ose-deployer:v3.2.0.8):
       [Note] Running diagnostic: PodCheckAuth
              Description: Check that service account credentials authenticate as expected
              
       Info:  Service account token successfully authenticated to master
       Info:  Service account token was authenticated by the integrated registry.
       
       [Note] Running diagnostic: PodCheckDns
              Description: Check that DNS within a pod works as expected
              
       [Note] Summary of diagnostics execution (version v3.2.0.8):
       [Note] Completed with no errors or warnings seen.


Expected Result:
Diag container should fail, and should report the missing of router pod

Additional info:
The error message from diag container when registry pod is missing:
       ERROR: [DP1014 from diagnostic PodCheckAuth@openshift/origin/pkg/diagnostics/pod/auth.go:173]
              Request to integrated registry timed out; this typically indicates network or SDN problems.

Comment 1 Andy Goldstein 2016-03-30 14:56:14 UTC
This works for me:

[Note] Running diagnostic: ClusterRouterName
       Description: Check there is a working router

ERROR: [DClu2007 from diagnostic ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:156]
       The "router" DeploymentConfig exists but has no running pods, so it
       is not available. Apps will not be externally accessible via the router.


Can you paste or attach the entire output from your oadm diagnostics run?

openshift v3.2.0.8
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

Comment 2 Luke Meyer 2016-03-30 18:32:54 UTC
Right, that particular diagnostic doesn't test for the router, because the router isn't very interesting from the perspective of inside a pod. It tests for the presence of the registry because build pods need to interact with the registry all the time and we would like to know if/why that is not working.

ClusterRouterName is the correct diagnostic for looking for a router.

Comment 3 Xia Zhao 2016-03-31 03:49:14 UTC
Created attachment 1142057 [details]
full diagnostics log attached

Comment 4 Xia Zhao 2016-03-31 03:51:41 UTC
@lmeyer yes, I verified that ClusterRouterName managed in reporting the missing about router pod. I now understand that diag container is not aimed at reporting this.Would you please help set the bug status to ON_QA, and I will then close it? Thanks.

Thanks,
Xia

Comment 5 Xia Zhao 2016-04-01 10:00:35 UTC
Set to verified according to my comment in #4

Comment 7 errata-xmlrpc 2016-05-12 16:34:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064