Bug 1332856

Summary: Authentication invalidates when provider is down
Product: Red Hat CloudForms Management Engine Reporter: Einat Pacifici <epacific>
Component: UI - OPSAssignee: Greg Blomquist <gblomqui>
Status: CLOSED ERRATA QA Contact: Einat Pacifici <epacific>
Severity: high Docs Contact:
Priority: high    
Version: 5.6.0CC: azellner, bazulay, cpelland, dajohnso, dron, epacific, fsimonce, hkataria, jfrey, jhardy, mfeifer, mpovolny, obarenbo, simaishi
Target Milestone: GA   
Target Release: 5.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.6.0.11 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-29 15:57:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Bug
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Attached: screenshot + evm.log + oc get pods result
none
Recreated evm.log and CFME screenshots none

Description Einat Pacifici 2016-05-04 08:46:50 UTC
Created attachment 1153747 [details]
Attached: screenshot + evm.log + oc get pods result

Description of problem:
When viewing list of Pods in CFME - Containers, the list contains pods that no longer exist. 
Also, the list indicates these pods as running pods. 

Version-Release number of selected component (if applicable):
5.6.0.4

How reproducible:
Always

Steps to Reproduce:
1.In CFME - create provider and ensure there are several pods. 
2.Delete pods & add new pods

Actual results:
All pods - deleted and new are listed and are listed as running

Expected results:
Only existing pods should be listed.

Comment 2 Federico Simoncelli 2016-05-04 16:56:05 UTC
Ari, please work with Einat to understand if what she's seeing is an old issue we fixed already or if it's something new. Thanks.

Comment 3 Ari Zellner 2016-05-05 11:55:41 UTC
Unable to reproduce. Einat, this may have been already fixed. If not, lets review this together when youre available.

Comment 4 Dave Johnson 2016-05-05 18:09:46 UTC
Hey Dafna, like we discussed, assigning this to you for a retest.

Comment 6 Einat Pacifici 2016-05-08 11:58:20 UTC
This is currently blocked by: Bug 1333258

Comment 7 Barak 2016-05-15 09:19:21 UTC
Einat any updates ?

Comment 8 Einat Pacifici 2016-05-18 07:40:58 UTC
Created attachment 1158643 [details]
Recreated evm.log and CFME screenshots

Comment 9 Einat Pacifici 2016-05-18 07:43:00 UTC
Barak, this issue is still occurring. I have attached screenshots and evm.log
In the master I see: 

[root@ose-master ~]# oc get pods --all-namespaces
NAMESPACE         NAME                         READY     STATUS    RESTARTS   AGE
default           management-metrics-1-6r0z3   0/1       Pending   0          8h
default           my-pod1                      1/1       Running   0          1m
default           router-1-978yd               0/1       Pending   0          8h
default           router-1-uigub               1/1       Running   0          1d
openshift-infra   hawkular-cassandra-1-8l8uz   1/1       Running   0          8h
openshift-infra   hawkular-metrics-w2c7h       1/1       Running   0          8h
openshift-infra   heapster-taryt               1/1       Running   4          8h
openshift-infra   stress-1-jl5lv               1/1       Running   0          8h
openshift-infra   stress1-1-cy2i2              1/1       Running   0          8h

Comment 10 Ari Zellner 2016-05-26 11:07:56 UTC
Einat, Im having a hard time reproducing this and Id like your help. Please show me your env when this happens.

Comment 12 Einat Pacifici 2016-05-31 06:19:14 UTC
Dafna, the containers remain in CFME for as long as CFME is running. This means that the obsolete/deleted containers are always visible (for the 7 days that CFME was running and available).
During this time the system that openshift was on (rhevm3) went down and was brought back up. 
As a result, new containers were created. 
The visible result was that CFME was showing the old as well as the new containers.

Comment 13 Federico Simoncelli 2016-06-01 08:59:34 UTC
As discussed with Ari yesterday this is a side-effect of authentication failures that are deactivating the provider refresh workers.

Ari please add more information here.

Comment 14 Jason Frey 2016-06-03 18:31:59 UTC
Federico, is there a separate BZ about the authentication failures that are deactivating the provider refresh workers?  I ask because *that* one should be marked as blocker as well, if so.

Comment 15 Federico Simoncelli 2016-06-03 20:34:19 UTC
(In reply to Jason Frey from comment #14)
> Federico, is there a separate BZ about the authentication failures that are
> deactivating the provider refresh workers?

Jason, not that I know of. I was thinking to use this one (unless you think it could be misleading).

> I ask because *that* one should be marked as blocker as well, if so.

Yes this BZ is a blocker because it is the one about authentication failures that are preventing the refresh worker from running (therefore "CFME shows pods that do not exist in openshift", as reported in the subject of the BZ).

Ari should confirm that (needinfo added in comment 13) or maybe update us in case he found any additional issue.

Comment 16 Ari Zellner 2016-06-06 21:36:11 UTC
This is a provider infrastructure problem with a possible fix here: https://github.com/ManageIQ/manageiq/pull/8912

Comment 17 Dave Johnson 2016-06-08 22:41:55 UTC
PR Merged and backported

Comment 19 errata-xmlrpc 2016-06-29 15:57:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348