Bug 1384629 - OpenShift smartstate errors -unknown access error to pod management-infra/manageiq-img-scan-7f243: #<Net::HTTPBadRequest:0x00000010422df8>
Summary: OpenShift smartstate errors -unknown access error to pod management-infra/man...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: SmartState Analysis
Version: 5.6.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: GA
: 5.9.0
Assignee: Erez Freiberger
QA Contact: brahmani
URL:
Whiteboard: container
Depends On:
Blocks: 1461558
TreeView+ depends on / blocked
 
Reported: 2016-10-13 17:39 UTC by Thomas Hennessy
Modified: 2020-09-10 09:51 UTC (History)
19 users (show)

Fixed In Version: 5.9.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1461558 (view as bug list)
Environment:
Last Closed: 2018-03-06 15:44:53 UTC
Category: ---
Cloudforms Team: Container Management
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Thomas Hennessy 2016-10-13 17:39:35 UTC
Description of problem:smartstate analysis for OpenShift provier fails


Version-Release number of selected component (if applicable): 5.6.2.1


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I have processed the CFME logs and isolated the process logs for the two items identified in the case description:

from the logs associated with dz2lrcfp405.divbiz.net

the refresh worker process - 30183 => http error
Unexpected Exception during refresh: HTTP status code 403, User "system:serviceaccount:management-infra:management-admin" cannot list all componentstatuses in the cluster

and two smartstate worker processes: 29011 and 29019 both encountering errors but not while processing any smartstate message
===================================
from the logs associated with dz2lrcfp404.divbiz.net

no ems refresh error in this log set.

several smartstate scanning errors captured in worker process logs for pids 11690, 11698, 11706, and 11714
====================================

and for the logs associated with dz2lrcfp403.divbiz.net

the refresh worker processes  - 25983 , 11986 , 14085 (and others) http error

and several smartstate scanning errors capture in the work process logs for pids 11730,11714,

I have collected the OpenShift logs into the location http://file.rdu.redhat.com/~thenness/SF-01720000-CFME-OpenShift/ under the directory "OpenShift Materials" and the CFME logs with the extracted processes noted above in the same http://file.rdu.redhat.com/~thenness/SF-01720000-CFME-OpenShift/ location under the directory "CFME Materials".

Comment 7 Thomas Hennessy 2016-11-14 16:35:32 UTC
Erez,
Thank you for responding.  I don't know if you missed the reference in the original text.

I have collected the OpenShift logs into the location http://file.rdu.redhat.com/~thenness/SF-01720000-CFME-OpenShift/ under the directory "OpenShift Materials" and the CFME logs with the extracted processes noted above in the same http://file.rdu.redhat.com/~thenness/SF-01720000-CFME-OpenShift/ location under the directory "CFME Materials".

there are three CFME instances in this environment.  

As I mentioned before, I have zero experience and/or training with OpenShift so when  you say *you* need more information, *I* need to know exactly what *you* need since my role in this is only passing along to the customer what *you* need to resolve this issue.  

Please advise.

Tom Hennessy

Comment 11 Sachin 2017-02-03 10:50:23 UTC
Erez,

Any update from your side? Do you want any other info from customer?

Comment 19 Sachin 2017-02-09 06:00:23 UTC
Erez,

Please find attached logs

Comment 31 Erez Freiberger 2017-03-30 14:32:19 UTC
Martin,
I have created a PR to fix this problem but I am still not sure how to replicate it on my own. I would appreciate it if you could explain how this proxy is defined exactly?

[1]https://github.com/ManageIQ/manageiq/pull/14578

Comment 33 Martin Eggen 2017-03-31 11:28:50 UTC
I have tested the modified scan job from the PR in our environment successfully.

Comment 41 brahmani 2017-11-16 14:31:56 UTC
Hi Erez!
I looking for way to replicate and check this issue,
From what i understand the problem related to specific customer proxy settings that change error messages from Openshift, so its cause problem with healthz poll status from Image inspector POD.
Since that do you have an idea if i replicate this problem here?

Comment 42 Erez Freiberger 2017-11-19 09:50:28 UTC
I never succeeded in replicating it myself, I would have tried doing so by creating a proxy between ManageIQ and Openshift that will change 404 messages to 500 (HTTPBadRequest).
Also notice that the fix was backported to fine, so to reproduce you will have to use and early fine version.

Comment 43 Einat Pacifici 2017-12-04 09:19:06 UTC
Verified. After discussion with dev+PM this BZ cannot be reproduced. Hence, should this issue reappear this BZ can then be reopened.


Note You need to log in before you can comment on or make changes to this bug.