Description of problem: Several deployments failed for me during hack day. From the web console, it was difficult to find information about why. Although I could have poked around with the CLI, I wanted to stay within the GUI. - Clicking the word "FAILED" in the overview of deployment did not take me to information about the failure. - The logs for the deployment# did not identify the underlying cause -- merely indicating a timeout. - I ultimately found the reason under Browse/Events, but these were in no way connected to my deployment. The event: "hellow-2 Replication Controller Warning Failed create Error creating: pods "hellow-2-" is forbidden: service account jupierce1/jws-service-account was not found, retry after the service account is created 8 times in the last 3 minutes" Version-Release number of selected component (if applicable): How reproducible: 100% . Steps to Reproduce: 1. Attempted to use template: jws30-tomcat8-mysql-s2i . Do no create a JWS service account in advance. 2. All template parameters were left as default. Actual results: No apparent way to directly analyze DC's failure. Expected results: Information about RC's failure somewhat correlated with DC in GUI. Additional info: I understand that this particular deployment failure is valid (https://bugzilla.redhat.com/show_bug.cgi?id=1313556). I'm just hoping that finding the underlying cause could have been more intuitive.
Ideally the cause of the failure would be written back to deployment status so we could display it directly. I have opened a PR that adds a link to the events tab to encourage users to check there. https://github.com/openshift/origin-web-console/pull/864
Is this not something that is being handled with conditions? Or are conditions only showing up on the DC and not the RCs
We will have conditions on the RCs once kube 1.5 rebase lands, then we can address this
The conditions on RCs don't provide us any useful information about what might have gone wrong. We now link you to both the log and the events for a failed deployment, which should help diagnose the problem.
Created attachment 1295345 [details] failed-deployment-help
This is probably the best we can do as far as an improvement any time soon, there are too many reasons (events) that could be the underlying cause of a failure.
Just going to link to the overview redesign PR for this https://github.com/openshift/origin-web-console/pull/1335
Checked this issue in openshift v3.6.139, now web console will display logs and events to help users analyze failure. see attachment. BTW, QE is checking many Modified bugs to see if they're verifiable. Because fixed, moving to Verified. If have other concerns, pls tell me, thx
Created attachment 1296107 [details] help debug link
Created attachment 1296108 [details] useful info
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716