+++ This bug was initially created as a clone of Bug #1969477 +++ Description of problem: Agent calls GetNextSteps API, the request arrives at the service, while processing the request the service runs `oc adm release info`, the command may take a long time and the request eventually times out Version-Release number of selected component (if applicable): Latest (2021/06/08) How reproducible: It seems to happen when there's connectivity issues to the registry, it hangs for 15 seconds, then says ``` $ oc adm release info error: unable to read image registry.ci...: Get "...": context deadline exceeded ``` Steps to Reproduce: See above Actual results: Request takes a long time, seems like some intermediate HTTP proxy eventually gives up and returns a 504. This is unexpected in the swagger definition so the agent gives a cryptic error message about swagger and 504. Expected results: Service should be able to tell that the command will fail more quickly. Either by caching in-advance, or having a more strict internal timeout. Additional info: --- Additional comment from itsoiref on 20210608T14:31:55 We saw that assisted-service is running "oc adm release info" for 2 images "machine-config-operator" and "must-gather". Current mirrored registry (disconnected env) was slow and it took ~15 seconds for each command to run. On the agent side we have 30 seconds timeout as default in assisted-client. This combination caused agent to timeout all the time and installation failed to start and moved from preparing back to ready and vicer-versa all the time. We should add cache per ocp version per image as there is no benefit to run it all the time. --- Additional comment from alazar on 20210608T14:59:33 Besides caching, maybe should we also enlarge the timeout on the agent's side?
this bug doesn't block OCP4.8.0 -release - the fix will be delivered the the ACM channel on the AI-operator add verified then
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438