| Summary: | Unusual OpenShift Console Behavior | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Steven Walter <stwalter> |
| Component: | Networking | Assignee: | Dan Winship <danw> |
| Status: | CLOSED NOTABUG | QA Contact: | Meng Bo <bmeng> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.2.0 | CC: | aloughla, aos-bugs, bbennett, erich, jokerman, mmccomas, pweil, stwalter |
| Target Milestone: | --- | Keywords: | Performance |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-30 19:32:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Steven Walter
2016-11-21 20:29:49 UTC
Can you please get a tcpdump / wireshark trace of the traffic at the client side? Ideally one with the working Safari and one with Chrome. I'd be interested to see what connections exist and which end is initiating the teardown. Are they still seeing this behavior? I'm verifying with them The Safari capture shows a connection being established at time 1.530214, being used for a bit of traffic, and then going idle at 1.834035. Then at 61.920059 (60 seconds plus epsilon later), the server cleanly closes its end of the connection (and then the client tries to get in a few words before closing the other end of the connection, but the server has already stopped listening to it, and so responds with RSTs). So there doesn't seem to be anything "networky" going on there; the connection has gone idle, the server has apparently been configured to close any connections that are idle for longer than 60 seconds, and the client apparently isn't expecting this. I don't know much about our web UI, but based on the "will not even allow me to login and it takes FOREVER to load the login page" comment it seems like the problem is that some request that is supposed to happen quickly is getting "stuck" forever(-ish) somewhere in the backend and so we don't get a response before the proxy times out the connection? Looking at the logs on the server might show something. (It's worth noting that in both pcaps there seems to be a lot of network lossage (retransmissions, duplicates, etc) going on. But TCP seems to be coping with that lossage (because TCP), so I don't think it's related to this problem.) Based on Dan's comment it looks like something is taking too long to respond in the application. You can increase the timeouts: - Globally: https://docs.openshift.com/container-platform/3.3/install_config/configuring_routing.html#install-config-configuring-route-timeouts - Per-route: https://docs.openshift.com/container-platform/3.3/architecture/core_concepts/routes.html#haproxy-template-router (see the environment variable section) |