Hide Forgot
Description of problem: We are seeing frequent restarts of node-web-proxy on some nodes, with the following error in the supervisor log: 1386949409546:ERROR:[13/Dec/2013:10:43:29 -0500] - io_timeout: 300 1386949409547:ERROR:[13/Dec/2013:10:43:29 -0500] - Error listener on proxied request: Error: socket hang up at createHangUpError (http.js:1424:15) at Socket.socketCloseListener (http.js:1474:23) at Socket.EventEmitter.emit (events.js:117:20) at TCP.close (net.js:451:12) Version-Release number of selected component (if applicable): openshift-origin-node-proxy-1.17.1-1.el6oso.noarch Additional info: It appears that we're setting the socket timeout on the incoming request in the _requestHandler function in openshift-node-web-proxy/lib/proxy/ProxyServer.js , but we don't set a timeout handler on it. We only set a timeout handler on the proxy request to the backend. It seems logical that if the timeouts are the same, the incoming request socket will always timeout before the proxy request socket does, so we either need a handler there or a longer timeout (and the timeout may need to be much longer due to app unidling).
I completely misinterpreted the log. This error in the log is not what causes the crash. The relevant error is: /opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:175 else throw new Error('not opened'); ^ Error: not opened at WebSocket.send (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:175:16) at WebSocket.<anonymous> (/opt/rh/nodejs010/root/usr/lib/node_modules/openshift-node-web-proxy/lib/proxy/ProxyServer.js:494:8) at WebSocket.EventEmitter.emit (events.js:98:17) at Receiver.self._receiver.ontext (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:555:10) at Receiver.opcodes.1.finish (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/Receiver.js:397:14) at Receiver.expectHandler (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/Receiver.js:384:31) at Receiver.add (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/Receiver.js:93:24) at Socket.firstHandler (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:535:22) at Socket.EventEmitter.emit (events.js:95:17) at Socket.<anonymous> (_stream_readable.js:720:14) DEBUG: Program node /opt/rh/nodejs010/root/usr/lib/node_modules/openshift-node-web-proxy/bin/web-proxy.js --config /etc/openshift/web-proxy-config.json exited with code 8
I'm collecting more data about this. I have tcpdump data from one crash, and I've put the proxy on that node in debug mode to capture additional log data. The process dies about twice an hour, so I should have more details soon.
Hmm, this appears to duplicate https://bugzilla.redhat.com/show_bug.cgi?id=1034077 but that was release into production and we are still seeing some issues here. I need to do some follow-up to make sure exactly when that fix landed on our systems.
Ahh, looking more closely at the traceback, this is not the same. It's the other connection. It looks like proxy_ws.on('message', ...) in ProxyServer.js needs a similar fix to the one in ws.on('message', ...) in the earlier issue.
I'm thinking that this is primarily needing an error handler, so we can log errors, but not crash the whole proxy. But I am a good deal curious why the connections are being reset.
Since the bug is hard to reproduce... QE do regression testing for the existing websocket cases, no regression issue found. Move bug to verified.