Bug 1042938

Summary: node-web-proxy crashes on send call to a closed socket
Product: OpenShift Online Reporter: Andy Grimm <agrimm>
Component: ContainersAssignee: Vincent Batts <vbatts>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.xCC: bmeng, jgoulding
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:53:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andy Grimm 2013-12-13 16:36:26 UTC
Description of problem:

We are seeing frequent restarts of node-web-proxy on some nodes, with the following error in the supervisor log:

1386949409546:ERROR:[13/Dec/2013:10:43:29 -0500] - io_timeout: 300
1386949409547:ERROR:[13/Dec/2013:10:43:29 -0500] - Error listener on proxied request: Error: socket hang up
    at createHangUpError (http.js:1424:15)
    at Socket.socketCloseListener (http.js:1474:23)
    at Socket.EventEmitter.emit (events.js:117:20)
    at TCP.close (net.js:451:12)

Version-Release number of selected component (if applicable):

openshift-origin-node-proxy-1.17.1-1.el6oso.noarch

Additional info:

It appears that we're setting the socket timeout on the incoming request in the _requestHandler function in openshift-node-web-proxy/lib/proxy/ProxyServer.js , but we don't set a timeout handler on it.  We only set a timeout handler on the proxy request to the backend.  It seems logical that if the timeouts are the same, the incoming request socket will always timeout before the proxy request socket does, so we either need a handler there or a longer timeout (and the timeout may need to be much longer due to app unidling).

Comment 1 Andy Grimm 2013-12-13 16:45:22 UTC
I completely misinterpreted the log.  This error in the log is not what causes the crash.  The relevant error is:

/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:175
    else throw new Error('not opened');
         ^
Error: not opened
    at WebSocket.send (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:175:16)
    at WebSocket.<anonymous> (/opt/rh/nodejs010/root/usr/lib/node_modules/openshift-node-web-proxy/lib/proxy/ProxyServer.js:494:8)
    at WebSocket.EventEmitter.emit (events.js:98:17)
    at Receiver.self._receiver.ontext (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:555:10)
    at Receiver.opcodes.1.finish (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/Receiver.js:397:14)
    at Receiver.expectHandler (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/Receiver.js:384:31)
    at Receiver.add (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/Receiver.js:93:24)
    at Socket.firstHandler (/opt/rh/nodejs010/root/usr/lib/node_modules/ws/lib/WebSocket.js:535:22)
    at Socket.EventEmitter.emit (events.js:95:17)
    at Socket.<anonymous> (_stream_readable.js:720:14)
DEBUG: Program node /opt/rh/nodejs010/root/usr/lib/node_modules/openshift-node-web-proxy/bin/web-proxy.js --config /etc/openshift/web-proxy-config.json exited with code 8

Comment 2 Andy Grimm 2013-12-16 19:50:32 UTC
I'm collecting more data about this. I have tcpdump data from one crash, and I've put the proxy on that node in debug mode to capture additional log data.  The process dies about twice an hour, so I should have more details soon.

Comment 3 Andy Grimm 2013-12-18 21:35:00 UTC
Hmm, this appears to duplicate https://bugzilla.redhat.com/show_bug.cgi?id=1034077

but that was release into production and we are still seeing some issues here.  I need to do some follow-up to make sure exactly when that fix landed on our systems.

Comment 4 Andy Grimm 2013-12-19 14:32:25 UTC
Ahh, looking more closely at the traceback, this is not the same.  It's the other connection.  It looks like proxy_ws.on('message', ...) in ProxyServer.js needs a similar fix to the one in ws.on('message', ...) in the earlier issue.

Comment 6 Vincent Batts 2014-01-07 18:53:24 UTC
I'm thinking that this is primarily needing an error handler, so we can log errors, but not crash the whole proxy. But I am a good deal curious why the connections are being reset.

Comment 9 Meng Bo 2014-01-15 09:58:02 UTC
Since the bug is hard to reproduce... QE do regression testing for the existing websocket cases, no regression issue found.

Move bug to verified.