Bug 442102
Summary: | openais full library queue causes node to be fenced | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> |
Component: | openais | Assignee: | Steven Dake <sdake> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | low | Docs Contact: | |
Priority: | urgent | ||
Version: | 5.2 | CC: | bstevens, cluster-maint, edamato, sghosh, syeghiay |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 0.80.3-19.el5 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-20 20:46:50 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 509888 |
Description
Corey Marthaler
2008-04-11 18:39:09 UTC
Possibly blocking cluster mirror progress at this point, Setting flags to get consideration/discussion for 5.3. The ipc system is designed such that there is one thread that uses the poll system call in a loop. When the outbound IPC kernel buffers are filled up, the poll thread is interrupted via the pthread_kill library function (which also sets POLLOUT) so that poll will allow fallthrough when outbound messages must be sent but the kernel queue is full and then becomes available. It is possible with this design that the number of messages being sent to the ipc connection by openais_response_send occurs at a faster rate then the poll thread can be processed. This causes poll to enter a loop always being interrupted by pthread_exit as new messages are queued for the ipc connection. The result is that new outgoing messages are not sent in this state to the ipc connection, even though that ipc connection has buffer space available in the kernel queues for outgoing messages. This problem is easily fixed by only sending a signal via pthread_exit once when the queue has filled to interrupt the blocking poll call instead of every time a new message is queued. The cmirror clogd daemon has a priority inversion with openais. The fix requires a change to clogd to run at sched_rr 99 and also for the ipc dispatch thread to run at sched_rr 99 to avoid starvation of the outbound processing queues. openais problems fixed in version openais-0.80.3-19.el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0074.html |