Bug 442102 - openais full library queue causes node to be fenced
Summary: openais full library queue causes node to be fenced
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais
Version: 5.2
Hardware: All
OS: Linux
urgent
low
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 509888
TreeView+ depends on / blocked
 
Reported: 2008-04-11 18:39 UTC by Corey Marthaler
Modified: 2016-04-26 14:25 UTC (History)
5 users (show)

Fixed In Version: 0.80.3-19.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 20:46:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0074 0 normal SHIPPED_LIVE openais bug-fix update 2009-01-20 16:04:11 UTC

Description Corey Marthaler 2008-04-11 18:39:09 UTC
Description of problem:
During cmirror device failure testing on the latest rhel5 builds, I saw this
issue where after a 'Library queue is full' message on taft-02, it got fenced
due to dm-log-clustered timing out.

Here's what the test was doing:

Enabling device sdg on taft-01
Enabling device sdg on taft-02
Enabling device sdg on taft-03
Enabling device sdg on taft-04

Recreating PV /dev/sdg1
  WARNING: Volume group helter_skelter is not consistent
  WARNING: Volume Group helter_skelter is not consistent
  WARNING: Volume group helter_skelter is not consistent
Extending the recreated PV back into VG helter_skelter
Since we can't yet up convert existing mirrors, down converting to linear(s)
on taft-02 before re-converting back to original mirror(s)
Up converting linear(s) back to mirror(s) on taft-02...
taft-02: lvconvert -m 2 -b helter_skelter/nonsyncd_secondary_3legs_1
/dev/sde1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sdf1:0-150
<fail name="taft-02_1" pid="27866" time="Fri Apr 11 11:07:08 2008" type="cmd"
duration="1207930028" ec="127" />
ALL STOP!
Didn't receive heartbeat for 120 seconds
couldn't up convert mirror nonsyncd_secondary_3legs_1 on taft-02

Here are the failure messages on taft-02:
Apr 11 11:01:46 taft-02 qarshd[10829]: Running cmdline: lvconvert -m 2 -b
helter_skelter/nonsyncd_secondary_3legs_1 /dev/sde1:0-1000 /dev/sdg1:0-1000
/dev/sdh1:0-1000 /dev/sdf1:0-150
Apr 11 11:01:57 taft-02 openais[6685]: [MAIN ] Library queue is full,
disconnecting library connection.
Apr 11 11:02:03 taft-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_RESUME:1209961 - retrying
Apr 11 11:02:18 taft-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_RESUME:1209962 - retrying
Apr 11 11:02:33 taft-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_RESUME:1209963 - retrying

Here are the messages from the node doing the fencing:
Apr 11 11:02:48 taft-01 clogd[6681]: 95:31) SEQ#=882307, UUID=AHec9A4F,
TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1
Apr 11 11:02:48 taft-01 clogd[6681]: 96:32) SEQ#=882309, UUID=AHec9A4F,
TYPE=(null), ORIG=1, RESP=NO
Apr 11 11:02:48 taft-01 clogd[6681]: 97:33) SEQ#=882309, UUID=AHec9A4F,
TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1
Apr 11 11:02:48 taft-01 clogd[6681]: 98:34) SEQ#=882311, UUID=AHec9A4F,
TYPE=(null), ORIG=1, RESP=NO
Apr 11 11:02:48 taft-01 clogd[6681]: 99:35) SEQ#=882311, UUID=AHec9A4F,
TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1
Apr 11 11:02:48 taft-01 kernel: device-mapper: dm-log-clustered: Excessive delay
in request processing, 14 sec for DM_CLOG_SET_REGION_SYNC
Apr 11 11:03:18 taft-01 fenced[6643]: taft-02 not a cluster member after 30 sec
post_fail_delay
Apr 11 11:03:18 taft-01 fenced[6643]: fencing node "taft-02"

Version-Release number of selected component (if applicable):
2.6.18-88.el5
openais-0.80.3-15.el5
lvm2-2.02.32-3.el5
kmod-cmirror-0.1.8-1.el5

Comment 1 Kiersten (Kerri) Anderson 2008-09-09 15:33:15 UTC
Possibly blocking cluster mirror progress at this point,  Setting flags to get consideration/discussion for 5.3.

Comment 3 Steven Dake 2008-09-23 02:20:27 UTC
The ipc system is designed such that there is one thread that uses the poll system call in a loop.  When the outbound IPC kernel buffers are filled up, the poll thread is interrupted via the pthread_kill library function (which also sets POLLOUT) so that poll will allow fallthrough when outbound messages must be sent but the kernel queue is full and then becomes available.

It is possible with this design that the number of messages being sent to the ipc connection by openais_response_send occurs at a faster rate then the poll thread can be processed.  This causes poll to enter a loop always being interrupted by pthread_exit as new messages are queued for the ipc connection.  The result is that new outgoing messages are not sent in this state to the ipc connection, even though that ipc connection has buffer space available in the kernel queues for outgoing messages.

This problem is easily fixed by only sending a signal via pthread_exit once when the queue has filled to interrupt the blocking poll call instead of every time a new message is queued.

Comment 4 Steven Dake 2008-09-23 17:55:26 UTC
The cmirror clogd daemon has a priority inversion with openais.  The fix requires a change to clogd to run at sched_rr 99 and also for the ipc dispatch thread to run at sched_rr 99 to avoid starvation of the outbound processing queues.

Comment 5 Steven Dake 2008-09-23 17:56:08 UTC
openais problems fixed in version openais-0.80.3-19.el5

Comment 8 errata-xmlrpc 2009-01-20 20:46:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0074.html


Note You need to log in before you can comment on or make changes to this bug.