Description of problem: During cmirror device failure testing on the latest rhel5 builds, I saw this issue where after a 'Library queue is full' message on taft-02, it got fenced due to dm-log-clustered timing out. Here's what the test was doing: Enabling device sdg on taft-01 Enabling device sdg on taft-02 Enabling device sdg on taft-03 Enabling device sdg on taft-04 Recreating PV /dev/sdg1 WARNING: Volume group helter_skelter is not consistent WARNING: Volume Group helter_skelter is not consistent WARNING: Volume group helter_skelter is not consistent Extending the recreated PV back into VG helter_skelter Since we can't yet up convert existing mirrors, down converting to linear(s) on taft-02 before re-converting back to original mirror(s) Up converting linear(s) back to mirror(s) on taft-02... taft-02: lvconvert -m 2 -b helter_skelter/nonsyncd_secondary_3legs_1 /dev/sde1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sdf1:0-150 <fail name="taft-02_1" pid="27866" time="Fri Apr 11 11:07:08 2008" type="cmd" duration="1207930028" ec="127" /> ALL STOP! Didn't receive heartbeat for 120 seconds couldn't up convert mirror nonsyncd_secondary_3legs_1 on taft-02 Here are the failure messages on taft-02: Apr 11 11:01:46 taft-02 qarshd[10829]: Running cmdline: lvconvert -m 2 -b helter_skelter/nonsyncd_secondary_3legs_1 /dev/sde1:0-1000 /dev/sdg1:0-1000 /dev/sdh1:0-1000 /dev/sdf1:0-150 Apr 11 11:01:57 taft-02 openais[6685]: [MAIN ] Library queue is full, disconnecting library connection. Apr 11 11:02:03 taft-02 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_RESUME:1209961 - retrying Apr 11 11:02:18 taft-02 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_RESUME:1209962 - retrying Apr 11 11:02:33 taft-02 kernel: device-mapper: dm-log-clustered: Request timed out on DM_CLOG_RESUME:1209963 - retrying Here are the messages from the node doing the fencing: Apr 11 11:02:48 taft-01 clogd[6681]: 95:31) SEQ#=882307, UUID=AHec9A4F, TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1 Apr 11 11:02:48 taft-01 clogd[6681]: 96:32) SEQ#=882309, UUID=AHec9A4F, TYPE=(null), ORIG=1, RESP=NO Apr 11 11:02:48 taft-01 clogd[6681]: 97:33) SEQ#=882309, UUID=AHec9A4F, TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1 Apr 11 11:02:48 taft-01 clogd[6681]: 98:34) SEQ#=882311, UUID=AHec9A4F, TYPE=(null), ORIG=1, RESP=NO Apr 11 11:02:48 taft-01 clogd[6681]: 99:35) SEQ#=882311, UUID=AHec9A4F, TYPE=DM_CLOG_GET_RESYNC_WORK, ORIG=1, RESP=YES, RSPR=1 Apr 11 11:02:48 taft-01 kernel: device-mapper: dm-log-clustered: Excessive delay in request processing, 14 sec for DM_CLOG_SET_REGION_SYNC Apr 11 11:03:18 taft-01 fenced[6643]: taft-02 not a cluster member after 30 sec post_fail_delay Apr 11 11:03:18 taft-01 fenced[6643]: fencing node "taft-02" Version-Release number of selected component (if applicable): 2.6.18-88.el5 openais-0.80.3-15.el5 lvm2-2.02.32-3.el5 kmod-cmirror-0.1.8-1.el5
Possibly blocking cluster mirror progress at this point, Setting flags to get consideration/discussion for 5.3.
The ipc system is designed such that there is one thread that uses the poll system call in a loop. When the outbound IPC kernel buffers are filled up, the poll thread is interrupted via the pthread_kill library function (which also sets POLLOUT) so that poll will allow fallthrough when outbound messages must be sent but the kernel queue is full and then becomes available. It is possible with this design that the number of messages being sent to the ipc connection by openais_response_send occurs at a faster rate then the poll thread can be processed. This causes poll to enter a loop always being interrupted by pthread_exit as new messages are queued for the ipc connection. The result is that new outgoing messages are not sent in this state to the ipc connection, even though that ipc connection has buffer space available in the kernel queues for outgoing messages. This problem is easily fixed by only sending a signal via pthread_exit once when the queue has filled to interrupt the blocking poll call instead of every time a new message is queued.
The cmirror clogd daemon has a priority inversion with openais. The fix requires a change to clogd to run at sched_rr 99 and also for the ipc dispatch thread to run at sched_rr 99 to avoid starvation of the outbound processing queues.
openais problems fixed in version openais-0.80.3-19.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0074.html