Bug 444171

Summary: RHEL5 cmirror tracker: potential memory leak during I/O load
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: cmirrorAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: agk, ccaulfie, dwysocha, edamato, heinzm, mbroz
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 21:27:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2008-04-25 16:30:45 UTC
Description of problem:
I had I/O going to 4 gfs/cmirrors from 3 nodes (hayes-0[123]) inorder to verify
bz 383291, and we noticed the following:

hayes-01:
Apr 25 11:05:16 hayes-01 clogd[3287]: Pending log list:
Apr 25 11:23:43 hayes-01 clogd[3287]: cluster_queue: 0
Apr 25 11:23:43 hayes-01 clogd[3287]: free_queue   : 4
Apr 25 11:23:43 hayes-01 clogd[3287]: Official log list:
Apr 25 11:23:43 hayes-01 clogd[3287]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQo53PXJVXDI5ekfiBozoacbRznqRirsBM
Apr 25 11:23:43 hayes-01 clogd[3287]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQ4Sf6gF304fMKVrJeh9qMjIING7SygXjg
Apr 25 11:23:43 hayes-01 clogd[3287]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQJ1X0wS5GRkfpPcUcYuydGLfHe1qO6Eea
Apr 25 11:23:43 hayes-01 clogd[3287]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQEGrWZe3lkJU1rWo705Dek6cPGGryLz2U
Apr 25 11:23:43 hayes-01 clogd[3287]: Pending log list:
Apr 25 11:23:44 hayes-01 clogd[3287]: Invalid request_type
device-mapper: dm-log-clustered: Request timed out on DM_CLOG_IN_SYNC:2226747 -
retrying
Apr 25 11:23:59 hayes-01 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_IN_SYNC:22267g


hayes-02:
Apr 25 11:23:02 hayes-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_IN_SYNC:21890g
Apr 25 11:23:28 hayes-02 clogd[3292]: kernel_recv:  Preallocated transfer
structs exhausted
Apr 25 11:23:37 hayes-02 clogd[3292]: cluster_queue: 0
Apr 25 11:23:37 hayes-02 clogd[3292]: free_queue   : 3
Apr 25 11:23:37 hayes-02 clogd[3292]: Official log list:
Apr 25 11:23:37 hayes-02 clogd[3292]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQo53PXJVXDI5ekfiBozoacbRznqRirsBM
Apr 25 11:23:37 hayes-02 clogd[3292]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQ4Sf6gF304fMKVrJeh9qMjIING7SygXjg
Apr 25 11:23:37 hayes-02 clogd[3292]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQJ1X0wS5GRkfpPcUcYuydGLfHe1qO6Eea
Apr 25 11:23:37 hayes-02 clogd[3292]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQEGrWZe3lkJU1rWo705Dek6cPGGryLz2U
Apr 25 11:23:37 hayes-02 clogd[3292]: Pending log list:
Apr 25 11:27:35 hayes-02 clogd[3292]: Invalid request_type
device-mapper: dm-log-clustered: Request timed out on DM_CLOG_IN_SYNC:2197508 -
retrying
Apr 25 11:27:50 hayes-02 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_IN_SYNC:21975g
Apr 25 11:28:10 hayes-02 clogd[3292]: kernel_recv:  Preallocated transfer
structs exhausted


hayes-03:
Apr 25 11:17:43 hayes-03 clogd[3296]: kernel_recv:  Preallocated transfer
structs exhausted
Apr 25 11:23:01 hayes-03 clogd[3296]: kernel_recv:  Preallocated transfer
structs exhausted
Apr 25 11:24:20 hayes-03 clogd[3296]: cluster_queue: 0
Apr 25 11:24:20 hayes-03 clogd[3296]: free_queue   : 4
Apr 25 11:24:20 hayes-03 clogd[3296]: Official log list:
Apr 25 11:24:20 hayes-03 clogd[3296]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQo53PXJVXDI5ekfiBozoacbRznqRirsBM
Apr 25 11:24:20 hayes-03 clogd[3296]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQ4Sf6gF304fMKVrJeh9qMjIING7SygXjg
Apr 25 11:24:20 hayes-03 clogd[3296]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQJ1X0wS5GRkfpPcUcYuydGLfHe1qO6Eea
Apr 25 11:24:20 hayes-03 clogd[3296]:
LVM-SXcMdjs1qf8Bj8Ll0xpIubQFgb37CYrQEGrWZe3lkJU1rWo705Dek6cPGGryLz2U
Apr 25 11:24:20 hayes-03 clogd[3296]: Pending log list:
Apr 25 11:24:46 hayes-03 clogd[3296]: Invalid request_type
device-mapper: dm-log-clustered: Request timed out on
DM_CLOG_MARK_REGION:2520799 - retrying
Apr 25 11:25:01 hayes-03 kernel: device-mapper: dm-log-clustered: Request timed
out on DM_CLOG_MARK_REGION:2g
Apr 25 11:25:22 hayes-03 clogd[3296]: kernel_recv:  Preallocated transfer
structs exhausted


Version-Release number of selected component (if applicable):
2.6.18-90.el5
kmod-cmirror-0.1.8-1.el5
lvm2-2.02.32-4.el5
lvm2-cluster-2.02.32-4.el5

Comment 1 Corey Marthaler 2008-04-25 18:01:24 UTC
I was using xiogen/xdoio for the I/O load. There were no failure scenarios of
any kind taking place during this.

On each node in cluster:

/dev/mapper/hayes-mirror1 on /mnt/mirror1 type gfs
(rw,hostdata=jid=2:id=131073:first=0)
/dev/mapper/hayes-mirror2 on /mnt/mirror2 type gfs
(rw,hostdata=jid=2:id=262145:first=0)
/dev/mapper/hayes-mirror3 on /mnt/mirror3 type gfs
(rw,hostdata=jid=2:id=393217:first=0)
/dev/mapper/hayes-mirror4 on /mnt/mirror4 type gfs
(rw,hostdata=jid=2:id=524289:first=0)


for i in 1 2 3 4; do xiogen -f buffered -m random -s read,write,readv,writev -t
100 -T 1000000  -F 1000000:/mnt/mirror$i/$(hostname) | xdoio -n 15 -kvD & done

Comment 2 Jonathan Earl Brassow 2008-05-16 18:53:45 UTC
commit 4eca59fdacf347d0315eff78487b642e17be2de7
Author: Jonathan Brassow <jbrassow>
Date:   Fri May 16 13:16:23 2008 -0500

    clogd:  Fix 444171 [memory leak]

    If there was an invalid request to the cluster
    log server, it would ignore it.  This is fine, but it
    did not free the memory structure holding the invalid
    request.

    I've also taken this opportunity to explicitly state
    all valid requests, and let the 'default' case handle
    the errors.


Comment 5 errata-xmlrpc 2009-01-20 21:27:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0158.html