Bug 519228 - Potential memory leak when stressing clvm activation operations
Summary: Potential memory leak when stressing clvm activation operations
Keywords:
Status: CLOSED DUPLICATE of bug 543146
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2-cluster
Version: 5.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-25 18:17 UTC by Corey Marthaler
Modified: 2010-01-12 04:10 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-02 08:06:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2009-08-25 18:17:43 UTC
Description of problem:
I've reproduced this quite a few times now where I let activator run for many days and it eventually ends up causing an OOM situation.

Activator is a test that basically just activates and deactivates volume groups containing clustered volumes (linears,stripes,mirrors) over and over.

I'll try and run this with 'clvmd -d' and see if that gives us more info.

Iteration 7903 of 10000 started at Mon Aug 24 18:54:14 CDT 2009
Deactivating volume activator1 on grant-02                     
Deactivating volume activator2 on grant-03                     
Deactivating volume activator3 on grant-01                     
Verifying that all volumes are deactivated                     
Verifying that all volumes no longer exist in device-mapper    
Reactivating volume activator1 on grant-01                     
Reactivating volume activator2 on grant-03                     
Reactivating volume activator3 on grant-02                     

Iteration 7904 of 10000 started at Mon Aug 24 18:55:14 CDT 2009
Deactivating volume activator1 on grant-02
Deactivating volume activator2 on grant-01
Deactivating volume activator3 on grant-01
Verifying that all volumes are deactivated
Verifying that all volumes no longer exist in device-mapper
Reactivating volume activator1 on grant-03
Reactivating volume activator2 on grant-02
Reactivating volume activator3 on grant-03
  Error locking on node grant-03: Command timed out
  Error writing data to clvmd: Broken pipe
  Internal error: Volume Group activator3 was not unlocked
  Device '/dev/sdc3' has been left open.
  Device '/dev/sdc3' has been left open.
  Device '/dev/sdc2' has been left open.
  Device '/dev/sdc2' has been left open.
  Device '/dev/sdc6' has been left open.
  Device '/dev/sdc1' has been left open.
  Device '/dev/sdc1' has been left open.
  Device '/dev/sdc5' has been left open.
  Device '/dev/sdc5' has been left open.
  Device '/dev/sdc5' has been left open.
  Device '/dev/sdc1' has been left open.
  Device '/dev/sdc6' has been left open.
  Device '/dev/sdc6' has been left open.
  Device '/dev/sdc2' has been left open.
  Device '/dev/sdc3' has been left open.

Iteration 7905 of 10000 started at Mon Aug 24 18:57:44 CDT 2009
Deactivating volume activator1 on grant-03
  connect() failed on local socket: Connection refused
  WARNING: Falling back to local file-based locking.
  Volume Groups with the clustered attribute will be inaccessible.
  Skipping clustered volume group activator1
couldn't deactivate on grant-03


Aug 24 18:56:23 grant-03 lvm[12709]: activator3-mirror2 is now in-sync
Aug 24 18:56:25 grant-03 lvm[12709]: Monitoring mirror device activator3-mirror3 for events
Aug 24 18:56:25 grant-03 lvm[12709]: activator3-mirror3 is now in-sync
Aug 24 18:56:28 grant-03 lvm[12709]: Monitoring mirror device activator3-mirror4 for events
Aug 24 18:56:28 grant-03 lvm[12709]: activator3-mirror4 is now in-sync
Aug 24 18:57:53 grant-03 kernel: btimed invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0
Aug 24 18:57:53 grant-03 kernel:
Aug 24 18:57:53 grant-03 kernel: Call Trace:
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff800c5b10>] out_of_memory+0x8e/0x2f3
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff8000f487>] __alloc_pages+0x245/0x2ce
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff80032768>] read_swap_cache_async+0x45/0xd8
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff8000926e>] __handle_mm_fault+0xadc/0xf99
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff880c4732>] :libata:ata_sff_pause+0x9/0x14
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff880c6081>] :libata:ata_sff_qc_issue+0x11a/0x287
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff880b8bd0>] :libata:ata_qc_issue+0x516/0x58e
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff80066b25>] do_page_fault+0x4cb/0x830
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff880bd0df>] :libata:ata_scsi_translate+0x140/0x16d
Aug 24 18:57:53 grant-03 kernel:  [<ffffffff8008b63f>] dequeue_task+0x18/0x37
Aug 24 18:56:49 grant-03 openais[12425]: [TOTEM] entering GATHER state from 12.
Aug 24 18:57:58 grant-03 kernel:  [<ffffffff80062fc8>] thread_return+0x62/0xfe
Aug 24 18:57:58 grant-03 openais[12425]: [TOTEM] Saving state aru 684320 high seq received 684320


Version-Release number of selected component (if applicable):
2.6.18-162.el5

lvm2-2.02.46-8.el5    BUILT: Thu Jun 18 08:06:12 CDT 2009
lvm2-cluster-2.02.46-8.el5    BUILT: Thu Jun 18 08:05:27 CDT 2009
cmirror-1.1.39-2.el5    BUILT: Mon Jul 27 15:39:05 CDT 2009
kmod-cmirror-0.1.22-1.el5    BUILT: Mon Jul 27 15:28:46 CDT 2009


How reproducible:
Eventually everytime.

Comment 1 Corey Marthaler 2009-08-31 14:46:35 UTC
Reproduced this again, but 'clvmd -d' didn't really provide any help on the node that ran OOM. 

Iteration 7908 of 10000 started at Mon Aug 31 09:04:52 CDT 2009
Deactivating volume activator1 on grant-03
Didn't receive heartbeat for 120 seconds
couldn't deactivate on grant-03


Aug 31 09:05:05 grant-03 lvm[9756]: activator3-mirror3 is now in-sync
Aug 31 09:05:07 grant-03 lvm[9756]: Monitoring mirror device activator3-mirror4 for events
Aug 31 09:05:07 grant-03 lvm[9756]: activator3-mirror4 is now in-sync
Aug 31 09:05:09 grant-03 lvm[9756]: Monitoring mirror device activator3-mirror5 for events
Aug 31 09:05:09 grant-03 lvm[9756]: activator3-mirror5 is now in-sync
Aug 31 09:05:09 grant-03 xinetd[9183]: START: qarsh pid=11618 from=10.15.80.47
Aug 31 09:05:09 grant-03 qarshd[11618]: Talking to peer 10.15.80.47:46436
Aug 31 09:05:09 grant-03 qarshd[11618]: Running cmdline: vgchange -an activator1
Aug 31 09:05:28 grant-03 openais[9582]: [TOTEM] entering GATHER state from 12.
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] Saving state aru a317 high seq received a317
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] Storing new sequence id for ring 994
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] entering COMMIT state.
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] entering RECOVERY state.
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] position [0] member 10.15.89.151:
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] previous ring seq 2448 rep 10.15.89.151
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] aru a318 high delivered a318 received flag 1
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] position [1] member 10.15.89.152:
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] previous ring seq 2448 rep 10.15.89.151
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] aru a318 high delivered a318 received flag 1
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] position [2] member 10.15.89.153:
Aug 31 09:05:29 grant-03 openais[9582]: [TOTEM] previous ring seq 2448 rep 10.15.89.151
Aug 31 09:05:31 grant-03 openais[9582]: [TOTEM] aru a317 high delivered a317 received flag 0
Aug 31 09:05:31 grant-03 openais[9582]: [TOTEM] Did not need to originate any messages in recovery.
Aug 31 09:05:31 grant-03 openais[9582]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] New Configuration:
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ]         r(0) ip(10.15.89.151)
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ]         r(0) ip(10.15.89.152)
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ]         r(0) ip(10.15.89.153)
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] Members Left:
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] Members Joined:
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] New Configuration:
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ]         r(0) ip(10.15.89.151)
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ]         r(0) ip(10.15.89.152)
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ]         r(0) ip(10.15.89.153)
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] Members Left:
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] Members Joined:
Aug 31 09:05:32 grant-03 openais[9582]: [SYNC ] This node is within the primary component and will provide service.
Aug 31 09:05:32 grant-03 openais[9582]: [TOTEM] entering OPERATIONAL state.
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] got nodejoin message 10.15.89.153
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] got nodejoin message 10.15.89.151
Aug 31 09:05:32 grant-03 openais[9582]: [CLM  ] got nodejoin message 10.15.89.152
Aug 31 09:05:32 grant-03 openais[9582]: [CPG  ] got joinlist message from node 1
Aug 31 09:05:32 grant-03 openais[9582]: [CPG  ] got joinlist message from node 2
Aug 31 09:05:32 grant-03 openais[9582]: [CPG  ] got joinlist message from node 3
Aug 31 09:05:37 grant-03 lvm[9756]: No longer monitoring mirror device activator1-mirror1 for events
Aug 31 09:42:01 grant-03 syslogd 1.4.1: restart.



CLVMD[431d2940]: Aug 31 09:05:37 LVM thread waiting for work
CLVMD[d4f42820]: Aug 31 09:05:37 Read on local socket 10, len = 85
CLVMD[d4f42820]: Aug 31 09:05:37 check_all_clvmds_running
CLVMD[43bd3940]: Aug 31 09:05:37 Got pre command condition...
CLVMD[43bd3940]: Aug 31 09:05:37 Writing status 0 down pipe 12
CLVMD[43bd3940]: Aug 31 09:05:37 Waiting to do post command - state = 0
CLVMD[d4f42820]: Aug 31 09:05:37 read on PIPE 11: 4 bytes: status: 0
CLVMD[d4f42820]: Aug 31 09:05:37 background routine status was 0, sock_client=0x2aac0c6c3e40
CLVMD[d4f42820]: Aug 31 09:05:37 distribute command: XID = 43604
CLVMD[d4f42820]: Aug 31 09:05:37 add_to_lvmqueue: cmd=0x2aac07f4e7d0. client=0x2aac0c6c3e40, msg=0x2aac0c6c3540, len=85, csid=(nil), xid=43604
CLVMD[d4f42820]: Aug 31 09:05:37 Sending message to all cluster nodes
CLVMD[431d2940]: Aug 31 09:05:37 process_work_item: local
CLVMD[431d2940]: Aug 31 09:05:37 process_local_command: LOCK_LV (0x32) msg=0x2aac0c6c3f50, msglen =85, client=0x2aac0c6c3e40
CLVMD[431d2940]: Aug 31 09:05:37 do_lock_lv: resource 'QZThNqybhmuZkq2bWm63sRGQ69G8aiBxvVsVIxwpA4ebewLw6NaVthm3K0xFVGMt', cmd = 0x18 LCK_LV_DEACTIVATE (NULL|LV|NONBLOCK), flags = 0x84 (DMEVENTD_MONITOR )
CLVMD[d4f42820]: Aug 31 09:05:37 Reply from node grant-02: 0 bytes
CLVMD[d4f42820]: Aug 31 09:05:37 Got 1 replies, expecting: 3
CLVMD[d4f42820]: Aug 31 09:05:37 Reply from node grant-01: 0 bytes
CLVMD[d4f42820]: Aug 31 09:05:37 Got 2 replies, expecting: 3

Comment 2 Milan Broz 2009-12-02 08:06:45 UTC
I am almost sure that if this is during activation, it is leak from bug 543146.

*** This bug has been marked as a duplicate of bug 543146 ***


Note You need to log in before you can comment on or make changes to this bug.