Bug 164959

Summary: CRM# 619256 RHEL4 lvm2 memory allocation failures locking up system with snapshots
Product: Red Hat Enterprise Linux 4 Reporter: Issue Tracker <tao>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: bugzilla.redhat, jch, joshkel, kbsingh, k.georgiou, ksorensen, menscher, modus-bugzilla, nospam, olle, rkenna, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2006-0137 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-07 21:33:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 173163, 173164, 173166    
Bug Blocks: 168429    

Comment 6 Alasdair Kergon 2005-09-21 19:54:31 UTC
*** Bug 168970 has been marked as a duplicate of this bug. ***

Comment 7 Alasdair Kergon 2005-09-21 19:59:32 UTC
*** Bug 166975 has been marked as a duplicate of this bug. ***

Comment 8 Alasdair Kergon 2005-09-21 20:08:49 UTC
This is the RHEL4 version of bug 132057

Comment 11 Damian Menscher 2005-09-28 18:17:01 UTC
According to the other bug report, this problem has been "fully understood"
since January?  What's the holdup on getting it fixed?

Logs from a CentOS 4.1 machine:

Sep 28 03:00:03 astro kernel: lvcreate: page allocation failure. order:0, mode:0xd0
Sep 28 03:00:03 astro kernel:  [<c013fa77>] __alloc_pages+0x28b/0x29d
Sep 28 03:00:03 astro kernel:  [<f8884a3b>] alloc_pl+0x27/0x3d [dm_mod]
Sep 28 03:00:03 astro kernel:  [<f8884b16>] client_alloc_pages+0x15/0x47 [dm_mod]
Sep 28 03:00:03 astro kernel:  [<f88854b6>] kcopyd_client_create+0x64/0x9f [dm_mod]
Sep 28 03:00:03 astro kernel:  [<f884b697>] snapshot_ctr+0x231/0x2b8 [dm_snapshot]
Sep 28 03:00:03 astro kernel:  [<f8881185>] dm_table_add_target+0xfc/0x169 [dm_mod]
Sep 28 03:00:03 astro kernel:  [<f888320c>] populate_table+0x8a/0xaf [dm_mod]
Sep 28 03:00:03 astro kernel:  [<f8883268>] table_load+0x37/0x123 [dm_mod]
Sep 28 03:00:03 astro kernel:  [<f8883ce3>] ctl_ioctl+0xd1/0x144 [dm_mod] 
Sep 28 03:00:03 astro kernel:  [<f8883231>] table_load+0x0/0x123 [dm_mod]
Sep 28 03:00:03 astro kernel:  [<c0165b5e>] sys_ioctl+0x227/0x269
Sep 28 03:00:03 astro kernel:  [<c02c7377>] syscall_call+0x7/0xb
Sep 28 03:00:03 astro kernel: Mem-info:
Sep 28 03:00:03 astro kernel: DMA per-cpu:
Sep 28 03:00:03 astro kernel: cpu 0 hot: low 2, high 6, batch 1
Sep 28 03:00:03 astro kernel: cpu 0 cold: low 0, high 2, batch 1
Sep 28 03:00:03 astro kernel: cpu 1 hot: low 2, high 6, batch 1 
Sep 28 03:00:03 astro kernel: cpu 1 cold: low 0, high 2, batch 1
Sep 28 03:00:03 astro kernel: cpu 2 hot: low 2, high 6, batch 1 
Sep 28 03:00:03 astro kernel: cpu 2 cold: low 0, high 2, batch 1
Sep 28 03:00:03 astro kernel: cpu 3 hot: low 2, high 6, batch 1 
Sep 28 03:00:03 astro kernel: cpu 3 cold: low 0, high 2, batch 1
Sep 28 03:00:03 astro kernel: Normal per-cpu:
Sep 28 03:00:03 astro kernel: cpu 0 hot: low 32, high 96, batch 16 
Sep 28 03:00:03 astro kernel: cpu 0 cold: low 0, high 32, batch 16
Sep 28 03:00:03 astro kernel: cpu 1 hot: low 32, high 96, batch 16
Sep 28 03:00:03 astro kernel: cpu 1 cold: low 0, high 32, batch 16
Sep 28 03:00:03 astro kernel: cpu 2 hot: low 32, high 96, batch 16
Sep 28 03:00:03 astro kernel: cpu 2 cold: low 0, high 32, batch 16
Sep 28 03:00:03 astro kernel: cpu 3 hot: low 32, high 96, batch 16
Sep 28 03:00:03 astro kernel: cpu 3 cold: low 0, high 32, batch 16
Sep 28 03:00:03 astro kernel: HighMem per-cpu:
Sep 28 03:00:03 astro kernel: cpu 0 hot: low 14, high 42, batch 7
Sep 28 03:00:03 astro kernel: cpu 0 cold: low 0, high 14, batch 7
Sep 28 03:00:03 astro kernel: cpu 1 hot: low 14, high 42, batch 7
Sep 28 03:00:03 astro kernel: cpu 1 cold: low 0, high 14, batch 7
Sep 28 03:00:03 astro kernel: cpu 2 hot: low 14, high 42, batch 7
Sep 28 03:00:03 astro kernel: cpu 2 cold: low 0, high 14, batch 7
Sep 28 03:00:03 astro kernel: cpu 3 hot: low 14, high 42, batch 7
Sep 28 03:00:03 astro kernel: cpu 3 cold: low 0, high 14, batch 7
Sep 28 03:00:03 astro kernel: 
Sep 28 03:00:03 astro kernel: Free pages:       14836kB (280kB HighMem)
Sep 28 03:00:03 astro kernel: Active:197720 inactive:36044 dirty:48 writeback:0
unstable:0 free:3709 slab:15165 mapped:158110 pagetables:2309
Sep 28 03:00:03 astro kernel: DMA free:12636kB min:16kB low:32kB high:48kB
active:0kB inactive:0kB present:16384kB pages_scanned:53734 all_unreclaimable? yes
Sep 28 03:00:03 astro kernel: protections[]: 0 0 0
Sep 28 03:00:03 astro kernel: Normal free:1920kB min:928kB low:1856kB
high:2784kB active:670208kB inactive:140912kB present:901120kB pages_scanned:0
all_unreclaimable? no
Sep 28 03:00:03 astro kernel: protections[]: 0 0 0
Sep 28 03:00:03 astro kernel: HighMem free:308kB min:128kB low:256kB high:384kB
active:120672kB inactive:3264kB present:130496kB pages_scanned:0
all_unreclaimable? no
Sep 28 03:00:03 astro kernel: protections[]: 0 0 0
Sep 28 03:00:03 astro kernel: DMA: 1*4kB 3*8kB 4*16kB 4*32kB 4*64kB 1*128kB
1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12636kB
Sep 28 03:00:03 astro kernel: Normal: 480*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1920kB
Sep 28 03:00:03 astro kernel: HighMem: 91*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 364kB
Sep 28 03:00:04 astro kernel: Swap cache: add 1685, delete 1683, find 453/671,
race 0+0
Sep 28 03:00:04 astro kernel: Free swap:       2096248kB
Sep 28 03:00:04 astro kernel: 262000 pages of RAM
Sep 28 03:00:04 astro kernel: 32624 pages of HIGHMEM 
Sep 28 03:00:04 astro kernel: 3522 reserved pages
Sep 28 03:00:04 astro kernel: 169391 pages shared
Sep 28 03:00:04 astro kernel: 2 pages swap cached
Sep 28 03:00:04 astro kernel: device-mapper: Could not create kcopyd client
Sep 28 03:00:04 astro kernel: device-mapper: error adding target to table



Comment 12 Alasdair Kergon 2005-10-03 15:40:06 UTC
*** Bug 169162 has been marked as a duplicate of this bug. ***

Comment 20 Alasdair Kergon 2005-12-01 19:51:37 UTC
There are now some changes as follows:

The steps necessary to create or activate a snapshot have been resequenced so
that if there isn't enough memory available this should not cause the system to
lock up.

Changes are being made to the kernel, lvm2 and device-mapper packages for U3.

The same amount of memory as before is still needed, but the memory now gets
reserved *before* the snapshot becomes live, rather than during critical parts
of the process where failure could leave the machine in an unusable state.


Work will continue separately aimed at providing control over the amount of
memory used.


Comment 22 Marc Bejarano 2006-02-07 19:00:00 UTC
i just ran into this.  are the current bits available for beta testing the fix?
 is this issue being tracked in another open bugtracker?

Comment 24 Red Hat Bugzilla 2006-03-07 18:40:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0099.html


Comment 25 Red Hat Bugzilla 2006-03-07 21:33:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0137.html