Bug 146696

Summary: possible metadata coruption trying to active VG
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: lvm2-clusterAssignee: Alasdair Kergon <agk>
Status: CLOSED WORKSFORME QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: agk, kanderso
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-01 16:24:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2005-01-31 20:48:55 UTC
Description of problem:
this may be related to bz146056 but unlike bz146056, these appear to
be real errors and afterwards, the clvmd cmds hang indefinately.

I've reproduced this a few times now during the second recovery
attempt.  The node being brought back up sees theses errors during the
vg activation:
  Huge memory allocation (size 1124073490) rejected - metadata corruption?
  Huge memory allocation (size 1916097632) rejected - metadata corruption?
  Huge memory allocation (size 1332770115) rejected - metadata corruption?
  Huge memory allocation (size 959853662) rejected - metadata corruption?

and then afterwards, other clvmd cmds like this vgdisplay hang:
.
.
.
stat64("/dev/hda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 1),
...}) = 0
stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1),
...}) = 0
stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17),
...}) = 0
stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33),
...}) = 0
stat64("/dev/sdd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49),
...}) = 0
stat64("/dev/hda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 2),
...}) = 0
stat64("/dev/hda3", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 3),
...}) = 0
time(NULL)                              = 1107203815
getpid()                                = 3922
stat64("/etc/lvm/archive", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat64("/etc/lvm/backup", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat64("/etc/lvm/lvm.conf", {st_mode=S_IFREG|0644, st_size=10126,
...}) = 0
open("/usr/lib/liblvm2clusterlock.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\7\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0555, st_size=7932, ...}) = 0
old_mmap(NULL, 10500, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x1ba000
old_mmap(0x1bc000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
3, 0x1000) = 0x1bc000
close(3)                                = 0
socket(PF_UNIX, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_UNIX, path=@clvmd}, 110

Also, the devices never show up:
[root@morph-02 root]# ls /dev/mapper/gfs-gfs1
/dev/mapper/gfs-gfs1
[root@morph-02 root]# mount -t gfs /dev/mapper/gfs-gfs1 /mnt/gfs1
mount: /dev/mapper/gfs-gfs1 is not a valid block device


Version-Release number of selected component (if applicable):
[root@morph-03 root]# clvmd -V
Cluster LVM daemon version: 2.01.02 (2005-01-21)
Protocol version:           0.2.1


How reproducible:
Sometimes

Comment 1 Kiersten (Kerri) Anderson 2005-02-01 15:32:48 UTC
Adding to release blocker list

Comment 2 Christine Caulfield 2005-02-01 16:15:58 UTC
I can't reproduce this, the LVM errors at the top of the bug report
look quite nasty though - Alasdair ??

Comment 3 Alasdair Kergon 2005-02-01 17:13:58 UTC
Not enough info, but the usual cause of those memory errors is corrupt lvm1
metadata.  Need -vvvv (or lvm2 logfile) output from the command giving the
memory error, or a dump of the data at the start of the device which it read
causing it.

Comment 4 Kiersten (Kerri) Anderson 2005-02-23 17:40:26 UTC
Removing from blocker list.  If it is recreatable, then will get back on the
list at that time.

Comment 5 Corey Marthaler 2005-03-14 21:50:26 UTC
Added more discriptive summary.
This issue was only seen that one time.


Comment 6 Alasdair Kergon 2006-02-01 16:24:42 UTC
A year has passed without seeing this again, so closing it for now.