Bug 146696

Summary:	possible metadata coruption trying to active VG
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Corey Marthaler <cmarthal>
Component:	lvm2-cluster	Assignee:	Alasdair Kergon <agk>
Status:	CLOSED WORKSFORME	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	agk, kanderso
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-02-01 16:24:42 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Corey Marthaler 2005-01-31 20:48:55 UTC

Description of problem:
this may be related to bz146056 but unlike bz146056, these appear to
be real errors and afterwards, the clvmd cmds hang indefinately.

I've reproduced this a few times now during the second recovery
attempt.  The node being brought back up sees theses errors during the
vg activation:
  Huge memory allocation (size 1124073490) rejected - metadata corruption?
  Huge memory allocation (size 1916097632) rejected - metadata corruption?
  Huge memory allocation (size 1332770115) rejected - metadata corruption?
  Huge memory allocation (size 959853662) rejected - metadata corruption?

and then afterwards, other clvmd cmds like this vgdisplay hang:
.
.
.
stat64("/dev/hda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 1),
...}) = 0
stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1),
...}) = 0
stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17),
...}) = 0
stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33),
...}) = 0
stat64("/dev/sdd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49),
...}) = 0
stat64("/dev/hda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 2),
...}) = 0
stat64("/dev/hda3", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 3),
...}) = 0
time(NULL)                              = 1107203815
getpid()                                = 3922
stat64("/etc/lvm/archive", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat64("/etc/lvm/backup", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat64("/etc/lvm/lvm.conf", {st_mode=S_IFREG|0644, st_size=10126,
...}) = 0
open("/usr/lib/liblvm2clusterlock.so", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\7\0\000"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0555, st_size=7932, ...}) = 0
old_mmap(NULL, 10500, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x1ba000
old_mmap(0x1bc000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
3, 0x1000) = 0x1bc000
close(3)                                = 0
socket(PF_UNIX, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_UNIX, path=@clvmd}, 110

Also, the devices never show up:
[root@morph-02 root]# ls /dev/mapper/gfs-gfs1
/dev/mapper/gfs-gfs1
[root@morph-02 root]# mount -t gfs /dev/mapper/gfs-gfs1 /mnt/gfs1
mount: /dev/mapper/gfs-gfs1 is not a valid block device


Version-Release number of selected component (if applicable):
[root@morph-03 root]# clvmd -V
Cluster LVM daemon version: 2.01.02 (2005-01-21)
Protocol version:           0.2.1


How reproducible:
Sometimes

Comment 1 Kiersten (Kerri) Anderson 2005-02-01 15:32:48 UTC

Adding to release blocker list

Comment 2 Christine Caulfield 2005-02-01 16:15:58 UTC

I can't reproduce this, the LVM errors at the top of the bug report
look quite nasty though - Alasdair ??

Comment 3 Alasdair Kergon 2005-02-01 17:13:58 UTC

Not enough info, but the usual cause of those memory errors is corrupt lvm1
metadata.  Need -vvvv (or lvm2 logfile) output from the command giving the
memory error, or a dump of the data at the start of the device which it read
causing it.

Comment 4 Kiersten (Kerri) Anderson 2005-02-23 17:40:26 UTC

Removing from blocker list.  If it is recreatable, then will get back on the
list at that time.

Comment 5 Corey Marthaler 2005-03-14 21:50:26 UTC

Added more discriptive summary.
This issue was only seen that one time.

Comment 6 Alasdair Kergon 2006-02-01 16:24:42 UTC

A year has passed without seeing this again, so closing it for now.