Bug 149542

Summary:	clvmd segfault on vgchange -an
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Derek Anderson <danderso>
Component:	lvm2-cluster	Assignee:	Christine Caulfield <ccaulfie>
Status:	CLOSED WORKSFORME	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	agk
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-02-01 16:46:51 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Derek Anderson 2005-02-23 21:14:49 UTC

Description of problem:
I'll need to try to narrow this down, but I'm logging everything here
before I forget something.

Setup: 4 node cluster.  link-10,link-11,link-12 are x86.  link-08 is
x86_64 Opteron.

Created a pool device and GFS from another node attached to this
cluster.  Umounted and unassembled the pool.  On the 6.1 cluster (from
link-12) ran a vgconvert -M2 to make it an lvm2 vol and used gfs_tool
df to update the lock type to lock_dlm.  Mounted it and ran some
traffic (placemaker) during a meeting (~45 mins).  After that I
umounted and ran the new gfs_fsck on it twice.  I then ran "vgchange
-an" on link-12.  I had two other existing lvm2 volumes: /dev/VG1/LV1
and /dev/LV2/VG2 managed by the cluster.

On link-12:
[root@link-12 /]# vgchange -an
  0 logical volume(s) in volume group "VG2" now active
  EOF reading CLVMD
  1 logical volume(s) in volume group "POOLIO1" now active
  Error writing data to clvmd: Broken pipe
  Error writing data to clvmd: Broken pipe
  Can't lock VG1: skipping
[root@link-12 /]# echo $?
0

Got this message in link-08 /var/log/messages:
Feb 23 15:04:08 link-08 udev[16927]: removing device node '/dev/dm-3'
Feb 23 15:04:08 link-08 kernel: clvmd[2287]: segfault at
000000335952d700 rip 00000032593684bd rsp 00000000413fc360 error 4

Now the clvmd is dead on link-08, but the cluster thinks it is still
running:
[root@link-08 tmp]# clu_state
Node  Votes Exp Sts  Name
   8    1    4   M   link-08-pvt
  10    1    4   M   link-10-pvt
  11    1    4   M   link-11-pvt
  12    1    4   M   link-12-pvt
Protocol version: 5.0.1
Config version: 234
Cluster name: MILTON
Cluster ID: 4812
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 4
Expected_votes: 4
Total_votes: 4
Quorum: 3
Active subsystems: 3
Node name: link-08-pvt
Node addresses: 10.1.1.158

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[8 11 12 10]

DLM Lock Space:  "clvmd"                             2   3 run       -
[8 11 12 10]

Version-Release number of selected component (if applicable):
[root@link-08 tmp]# clvmd -V
Cluster LVM daemon version: 2.01.04 (2005-02-09)
Protocol version:           0.2.1

How reproducible:
I'll try to boil down the steps necessary to reproduce this.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2005-02-24 16:17:37 UTC

If clvmd is built with debug then output of clvmd -d would be very
helpful.

It would also be useful to eliminate (or implicate) the pool
conversion too.

Comment 2 Kiersten (Kerri) Anderson 2005-03-02 20:55:54 UTC

Blocker bug - add it to the list

Comment 3 Christine Caulfield 2005-03-08 13:36:15 UTC

Marking this as NEEDINFO as there's nothing I can do with it in it's
present state.

Comment 4 Kiersten (Kerri) Anderson 2005-03-22 19:06:17 UTC

Removing from Blocker list until it is recreated.