Description of problem: I'll need to try to narrow this down, but I'm logging everything here before I forget something. Setup: 4 node cluster. link-10,link-11,link-12 are x86. link-08 is x86_64 Opteron. Created a pool device and GFS from another node attached to this cluster. Umounted and unassembled the pool. On the 6.1 cluster (from link-12) ran a vgconvert -M2 to make it an lvm2 vol and used gfs_tool df to update the lock type to lock_dlm. Mounted it and ran some traffic (placemaker) during a meeting (~45 mins). After that I umounted and ran the new gfs_fsck on it twice. I then ran "vgchange -an" on link-12. I had two other existing lvm2 volumes: /dev/VG1/LV1 and /dev/LV2/VG2 managed by the cluster. On link-12: [root@link-12 /]# vgchange -an 0 logical volume(s) in volume group "VG2" now active EOF reading CLVMD 1 logical volume(s) in volume group "POOLIO1" now active Error writing data to clvmd: Broken pipe Error writing data to clvmd: Broken pipe Can't lock VG1: skipping [root@link-12 /]# echo $? 0 Got this message in link-08 /var/log/messages: Feb 23 15:04:08 link-08 udev[16927]: removing device node '/dev/dm-3' Feb 23 15:04:08 link-08 kernel: clvmd[2287]: segfault at 000000335952d700 rip 00000032593684bd rsp 00000000413fc360 error 4 Now the clvmd is dead on link-08, but the cluster thinks it is still running: [root@link-08 tmp]# clu_state Node Votes Exp Sts Name 8 1 4 M link-08-pvt 10 1 4 M link-10-pvt 11 1 4 M link-11-pvt 12 1 4 M link-12-pvt Protocol version: 5.0.1 Config version: 234 Cluster name: MILTON Cluster ID: 4812 Cluster Member: Yes Membership state: Cluster-Member Nodes: 4 Expected_votes: 4 Total_votes: 4 Quorum: 3 Active subsystems: 3 Node name: link-08-pvt Node addresses: 10.1.1.158 Service Name GID LID State Code Fence Domain: "default" 1 2 run - [8 11 12 10] DLM Lock Space: "clvmd" 2 3 run - [8 11 12 10] Version-Release number of selected component (if applicable): [root@link-08 tmp]# clvmd -V Cluster LVM daemon version: 2.01.04 (2005-02-09) Protocol version: 0.2.1 How reproducible: I'll try to boil down the steps necessary to reproduce this. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
If clvmd is built with debug then output of clvmd -d would be very helpful. It would also be useful to eliminate (or implicate) the pool conversion too.
Blocker bug - add it to the list
Marking this as NEEDINFO as there's nothing I can do with it in it's present state.
Removing from Blocker list until it is recreated.