Description of problem: This is on a gulm cluster with only 1 filesystem and no I/O. I take down a node and then bring it back into the cluster. after starting clvmd on downed node: [root@morph-04 root]# lvdisplay --- Logical volume --- LV Name /dev/gfs/gfs0 VG Name gfs LV UUID 0mCDCP-rs45-oLRi-ncAm-GdNG-mMjZ-GlVBPd LV Write Access read/write LV Status NOT available LV Size 2.02 TB Current LE 530080 Segments 4 Allocation inherit Read ahead sectors 0 [root@morph-04 root]# vgchange -ay [ apears to hang for a couple minutes or so ] Error locking on node morph-02.lab.msp.redhat.com: Command timed out Error locking on node morph-05.lab.msp.redhat.com: Command timed out Error locking on node morph-03.lab.msp.redhat.com: Command timed out Error locking on node morph-01.lab.msp.redhat.com: Command timed out 1 logical volume(s) in volume group "gfs" now active [root@morph-04 root]# lvdisplay --- Logical volume --- LV Name /dev/gfs/gfs0 VG Name gfs LV UUID 0mCDCP-rs45-oLRi-ncAm-GdNG-mMjZ-GlVBPd LV Write Access read/write LV Status available # open 0 LV Size 2.02 TB Current LE 530080 Segments 4 Allocation inherit Read ahead sectors 0 Block device 253:0 Version-Release number of selected component (if applicable): [root@morph-04 root]# clvmd -V Cluster LVM daemon version: 2.01.02 (2005-01-21) Protocol version: 0.2.1 How reproducible: Always
strace of vgchange: stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0 stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0 open("/dev/sda1", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 5 fstat64(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0 ioctl(5, 0x80041270, 0x83722e8) = 0 _llseek(5, 2048, [2048], SEEK_SET) = 0 read(5, "8\215\317v LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 512) = 512 _llseek(5, 3584, [3584], SEEK_SET) = 0 read(5, "gfs {\nid = \"EGxdQN-JxRQ-c5zu-OOc"..., 512) = 512 close(5) = 0 stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 open("/dev/sdb1", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 5 fstat64(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 ioctl(5, 0x80041270, 0x8372378) = 0 _llseek(5, 0, [0], SEEK_SET) = 0 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 close(5) = 0 stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 open("/dev/sdb1", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 5 fstat64(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 ioctl(5, 0x80041270, 0x8372378) = 0 _llseek(5, 2048, [2048], SEEK_SET) = 0 read(5, "8\215\317v LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 512) = 512 _llseek(5, 3584, [3584], SEEK_SET) = 0 read(5, "gfs {\nid = \"EGxdQN-JxRQ-c5zu-OOc"..., 512) = 512 close(5) = 0 stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 open("/dev/sdc1", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 5 fstat64(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 ioctl(5, 0x80041270, 0x8372408) = 0 _llseek(5, 0, [0], SEEK_SET) = 0 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 close(5) = 0 stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 open("/dev/sdc1", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 5 fstat64(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 ioctl(5, 0x80041270, 0x8372408) = 0 _llseek(5, 2048, [2048], SEEK_SET) = 0 read(5, "8\215\317v LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 512) = 512 _llseek(5, 3584, [3584], SEEK_SET) = 0 read(5, "gfs {\nid = \"EGxdQN-JxRQ-c5zu-OOc"..., 512) = 512 close(5) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 _llseek(4, 2048, [2048], SEEK_SET) = 0 read(4, "8\215\317v LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 512) = 512 _llseek(4, 3584, [3584], SEEK_SET) = 0 read(4, "gfs {\nid = \"EGxdQN-JxRQ-c5zu-OOc"..., 512) = 512 close(4) = 0 open("/proc/devices", O_RDONLY|O_LARGEFILE) = 4 fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ded000 read(4, "Character devices:\n 1 mem\n 4 /"..., 1024) = 402 close(4) = 0 munmap(0xb7ded000, 4096) = 0 open("/proc/misc", O_RDONLY|O_LARGEFILE) = 4 fstat64(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ded000 read(4, " 62 gnbd_ctl\n184 microcode\n 63 d"..., 1024) = 53 close(4) = 0 munmap(0xb7ded000, 4096) = 0 stat64("/dev/mapper/control", {st_mode=S_IFCHR|0600, st_rdev=makedev(10, 63), ...}) = 0 open("/dev/mapper/control", O_RDWR|O_LARGEFILE) = 4 ioctl(4, 0xc134fd00, 0x838cdd8) = 0 ioctl(4, 0xc134fd07, 0x838cdd8) = -1 ENXIO (No such device or address) ioctl(4, 0xc134fd07, 0x838cdd8) = -1 ENXIO (No such device or address) write(3, "2\0\v\f\0\0\0\0\0\0\0\0C\0\0\0\0\31\0EGxdQNJxRQc5z"..., 85) = 85 read(3, "\1\4\320\0\310g\320\0\0\0\0\0\352\0\0\0\0m", 18) = 18 read(3, "orph-04.lab.msp.redhat.com\0n\0\0\0C"..., 234) = 234 write(2, " ", 2 ) = 2 write(2, "Error locking on node morph-04.l"..., 68Error locking on node morph-04.lab.msp.redhat.com: Command 8 write(2, "\n", 1 ) = 1 write(2, " ", 2 ) = 2 write(2, "Error locking on node morph-02.l"..., 68Error locking on node morph-02.lab.msp.redhat.com: Command 8 write(2, "\n", 1 ) = 1 write(2, " ", 2 ) = 2 write(2, "Error locking on node morph-05.l"..., 68Error locking on node morph-05.lab.msp.redhat.com: Command 8 write(2, "\n", 1 ) = 1 write(2, " ", 2 ) = 2 write(2, "Error locking on node morph-01.l"..., 68Error locking on node morph-01.lab.msp.redhat.com: Command 8 write(2, "\n", 1 ) = 1 ioctl(4, 0xc134fd07, 0x838cdd8) = 0 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ded000 write(1, " 1 logical volume(s) in volume "..., 55 1 logical volume(s) in volume group "gfs" now active ) = 55 stat64("/proc/lvm/VGs/gfs", 0xbfe5e7d0) = -1 ENOENT (No such file or directory) write(3, "3\0018\10\0\0\0\0\0\0\0\0\10\0\0\0\0\6\0V_gfs\0\0", 26) = 26 read(3, "\1\0\320\0\300i\320\0\0\0\0\0\"\0\0\0\0m", 18) = 18 read(3, "orph-03.lab.msp.redhat.com\0\0\0\0\0\0"..., 34) = 34 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 close(3) = 0 munmap(0x3ef000, 10500) = 0 open("/etc/lvm/.cache", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7dec000 write(3, "# This file is automatically mai"..., 201) = 201 close(3) = 0 munmap(0xb7dec000, 4096) = 0 close(4) = 0 brk(0) = 0x83a5000 brk(0) = 0x83a5000 brk(0x839e000) = 0x839e000 brk(0) = 0x839e000 brk(0) = 0x839e000 brk(0) = 0x839e000 brk(0x839b000) = 0x839b000 brk(0) = 0x839b000 munmap(0xb7ded000, 4096) = 0 exit_group(0) = ?
A superfluous & next to a pointer meant that the node was never removed from the list when it went down. Checking in daemons/clvmd/clvmd-gulm.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd-gulm.c,v <-- clvmd-gulm.c new revision: 1.8; previous revision: 1.7 done
fix verified.
Originally, I only had to shoot one node and bring it back up once, to see this issue of locking errors when doing a vgchange. I verified all morning that when shooting one node and bringing it back up I no longer saw these errors. However, now that I'm shooting more than one node at a time, these error sshow up again.
I suspect this is related to (or probably the same as) 146696. I have managed to reproduce it once but it took ages and lots of machine reboots. My next guess was some sort of memory corruption in clvmd, but valgrind reports nothing.
OK. I think I've found a thread shutdown race that could cause this. testing a fix.
This seems to fix the thread race. Checking in daemons/clvmd/clvmd.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v <-- clvmd.c new revision: 1.15; previous revision: 1.14 done
Saw this again today on tank-05 after taking down and bringing back up two slaves (tank-02 and tank-05). from revolver: starting ccsd on cluster Starting ccsd on tank-02.lab.msp.redhat.com...pass Starting ccsd on tank-05.lab.msp.redhat.com...pass starting gulm on cluster starting gulm on tank-02.lab.msp.redhat.com starting gulm on tank-05.lab.msp.redhat.com waiting for all nodes to join cluster... starting clvmd starting clvmd on tank-02.lab.msp.redhat.com starting clvmd on tank-05.lab.msp.redhat.com activating VGs on tank-02.lab.msp.redhat.com Error locking on node tank-05.lab.msp.redhat.com: Command timed out activating VGs on tank-05.lab.msp.redhat.com [complete hang] gulm appears to be completely happy: [root@tank-05 ~]# gulm_tool getstats tank-01 I_am = Master quorum_has = 5 quorum_needs = 3 rank = 0 quorate = true GenerationID = 1108491624705648 run time = 3178 pid = 5738 verbosity = Default failover = enabled [root@tank-05 ~]# gulm_tool getstats tank-05 I_am = Slave Master = tank-01.lab.msp.redhat.com rank = 4 quorate = true GenerationID = 1108491624705648 run time = 1821 pid = 5535 verbosity = Default failover = enabled strace of a hung vgchange -ay: . . . stat64("/dev/sda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0 stat64("/dev/sda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0 open("/dev/sda2", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0 ioctl(4, BLKBSZGET, 0x8ca44f8) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 close(4) = 0 stat64("/dev/sda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0 stat64("/dev/sda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0 open("/dev/sda2", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0 ioctl(4, BLKBSZGET, 0x8ca44f8) = 0 _llseek(4, 2048, [2048], SEEK_SET) = 0 read(4, "\330\24\315Y LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 1024) = 1024 _llseek(4, 3072, [3072], SEEK_SET) = 0 read(4, "lume Group\"\nversion = 1\n\ndescrip"..., 1024) = 1024 close(4) = 0 stat64("/proc/lvm/VGs/gfs", 0xbfe55e10) = -1 ENOENT (No such file or directory) rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 write(3, "3\1\267\0\0\0\0\0\0\0\0\0\10\0\0\0\0\1\0V_gfs\0\0", 26) = 26 read(3, strace of a hung vgdisplay: . . . stat64("/etc/lvm/.cache", {st_mode=S_IFREG|0600, st_size=126, ...}) = 0 stat64("/etc/lvm/.cache", {st_mode=S_IFREG|0600, st_size=126, ...}) = 0 open("/etc/lvm/.cache", O_RDONLY|O_LARGEFILE) = 3 mmap2(NULL, 126, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ded000 munmap(0xb7ded000, 126) = 0 close(3) = 0 stat64("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0 stat64("/dev/sda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0 time(NULL) = 1108494024 getpid() = 5730 stat64("/etc/lvm/archive", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0 stat64("/etc/lvm/backup", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0 stat64("/etc/lvm/lvm.conf", {st_mode=S_IFREG|0644, st_size=10554, ...}) = 0 open("/usr/lib/liblvm2clusterlock.so", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\7\0\000"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0555, st_size=7932, ...}) = 0 old_mmap(NULL, 10500, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xcd3000 old_mmap(0xcd5000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0xcd5000 close(3) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 3 connect(3, {sa_family=AF_FILE, path=@clvmd}, 110
the "hang" on tank-02 did eventually finish with the "Error" message and the vg did get activated, but the activation on tank-05 never did return. [root@tank-02 ~]# vgdisplay --- Volume group --- VG Name gfs System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 1.27 TB PE Size 4.00 MB Total PE 333858 Alloc PE / Size 333858 / 1.27 TB Free PE / Size 0 / 0 VG UUID T7wRo7-kUxK-gV2o-qVqz-MUvo-bkir-U1cSZH
The TCP comms couldn't cope with more than one message arriving on the socket at one time - mainly because the clvmd code was written for cman which is packet-based. This may not fix all your hangs with this code, I'm raising a seperate bugzilla for a problem I found in the locking code whilst testing this. Checking in daemons/clvmd/clvmd.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v <-- clvmd.c new revision: 1.16; previous revision: 1.15 done Checking in daemons/clvmd/tcp-comms.c; /cvs/lvm2/LVM2/daemons/clvmd/tcp-comms.c,v <-- tcp-comms.c new revision: 1.6; previous revision: 1.5 done
Reassigning this back to me cos it still needs some work.
It's had some work! Checking in daemons/clvmd/clvmd-gulm.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd-gulm.c,v <-- clvmd-gulm.c new revision: 1.11; previous revision: 1.10 done Checking in daemons/clvmd/tcp-comms.c; /cvs/lvm2/LVM2/daemons/clvmd/tcp-comms.c,v <-- tcp-comms.c new revision: 1.7; previous revision: 1.6 done Checking in daemons/clvmd/tcp-comms.h; /cvs/lvm2/LVM2/daemons/clvmd/tcp-comms.h,v <-- tcp-comms.h new revision: 1.4; previous revision: 1.3 done If anything similar happens again - it's really handy to have a debug clvmd run with -d (if that's at all possible), failing that an strace of clvmd is more use to me than one of lvm...usually.
Even though this is supposedly fixed, I'm adding to the blocker list incase the next build reveils it's not.
*** Bug 149080 has been marked as a duplicate of this bug. ***
this bug is still here. Gulm 2.6.9-26.1 (built Mar 8 2005 14:32:42) installed GFS 2.6.9-26.1 (built Mar 8 2005 14:32:55) installed Lock_Nolock 2.6.9-26.1 (built Mar 8 2005 14:32:36) installed [root@morph-02 ~]# rpm -qa | grep lvm2 lvm2-2.01.07-1.0.RHEL4 lvm2-cluster-2.01.07-1.0.RHEL4 LVM version: 2.01.07 (2005-03-08) Library version: 1.01.00-ioctl (2005-01-17) Driver version: 4.3.0 I shot only the master and when trying to bring it back into the cluster it hung: Load mods on morph-01: scsi_mod...sd_mod...dm-mod...qla2300...lock_gulm...lock_harness...gfs...lock_nolock...gnbd...OK starting ccsd on cluster Starting ccsd on morph-01...pass starting gulm on cluster starting gulm on morph-01 waiting for all nodes to join cluster... starting clvmd starting clvmd on morph-01 activating VGs on morph-01 [HANG] vgdisplay strace (appears same as comment #8): . . . stat64("/dev/sdd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0 stat64("/dev/sdd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0 open("/dev/sdd1", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0 ioctl(4, BLKBSZGET, 0x8601618) = 0 _llseek(4, 2048, [2048], SEEK_SET) = 0 read(4, "A\222>\374 LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 512) = 512 _llseek(4, 12288, [12288], SEEK_SET) = 0 read(4, "gfs {\nid = \"2EU2wQ-zcHB-LuEO-5Vw"..., 512) = 512 close(4) = 0 stat64("/proc/lvm/VGs/gfs", 0xfeeb0900) = -1 ENOENT (No such file or directory) rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 write(3, "3\1J\0\0\0\0\0\0\0\0\0\10\0\0\0\0\1\0V_gfs\0\0", 26) = 26 read(3, sorry I don't have more clvmd info.
This is actually a completely new problem seemingly caused by the latest checkin. clvmd is unlocking some locks at startup (just in case they were left lying around) and gulm isn't responding. I need to chat to tadpol.
slapped wrist for implicating other people in my own mistakes. We no longer take out the lvm_thread_lock at LVM thread startup. It's only job now is to protect the jobs list, not the LVM processes themselves which all happen in the LVM thread. Checking in daemons/clvmd/clvmd.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v <-- clvmd.c new revision: 1.18; previous revision: 1.17 done
lvm2-2.01.07-1.0.RHEL4 lvm2-cluster-2.01.07-2.0.RHEL4 With the latest build after a few attempts, the hang appears gone (at least for now), but I'm still seeing locking errors / cmd timeouts every once and awhile when VG activating like the original report. Once the error is reported and the cmd timeouts, the VG does get activated and the tests continues on (again, just like originally). starting ccsd on cluster Starting ccsd on morph-01...pass Starting ccsd on morph-02...pass Starting ccsd on morph-03...pass starting gulm on cluster starting gulm on morph-01 starting gulm on morph-02 starting gulm on morph-03 waiting for all nodes to join cluster... starting clvmd starting clvmd on morph-01 starting clvmd on morph-02 starting clvmd on morph-03 activating VGs on morph-01 Error locking on node morph-04.lab.msp.redhat.com: Command timed out activating VGs on morph-02 activating VGs on morph-03
Just another add that after further testing with revolver, that hang still apears to be gone and testing progress is not hindered by these errors, however they are seen quite a bit.
Spoke too soon, after even more continued testing with more I/O run time in between revolver killings, the hang is indeed still there. :(
Can I have some more information please ? We've been round this several times, all different issues so just saying "it's still there" is no help at all. strace (or ideally debug logs) from clvmd; setting debug levels in lvm and posting the syslog output would make this job much easier. Or maybe I could do a week's course on actually getting revolver to work.
It's actually only a 2 day seminar and it's not going to be presented until the Cluster Summit, but in the mean time I'm more than happy to answer any questions you have about revolver. The failure had the exact same symptoms as the original bug so I only assumed it was. If you really think it's a new bug we can open a new one. I try to gather more info for you.
Reproduced this, after much time, and I reinstate comment #16
Well, my problems look to have been fencing related. causing gulm to leave the node in "expired" state. When I disabled fencing I have managed to run revolver on 8 nodes for around 24 hours (317 iterations). If you can make this happen again, check the syslog for gulm-related messages. If there aren't any then we'll have to stick a debugging clvmd on there.
I reproduced this today with fencing enabled running revolver. After 7 iterations I saw the locking errors/cmdtime out, and then after 11 iterations I saw the complete hang of the vgchange cmd. I tried another vgchange by hand to see if I could get an strace but it worked just fine on all the nodes. I didn't see any out of the ordinary messages as far as gulm is concerned. I'll try this with fencing disabled and see if I can get as far as you did.
I hit this this morning with fencing disabled and a debug clvmd. After three iterations of revolver, I saw the error on the first two iterations and on the third saw the hang. I put the clvmd debug logs from all 6 nodes over that revovler run, along with the revolver output in: /home/msp/cmarthal/pub/bugs/146056
Thanks, that's pinpointed it. It's a startup race between the LVM thread and the worker thread for the vgchange. I've put a lock in there to hold any command activity until the LVM thread is fully initialised. This has got worse since all the new stuff went into the gulm initialisation for tidying locks etc. It also seems that my cluster machines are too slow to make it happen! Either a very fast box or SMP seems to be needed. Checking in daemons/clvmd/clvmd.c; /cvs/lvm2/LVM2/daemons/clvmd/clvmd.c,v <-- clvmd.c new revision: 1.20; previous revision: 1.19 done
Harumph, This fixes the out-and-out "hang" but does not fix the timeout which is something else (I really hate this code, I knew it would never work). I'm going to need to put some more logging in to catch that.
It looks like it might be gulm. I'm using 8 nodes, with 5 servers. If I kill the master and two slaves then the two remaining slaves only see a node down event for the master, so they never know that the two slaves are dead. When those two nodes com back up and try to contact clvmd on the remaining slaves the connection hangs because TCP thinks it's still connected. tadpol says this might "go away" with only 3 servers.
gulm is based around a single master server. Without a master, gulm nodes can only look for a new master. There should be no changes until the new master is found. With a single exception, the nodes that are slaves can elect to become a master (Arbitrating). This election includes the task of fencing the old Master, so that the node could reboot and rejoin without intervention of an administrator. In the case with 5 servers, when the master and two slaves go down, everything should stop until the slave servers can elect a new Master server. This won't go away with 3 servers. Once a Master is elected, and the clients connect to it, then there should be a stream of events to update connected processes.
[09:27am] pjc: hmm. So I need to hold off any more clvmd comms until they know that the servers have settled down. [09:27am] tadpol: gulm clusters can't do much without a Master server. so until one shows up, they just spin looking for it. [09:27am] pjc: will I get a core event when that happens ? [09:28am] tadpol: think so. let me check quick. [09:32am] tadpol: yeah, there should be a core->statechange [09:34am] pjc: Ok, I'll see if I can work with that. I'll have a look on Monday [09:34am] tadpol: ok. [09:34am] pjc: stick some notes in bugzilla and assign it back to me. [09:34am] tadpol: right. [09:34am] pjc: thanks [09:36am] pjc: so long as I get a nodedown event at some stage before it's quorate again I should be able to make it work (I think) [09:36am] tadpol: erm. [09:37am] tadpol: yeah. should be in that order. [09:37am] tadpol: looks at the code. [09:37am] tadpol: yeah it is.
Sorry, this still isn't working. even when the cluster is quorate again I don't get DOWN events for the slave nodes, only UP ones. If I close down my comms sockets when I get an UP notification as well as a DOWN one it seems to work - but it's a disgusting hack. I need bath..
well, without a major code change, what i can do is send a node down for all nodes when the master dies. Then node up when thigns return. Is that workable?
I think that will actually make things worse. When clvmd gets a node down it closes the connection associated with that node. If the remote end has just sent something down the connection we'll then lose messages. What might work is if I ask for a node list every time the state goes from inquorate->quorate. I'll have a fiddle and see what I can come up with.
Created attachment 113093 [details] Patch to LVM2 to workaround gulm node notifications OK, This patch is needed to make clvmd work around the gulm notification limitations. It tests out OK for me. If it doesn't work I have no idea what we'll do.
The code is also committed to the head of CVS, but it'll need the patch adding to the RPM as that is built against the stable version of LVM2.