Description of problem: When new physical devices are discovered after clvmd has started, these devices can be used in pvcreate and vgcreate, but lvcreate will fail due to an Internal lvm error on every node in the cluster. Version-Release number of selected component (if applicable): Cluster LVM daemon version: 2.01.14 (2005-08-04) Protcol version: 0.2.1 LVM version: 2.01.14 (2005-08-04) Library version: 1.01.04 (2005-08-02) Driver version: 4.4.0 How reproducible: After the new devices have been added, but before clvmd is restarted, the steps can be executed any number of times with the same results. Steps to Reproduce: 1. vgscan -v 2. vgchange -a y 3. pvcreate /dev/md16 /dev/md17 /dev/md18 /dev/md19 /dev/md20 4. vgcreate -s 64k JoesPool /dev/md16 /dev/md17 /dev/md18 /dev/md19 /dev/md20 5. lvcreate -L 10485760k -i 5 -I 16k -n JoesVol JoesPool 6. lvremove /dev/JoesPool/JoesVol 7. vgremove JoesPool 8. pvremove /dev/md16 /dev/md17 /dev/md18 /dev/md19 /dev/md20 Actual results: lvcreate produces the following: Error locking on node sqazero04: Internal lvm error, check syslog Error locking on node sqazero02: Internal lvm error, check syslog Error locking on node sqazero01: Internal lvm error, check syslog Error locking on node sqazero03: Internal lvm error, check syslog Failed to activate new LV. Expected results: lvcreate would complete successfully. Additional info: clvmd produces debug output such as: Volume group for uuid not found: Gcl4svzA7eybxdXNxfbiB3WcI4Kc2pH8qFkFyy5u0cf2p16K2DjGJ4OKa34jYBrr The UUID for JoesPool is Gcl4sv-zA7e-ybxd-XNxf-biB3-WcI4-Kc2pH8 The UUID for JoesVol is qFkFyy-5u0c-f2p1-6K2D-jGJ4-OKa3-4jYBrr As soon as clvmd is restarted on any node, that node will no longer produce the internal lvm error. After clvmd has been restarted on every node, the logical volume activation succeeds.
Created attachment 123461 [details] steps taken and results
This should have been fixed in the long-closed bug #138396
Bug #138396 was believed fixed in RHBA-2005-192, this is happening in 2.01.14 which is later. The steps are slightly different than in 138396: we are not restarting clvmd -- restarting clvmd makes the problem go away. Also, this behavior is exhibited reliably, not intermittently.
Does 'md' mean these are software raid shared between nodes? What's their configuration? If so, can you reproduce without using 'md'? Also need to test with latest U3 beta packages.
This is a duplicate of bz 138396. *** This bug has been marked as a duplicate of 138396 ***
A work around for this issue that we've tested was to stop clvmd on all the nodes in the cluster, add your new devices, discover the new devices on all the nodes, and then restart clvmd. What we actually did: 1. We had a 3 disk/PV (400Gb) GFS filesystem with active I/O running from all the nodes. 2. Stopped clvmd: [root@link-02 ~]# service clvmd stop Deactivating VG link1: Can't deactivate volume group "link1" with 1 open logical volume(s) [FAILED] Deactivating VG link2: [ OK ] Stopping clvm:[ OK ] (note that the de-activation will fail due to the mounted filesystem with running I/O) 3. Took 3 other unused disks, repartitioned them, rediscovered them on all nodes and then restarted clvmd. 4. Created PVs out of those new partitons 5. Grew the active VG and LV 6. Grew the GFS filesystem