Bug 138396
Summary: | LVM2 processes fail to handle reconfiguration of underlying block devices while running | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Corey Marthaler <cmarthal> | ||||||
Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | ||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.0 | CC: | casmith, ccaulfie, cevich, christian.gugliucci, dennis.preston, dwysocha, guy_albertelli, henry.harris, mbroz, mkpai, mspqa-list, nhappel, rkenna, sphadnis, syeghiay, tao | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2007-0847 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-11-21 21:13:07 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 144795 | ||||||||
Attachments: |
|
Description
Corey Marthaler
2004-11-08 21:38:09 UTC
Try killing and restarting clvmd. If that fixes it then I suspect that the lvm internals of clvm have something cached. I usually only see this if the disk configuration is changed while clvmd is running though. How are you 'cleaning up'? Only by using LVM commands (lvremove, vgremove, pvremove etc.)? (There is a known problem with caching that pjc refers to: we need to determine if this bug is a manifestation of the same problem or a new one.) I reproduced this today. Our tool does change the disk configuration while clvmd is running but it has a hack in it to stop and then start clvmd before going on to avoid this bug. This seemed to be a workaround for this bug however I'm starting to see this bug again even though we do the stop and then start. I clean up only with LVM commands (lvremove, vgremove, pvremove etc.) Here is how I reproduce: start clvmd cleanup any old lvs deactivate any old vgs remove any old vgs remove any old pvs kill clvmd (to get around this bug) partition up the disk how I want it start clvmd make PV make VG activate VG make LV see this bug Also I reproduced this while trying to make a linear so I guess it's not just a stripe thing any more Does adding the missing 'vgscan' after 'partitioning up the disk' before 'start clvmd' make any difference? I assume this is the known problem, namely that we don't yet have a mechanism for external events to trigger updates to the daemons' internal caches other than stopping and restarting them. clvmd has to be up if I'm gonna run vgscan, else... [root@morph-02 root]# /sbin/vgscan connect() failed on local socket: Connection refused Locking type 2 initialisation failed. [root@morph-02 root]# /usr/sbin/vgscan Unknown locking type requested. Locking type 2 initialisation failed. reproduced this again today, even with the stop clvmd, dice, start clvmd hack. This is happening to me today with the latest. I am _not_ partitioning disks out from underneath anything. I just hit this, after restarting clvmd today: clvmd -V Cluster LVM daemon version: 2.01.02 (2005-01-21) Protocol version: 0.2.1 Chop up /dev/sda into 2 parts. kill clvmd restart clvmd create PV create VG vgchange -ay stripe_2_4 Error locking on node tank-05: Internal lvm error, check syslog Error locking on node tank-03: Internal lvm error, check syslog Error locking on node tank-02: Internal lvm error, check syslog Error locking on node tank-01: Internal lvm error, check syslog Error locking on node tank-04: Internal lvm error, check syslog 0 logical volume(s) in volume group "stripe_2_4" now active From /var/log/messages: Feb 1 10:22:14 tank-01 lvm[4491]: Volume group for uuid not found: qVWqyFDmSqMnZxTVMVbUifLtNb2OCAFOAmVzs364dCl5wREpgx28b4MiiZ0gTT5T I hit what is shown in #10 again, several times today... Alasdair, is there some type of data you would like if I hit this again? I'd be happy to gather any state info you need. Again, this is happening even with the workaround of stopping and restarting clvmd. Yes, clvmd was killed on every node, readpart was run on every node, clvmd was then restarted on every node, and a PV was created on every new partiton on only one node. We'll try to get a script to reproduce this more reliably. I have a script /home/msp/cmarthal/138396 which will for me always hit this bug (it may take a few iterations though). Before running this script you have to have a gulm kernel cluster running (ccsd and lock_gulmd) and you have to have the dice tool (found in the sistina test tree) located on the first node in the nodelist you specify. Basically, all this script does are the steps outlined in comment #4. I narrowed this down to just two nodes and still see it just as often using the script in comment #14. Let me know if/when you want to take a look at the morphs and we can step through this. LVM complains that it can't find a device with a specific uuid (for the LV creation) but a pv and vg display shows that it's there. make needed LVs Error locking on node morph-02.lab.msp.redhat.com: Internal lvm error, check syslog Error locking on node morph-01.lab.msp.redhat.com: Internal lvm error, check syslog Failed to activate new LV. morph-01: Mar 3 15:50:08 morph-01 lvm[8760]: Couldn't find device with uuid 'eza7Gw-JfC0-9XmP-3sc6-G80z-9qw4-dVajFe'. Mar 3 15:50:08 morph-01 lvm[8760]: Couldn't find all physical volumes for volume group 138396. Mar 3 15:50:08 morph-01 lvm[8760]: Volume group for uuid not found: x73Hqzj5u6wGiEYlZfpWGLy3MiIiQsbBtadzWTQ61 morph-02: Mar 3 15:51:13 morph-02 lvm[7664]: Couldn't find device with uuid 'eza7Gw-JfC0-9XmP-3sc6-G80z-9qw4-dVajFe'. Mar 3 15:51:13 morph-02 lvm[7664]: Couldn't find all physical volumes for volume group 138396. Mar 3 15:51:13 morph-02 lvm[7664]: Volume group for uuid not found: x73Hqzj5u6wGiEYlZfpWGLy3MiIiQsbBtadzWTQ61 but again, the LV _does_ get created: [root@morph-02 ~]# lvdisplay --- Logical volume --- LV Name /dev/138396/138396 VG Name 138396 LV UUID tadzWT-Q63r-GPuh-847U-KaRp-LwXz-9hKbJ1 LV Write Access read/write LV Status NOT available LV Size 431.35 GB Current LE 110425 Segments 5 Allocation inherit Read ahead sectors 0 [root@morph-01 ~]# vgdisplay --- Volume group --- VG Name 138396 System ID Format lvm2 Metadata Areas 5 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 5 Act PV 5 VG Size 431.35 GB PE Size 4.00 MB Total PE 110425 Alloc PE / Size 110425 / 431.35 GB Free PE / Size 0 / 0 VG UUID x73Hqz-j5u6-wGiE-YlZf-pWGL-y3Mi-IiQsbB [root@morph-01 ~]# pvdisplay --- Physical volume --- PV Name /dev/sda1 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID zT04IC-xdge-NshQ-9lAX-CM33-FMlt-oaVNuW --- Physical volume --- PV Name /dev/sda2 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID eza7Gw-JfC0-9XmP-3sc6-G80z-9qw4-dVajFe --- Physical volume --- PV Name /dev/sda3 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID q3nV54-div4-08AU-VwJv-1WJk-xGV2-RNeyyr --- Physical volume --- PV Name /dev/sda5 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID mhBIdA-0dlo-6gKf-eXAB-lzNk-2PmP-X68AQd --- Physical volume --- PV Name /dev/sda6 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID vNzGnB-KZ10-Djt5-0LUh-FAfq-klgZ-6n9rkL [root@morph-02 ~]# vgdisplay --- Volume group --- VG Name 138396 System ID Format lvm2 Metadata Areas 5 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 5 Act PV 5 VG Size 431.35 GB PE Size 4.00 MB Total PE 110425 Alloc PE / Size 110425 / 431.35 GB Free PE / Size 0 / 0 VG UUID x73Hqz-j5u6-wGiE-YlZf-pWGL-y3Mi-IiQsbB [root@morph-02 ~]# pvdisplay --- Physical volume --- PV Name /dev/sda1 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID zT04IC-xdge-NshQ-9lAX-CM33-FMlt-oaVNuW --- Physical volume --- PV Name /dev/sda2 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID eza7Gw-JfC0-9XmP-3sc6-G80z-9qw4-dVajFe --- Physical volume --- PV Name /dev/sda3 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID q3nV54-div4-08AU-VwJv-1WJk-xGV2-RNeyyr --- Physical volume --- PV Name /dev/sda5 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID mhBIdA-0dlo-6gKf-eXAB-lzNk-2PmP-X68AQd --- Physical volume --- PV Name /dev/sda6 VG Name 138396 PV Size 86.27 GB / not usable 0 Allocatable yes (but full) PE Size (KByte) 4096 Total PE 22085 Free PE 0 Allocated PE 22085 PV UUID vNzGnB-KZ10-Djt5-0LUh-FAfq-klgZ-6n9rkL I've traced through this on 2 of the morphs and done a patch which attempts to workaround the problem by doing a full rescan of /dev if a request is made for an object which can't be found. I've built experimental RPMs with the patch: lvm2-2.01.06-1.0.RHEL4 lvm2-cluster-2.01.06-1.1.RHEL4 but they need thorough checking in case the change has broken other functionality - I expect at least one more iteration before this is final. I also hit a problem after killing clvmd - gulm still held onto locks, e.g. P_orphans caused 'pvs -a' to hang. Added in some VG locking fixes for gulm and released as 2.01.07. Update: With the latest version of lvm2 runnuing gulm or dlm, the locking issue appears to be gone, if and only if you do the kill clvmd hack mentioned back up in comment #1. However if you do not kill clvmd before dicing up the disk(s), you will still hit the exact same locking errors once you start clvmd back up and attempt to use that storage. This may or may not be a different issue than what agk fixed, I'm not sure. Latest: LVM version: 2.01.07 (2005-03-08) Library version: 1.01.00-ioctl (2005-01-17) Driver version: 4.3.0 not sure how the CC list got dropped :( Detailed testing has turned up a couple of issues: If a device disappears underneath clvmd that used to be part of a VG, clvmd continues to expect it to contain metadata for that VG, so never gets as far as reading the new metadata to find out that the PV got removed from it. If a device is added underneath clvmd the entry in the persistent filter doesn't get reset and clvmd continues to ignore the device. Believed fixed in 2.01.08. This sounds similar to BUG 165832, not sure if they're related or not though. Not. This bz was accidentally closed awhile back by the automated errata process. A simple test shows that this was never fixed. *** Bug 178384 has been marked as a duplicate of this bug. *** There's a lot on this bugzilla. Please give example(s) of exactly what you're seeing that's wrong in the U3 package. (And I need more information before being able to determine whether or not bug 178384 is the same problem as any mentioned on this bugzilla.) QA believes that this is the same issue as bz 178384 because it appears to be the same underlining issue, clvmd not handling changes to storage. Clvmd never appears to update it's view of the storage if something storage related changes. In the case of bz 178384, that change is the addition of new storage. In the case that QA is seeing, that change is the reconfiguration of storage which results in the "deletion" of old partitions and the "addition" of new partitions. QA has been working around this long before U3. Here's an example: I start with 6 devices, each with only one partition, one PV per partition. [root@link-08 ~]# cat /proc/partitions major minor #blocks name 3 0 78150744 hda 3 1 104391 hda1 3 2 18434587 hda2 3 3 2024190 hda3 3 4 1 hda4 3 5 104391 hda5 3 6 57480538 hda6 8 0 142255575 sda 8 1 142255543 sda1 8 16 142255575 sdb 8 17 142247542 sdb1 8 32 142255575 sdc 8 33 142247542 sdc1 8 48 142255575 sdd 8 49 142247542 sdd1 8 64 284519182 sde 8 65 284511150 sde1 8 80 142255575 sdf 8 81 142247542 sdf1 These all make one VG and from that there are 4 LVs: [root@link-08 ~]# pvcreate /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 Physical volume "/dev/sda1" successfully created Physical volume "/dev/sdb1" successfully created Physical volume "/dev/sdc1" successfully created Physical volume "/dev/sdd1" successfully created Physical volume "/dev/sde1" successfully created Physical volume "/dev/sdf1" successfully created [root@link-08 ~]# vgcreate VG /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 Volume group "VG" successfully created [root@link-08 ~]# lvcreate -n LV1 -L 50G VG Logical volume "LV1" created [root@link-08 ~]# lvcreate -n LV2 -L 50G VG Logical volume "LV2" created [root@link-08 ~]# lvcreate -n LV3 -L 50G VG Logical volume "LV3" created [root@link-08 ~]# lvcreate -n LV4 -L 50G VG Logical volume "LV4" created I then delete all those LVs, VG, PVs: [root@link-08 ~]# lvremove -f /dev/VG/LV1 /dev/VG/LV2 /dev/VG/LV3 /dev/VG/LV4 Logical volume "LV1" successfully removed Logical volume "LV2" successfully removed Logical volume "LV3" successfully removed Logical volume "LV4" successfully removed [root@link-08 ~]# vgremove VG Volume group "VG" successfully removed [root@link-08 ~]# pvremove /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 Labels on physical volume "/dev/sda1" successfully wiped Labels on physical volume "/dev/sdb1" successfully wiped Labels on physical volume "/dev/sdc1" successfully wiped Labels on physical volume "/dev/sdd1" successfully wiped Labels on physical volume "/dev/sde1" successfully wiped Labels on physical volume "/dev/sdf1" successfully wiped From here (with CLVMD still running) I reconfigure (fdisk) the devices into having two partitons each: [root@link-08 ~]# for i in a b c d e f; do /usr/tests/sts/bin/dice -d /dev/sd$i -p 2; done I then re-scan those devices on each nodes in the cluster so that all the views are consistent: [root@link-08 ~]# for i in a b c d e f; do /usr/tests/sts/bin/readpart /dev/sd$i; done [root@link-08 ~]# cat /proc/partitions major minor #blocks name 3 0 78150744 hda 3 1 104391 hda1 3 2 18434587 hda2 3 3 2024190 hda3 3 4 1 hda4 3 5 104391 hda5 3 6 57480538 hda6 8 0 142255575 sda 8 1 71127787 sda1 8 2 71127787 sda2 8 16 142255575 sdb 8 17 71127787 sdb1 8 18 71127787 sdb2 8 32 142255575 sdc 8 33 71127787 sdc1 8 34 71127787 sdc2 8 48 142255575 sdd 8 49 71127787 sdd1 8 50 71127787 sdd2 8 64 284519182 sde 8 65 142255574 sde1 8 66 142255575 sde2 8 80 142255575 sdf 8 81 71127787 sdf1 8 82 71127787 sdf2 253 0 55377920 dm-0 253 1 2031616 dm-1 [root@link-02 ~]# for i in a b c d e f; do /usr/tests/sts/bin/readpart /dev/sd$i; done [root@link-02 ~]# cat /proc/partitions major minor #blocks name 3 0 39082680 hda 3 1 104391 hda1 3 2 18434587 hda2 3 3 2024190 hda3 3 4 1 hda4 3 5 104391 hda5 3 6 18410458 hda6 8 0 142255575 sda 8 1 71127787 sda1 8 2 71127787 sda2 8 16 142255575 sdb 8 17 71127787 sdb1 8 18 71127787 sdb2 8 32 142255575 sdc 8 33 71127787 sdc1 8 34 71127787 sdc2 8 48 142255575 sdd 8 49 71127787 sdd1 8 50 71127787 sdd2 8 64 284519182 sde 8 65 142255574 sde1 8 66 142255575 sde2 8 80 142255575 sdf 8 81 71127787 sdf1 8 82 71127787 sdf2 253 0 16285696 dm-0 253 1 2031616 dm-1 [root@link-01 ~]# for i in a b c d e f; do /usr/tests/sts/bin/readpart /dev/sd$i; done [root@link-01 ~]# cat /proc/partitions major minor #blocks name 3 0 39082680 hda 3 1 104391 hda1 3 2 18434587 hda2 3 3 2024190 hda3 3 4 1 hda4 3 5 104391 hda5 3 6 18410458 hda6 8 0 142255575 sda 8 1 71127787 sda1 8 2 71127787 sda2 8 16 142255575 sdb 8 17 71127787 sdb1 8 18 71127787 sdb2 8 32 142255575 sdc 8 33 71127787 sdc1 8 34 71127787 sdc2 8 48 142255575 sdd 8 49 71127787 sdd1 8 50 71127787 sdd2 8 64 284519182 sde 8 65 142255574 sde1 8 66 142255575 sde2 8 80 142255575 sdf 8 81 71127787 sdf1 8 82 71127787 sdf2 Now I try and create new PVs, VG, LVs out of the "new" storage: [root@link-08 ~]# pvcreate /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2 Physical volume "/dev/sda1" successfully created Physical volume "/dev/sdb1" successfully created Physical volume "/dev/sdc1" successfully created Physical volume "/dev/sdd1" successfully created Physical volume "/dev/sde1" successfully created Physical volume "/dev/sdf1" successfully created Physical volume "/dev/sda2" successfully created Physical volume "/dev/sdb2" successfully created Physical volume "/dev/sdc2" successfully created Physical volume "/dev/sdd2" successfully created Physical volume "/dev/sde2" successfully created Physical volume "/dev/sdf2" successfully created [root@link-08 ~]# vgcreate VG /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2 Volume group "VG" successfully created [root@link-08 ~]# lvcreate -n LV1 -L 50G VG Error locking on node link-08: Internal lvm error, check syslog Error locking on node link-02: Internal lvm error, check syslog Error locking on node link-01: Internal lvm error, check syslog Failed to activate new LV. LOG: Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find device with uuid 'cj5dJY-xdwf-hMMg-rdw0-ZMU5-4DTo-CVBPGI'. Jan 31 08:48:36 link-08 lvm[15764]: Couldn't find all physical volumes for volume group VG. Jan 31 08:48:36 link-08 lvm[15764]: Volume group for uuid not found: jfEU0TWAvYLHbgZJAiBeGNtVmbwKQuBasmLkzeg5aAdRwRPGASc5446gOm8BVMDF Our work around has been to stop clvmd, reconfigure the devices, re-scan them on each of the nodes, and then restart clvmd so that it too discovers the updated storage view. We also attempted this the exact same way as described in bz 178384. We had a consistent view of the storage on each of the nodes, along with PVs, VGs, and LVs. We then zoned in two new devices and discoverd them: [root@link-01 host0]# echo "- - -" > /sys/class/scsi_host/host0/scan [root@link-02 host0]# echo "- - -" > /sys/class/scsi_host/host0/scan [root@link-08 host0]# echo "- - -" > /sys/class/scsi_host/host2/scan We now have sdg and sdh on each node: [root@link-08 host0]# cat /proc/partitions major minor #blocks name 3 0 78150744 hda 3 1 104391 hda1 3 2 18434587 hda2 3 3 2024190 hda3 3 4 1 hda4 3 5 104391 hda5 3 6 57480538 hda6 8 0 142255575 sda 8 1 142247542 sda1 8 16 142255575 sdb 8 17 142247542 sdb1 8 32 142255575 sdc 8 33 142247542 sdc1 8 48 142255575 sdd 8 49 142247542 sdd1 8 64 284519182 sde 8 65 284511150 sde1 8 80 142255575 sdf 8 81 142247542 sdf1 253 0 55377920 dm-0 253 1 2031616 dm-1 253 2 52428800 dm-2 253 3 52428800 dm-3 8 96 285700096 sdg 8 97 285691927 sdg1 8 112 285700096 sdh 8 113 285691927 sdh1 [root@link-08 host0]# pvcreate /dev/sdg1 Physical volume "/dev/sdg1" successfully created [root@link-08 host0]# pvcreate /dev/sdh1 Physical volume "/dev/sdh1" successfully created [root@link-08 host0]# pvscan PV /dev/sda1 VG VG lvm2 [135.66 GB / 135.66 GB free] PV /dev/sdb1 VG VG lvm2 [135.66 GB / 135.66 GB free] PV /dev/sdc1 VG VG lvm2 [135.66 GB / 135.66 GB free] PV /dev/sdd1 VG VG lvm2 [135.66 GB / 135.66 GB free] PV /dev/sde1 VG VG lvm2 [271.33 GB / 171.33 GB free] PV /dev/sdf1 VG VG lvm2 [135.66 GB / 135.66 GB free] PV /dev/sdg1 lvm2 [272.46 GB] PV /dev/sdh1 lvm2 [272.46 GB] [root@link-08 host0]# vgcreate VG2 /dev/sdg1 /dev/sdh1 Volume group "VG2" successfully created [root@link-08 host0]# lvscan ACTIVE '/dev/VG/LV1' [50.00 GB] inherit ACTIVE '/dev/VG/LV2' [50.00 GB] inherit [root@link-08 host0]# lvcreate -n LV3 -L 50G VG2 Error locking on node link-08: Internal lvm error, check syslog Error locking on node link-02: Internal lvm error, check syslog Error locking on node link-01: Internal lvm error, check syslog Failed to activate new LV. OK, to clarify: This bugzilla only covers problems seen when the block devices that LVM2 is using as PVs are reconfigured in some way whilst an lvm2 process (such as clvmd) is running. It does *not* cover problems seen when a VG is deleted and a new one is then created with the same name whilst an lvm2 process is running - that's on bug 162704 and is related to bug 147361, which is about handling multiple distinct VGs that have identical names and are seen by the system concurrently. *** Bug 171157 has been marked as a duplicate of this bug. *** Changing to lvm2 package, as the problem is in that package - it's not cluster-related. A work around for this issue that we've tested was to stop clvmd on all the nodes in the cluster, add your new devices, discover the new devices on all the nodes, and then restart clvmd. What we actually did: 1. We had a 3 disk/PV (400Gb) GFS filesystem with active I/O running from all the nodes. 2. Stopped clvmd: [root@link-02 ~]# service clvmd stop Deactivating VG link1: Can't deactivate volume group "link1" with 1 open logical volume(s) [FAILED] Deactivating VG link2: [ OK ] Stopping clvm:[ OK ] (note that the de-activation will fail due to the mounted filesystem with running I/O) 3. Took 3 other unused disks, repartitioned them, rediscovered them on all nodes and then restarted clvmd. 4. Created PVs out of those new partitons 5. Grew the active VG and LV 6. Grew the GFS filesystem *** Bug 172888 has been marked as a duplicate of this bug. *** The issue of "clvmd caches storage informations" still exists, even with using lvm2-cluster-2.02.06-4.0.RHEL4. Is there any better workaround available then restarting clvmd on every cluster-node at the same time (which is very ugly)? Here´s what I am doing to reproduce this: [root@cluster1 ~]# fdisk /dev/sdb Command (m for help): p Disk /dev/sdb: 1073 MB, 1073741824 bytes 255 heads, 63 sectors/track, 130 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 60 481918+ 8e Linux LVM Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (61-130, default 61): Using default value 61 Last cylinder or +size or +sizeM or +sizeK (61-130, default 130): Using default value 130 Command (m for help): t Partition number (1-4): 2 Hex code (type L to list codes): 8e Changed system type of partition 2 to 8e (Linux LVM) Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 16: Device or resource busy. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks. [root@cluster1 ~]# partprobe [root@cluster1 ~]# pvcreate /dev/sdb2 Physical volume "/dev/sdb2" successfully created [root@cluster1 ~]# vgextend vg_test /dev/sdb2 Volume group "vg_test" successfully extended [root@cluster1 ~]# lvextend -v -l +40 /dev/vg_test/lv_test Loaded external locking library liblvm2clusterlock.so Finding volume group vg_test Archiving volume group "vg_test" metadata (seqno 3). Extending logical volume lv_test to 628.00 MB Creating volume group backup "/etc/lvm/backup/vg_test" (seqno 4). Error locking on node cluster2: Internal lvm error, check syslog Error locking on node cluster1: Internal lvm error, check syslog Failed to suspend lv_test [root@cluster1 ~]# [root@cluster1 ~]# rpm -q device-mapper lvm2 lvm2-cluster device-mapper-1.02.07-4.0.RHEL4 lvm2-2.02.06-4.0.RHEL4 lvm2-cluster-2.02.06-4.0.RHEL4 [root@cluster1 ~]# Having same problem trying to follow cookbook for evaluation. Using U3 (lvm2- cluster-2.02.01-1.2.RHEL4). Killing clvmd and restarting it after failure of lvcreate allows the next lvcreate to work in my case. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I've added a -R switch to clvmd that should help this. After any changes to devices in the cluster simply run "clvmd -R" and it will tell all running clvmd instances to reload their device cache. With the latest packages this isn't quite fixed. I tried 3 different times running the following scenerio and each time I moved the clvmd -R, first before the pvcreation, next before the vgcreation, and last before the lvcreation. 1. make sure all pv/vg/lvs are deleted 2. repartitions the devices 3. probe the device on all machines 4. clvmd -R on all machines 5. pvcreate 6. vgcreate 7. lvcreate [root@taft-04 lvm]# lvcreate -n lv -l 30%VG vg Error locking on node taft-03: Volume group for uuid not found: cXJDZFj0nhN4PLAsY2h9CjawwsU6eKQGtYza3oOqJTnauWgluuSQOOBkrCVlQw7r Error locking on node taft-02: Volume group for uuid not found: cXJDZFj0nhN4PLAsY2h9CjawwsU6eKQGtYza3oOqJTnauWgluuSQOOBkrCVlQw7r Error locking on node taft-01: Volume group for uuid not found: cXJDZFj0nhN4PLAsY2h9CjawwsU6eKQGtYza3oOqJTnauWgluuSQOOBkrCVlQw7r Failed to activate new LV. Again this worked however when I just stopped and restarted clvmd services instead of doing a clvmd -R. [root@taft-04 lvm]# rpm -q lvm2 lvm2-2.02.11-1.0.RHEL5 [root@taft-04 lvm]# rpm -q lvm2-cluster lvm2-cluster-2.02.11-1.0.RHEL5 [root@taft-04 lvm]# rpm -q device-mapper device-mapper-1.02.12-1.0.RHEL5 Since the code I was testing is technically rhel5, I'll leave this bz in the POST state for now and open a rhel5 bug for the issue (210724). Still able to reproduce this issue on the latest rpms. [root@link-08 lvm]# rpm -q lvm2 lvm2-2.02.13-1 [root@link-08 lvm]# rpm -q lvm2-cluster lvm2-cluster-2.02.13-1 [root@link-08 lvm]# rpm -q device-mapper device-mapper-1.02.12-3 [root@link-08 lvm]# uname -ar Linux link-08 2.6.9-42.17.ELsmp #1 SMP Mon Oct 9 18:42:57 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux Another fix applied for clvmd -R: please retest. lvm2-2.02.20-1.el4 lvm2-cluster-2.02.20-1.el4 This still doesn't appear to work. [root@link-07 ~]# pvscan PV /dev/sda1 lvm2 [135.66 GB] PV /dev/sdb1 lvm2 [135.66 GB] PV /dev/sdc1 lvm2 [135.66 GB] PV /dev/sdd1 lvm2 [135.66 GB] PV /dev/sde1 lvm2 [135.66 GB] Total: 5 [678.29 GB] / in use: 0 [0 ] / in no VG: 5 [678.29 GB] [root@link-07 ~]# pvremove -f /dev/sd[abcde]1 Labels on physical volume "/dev/sda1" successfully wiped Labels on physical volume "/dev/sdb1" successfully wiped Labels on physical volume "/dev/sdc1" successfully wiped Labels on physical volume "/dev/sdd1" successfully wiped Labels on physical volume "/dev/sde1" successfully wiped [root@link-07 ~]# pvscan No matching physical volumes found # repartition the devices [root@link-07 ~]# for i in a b c d e > do > /usr/tests/sts/bin/dice -p 3 -d /dev/sd$i > done # reprobe the devices [root@link-02 ~]# for i in a b c d e > do > /usr/tests/sts/bin/readpart /dev/sd$i > done [root@link-04 ~]# for i in a b c d e > do > /usr/tests/sts/bin/readpart /dev/sd$i > done [root@link-07 ~]# for i in a b c d e > do > /usr/tests/sts/bin/readpart /dev/sd$i > done [root@link-08 ~]# for i in a b c d e > do > /usr/tests/sts/bin/readpart /dev/sd$i > done [root@link-08 ~]# clvmd -R [root@link-07 ~]# clvmd -R [root@link-04 ~]# clvmd -R [root@link-02 ~]# clvmd -R [root@link-08 ~]# pvcreate /dev/sd[abcde][123] Physical volume "/dev/sda1" successfully created Physical volume "/dev/sda2" successfully created Physical volume "/dev/sda3" successfully created Physical volume "/dev/sdb1" successfully created Physical volume "/dev/sdb2" successfully created Physical volume "/dev/sdb3" successfully created Physical volume "/dev/sdc1" successfully created Physical volume "/dev/sdc2" successfully created Physical volume "/dev/sdc3" successfully created Physical volume "/dev/sdd1" successfully created Physical volume "/dev/sdd2" successfully created Physical volume "/dev/sdd3" successfully created Physical volume "/dev/sde1" successfully created Physical volume "/dev/sde2" successfully created Physical volume "/dev/sde3" successfully created [root@link-08 ~]# vgcreate vg /dev/sd[abcde][123] Volume group "vg" successfully created [root@link-07 ~]# vgscan Reading all physical volumes. This may take a while... Found volume group "vg" using metadata type lvm2 [root@link-07 ~]# lvcreate -L 100M -n feist vg Error locking on node link-08: Volume group for uuid not found: qg8sa1QQmq1pPusZvBvMxygg8FUrPJD1AhIeUZ0lOF7rZNRr2xUivUYjTJX6nu2X Error locking on node link-02: Volume group for uuid not found: qg8sa1QQmq1pPusZvBvMxygg8FUrPJD1AhIeUZ0lOF7rZNRr2xUivUYjTJX6nu2X Error locking on node link-04: Volume group for uuid not found: qg8sa1QQmq1pPusZvBvMxygg8FUrPJD1AhIeUZ0lOF7rZNRr2xUivUYjTJX6nu2X Error locking on node link-07: Volume group for uuid not found: qg8sa1QQmq1pPusZvBvMxygg8FUrPJD1AhIeUZ0lOF7rZNRr2xUivUYjTJX6nu2X Failed to activate new LV. This bugzilla had previously been approved for engineering consideration but Red Hat Product Management is currently reevaluating this issue for inclusion in RHEL4.6. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Internal Status set to 'Resolved' Status set to: Closed by Client Resolution set to: 'Closed by Client' This event sent from IssueTracker by lpleiman issue 124182 Just a comment that this pertains to newly added disks as well. Created attachment 161986 [details]
Controlled reproduction logs
I managed to reproduce this on two nodes. Actually you only need one and I've
only included one node's logs here.
The steps that work for me on demand are:
# multipath -f mpath 5 #! Disable multipath device
# rm /etc/lvm/.cache
# clvmd
# dd if=/dev/zero of=/dev/mpath/mpath0
# dd if=/dev/zero of=/dev/mpath/mpath1
# dd if=/dev/zero of=/dev/mpath/mpath2
# dd if=/dev/zero of=/dev/mpath/mpath3
# dd if=/dev/zero of=/dev/mpath/mpath4
# pvcreate /dev/mpath/mpath[01234]
# vgcreate vg1 /dev/mpath/mpath[01234]
# lvcreate -L50G vg1
<did lvscan on remote node to check active>
# vgchange -an
# lvremove vg1/lvol0
# vgremove vg1
# multipath #! enable mpath5
# dd if=/dev/zero of=/dev/mpath/mpath0
# dd if=/dev/zero of=/dev/mpath/mpath1
# dd if=/dev/zero of=/dev/mpath/mpath2
# dd if=/dev/zero of=/dev/mpath/mpath3
# dd if=/dev/zero of=/dev/mpath/mpath4
# dd if=/dev/zero of=/dev/mpath/mpath5
# pvcreate /dev/mpath/mpath[012345]
# vgcreate vg1 /dev/mpath/mpath[051234]
# lvcreate -L250G vg1
<Fails with locking error>
# clvmd -R
# lvcreate -L250G vg1
<Fails with locking error>
Well those logs suggest that clvmd -R is correctly clearing the internal caches - but not repopulating them (a la vgscan) afterwards! So I reckon that do_refresh_cache() also need to finish by calling a function to repopulate the cache. See whether adding get_vgids(cmd, 2) there works. Actually, calling lvmcache_label_scan(cmd, 2) directly would be better. If that doesn't work, work out how far into lvmcache_label_scan it gets before it bails out and we'll find a way to avoid that happening. I put a call to lvmcache_label_scan(cmd, 2) after refresh_toolcontext and it doesn't seem to help. It does seem to be running through the whole of lvmcache_label_scan too... Created attachment 168571 [details]
new debug log
Actually lvmcache_label_scan() seems to be being called later on anyway, though
the devices still don't turn up in the cache AFAICT.
Found it! init_full_scan_done(0); needs to be called too. Checking in daemons/clvmd/lvm-functions.c; /cvs/lvm2/LVM2/daemons/clvmd/lvm-functions.c,v <-- lvm-functions.c new revision: 1.32; previous revision: 1.31 done We're also changing 'vgscan' so that it will issue 'clvmd -R' automatically. So if underlying devices change, all people have to do is run 'vgscan' on one node afterwards. Included in 2.02.27-2. Let's hope it's really fixed this time! This does appear to be fixed in lvm2-2.02.27-2.el4/lvm2-cluster-2.02.27-2.el4! Marking verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0847.html |