Bug 147531

Summary: dm entries being left, causing storge usability problems
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: lvm2-clusterAssignee: Alasdair Kergon <agk>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: agk, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-02-15 15:43:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2005-02-08 20:40:50 UTC
Description of problem:
After having a valid running gulm cluster with a mounted filesystem, I
rebooted everyone. Once everyone was back up, I started ccsd, gulm,
and clvmd on all. I then deleted the left over lv, vg, and pv and
verified on all that they have indeed been removed. However on some of
the nodes there are still leftover dm entries which cause the storge
to report its in use.

[root@tank-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)

[root@tank-02 ~]# dmsetup ls
gfs-gfs0        (253, 2)
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)

[root@tank-03 ~]# dmsetup ls
gfs-gfs0        (253, 2)
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)

[root@tank-04 ~]# dmsetup ls
gfs-gfs0        (253, 2)
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)

[root@tank-05 ~]# dmsetup ls
gfs-gfs0        (253, 2)
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)


[root@tank-03 ~]# /tmp/STS/bin/dice -d /dev/sda -p 1
Using default ID type
Checking that no-one is using this disk right now ...
BLKRRPART: Device or resource busy

If I then do a dmsetup remove on all the nodes everything will be fine.


Version-Release number of selected component (if applicable):
[root@tank-04 ~]# clvmd -V
Cluster LVM daemon version: 2.01.03 (2005-02-01)
Protocol version:           0.2.1


How reproducible:
Always

Comment 1 Corey Marthaler 2005-02-08 21:12:01 UTC
This isn't lock manager specific, the same thing happens on cman/dlm.

Comment 3 Corey Marthaler 2005-02-09 21:08:58 UTC
This is real easy to reproduce. 

1. Get a gfs mounted on all nodes in the cluster (I had 3 gfs). 
2. Reboot everyone
3. verify that the dm entries are still there:
   [root@tank-02 ~]# dmsetup ls
   gfs-gfs1        (253, 3)
   gfs-gfs0        (253, 2)
   VolGroup00-LogVol01     (253, 0)
   VolGroup00-LogVol00     (253, 1)
   gfs-gfs2        (253, 4)

4. Unactivate the vg, then remove the lv, vg, and pv. 
5. Verify that all lvms are deleted on each node.
6. The dm entries will still be there:
   [root@tank-02 ~]# dmsetup ls
   gfs-gfs1        (253, 3)
   gfs-gfs0        (253, 2)
   VolGroup00-LogVol01     (253, 0)
   VolGroup00-LogVol00     (253, 1)
   gfs-gfs2        (253, 4)

7. Now make a new pv, vg, and lv
8. make a filesystem out of the lv and mount.
9. The dm entries will still be there in addition to the newly created
one:
   [root@tank-02 ~]# dmsetup ls
   gfs-gfs1        (253, 3)
   gfs-gfs0        (253, 2)
   myvg-lvol0      (253, 5)
   VolGroup00-LogVol01     (253, 0)
   VolGroup00-LogVol00     (253, 1)
   gfs-gfs2        (253, 4)


With those entries there, nothing but LVM is able to do anything with
that storage.

10. If you reboot all the nodes again the first dm entries will go away:

[root@tank-05 ~]# dmsetup ls
myvg-lvol0      (253, 2)
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)


Comment 4 Alasdair Kergon 2005-02-09 22:50:45 UTC
Sorry, but I still don't understand what you're doing here.
 
Which of all those steps is the *first* one where something happens
that shouldn't?  Then home in on that step to see what's going wrong
and provide more details of what you're running and what output you
get.  Everything after the first place things go wrong is not interesting.

For example, is step 2 -> step 3 what you expected? I can't tell from
the information so far if it is.  Then step 5 to step 6 certainly
doesn't make any sense to me - if the 'lvms' are deleted on each node,
dmsetup surely won't show anything?  So is it step 4 or step 5 where
something is wrong?


Comment 5 Corey Marthaler 2005-02-09 23:42:32 UTC
"Then step 5 to step 6 certainly doesn't make any sense to me - if the
'lvms' are deleted on each node, dmsetup surely won't show anything?"

Exactly, but it does. :) hence this bug. That is where I expected the
dm entries to be gone (after step 5), but they remain. 

Comment 6 Alasdair Kergon 2005-02-10 14:17:26 UTC
So please document what you're doing in those steps: correct status
before, the one command you run that does the wrong thing, status
after it.

Comment 7 Corey Marthaler 2005-02-10 19:51:30 UTC
The command which is supposed to do the deletion of the dm entires is
the deactivation of the VG. So before doing the 'vgchange -an' you
will still see the dm entries, and afterwards you should not. 

Out of the blue this morning, I was nolonger able to reproduce this
issue. The same issue which us testers have been seeing at will for
the past sennight. 

Now, this afternoon, when trying to reproduce this, the lvdisplay cmds
hang.

.
.
.
stat64("/dev/sda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2),
...}) = 0
stat64("/dev/sda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2),
...}) = 0
open("/dev/sda2", O_RDONLY|O_DIRECT|O_LARGEFILE|0x40000) = 4
fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 2), ...}) = 0
ioctl(4, BLKBSZGET, 0x999e4f8)          = 0
_llseek(4, 2048, [2048], SEEK_SET)      = 0
read(4, "\312\213Q. LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"...,
1024) = 1024
_llseek(4, 20480, [20480], SEEK_SET)    = 0
read(4, "gfs {\nid = \"lYOiwj-NkXU-hnJX-fLR"..., 1024) = 1024
close(4)                                = 0
stat64("/proc/lvm/VGs/gfs", 0xbff148e0) = -1 ENOENT (No such file or
directory)
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
write(3, "3\1\30\0\0\0\0\0\0\0\0\0\10\0\0\0\0\1\0V_gfs\0\0", 26) = 26
read(3,


<occupational hypnotherapist> That's messed up. </<occupational
hypnotherapist>

Comment 8 Corey Marthaler 2005-02-10 20:30:12 UTC
It turns out that the lvdisplay cmd was hanging as a result of all the
dm entries I had.  

I originally had 9 gfs filesystem mounted on all nodes, and I then
rebooted all of them to try and reproduce this issuse.  When the nodes
came back up I saw the 9 dm entries and started ccsd, gulmd, clvmd,
and then attempted an lvdisplay which would hang. So I rebooted again
and tried the exact same above senario. I repeated this 4 or 5 times,
each time resulting in hung lv cmds, until I decided to remove the dm
entries with 'dmsetup remove' and at which point everything was fine
again. 

Comment 9 Corey Marthaler 2005-02-10 21:57:30 UTC
This bug appears to have morphed in the past 24 hours.
Now any node with dm entries after a reboot ends up with hung lvm cmds.

Node WITHOUT dm entries:

[root@tank-01 ~]# cat /proc/partitions
major minor  #blocks  name
   3     0   39082680 hda
   3     1     104391 hda1
   3     2   38973690 hda2
 253     0    1048576 dm-0
 253     1   37912576 dm-1
   8     0 1367490560 sda
   8     1  683742464 sda1
   8     2  683742465 sda2
[root@tank-01 ~]# dmsetup ls
VolGroup00-LogVol01     (253, 0)
VolGroup00-LogVol00     (253, 1)
[root@tank-01 ~]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/gfs/gfs0
  VG Name                gfs
  LV UUID                pqDoZQ-jnXZ-y0Zb-kl2s-53pP-TYWb-pzd1g3
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                260.82 GB
  Current LE             66771
  Segments               1
  Allocation             inherit
  Read ahead sectors     0

  --- Logical volume ---
  LV Name                /dev/gfs/gfs1
  VG Name                gfs
  LV UUID                8MYJp6-3K1l-HtTi-qwbK-VT2p-WgO8-C98BSl
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                260.82 GB
  Current LE             66771
  Segments               1
  Allocation             inherit
  Read ahead sectors     0

  --- Logical volume ---
  LV Name                /dev/gfs/gfs2
  VG Name                gfs
  LV UUID                F7SWs4-I7qj-a5Ma-no8Y-p8ir-gCEn-Z2F8EM
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                260.82 GB
  Current LE             66771
  Segments               2
  Allocation             inherit
  Read ahead sectors     0

  --- Logical volume ---
  LV Name                /dev/gfs/gfs3
  VG Name                gfs
  LV UUID                4jI7ey-jWUz-KBnZ-7xfQ-xPZT-nir9-s5gQBa
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                260.82 GB
  Current LE             66771
  Segments               1
  Allocation             inherit
  Read ahead sectors     0

  --- Logical volume ---
  LV Name                /dev/gfs/gfs4
  VG Name                gfs
  LV UUID                uo4iyn-vALS-DQ6j-WM4m-8nGV-vtcR-6WTM9e
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                260.84 GB
  Current LE             66774
  Segments               1
  Allocation             inherit
  Read ahead sectors     0


Node WITH dm entries:
[root@tank-04 ~]# cat /proc/partitions
major minor  #blocks  name

   3     0   39082680 hda
   3     1     104391 hda1
   3     2   38973690 hda2
   8     0 1367490560 sda
   8     1  683742464 sda1
   8     2  683742465 sda2
 253     0    1048576 dm-0
 253     1   37912576 dm-1
 253     2  273494016 dm-2
 253     3  273494016 dm-3
 253     4  273494016 dm-4
 253     5  273494016 dm-5
 253     6  273506304 dm-6
[root@tank-04 ~]# dmsetup ls
gfs-gfs1        (253, 3)
gfs-gfs0        (253, 2)
VolGroup00-LogVol01     (253, 0)
gfs-gfs4        (253, 6)
VolGroup00-LogVol00     (253, 1)
gfs-gfs3        (253, 5)
gfs-gfs2        (253, 4)
[root@tank-04 ~]# lvdisplay
[HANG]


Both of these nodes are in the same cluster and connected to the same
storage. The only difference is one has a lpfc and the other a qla2300.

Comment 10 Alasdair Kergon 2005-02-15 15:43:47 UTC
Reopen if it happens again.

Comment 11 Alasdair Kergon 2005-02-15 15:46:01 UTC
(Raise a separate bug for the new problem)