Bug 156110 - activating restored volumes after hardware failure can fail
Summary: activating restored volumes after hardware failure can fail
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: lvm2
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: LVM and device-mapper development team
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-04-27 16:46 UTC by Corey Marthaler
Modified: 2010-05-14 22:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-14 22:28:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2005-04-27 16:46:07 UTC
Description of problem:
After having my MSA1000 go down and then restoring with a new drive from HP, I
went through the steps to get my data back.

I had 7 PVs on 7 luns, and those all into 1 VG, and the sliced into 5 LVs.


Here is the initial status of lvm after rebooting the nodes after restoring the MSA:

[root@tank-03 tmp]# vgscan
  Reading all physical volumes.  This may take a while...
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find all physical volumes for volume group gfs.
  Volume group "gfs" not found

[root@tank-03 tmp]# pvscan
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  PV /dev/sda1        VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdb1        VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdc1        VG gfs   lvm2 [135.66 GB / 0    free]
  PV unknown device   VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sde1        VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdf1        VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdg1        VG gfs   lvm2 [135.66 GB / 0    free]
  Total: 7 [949.59 GB] / in use: 7 [949.59 GB] / in no VG: 0 [0   ]

First I created a back up of the config I had:
[root@tank-03 tmp]# vgcfgbackup -P
  Partial mode. Incomplete volume groups will be activated read-only.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Volume group "gfs" successfully backed up.

Second I created a PV out of the newly restored lun:
[root@tank-03 backup]# pvcreate --restorefile /etc/lvm/backup/gfs --uuid
Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf /dev/sdd
  Couldn't find device with uuid 'Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf'.
  Physical volume "/dev/sdd" successfully created

[root@tank-03 backup]# pvscan
  PV /dev/sda1   VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdb1   VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdc1   VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdd    VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sde1   VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdf1   VG gfs   lvm2 [135.66 GB / 0    free]
  PV /dev/sdg1   VG gfs   lvm2 [135.66 GB / 0    free]
  Total: 7 [949.59 GB] / in use: 7 [949.59 GB] / in no VG: 0 [0   ]

[root@tank-03 backup]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "gfs" using metadata type lvm2

When I tried the vg activate on all nodes it failed

[root@tank-03 backup]# vgchange -ay gfs
  Error locking on node tank-01.lab.msp.redhat.com: Internal lvm error, check syslog
  Error locking on node tank-04.lab.msp.redhat.com: Internal lvm error, check syslog
  Error locking on node tank-02.lab.msp.redhat.com: Internal lvm error, check syslog
  Error locking on node tank-05.lab.msp.redhat.com: Internal lvm error, check syslog
[...]

In the syslog it complains about the missing uuid:

Apr 27 09:33:45 tank-03 lvm[3098]: Volume group gfs metadata is inconsistent
Apr 27 09:33:45 tank-03 lvm[3098]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSXnBb816tMA0LbdcFSOrxAuNPeRlAY9v5
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group gfs metadata is inconsistent
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSFoj7HlabxJFo8mmJeTGkV4f56mU4Phxq
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group gfs metadata is inconsistent
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSELaSVIqMEmYTVatsdQFyELqx6a3414gt
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group gfs metadata is inconsistent
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSlSU0gNZyl77aX33ECBFioGys3WpfphNL
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group gfs metadata is inconsistent
Apr 27 09:33:48 tank-03 lvm[3098]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSXnBb816tMA0LbdcFSOrxAuNPeRlAY9v5
Apr 27 09:36:48 tank-03 clvmd: Cluster LVM daemon started - connected to CMAN
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group gfs metadata is inconsistent
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSqnl7fuVgYhOxL0915LafYpzfxRtZTr8P
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group gfs metadata is inconsistent
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSFoj7HlabxJFo8mmJeTGkV4f56mU4Phxq
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group gfs metadata is inconsistent
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSELaSVIqMEmYTVatsdQFyELqx6a3414gt
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group gfs metadata is inconsistent
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSlSU0gNZyl77aX33ECBFioGys3WpfphNL
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group gfs metadata is inconsistent
Apr 27 09:37:59 tank-03 lvm[3253]: Volume group for uuid not found:
0ytQwKGjIB01ACCwicAQon4AB3tB1lMSXnBb816tMA0LbdcFSOrxAuNPeRlAY9v5

I then restarted clvmd and after that it still failed.
I then changed the backup file a bit, max_pvs = 0 -> 255 but according to agk,
the changes I made would not have affected anything.

After this I tried the vgchange again and it worked some how
[root@tank-02 archive]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "gfs" using metadata type lvm2
[root@tank-02 archive]# vgchange -ay gfs
  4 logical volume(s) in volume group "gfs" now active
[root@tank-02 archive]# lvscan
  inactive          '/dev/gfs/gfs0' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs1' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs2' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs3' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs4' [189.92 GB] inherit
[root@tank-02 archive]# lvscan
  inactive          '/dev/gfs/gfs0' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs1' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs2' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs3' [189.92 GB] inherit
  ACTIVE            '/dev/gfs/gfs4' [189.92 GB] inherit

Here is the backup file as it stand now:

# Generated by LVM2: Wed Apr 27 09:29:31 2005

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'vgcfgbackup -P'"

creation_host = "tank-03.lab.msp.redhat.com"    # Linux
tank-03.lab.msp.redhat.com 2.6.9-prep #1 SMP Fri Apr 22 14:25:10 EDT 2005 i686
creation_time = 1114608571      # Wed Apr 27 09:29:31 2005

gfs {
        id = "0ytQwK-GjIB-01AC-Cwic-AQon-4AB3-tB1lMS"
        seqno = 6
        status = ["RESIZEABLE", "READ", "WRITE", "CLUSTERED"]
        extent_size = 8192              # 4 Megabytes
        max_lv = 255
        max_pv = 255

        physical_volumes {

                pv0 {
                        id = "2cTk9n-nXJ8-MEk7-wsuX-vWvL-r4CF-N34PGK"
                        device = "/dev/sda1"    # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 34728        # 135.656 Gigabytes
                }

                pv1 {
                        id = "cDskCY-TDfD-iFJI-cXdW-g1xx-Dv0D-IXvbp0"
                        device = "/dev/sdb1"    # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 34728        # 135.656 Gigabytes
                }

                pv2 {
                        id = "ctpxNZ-nWZu-AaEE-Cx28-hqtB-eKot-lIJNN8"
                        device = "/dev/sdc1"    # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 34728        # 135.656 Gigabytes
                }

                pv3 {
                        id = "Xynm7y-q4us-32gx-b2Q2-523C-Fa3C-Um75Gf"
                        device = "unknown device"       # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 34728        # 135.656 Gigabytes
                }

                pv4 {
                        id = "onfQu4-Zsz6-DUnh-mhaZ-oZs9-pZSx-3BkNaW"
                        device = "/dev/sde1"    # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 34728        # 135.656 Gigabytes
                }

                pv5 {
                        id = "vDmgPx-hZB4-b5ZW-WF7p-WHkH-3Unm-oWMv5V"
                        device = "/dev/sdf1"    # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 34728        # 135.656 Gigabytes
                }

                pv6 {
                        id = "O2QJsf-H8xe-dCYT-grmc-5PFJ-g3aT-UsKR16"
                        device = "/dev/sdg1"    # Hint only

                        status = ["ALLOCATABLE"]
                        pe_start = 384
                        pe_count = 34728        # 135.656 Gigabytes
                }
        }

        logical_volumes {

                gfs0 {
                        id = "qnl7fu-VgYh-OxL0-915L-afYp-zfxR-tZTr8P"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 2

                        segment1 {
                                start_extent = 0
                                extent_count = 34728    # 135.656 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 0
                                ]
                        }
                        segment2 {
                                start_extent = 34728
                                extent_count = 13891    # 54.2617 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv1", 0
                                ]
                        }
                }

                gfs1 {
                        id = "Foj7Hl-abxJ-Fo8m-mJeT-GkV4-f56m-U4Phxq"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 2

                        segment1 {
                                start_extent = 0
                                extent_count = 20837    # 81.3945 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv1", 13891
                                ]
                        }
                        segment2 {
                                start_extent = 20837
                                extent_count = 27782    # 108.523 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv2", 0
                                ]
                        }
                }

                gfs2 {
                        id = "ELaSVI-qMEm-YTVa-tsdQ-FyEL-qx6a-3414gt"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 3

                        segment1 {
                                start_extent = 0
                                extent_count = 6946     # 27.1328 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv2", 27782
                                ]
                        }
                        segment2 {
                                start_extent = 6946
                                extent_count = 34728    # 135.656 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv3", 0
                                ]
                        }
                        segment3 {
                                start_extent = 41674
                                extent_count = 6945     # 27.1289 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv4", 0
                                ]
                        }
                }

                gfs3 {
                        id = "lSU0gN-Zyl7-7aX3-3ECB-FioG-ys3W-pfphNL"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 2

                        segment1 {
                                start_extent = 0
                                extent_count = 27783    # 108.527 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv4", 6945
                                ]
                        }
                        segment2 {
                                start_extent = 27783
                                extent_count = 20836    # 81.3906 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv5", 0
                                ]
                        }
                }

                gfs4 {
                        id = "XnBb81-6tMA-0Lbd-cFSO-rxAu-NPeR-lAY9v5"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 2

                        segment1 {
                                start_extent = 0
                                extent_count = 13892    # 54.2656 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv5", 20836
                                ]
                        }
                        segment2 {
                                start_extent = 13892
                                extent_count = 34728    # 135.656 Gigabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv6", 0
                                ]
                        }
                }
        }
}



Version-Release number of selected component (if applicable):
[root@tank-02 archive]# clvmd -V
Cluster LVM daemon version: 2.01.09 (2005-04-04)
Protocol version:           0.2.1

Comment 1 Christine Caulfield 2005-05-03 10:59:03 UTC
Looks like a job for agk.

Comment 2 Kiersten (Kerri) Anderson 2006-09-22 16:53:20 UTC
Devel ACK.  Is this one a cluster problem or a core rhel bug? If it is core RHEL
need to change the product and component fields.

Comment 3 Alasdair Kergon 2006-10-18 18:46:18 UTC
cluster-specific, but any fix would go into core lvm2

This simply looks like another manifestation of the 'clvmd internal cache not
getting updated' problem.

Comment 4 Milan Broz 2010-05-14 22:28:11 UTC
I think this was fixed with various changes in lvmcache code. If it is still reproducible, please reopen.


Note You need to log in before you can comment on or make changes to this bug.