2204480 – multisegment RAID1, allocator uses one disk for both legs

This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2204480 - multisegment RAID1, allocator uses one disk for both legs

Summary: multisegment RAID1, allocator uses one disk for both legs

Keywords:
Status:	CLOSED MIGRATED
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	LVM Team
QA Contact:	cluster-qe
Docs Contact:
URL:
Whiteboard:
Depends On:	1518121
Blocks:	2204467
TreeView+	depends on / blocked

Reported:	2023-05-15 16:10 UTC by Marian Csontos
Modified:	2023-09-23 19:02 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1518121
Environment:
Last Closed:	2023-09-23 19:02:57 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHEL-8334	0	None	Migrated	None	2023-09-23 19:03:06 UTC
Red Hat Issue Tracker	RHELPLAN-157273	0	None	None	None	2023-05-15 16:12:28 UTC

Description Marian Csontos 2023-05-15 16:10:57 UTC

+++ This bug was initially created as a clone of Bug #1518121 +++

Description of problem:
When creating RAID1 with 2 legs spanning multiple disks, allocator uses one of the disks for both legs.

Version-Release number of selected component (if applicable):
2.02.176

Affected versions: el7.2, el7.5, not checked other...

How reproducible:
100%

Steps to Reproduce:
- having 3 8GB disks /dev/sd[abc]

    vgcreate vg /dev/sd[abc]
    lvcreate -n t1 -L 4G vg /dev/sda
    lvcreate -n t2 -L 4G vg /dev/sdb
    lvcreate -n r1 -m 1 -L 6G vg
    lvs -aoname,devices


Actual results:

# lvs -aoname,devices
  LV            Devices                      
  r1            r1_rimage_0(0),r1_rimage_1(0)
  [r1_rimage_0] /dev/sdc(1)                  <---- sdc is used for both _rimage_0...
  [r1_rimage_0] /dev/sdb(1024)               
  [r1_rimage_1] /dev/sda(1025)               
  [r1_rimage_1] /dev/sdc(1023)               <---- ...as well as for _rimage_1
  [r1_rmeta_0]  /dev/sdc(0)                  
  [r1_rmeta_1]  /dev/sda(1024)               
  t1            /dev/sda(0)                  
  t2            /dev/sdb(0) 

Expected results:


Additional info:

--- Additional comment from Marian Csontos on 2017-11-28 12:53:58 UTC ---

In case of failure such device can not be repaired:

# lvconvert --repair vg/r1
  WARNING: Disabling lvmetad cache for repair command.
  WARNING: Not using lvmetad because of repair.
  /dev/vg/r1: read failed after 0 of 4096 at 6442385408: Input/output error
  /dev/vg/r1: read failed after 0 of 4096 at 6442442752: Input/output error
  Couldn't find device with uuid UFK7K0-nGPE-76Rq-F5WC-xGig-UzXP-MnFuDz.
Attempt to replace failed RAID images (requires full device resync)? [y/n]: y
  Unable to replace all PVs from vg/r1 at once.
  Failed to replace faulty devices in vg/r1.

TODO: Test with devices in sync.

Workaround: `lvcreate -n r1 -m 1 -L 6G vg /dev/sd[cab]`

--- Additional comment from Marian Csontos on 2017-11-28 14:23:30 UTC ---

Waited for sync. 100% in sync unable to repair. Also it is possible to read only first 4 GBs, reading from the second segment fails.

Read from first segment:

# dd if=/dev/vg/r1 of=/dev/null skip=4000 count=1 bs=1M
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00195199 s, 537 MB/s

Read from second segment fails:

# dd if=/dev/vg/r1 of=/dev/null skip=5000 count=1 bs=1M
dd: error reading ‘/dev/vg/r1’: Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00201181 s, 0.0 kB/s

Surprisingly the RAID1 device sanity reports DA:

# dmsetup status
vg-r1_rmeta_1: 0 8192 linear 
vg-r1_rimage_1: 0 8372224 linear 
vg-r1_rimage_1: 8372224 4210688 linear 
vg-t2: 0 8388608 linear 
vg-r1_rmeta_0: 0 8192 linear 
vg-r1_rimage_0: 0 8372224 linear 
vg-r1_rimage_0: 8372224 4210688 linear 
vg-t1: 0 8388608 linear 
vg_stacker_OTzj-root: 0 12582912 linear 
vg-r1: 0 12582912 raid raid1 2 DA 12582912/12582912 idle 0 0 -

And for reference only the segments:

# lvs --segments -aolv_name,pe_ranges,le_ranges
  WARNING: Not using lvmetad because a repair command was run.
  /dev/vg/r1: read failed after 0 of 4096 at 6442385408: Input/output error
  /dev/vg/r1: read failed after 0 of 4096 at 6442442752: Input/output error
  Couldn't find device with uuid jDRXQI-jGSW-BAOG-LB3h-aLhd-8fPb-kHbPZf.
  LV            PE Ranges                             LE Ranges                                
  r1            r1_rimage_0:0-1535 r1_rimage_1:0-1535 [r1_rimage_0]:0-1535,[r1_rimage_1]:0-1535
  [r1_rimage_0] [unknown]:1-1022                      [unknown]:1-1022                         
  [r1_rimage_0] /dev/sdb:1024-1537                    /dev/sdb:1024-1537                       
  [r1_rimage_1] /dev/sda:1025-2046                    /dev/sda:1025-2046                       
  [r1_rimage_1] [unknown]:1023-1536                   [unknown]:1023-1536                      
  [r1_rmeta_0]  [unknown]:0-0                         [unknown]:0-0                            
  [r1_rmeta_1]  /dev/sda:1024-1024                    /dev/sda:1024-1024

--- Additional comment from Steve D on 2018-10-18 12:12:26 UTC ---

I've just been bitten by this for second time, though I swore I specified PVs manually. 2.02.176 (-4.1ubuntu3) on Ubuntu 18.04.

I also hit a whole load of scrub errors last night - the data stored in the fs seems fine, it looks like when I extended the RAID1 LV in question, the extensions didn't get synced. Still investigating that one - may have been triggered by me trying to work around this bug.

Any thoughts / progress?

--- Additional comment from Heinz Mauelshagen on 2019-09-05 11:29:03 UTC ---

This behaves like allocation policy 'anywhere' was applied.
Reworking the allocator gains importance!

Creating with 'cling' allocation policy avoids the problem with existing t[12] LVs allocated on sd[ab] and free sdc:

# lvcreate -y  -nr1 -l251 -m1 vg
  Logical volume "r1" created.

# lvs --noh -aoname,segperanges vg
  r1            r1_rimage_0:0-250 r1_rimage_1:0-250
  [r1_rimage_0] /dev/sdc:1-126    <---
  [r1_rimage_0] /dev/sdb:128-252
  [r1_rimage_1] /dev/sda:129-254
  [r1_rimage_1] /dev/sdc:127-251  <--- bogus collocation with rimage_0
  [r1_rmeta_0]  /dev/sdc:0-0
  [r1_rmeta_1]  /dev/sda:128-128
  t1            /dev/sda:0-127
  t2            /dev/sdb:0-127

# lvremove -y vg/r1
  Logical volume "r1" successfully removed

# lvcreate -y --alloc cling -nr1 -l100%FREE -m1 vg # could alternatively give the extent count, same result
  Logical volume "r1" created.

# lvs --noh -aoname,segperanges vg
  r1            r1_rimage_0:0-125 r1_rimage_1:0-125
  [r1_rimage_0] /dev/sdc:1-126
  [r1_rimage_1] /dev/sda:129-254
  [r1_rmeta_0]  /dev/sdc:0-0
  [r1_rmeta_1]  /dev/sda:128-128
  t1            /dev/sda:0-127
  t2            /dev/sdb:0-127

# lvextend -l+1 vg/r1
  Extending 2 mirror images.
  Insufficient suitable allocatable extents for logical volume r1: 2 more required


Now alloc_anywhere is needed to make use of the existing space and listing sdc, sdb in that
sequence (sdb, sdc does not work) to avoid collocation:

# lvextend -l+1 vg/r1 
  Extending 2 mirror images.
  Insufficient suitable allocatable extents for logical volume r1: 2 more required

# lvextend -y --alloc anywhere -l+1 vg/r1 
  Extending 2 mirror images.
  Size of logical volume vg/r1 changed from 504.00 MiB (126 extents) to 508.00 MiB (127 extents).
  Logical volume vg/r1 successfully resized.

# lvs --noh -aoname,segperanges vg
  r1            r1_rimage_0:0-126 r1_rimage_1:0-126
  [r1_rimage_0] /dev/sdc:1-126    <---
  [r1_rimage_0] /dev/sdb:128-128
  [r1_rimage_1] /dev/sda:129-254
  [r1_rimage_1] /dev/sdc:127-127  <--- Bogus collocation again
  [r1_rmeta_0]  /dev/sdc:0-0
  [r1_rmeta_1]  /dev/sda:128-128
  t1            /dev/sda:0-127

In this case, reducing the raid1 in size or pvmove'ing the collocated extents off to another unrelated PV are options.

# lvreduce -fy -l-1 vg/r1  WARNING: Reducing active logical volume to 504.00 MiB.
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
  Size of logical volume vg/r1 changed from 508.00 MiB (127 extents) to 504.00 MiB (126 extents).
  Logical volume vg/r1 successfully resized.

# lvextend -y -l+1 --alloc anywhere vg/r1 /dev/sdc /dev/sdb
  Extending 2 mirror images.
  Size of logical volume vg/r1 changed from 504.00 MiB (126 extents) to 508.00 MiB (127 extents).
  Logical volume vg/r1 successfully resized.

# lvs --noh -aoname,segperanges vg
  r1            r1_rimage_0:0-126 r1_rimage_1:0-126
  [r1_rimage_0] /dev/sdc:1-127                     
  [r1_rimage_1] /dev/sda:129-254                   
  [r1_rimage_1] /dev/sdb:128-128                   
  [r1_rmeta_0]  /dev/sdc:0-0                       
  [r1_rmeta_1]  /dev/sda:128-128                   
  t1            /dev/sda:0-127                     
  t2            /dev/sdb:0-127

--- Additional comment from Heinz Mauelshagen on 2023-05-10 16:36:36 UTC ---

Fixed in commit 05c2b10c5d0a99993430ffbcef684a099ba810ad

Comment 3 RHEL Program Management 2023-09-23 19:02:11 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 4 RHEL Program Management 2023-09-23 19:02:57 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Note You need to log in before you can comment on or make changes to this bug.