1814909 – [lvmraid] how to reproduce a write error on raid device, make the "Volume Health" appears with "r" attr

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1814909 - [lvmraid] how to reproduce a write error on raid device, make the "Volume Health" appears with "r" attr

Summary: [lvmraid] how to reproduce a write error on raid device, make the "Volume Hea...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.7
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-19 01:47 UTC by xhe@redhat.com
Modified:	2021-09-03 12:41 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-09 06:24:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Example shell script showing various tests (5.39 KB, text/plain) 2020-05-01 22:25 UTC, Jonathan Earl Brassow	no flags	Details
View All

Comment 5 Jonathan Earl Brassow 2020-03-19 21:33:38 UTC

There are lots of ways to simulate failure. I was considering transient failures; while you are simulating (to a degree) a more permanent failure. All are good! And the rule could catch your cases also (and would be more complete in that regard).

Firstly, it is useful to read the "Device Failure" section in the lvmraid(7) man page along with 'lv_attr' (specifically, 9th character) section of the lvs(8) man page. Let me give you a little more detail here...

A device can fail:
- completely and forever (permanent failure) - in which case, a 'p' is in the 9th character of the lv_attr field
- completely but for a short time (transient failure) - a 'p' will exist while it is gone, an 'r' will exist when it returns if the RAID has been written while gone
- sector(s) failure (a write error occurs somewhere on the device, but the rest is still good) - an 'r' will exist

Looking at it another way, if LVM cannot read its metadata at the front of the device, a 'p' is used. If LVM metadata can be read, but the RAID LV has encountered a problem with the device, an 'r' is printed. It is important to keep in mind that for the RAID LV to detect a problem, it needs to experience a failure. If you are never doing any I/O, the RAID will never notice a failure. Same goes for LVM too, I suppose. If you turn off a device and turn it back on without ever running an LVM command or doing any I/O to the RAID LV, neither will ever notice the failure that occurred.

My specific issue of concern has to do with a transient failure where a write occurred to the array. The RAID will disable the device internally. When the device comes back, the user needs to either (r)efresh or (r)eplace the device to make the RAID LV operate normally again. The way you might test this is as follows:
1) create RAID LV (wait for sync to complete)
2) kill one of the devices (moving the device node or writing random data to the device will not do - it must be killed e.g. 'echo offline > /sys/block/$dev/device/state')
3) perform a write to the RAID LV (e.g. dd if=/dev/zero of=/dev/<vg>/<raid_lv> bs=4M count=10)
4) revive the device (e.g. 'echo running > /sys/block/$dev/device/state')
5) perform 'lvs'
This is an especially pernicious case for the customer, because the RAID runs in degraded mode - even though all the devices are active. It happens regularly.

Let's look at the cases you are testing (valid cases that this insight rule could also catch):
From the Description:
1) you create the RAID and allow it to sync to 100%
2) you write garbage to one of the underlying devices - you have now destroyed the LVM metadata and contents of the RAID LV
What should be expected? LVM (via the 'lvs' command) will attempt to read the device and upon not seeing the LVM metadata will assume the device is missing and print 'p'. Strangely also, it is unlikely the RAID will catch this problem at all, since read and write operations are not interrupted or failed - to detect this type of problem in a RAID LV, you would need to perform a "scrubbing" operation (see lvmraid(7) man page) and the result of such an operation would be an 'm' in the 9th character if it weren't for the 'p' already present. So, you got exactly what should be expected.

From comment3:
1) you create the RAID and allow it to sync to 100%
2) you move the device node of the underlying device
What should be expected? The 'lvs' command will attempt to read the LVM metadata, but cannot locate the device. Thus, a 'p' is printed. Again strangely, the RAID LV is unlikely to register a problem because it already has the device in use and moving the device node makes no difference. (Also, you didn't mention doing any write operation to the RAID LV which would trigger it noticing the failure if the device had been properly failed.) When you move the device node back, LVM can now read the LVM metadata again and the 'p' goes away.

It seems to seems to me that both the 'p' and 'r' characters are useful to have the Insights rule detect. However, notifying the user of the 'r' character seems particularly important because it is a problem that is less obvious to the user (they don't see errors when running LVM commands, like "WARNING: Device for PV 2ogi3F-9s3R-QwuM-oVcO-DmRk-n1lt-MpJgPJ not found or rejected by a filter.").

Hope this helps.

Comment 6 sheng.lao 2020-04-03 07:22:28 UTC

@Jonathan Earl Brassow, I get this kind of error.
# echo offline > /sys/block/vdc/device/status
-bash: /sys/block/vdc/device/status: Permission denied

# ls -l /sys/block/vdc/device/status
-r--r--r--. 1 root root 4096 4月   3 03:20 /sys/block/vdc/device/status

# uname -a
Linux localhost.localdomain 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"

Comment 7 sheng.lao 2020-04-03 07:44:51 UTC

"scrubbing" operation failed too.
# mv /dev/vdb /tmp

# lvchange --syncaction check vg1         
  WARNING: Device for PV D9gmAu-jPk9-t2gV-KNI4-zapp-Goy3-dMhoP7 not found or rejected by a filter.
  Cannot change VG vg1 while PVs are missing.
  Consider vgreduce --removemissing.
  Cannot process volume group vg1

# lvs
  WARNING: Device for PV D9gmAu-jPk9-t2gV-KNI4-zapp-Goy3-dMhoP7 not found or rejected by a filter.
  LV        VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvm_raid1 vg1     Rwi-a-r-p-   4.99g

Comment 8 Jonathan Earl Brassow 2020-04-22 16:33:18 UTC

(In reply to sheng.lao from comment #6)
> @Jonathan Earl Brassow, I get this kind of error.
> # echo offline > /sys/block/vdc/device/status
> -bash: /sys/block/vdc/device/status: Permission denied
> 
> # ls -l /sys/block/vdc/device/status
> -r--r--r--. 1 root root 4096 4月   3 03:20 /sys/block/vdc/device/status
> 
> # uname -a
> Linux localhost.localdomain 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> # cat /etc/os-release 
> NAME="CentOS Linux"
> VERSION="7 (Core)"

odd.  you running the command as the superuser?

Comment 9 Jonathan Earl Brassow 2020-04-22 16:39:13 UTC

(In reply to sheng.lao from comment #7)
> "scrubbing" operation failed too.
> # mv /dev/vdb /tmp

Please don't do that ^.  I don't think it does what you think it does (see comment5).

> 
> # lvchange --syncaction check vg1         
>   WARNING: Device for PV D9gmAu-jPk9-t2gV-KNI4-zapp-Goy3-dMhoP7 not found or
> rejected by a filter.
>   Cannot change VG vg1 while PVs are missing.
>   Consider vgreduce --removemissing.
>   Cannot process volume group vg1
> 

Yes, because the device node is missing.  Keep in mind that scrubbing (i.e. "check" and "repair") do not operate on RAID with missing devices.

> # lvs
>   WARNING: Device for PV D9gmAu-jPk9-t2gV-KNI4-zapp-Goy3-dMhoP7 not found or
> rejected by a filter.
>   LV        VG      Attr       LSize   Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>   lvm_raid1 vg1     Rwi-a-r-p-   4.99g

See comment5 - LVM will not be able to read the metadata, but the RAID is likely fine.  This too will yield the 'p' attribute.

Comment 10 Jonathan Earl Brassow 2020-04-22 17:00:26 UTC

(In reply to sheng.lao from comment #6)
> @Jonathan Earl Brassow, I get this kind of error.
> # echo offline > /sys/block/vdc/device/status
> -bash: /sys/block/vdc/device/status: Permission denied
> 
> # ls -l /sys/block/vdc/device/status
> -r--r--r--. 1 root root 4096 4月   3 03:20 /sys/block/vdc/device/status
> 
> # uname -a
> Linux localhost.localdomain 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> # cat /etc/os-release 
> NAME="CentOS Linux"
> VERSION="7 (Core)"

FWIW, i just tried this on rhel7 and rhel8*
[root@null-04 ~]# cat /sys/block/sdg/device/state 
running
[root@null-04 ~]# echo offline > /sys/block/sdg/device/state
[root@null-04 ~]# dd if=/dev/sdg of=/dev/null bs=4k count=1
dd: failed to open '/dev/sdg': No such device or address
[root@null-04 ~]# echo running > /sys/block/sdg/device/state
[root@null-04 ~]# dd if=/dev/sdg of=/dev/null bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000613042 s, 6.7 MB/s

Comment 11 Jonathan Earl Brassow 2020-05-01 17:34:18 UTC

Could we use 'dmsetup table' to scan for possible RAID LVs?  It is faster than 'lvs' and would provide us more valuable information.

Comment 12 Jonathan Earl Brassow 2020-05-01 22:25:13 UTC

Created attachment 1683897 [details]
Example shell script showing various tests

This example test script shows:
- a method for properly disabling and re-enabling test devices
- checking the various 9th character of attr for failure and mismatches

Hopefully, this helps make things a bit clearer

Comment 13 sheng.lao 2020-05-09 06:24:45 UTC

That is a great example. I learn many new things from it. Thanks.

Note You need to log in before you can comment on or make changes to this bug.