Bug 2142664

Summary: Error "Process '/sbin/mdadm -I /dev/dm-[x]' failed with exit code 1." on multipath devices when auto assemble disabled"
Product: Red Hat Enterprise Linux 8 Reporter: Diana Negrete <dnegrete>
Component: mdadmAssignee: XiaoNi <xni>
Status: CLOSED NOTABUG QA Contact: Storage QE <storage-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.6CC: jmagrini, jpittman, ncroxon, nweddle, petar.ivanov, xni
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-14 00:49:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journalctl none

Description Diana Negrete 2022-11-14 19:41:49 UTC
Created attachment 1924328 [details]
journalctl

Description of problem:
UDEV is trying to incrementally assemble MD devices even though this is disallowed in the /etc/mdadm.conf


Version-Release number of selected component (if applicable):
localhost ~]$ uname -a
Linux localhost.localdomain 4.18.0-425.3.1.el8.x86_64 #1 SMP Fri Sep 30 11:45:06 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

localhost ~]$ mdadm -V
mdadm - v4.2 - 2021-12-30 - 5

How reproducible:
I was able to reproduce the issue easily

Steps to Reproduce:
1. Added scsi devices

2. Added below to multipath.conf:
defaults {
  user_friendly_names yes
  find_multipaths no
}

3. Added  below to# cat /etc/mdadm.conf
AUTO +imsm -1.x -all

4. localhost ~]$ cat /proc/mdstat
Personalities : 
unused devices: <none>


Actual results:
We are seeing the same errors the customer encountered in journalctl:

Nov 14 12:39:13 localhost.localdomain systemd-udevd[878]: Process '/sbin/mdadm -I /dev/dm-2' failed with exit code 1.
 
Nov 14 12:39:13 localhost.localdomain systemd-udevd[880]: Process '/sbin/mdadm -I /dev/dm-3' failed with exit code 1.

Expected results:
UDEV should not assemble the MD devices as stated in the  /etc/mdadm.conf file and the error should not be seen

Additional info:

Below layout of devices and versions from reproduce:

localhost ~]$ uname -a
Linux localhost.localdomain 4.18.0-425.3.1.el8.x86_64 #1 SMP Fri Sep 30 11:45:06 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

localhost ~]$ mdadm -V
mdadm - v4.2 - 2021-12-30 - 5

localhost ~]$ lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda             8:0    0  256M  0 disk  
└─mpathb      253:3    0  256M  0 mpath 
sdb             8:16   0  256M  0 disk  
└─mpatha      253:2    0  256M  0 mpath 
sr0            11:0    1 1024M  0 rom   
vda           252:0    0   20G  0 disk  
├─vda1        252:1    0    1G  0 part  /boot
└─vda2        252:2    0   19G  0 part  
  ├─rhel-root 253:0    0   17G  0 lvm   /
  └─rhel-swap 253:1    0    2G  0 lvm   [SWAP]

localhost ~]$ rpm -qf /lib/udev/rules.d/64-md-raid-assembly.rules
mdadm-4.2-5.el8.x86_64

journalctl:
Nov 14 12:39:13 localhost.localdomain systemd-udevd[878]: Process '/sbin/mdadm -I /dev/dm-2' failed with exit code 1.
 Nov 14 12:39:13 localhost.localdomain systemd-udevd[880]: Process '/sbin/mdadm -I /dev/dm-3' failed with exit code 1.

The customer was able to work around the issue with the below patch:

# diff /lib/udev/rules.d/65-md-incremental.rules /etc/udev/rules.d/65-md-incremental.rules
58a59,60
> KERNEL=="dm-*", SUBSYSTEM=="block", ACTION=="change", ENV{ID_FS_TYPE}=="linux_raid_member", \
>      PROGRAM="/usr/bin/egrep -c ^AUTO.*-1\.x.*$ /etc/mdadm.conf", RESULT=="1", GOTO="dm_change_end"

Comment 1 Diana Negrete 2022-11-14 19:47:27 UTC
The debug logs for the issue are attached as journalctl.out

Comment 3 XiaoNi 2022-11-21 07:32:29 UTC
(In reply to Diana Negrete from comment #2)
> Hi Xiao.  Could you please look at this bug when you're able?  The customer
> is wanting to eliminate the error messages seen.  Thanks for any help!

Hi Diana

It's an expected result. Incremental and Assemble all return -1 when AUTO
is used to deny some metadata type. Because raid has been existed for a long
time and there are many customers use it. We can't change this behavior.
If we change it to return 0, maybe some customers use -1 in their scripts
to check the return value.

Thanks
Xiao

Comment 4 XiaoNi 2022-11-21 07:58:00 UTC
If you specify --verbose when incremental the member disk, you'll see the output:

[root@storageqe-104 mdadm]# mdadm -I /dev/sdb -v
mdadm: /dev/sdb has metadata type 1.x for which auto-assembly is disabled
[root@storageqe-104 mdadm]# echo $?
1

Comment 5 Petar Ivanov 2022-11-21 12:40:39 UTC
(In reply to XiaoNi from comment #4)
> If you specify --verbose when incremental the member disk, you'll see the
> output:
> 
> [root@storageqe-104 mdadm]# mdadm -I /dev/sdb -v
> mdadm: /dev/sdb has metadata type 1.x for which auto-assembly is disabled
> [root@storageqe-104 mdadm]# echo $?
> 1

Hi,

Well no one stated that the issue was with the mdadm itself. It is and it should return that RC when called. However, UDEV is calling for incremental assembly on a timer which in our opinion should not happen as it's disallowed and thus it's bound to have an error in the logs. So, why call a function that you know it's going to fail ?

Best Regards, Petar.

Comment 6 Diana Negrete 2022-11-21 14:52:23 UTC
Thanks for responding Xiao.  Is it possible that we change the log level on the message so it will only show up with debugging?  Or should the udev rules be changed as mentioned in comment 5?  Thanks!

Comment 7 XiaoNi 2022-11-22 02:03:54 UTC
(In reply to Petar Ivanov from comment #5)
> (In reply to XiaoNi from comment #4)
> > If you specify --verbose when incremental the member disk, you'll see the
> > output:
> > 
> > [root@storageqe-104 mdadm]# mdadm -I /dev/sdb -v
> > mdadm: /dev/sdb has metadata type 1.x for which auto-assembly is disabled
> > [root@storageqe-104 mdadm]# echo $?
> > 1
> 
> Hi,
> 
> Well no one stated that the issue was with the mdadm itself. It is and it
> should return that RC when called. However, UDEV is calling for incremental

Hi Petar

What's RC here?

> assembly on a timer which in our opinion should not happen as it's

The incremental functions is called in 65-md-incremental.rules rather than
a timer, right? What's the timer here?


> disallowed and thus it's bound to have an error in the logs. So, why call a
> function that you know it's going to fail ?

It should know the questions mentioned above first, then we can look at this
question.

Thanks
Xiao

Comment 8 XiaoNi 2022-11-22 02:06:41 UTC
(In reply to Diana Negrete from comment #6)
> Thanks for responding Xiao.  Is it possible that we change the log level on
> the message so it will only show up with debugging?  Or should the udev
> rules be changed as mentioned in comment 5?  Thanks!

Hi Diana

The log is output by udev. mdadm returns -1 and udev detects the error, then
it outputs the log. I don't know if we can change the log level to remove
this log.

Comment 18 XiaoNi 2023-07-24 13:33:40 UTC
(In reply to Diana Negrete from comment #17)
> 
> CU replied to Johns comment  with the following:
> 
> I need a some time  to prepare a detailed reply on this case (it's been more
> than 6 months). Don't close it!
> 
> Having said that I do have some points:
> 1) No one was talking about an IMSM or DDF type arrays and I don't see how
> they will be impacted. As you can probably see in the workaround I have
> provided that the only type of arrays that will be impacted is the ones with
> ENV{ID_FS_TYPE}=="linux_raid_member" which I beleive are of metadata type
> "0.9" and "1.x".
> IMSM would have a value of "isw_raid_member" and DDF respectively
> "ddf_raid_member". Both of those types are not a subject to a "CHANGE" event
> provided to UDEV (which is causing the issues).
> 
> 2) We're not saying the UDEV and MDADM are not working as intended. We're
> only saying they are not working together and since UDEV is calling MDADM it
> probably should respect the configuration provided in /etc/mdadm.conf.

Yes, I understand it. Because Petar said this in comment 5. mdadm.conf is used by mdadm.
mdadm is used by udev rule. But I don't think it's good to do a filter like the workaround
patch in the udev rule. 

> 
> I would very much like to see the arguments of the developers on how the
> IMSM arrays would be impacted since there is no explanation in the the
> bugzilla.

I want to say, there are different requests for different customers. In the workaround patch,
it specifies the device type(dm) and it doesn't want to assemble raid with super 1.x. If another
customer doesn't want to do auto assemble on raw devices which has super 1.x super, do we need to
add similar filter in the udev rule? And maybe some customers tell us they set the same rule
in mdadm.conf, but they only want to disallow auto assemble on raw devices. They want the udev
rule work on dm devices. What should we do? 

Thanks
Xiao

Comment 19 Petar Ivanov 2023-07-25 09:10:40 UTC
Hi Xiao,

Sorry to barge in on the conversation, but I would like to say that the provided workaround was designed to state the obvious i.e. that's it's a configuration issue and that it can be done.
Nobody is expecting from you to merge it in. It's a crude thing that's not fit for general purpose usage. 
Now, what are we to do - just ignore the errors we're getting in the logs?

Best Regards, Peter.

Comment 20 XiaoNi 2023-07-25 10:29:24 UTC
Hi Peter

Sorry, from my side, I have no better methods. Now mdadm and udev do what they should do. To resolve this, mdadm can't return error. But it may cause other customer's cases. Do you have some ideas? 

Thanks
Xiao

Comment 21 Petar Ivanov 2023-07-25 10:44:52 UTC
Hi Xiao,

Yes, as stated - not call mdadm when incremenatal assembly is not allowed for the specific array type, regardless if it's dm or raw.
Now, what you're telling me is that the issue that we're having is not important enough to even try to resolve it - fair enough.

Best Regards, Peter.