Bug 783823

Summary: raid-check broken / sending ioctl 1261 to a partition
Product: [Fedora] Fedora Reporter: Harald Reindl <h.reindl>
Component: mdadmAssignee: Jes Sorensen <Jes.Sorensen>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 15CC: agk, dledford, Jes.Sorensen, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-23 10:52:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harald Reindl 2012-01-22 16:04:03 UTC
mdadm-3.2.3-3.fc15.x86_64
2.6.41.10-2.fc15.x86_64

each time calling "/sbin/mdadm --detail /dev/mdX" results in "sending ioctl 1261 to a partition" in "dmesg" and "/usr/sbin/raid-check" results in "Unit mdmonitor.service entered failed state"

[root@srv-rhsoft:~]$ cat /proc/mdstat 
Personalities : [raid1] [raid10] 
md2 : active raid10 sdc3[0] sdd3[3] sda3[4] sdb3[5]
      3875222528 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 10/29 pages [40KB], 65536KB chunk

md1 : active raid10 sdc2[0] sdd2[3] sda2[4] sdb2[5]
      30716928 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md0 : active raid1 sdc1[0] sdd1[3] sda1[4] sdb1[5]
      511988 blocks super 1.0 [4/4] [UUUU]
      
unused devices: <none>
_______________________________________________

yes, i am using the F16 systemd-services but that should not be a problem 

[root@srv-rhsoft:~]$ ls /etc/systemd/system/ | grep mdmonitor
-rw-r--r--  1 root root  330 2012-01-21 03:52 mdmonitor.service
-rw-r--r--  1 root root  255 2012-01-21 03:52 mdmonitor-takeover.service
_______________________________________________

Jan 22 14:13:58 srv-rhsoft kernel: md: md2: data-check done.
Jan 22 14:13:58 srv-rhsoft kernel: md: delaying data-check of md0 until md1 has finished (they share one or more physical units)
Jan 22 14:13:58 srv-rhsoft kernel: md: data-check of RAID array md1
Jan 22 14:13:58 srv-rhsoft kernel: md: minimum _guaranteed_  speed: 50000 KB/sec/disk.
Jan 22 14:13:58 srv-rhsoft kernel: md: using maximum available idle IO bandwidth (but not more than 500000 KB/sec) for data-check.
Jan 22 14:13:58 srv-rhsoft kernel: md: using 128k window, over a total of 30716928k.
Jan 22 14:14:00 srv-rhsoft systemd[1]: PID 20560 read from file /var/run/mdadm/mdadm.pid does not exist. Your service or init script might be broken.
Jan 22 14:14:00 srv-rhsoft systemd[1]: mdmonitor.service: main process exited, code=killed, status=6
Jan 22 14:14:00 srv-rhsoft systemd[1]: Unit mdmonitor.service entered failed state.
Jan 22 14:16:20 srv-rhsoft kernel: md: md1: data-check done.
Jan 22 14:16:20 srv-rhsoft kernel: md: data-check of RAID array md0
Jan 22 14:16:20 srv-rhsoft kernel: md: minimum _guaranteed_  speed: 50000 KB/sec/disk.
Jan 22 14:16:20 srv-rhsoft kernel: md: using maximum available idle IO bandwidth (but not more than 500000 KB/sec) for data-check.
Jan 22 14:16:20 srv-rhsoft kernel: md: using 128k window, over a total of 511988k.
Jan 22 14:16:25 srv-rhsoft kernel: md: md0: data-check done.

Comment 1 Jes Sorensen 2012-01-23 09:49:55 UTC
Do the raids get assembled without problems otherwise?

Which kernel are you running?

Does this happen if you use the normal Fedora 15 scripts?

Thanks,
Jes

Comment 2 Harald Reindl 2012-01-23 09:55:06 UTC
as said kernel 2.6.41.10-2.fc15.x86_64
yes, the array works without any problems

[root@rh:~]$ dmesg -c
[root@rh:~]$ /sbin/mdadm --detail /dev/md2 
/dev/md2:
        Version : 1.1
  Creation Time : Wed Jun  8 13:10:56 2011
     Raid Level : raid10
     Array Size : 3875222528 (3695.70 GiB 3968.23 GB)
  Used Dev Size : 1937611264 (1847.85 GiB 1984.11 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Jan 23 10:54:36 2012
          State : active 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

           Name : localhost.localdomain:2
           UUID : ea253255:cb915401:f32794ad:ce0fe396
         Events : 51010

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       35        1      active sync   /dev/sdc3
       2       8       51        2      active sync   /dev/sdd3
       3       8       19        3      active sync   /dev/sdb3

[root@rh:~]$ dmesg -c
scsi_verify_blk_ioctl: 306 callbacks suppressed
mdadm: sending ioctl 1261 to a partition!
mdadm: sending ioctl 1261 to a partition!

Comment 3 Jes Sorensen 2012-01-23 10:07:13 UTC
I have tried reproducing the problem here on a Fedora 15 system with
three raid devices. This is vanilla Fedora 15 though, no F16 scripts
pulled in.

Please try with the non-test version of the kernel:
kernel-2.6.41.9-1.fc15.x86_64

This is what I get:

[root@monkeybay ~]# cat /proc/mdstat 
Personalities : [raid1] [raid10] 
md42 : active raid1 sde3[0] sdf3[1]
      19529656 blocks super 1.2 [2/2] [UU]
      
md126 : active raid10 sde1[0] sdf1[1] sdg1[2] sdh1[3]
      39053184 blocks 64K chunks 2 near-copies [4/4] [UUUU]
      
md125 : active raid10 sde2[0] sdh2[3] sdg2[1] sdf2[2]
      39063424 blocks 64K chunks 2 near-copies [4/4] [UUUU]
      
unused devices: <none>
[root@monkeybay ~]# raid-check 
[root@monkeybay ~]# 
[root@monkeybay ~]# dmesg|grep 1261
[root@monkeybay ~]# rpm -q kernel mdadm 
kernel-2.6.41.9-1.fc15.x86_64
mdadm-3.2.3-3.fc15.x86_64

Sounds more like a bug in ioctl processing in the test kernel.

Comment 4 Harald Reindl 2012-01-23 10:50:38 UTC
confirmed, see my kernel-bugreport
https://bugzilla.redhat.com/show_bug.cgi?id=783955

thank you for your feedback!

Comment 5 Jes Sorensen 2012-01-23 10:52:46 UTC
Glad we're getting closer to the issue. Thanks for testing.

I am going to close this one since it's not an mdadm bug, but I'll
keep an eye on the kernel bug as well.

Cheers,
Jes