Bug 1073314
| Summary: | Reshape was stuck | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | XiaoNi <xni> | |
| Component: | selinux-policy | Assignee: | Lukas Vrabec <lvrabec> | |
| Status: | CLOSED ERRATA | QA Contact: | Zhang Yi <yizhan> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.0 | CC: | dhowells, dledford, eguan, eparis, Jes.Sorensen, lvrabec, mgrepl, mmalik, plautrba, pvrabec, ssekidde, xni, xzhou | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | selinux-policy-3.13.1-50.el7 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1246035 (view as bug list) | Environment: | ||
| Last Closed: | 2015-11-19 10:22:04 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1246035 | |||
We can hit this on x86_64 host too while testing RHEL7.1 Alhpa and Beta compose, mdadm-3.3.2-1.el7 with upstream 3.18+ kernel, but 7.1 Beta kernel could reproduce too (-210)
[root@dhcp-66-86-11 ~]# cat /proc/mdstat
Personalities : [faulty] [raid6] [raid5] [raid4]
md127 : active raid5 loop3[4] loop2[3] loop1[1] loop0[0]
63488 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[>....................] reshape = 0.0% (0/31744) finish=395.2min speed=1K/sec
unused devices: <none>
dmesg
[61241.044600] md: bind<loop0>
[61241.044830] md: bind<loop1>
[61241.044962] md: bind<loop2>
[61241.072327] async_tx: api initialized (async)
[61241.210359] md: raid6 personality registered for level 6
[61241.210361] md: raid5 personality registered for level 5
[61241.210362] md: raid4 personality registered for level 4
[61241.211482] md/raid:md127: device loop1 operational as raid disk 1
[61241.211487] md/raid:md127: device loop0 operational as raid disk 0
[61241.212215] md/raid:md127: allocated 0kB
[61241.212335] md/raid:md127: raid level 5 active with 2 out of 3 devices, algorithm 2
[61241.213984] RAID conf printout:
[61241.213989] --- level:5 rd:3 wd:2
[61241.214004] disk 0, o:1, dev:loop0
[61241.214018] disk 1, o:1, dev:loop1
[61241.214029] md/raid456: discard support disabled due to uncertainty.
[61241.214030] Set raid456.devices_handle_discard_safely=Y to override.
[61241.214057] md127: detected capacity change from 0 to 65011712
[61241.214297] RAID conf printout:
[61241.214299] --- level:5 rd:3 wd:2
[61241.214301] disk 0, o:1, dev:loop0
[61241.214302] disk 1, o:1, dev:loop1
[61241.214303] disk 2, o:1, dev:loop2
[61241.215931] md: recovery of RAID array md127
[61241.215939] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[61241.215942] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[61241.215948] md: using 128k window, over a total of 31744k.
[61241.230132] md127: unknown partition table
[61241.471855] md: md127: recovery done.
[61241.481646] RAID conf printout:
[61241.481657] --- level:5 rd:3 wd:3
[61241.481661] disk 0, o:1, dev:loop0
[61241.481664] disk 1, o:1, dev:loop1
[61241.481666] disk 2, o:1, dev:loop2
[61251.409507] EXT4-fs (md127): mounting ext3 file system using the ext4 subsystem
[61251.412189] EXT4-fs (md127): mounted filesystem with writeback data mode. Opts: data=writeback
[61251.412197] SELinux: initialized (dev md127, type ext3), uses xattr
[61251.466479] md: bind<loop3>
[61251.473324] RAID conf printout:
[61251.473331] --- level:5 rd:3 wd:3
[61251.473334] disk 0, o:1, dev:loop0
[61251.473337] disk 1, o:1, dev:loop1
[61251.473339] disk 2, o:1, dev:loop2
[61251.490192] RAID conf printout:
[61251.490195] --- level:5 rd:4 wd:4
[61251.490196] disk 0, o:1, dev:loop0
[61251.490197] disk 1, o:1, dev:loop1
[61251.490198] disk 2, o:1, dev:loop2
[61251.490199] disk 3, o:1, dev:loop3
[61251.490667] md: reshape of RAID array md127
[61251.490670] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[61251.490672] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
[61251.490675] md: using 128k window, over a total of 31744k.
Hi, Just to be clear I understand you correctly, you were able to reproduce this with 3.18+ and mdadm-3.3.2? If this is the case, we should report the problem upstream. Thanks, Jes (In reply to Jes Sorensen from comment #2) > Hi, > > Just to be clear I understand you correctly, you were able to reproduce this > with 3.18+ and mdadm-3.3.2? Yes, I tested on RHEL7 Beta with upstream 3.18+ kernel, which means except the kernel, everything else is from RHEL7 Beta compose. > > If this is the case, we should report the problem upstream. I'm not sure how mdadm upstream works, report to some bugzilla or just to mail list (and which mail list)? Thanks, Eryu (In reply to Eryu Guan from comment #3) > (In reply to Jes Sorensen from comment #2) > > Hi, > > > > Just to be clear I understand you correctly, you were able to reproduce this > > with 3.18+ and mdadm-3.3.2? > > Yes, I tested on RHEL7 Beta with upstream 3.18+ kernel, which means except > the kernel, everything else is from RHEL7 Beta compose. Hi Eryu Can you reproduce this every time? By the way, can we update upstream kernel with RHEL7 kernel directly. I remeber I tried to do it, but it failed. > > > > > If this is the case, we should report the problem upstream. > > I'm not sure how mdadm upstream works, report to some bugzilla or just to > mail list (and which mail list)? Yes, there is a maillist linux-raid.org. I'm trying to do test again for this. I'll send mail to it soon. > > Thanks, > Eryu (In reply to XiaoNi from comment #4) > > Hi Eryu > > Can you reproduce this every time? By the way, can we update upstream > kernel with RHEL7 kernel directly. I remeber I tried to do it, but it failed. I tried once so far and hit the hang, but Xiong Zhou (xzhou@) hit this quite often when running beaker tasks. I compiled upstream kernel manually with config file from RHEL7 (with some additional configrations). > > Yes, there is a maillist linux-raid.org. I'm trying to do test > again for this. I'll send mail to it soon. Please cc me too, thanks! Eryu Hi all The problem can reproduce 100% and already send the mail to upstream. Thanks Xiao Are there any files in /dev/mqueue directory during the reshaping in permissive mode? # ls -Z /dev/mqueue I added some additional fixes -40.el7. PLease test it with this release. If it does not work, switch to permissive mode and see if it works in permissive mode. If so, please attach AVC msgs from permissive mode. Thanks. [root@dhcp-10-40-2-201 ~]# audit2allow -i avc #============= mdadm_t ============== #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t apm_bios_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t autofs_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t cpu_device_t:chr_file getattr; #!!!! This avc can be allowed using the boolean 'daemons_use_tty' allow mdadm_t devpts_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t dri_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t event_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t fuse_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t initctl_t:fifo_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t kmsg_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t loop_control_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t lvm_control_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t mouse_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t netcontrol_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t ppp_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t ptmx_t:chr_file getattr; #!!!! This avc is allowed in the current policy allow mdadm_t random_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t sound_device_t:chr_file getattr; #!!!! This avc is allowed in the current policy allow mdadm_t tmpfs_t:dir read; #!!!! This avc can be allowed using the boolean 'daemons_use_tty' allow mdadm_t tty_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t tun_tap_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t uhid_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t usbmon_device_t:chr_file getattr; #!!!! This avc can be allowed using the boolean 'daemons_use_tty' allow mdadm_t user_devpts_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t vfio_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t vhost_device_t:chr_file getattr; #!!!! This avc has a dontaudit rule in the current policy allow mdadm_t xserver_misc_device_t:chr_file getattr; [root@dhcp-10-40-2-201 ~]# rpm -q selinux-policy selinux-policy-3.13.1-42.el7.noarch Hi, Please, use: # sesearch -D -s mdadm_t -t device_t -c chr_file # rpm -qa | grep selinux-policy selinux-policy-3.13.1-45.el7.noarch selinux-policy-devel-3.13.1-37.el7.noarch selinux-policy-targeted-3.13.1-45.el7.noarch This is weird. On rhel7.2, I see dontaudit rules for this AVCs. Could you setup some beaker machine with this issue? Almost all AVCs listed in the last attachment are related to a chr_file class, except for this one:
----
type=SYSCALL msg=audit(08/25/2015 05:13:51.695:109) : arch=x86_64 syscall=stat success=yes exit=0 a0=0x21c9920 a1=0x7fffb5572e20 a2=0x7fffb5572e20 a3=0x100 items=0 ppid=30243 pid=30250 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=mdadm exe=/usr/sbin/mdadm subj=system_u:system_r:mdadm_t:s0-s0:c0.c1023 key=(null)
type=AVC msg=audit(08/25/2015 05:13:51.695:109) : avc: denied { getattr } for pid=30250 comm=mdadm path=/run/systemd/initctl/fifo dev="tmpfs" ino=12713 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:initctl_t:s0 tclass=fifo_file
----
XiaoNi, I cannot connect to server via ssh. What about AVCs which appear where you remove dontaudit rules? Following AVCs seem suspicious:
----
type=SYSCALL msg=audit(09/11/2015 08:11:05.391:574) : arch=x86_64 syscall=stat s
uccess=no exit=-13(Permission denied) a0=0x2433920 a1=0x7ffdc1f893e0 a2=0x7ffdc1
f893e0 a3=0x100 items=0 ppid=32744 pid=32751 auid=unset uid=root gid=root euid=r
oot suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset com
m=mdadm exe=/usr/sbin/mdadm subj=system_u:system_r:mdadm_t:s0-s0:c0.c1023 key=(n
ull)
type=AVC msg=audit(09/11/2015 08:11:05.391:574) : avc: denied { getattr } for pid=32751 comm=mdadm path=/dev/loop-control dev="devtmpfs" ino=7718 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:loop_control_device_t:s0 tclass=chr_file
----
type=SYSCALL msg=audit(09/11/2015 08:11:05.398:690) : arch=x86_64 syscall=newfstatat success=no exit=-13(Permission denied) a0=0x5 a1=0x2443d43 a2=0x7ffdc1f894a0 a3=0x100 items=0 ppid=32744 pid=32751 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=mdadm exe=/usr/sbin/mdadm subj=system_u:system_r:mdadm_t:s0-s0:c0.c1023 key=(null)
type=AVC msg=audit(09/11/2015 08:11:05.398:690) : avc: denied { getattr } for pid=32751 comm=mdadm path=/dev/mapper/control dev="devtmpfs" ino=7720 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:lvm_control_t:s0 tclass=chr_file
----
# getenforce
Enforcing
# setsebool daemons_use_tty on
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop3[3] loop2[2] loop1[1] loop0[0]
4093952 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[>....................] reshape = 3.5% (73216/2046976) finish=3.1min speed=10459K/sec
unused devices: <none>
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop3[3] loop2[2] loop1[1] loop0[0]
4093952 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[=>...................] reshape = 7.9% (162816/2046976) finish=2.8min speed=10854K/sec
unused devices: <none>
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 loop3[3] loop2[2] loop1[1] loop0[0]
4093952 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
[===>.................] reshape = 15.5% (319248/2046976) finish=2.8min speed=9976K/sec
unused devices: <none>
#
No special policy module is needed, but the daemons_use_tty boolean must be enabled. # ps -efZ | grep mdadm system_u:system_r:mdadm_t:s0 root 668 1 0 08:43 ? 00:00:00 /usr/sbin/mdadm --grow --continue /dev/md0 unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 root 690 319 0 08:43 pts/2 00:00:00 grep --color=auto mdadm # ls -l /proc/668/fd total 0 lr-x------. 1 root root 64 Sep 11 08:43 0 -> /dev/null l-wx------. 1 root root 64 Sep 11 08:43 1 -> /dev/null l-wx------. 1 root root 64 Sep 11 08:43 2 -> /dev/null lrwx------. 1 root root 64 Sep 11 08:43 3 -> /sys/devices/virtual/block/md0/md/sync_action lr-x------. 1 root root 64 Sep 11 08:43 4 -> /dev/loop0 lr-x------. 1 root root 64 Sep 11 08:43 5 -> /dev/loop1 lr-x------. 1 root root 64 Sep 11 08:43 6 -> /dev/loop2 lrwx------. 1 root root 64 Sep 11 08:43 7 -> /dev/loop3 # commit 14a8d542325607b15689f919ddc91903f7664ee3
Author: Lukas Vrabec <lvrabec>
Date: Mon Sep 14 14:43:12 2015 +0200
Allow mdadm_t domain read/write to general ptys and unallocated ttys.
Resolves: #1073314
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2300.html |
Description of problem: Version-Release number of selected component (if applicable): The kernel is 3.10.0-101.el7 How reproducible: 100% on ppc64 and s390x platform. It's not easy to reproduce this on x86_64 platform Steps to Reproduce: [root@ibm-p730-03-lp2 ~]# mdadm -CR /dev/md0 -l5 -n3 /dev/loop[0-2] --assume-clean mdadm: /dev/loop0 appears to be part of a raid array: level=raid5 devices=3 ctime=Tue Mar 4 04:00:35 2014 mdadm: /dev/loop1 appears to be part of a raid array: level=raid5 devices=3 ctime=Tue Mar 4 04:00:35 2014 mdadm: /dev/loop2 appears to be part of a raid array: level=raid5 devices=3 ctime=Tue Mar 4 04:00:35 2014 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. [root@ibm-p730-03-lp2 ~]# mdadm /dev/md0 -a /dev/loop3 mdadm: added /dev/loop3 [root@ibm-p730-03-lp2 ~]# mdadm --grow --raid-devices 4 /dev/md0 mdadm: Need to backup 3072K of critical section.. [root@ibm-p730-03-lp2 ~]# cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid5 loop3[3] loop2[2] loop1[1] loop0[0] 1022976 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] [>....................] reshape = 0.3% (1536/511488) finish=10092.8min speed=0K/sec unused devices: <none> The speed of reshape is zero! But it works well on x86_64 plat form. I'll give the dmesg information below: [root@ibm-p730-03-lp2 ~]# dmesg [ 8937.798375] md: bind<loop0> [ 8937.798432] md: bind<loop1> [ 8937.798475] md: bind<loop2> [ 8937.801173] md/raid:md0: device loop2 operational as raid disk 2 [ 8937.801181] md/raid:md0: device loop1 operational as raid disk 1 [ 8937.801184] md/raid:md0: device loop0 operational as raid disk 0 [ 8937.801736] md/raid:md0: allocated 49362kB [ 8937.801797] md/raid:md0: raid level 5 active with 3 out of 3 devices, algorithm 2 [ 8937.801802] RAID conf printout: [ 8937.801804] --- level:5 rd:3 wd:3 [ 8937.801806] disk 0, o:1, dev:loop0 [ 8937.801807] disk 1, o:1, dev:loop1 [ 8937.801808] disk 2, o:1, dev:loop2 [ 8937.801829] md0: detected capacity change from 0 to 1047527424 [ 8937.805712] md0: unknown partition table [ 8944.306297] md: bind<loop3> [ 8944.326330] RAID conf printout: [ 8944.326334] --- level:5 rd:3 wd:3 [ 8944.326336] disk 0, o:1, dev:loop0 [ 8944.326338] disk 1, o:1, dev:loop1 [ 8944.326340] disk 2, o:1, dev:loop2 [ 8948.374433] RAID conf printout: [ 8948.374436] --- level:5 rd:4 wd:4 [ 8948.374439] disk 0, o:1, dev:loop0 [ 8948.374440] disk 1, o:1, dev:loop1 [ 8948.374441] disk 2, o:1, dev:loop2 [ 8948.374443] disk 3, o:1, dev:loop3 [ 8948.374501] md: reshape of RAID array md0 [ 8948.374514] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 8948.374521] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. [ 8948.374584] md: using 2048k window, over a total of 511488k. [ 8948.618472] md: md_do_sync() got signal ... exiting [ 8948.635154] md: reshape of RAID array md0 [ 8948.635160] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 8948.635166] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. [ 8948.635236] md: using 2048k window, over a total of 511488k. Actual results: Expected results: Additional info: