Hide Forgot
Description of problem: fstrim command failing on mdadm raid 5 device. Version-Release number of selected component (if applicable): kernel-3.10.0-327.4.5.el7.x86_64 How reproducible: Create mdadm dev at levels mentioned; mount; run fstrim on mount point - Successful #vgcreate testtrim /dev/sdg1 #lvcreate -l 100%FREE -n lvtrimtest testtrim #mkfs.xfs /dev/testtrim/lvtrimtest #mount /dev/testtrim/lvtrimtest /storage #fstrim -v /storage /storage: 953,4 GiB (1023706796032 bytes) trimmed - Failed #mdadm --create --verbose /dev/md0 --level=5 --raid-devices=6 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 #mkfs.xfs /dev/md0 #mount /dev/md0 /storage #fstrim -v /storage fstrim: /storage: the discard operation is not supported Actual results: command complains that fstrim isn't supported Expected results: command should succeed Additional info: Host: scsi0 Channel: 02 Id: 00 Lun: 00 Vendor: DELL Model: PERC H730 Rev: 4.24 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi0 Channel: 00 Id: 06 Lun: 00 Vendor: ATA Model: Crucial_CT1024M5 Rev: MU01 Type: Direct-Access ANSI SCSI revision: 06 [john@host]$ cat etc/modprobe.d/raid456.conf options raid456 devices_handle_discard_safely=Y md0, discard_granularity: 4194304 sde, discard_granularity: 4096 sdf, discard_granularity: 4096 sdg, discard_granularity: 4096 sdh, discard_granularity: 4096 sdi, discard_granularity: 4096 sdj, discard_granularity: 4096 md0, discard_max_bytes: 134217216 sde, discard_max_bytes: 134217216 sdf, discard_max_bytes: 134217216 sdg, discard_max_bytes: 134217216 sdh, discard_max_bytes: 134217216 sdi, discard_max_bytes: 134217216 sdj, discard_max_bytes: 134217216 md0, discard_zeroes_data: 0 sde, discard_zeroes_data: 1 sdf, discard_zeroes_data: 1 sdg, discard_zeroes_data: 1 sdh, discard_zeroes_data: 1 sdi, discard_zeroes_data: 1 sdj, discard_zeroes_data: 1 cat /sys/module/raid456/parameters/devices_handle_discard_safely Y [john@host]$ cat proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdf1[1] sdi1[4] sdh1[3] sde1[0] sdg1[2] sdj1[6] 5000360960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU] bitmap: 0/8 pages [0KB], 65536KB chunk unused devices: <none> /dev/md0: UUID="<omitted>" TYPE="xfs" /dev/sde1: UUID="<omitted>" UUID_SUB="<omitted>" LABEL="0" TYPE="linux_raid_member" /dev/sdi1: UUID="<omitted>" UUID_SUB="<omitted>" LABEL="0" TYPE="linux_raid_member" /dev/sdg1: UUID="<omitted>" UUID_SUB="<omitted>" LABEL="0" TYPE="linux_raid_member" /dev/sdf1: UUID="<omitted>" UUID_SUB="<omitted>" LABEL="0" TYPE="linux_raid_member" /dev/sdj1: UUID="<omitted>" UUID_SUB="<omitted>" LABEL="0" TYPE="linux_raid_member" /dev/sdh1: UUID="<omitted>" UUID_SUB="<omitted>" LABEL="0" TYPE="linux_raid_member" [john@host]$ cat mount | grep md0 /dev/md0 on /storage type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=5120,noquota) Please let me know if I can be of assistance. John
John, I don''t see any mention of you setting devices_handle_discard_safely here when loading the raid5 module. Per upstream, discard isn't enabled per default on raid5, so the behavior you list here is normal. I don't see any problem here. Jes
Data from smartctl, 6 identical drives. Device Model: Crucial_CT1024M550SSD1 Serial Number: 14110C0B0346 LU WWN Device Id: 5 00a075 10c0b0346 Firmware Version: MU01 User Capacity: 1 024 209 543 168 bytes [1,02 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Feb 15 07:49:53 2016 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled Does not work with less than 5 disks: Created a 4 disk raid5 mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 mkfs.xfs -f /dev/md0 mount /dev/md0 /storage fstrim -v /storage fstrim: /storage1: the discard operation is not supported With the 2 spare drives I created a raid0 mdadm --create --verbose /dev/md1 --level=0 --raid-devices=2 /dev/sdi1 /dev/sdj1 mkfs.xfs /dev/md1 mkdir /storage1 mount /dev/md1 /storage1/ fstrim -v /storage1 fstrim: /storage1: the discard operation is not supported ..and a raid 1 umount /storage1 mdadm --stop /dev/md1 mdadm --zero-superblock /dev/sdi1 /dev/sdj1 mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/sdi1 /dev/sdj1 mkfs.xfs -f /dev/md1 mount /dev/md1 /storage1/ fstrim -v /storage1 /storage1: 953,3 GiB (1023574020096 bytes) trimmed
For raid0 you need to enable this module parameter: devices_discard_performance We had a problem with bad devices suffering from extremely slow discard speeds, and there is no way to test for it. Jes
Zhang Yi did some tests and his results are as follows: RAID5: 8 partitions: FAIL RAID5: 7 partitions: FAIL RAID5: 6 partitions: FAIL RAID5: 5 partitions: PASS RAID5: 4 partitions: PASS I did some more digging, and I can reproduce this with the upstream kernel as well, so this is not RHEL specific, Need to investigate why the number of devices impacts this.
I think I know what is wrong here. The kernel validates various limits before enabling discard support. However in one of the cases it compares sectors with bytes, instead of sectors with sectors. I have posted a patch upstream and will follow up once I hear back from the upstream maintainer. Jes
Patch has been accepted upstream. It should get pulled in automatically once we sync md for 7.3. Jes
Hi Jes, thanks again. I'm adding this bug as public since there is no identifying information present. Doing this so the customer can follow along. Also, once the changes get added into 7.3 and the customer upgrades, will the raid device need to be rebuilt? Or will we be fine just updating and rebooting into the new kernel? Best Regards, John Pittman Global Support Services Red Hat Inc.
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Patch(es) available on kernel-3.10.0-429.el7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2574.html