Bug 1289346 - discard_zeroes_data is set to 0 in /sys for md device ; fstrim still functions
discard_zeroes_data is set to 0 in /sys for md device ; fstrim still functions
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Jes Sorensen
Zhang Yi
Depends On:
  Show dependency treegraph
Reported: 2015-12-07 17:00 EST by John Pittman
Modified: 2015-12-10 09:51 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-12-10 09:51:49 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
proposed patch (848 bytes, text/plain)
2015-12-07 17:00 EST, John Pittman
no flags Details

  None (edit)
Description John Pittman 2015-12-07 17:00:52 EST
Created attachment 1103385 [details]
proposed patch

Description of problem:

With raid456 enabled and raid456.devices_handle_discard_safely=Y, discard_zeroes_data is set to 0 in /sys for md device.

Version-Release number of selected component (if applicable):


How reproducible:

[root@host ~]# grep raid /boot/grub/grub.conf 
	kernel /vmlinuz-2.6.32-573.8.1.el6.x86_64 ro root=/dev/mapper/vg_host-LogVol01root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_host/LogVol01root rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_host/LogVol00swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM printk.time=1 raid456.devices_handle_discard_safely=Y

[root@host ~]# lsmod | grep raid456
raid456                84075  0 
async_raid6_recov       6491  1 raid456
async_pq                4638  2 raid456,async_raid6_recov
async_xor               3522  3 raid456,async_raid6_recov,async_pq
async_memcpy            2172  2 raid456,async_raid6_recov
async_tx                2995  5 raid456,async_raid6_recov,async_pq,async_xor,async_memcpy

[root@host ~]# mdadm --create /dev/md1 --level=raid6 --raid-devices=4 /dev/sdb /dev/sdc /dev/sdd /dev/sde
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.

[root@host ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md1 : active raid6 sde[3] sdd[2] sdc[1] sdb[0]
      234309632 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

[root@host ~]# for d in {discard_granularity,discard_max_bytes,discard_zeroes_data}
> do for i in $(ls -l /sys/block | egrep 'sdb|sdc|sdd|sde|md' | awk '{ print $9}')
> do echo $i, $d: `cat /sys/block/$i/queue/$d`
> done
> done
md1, discard_granularity: 33553920
sdb, discard_granularity: 33553920
sdc, discard_granularity: 33553920
sdd, discard_granularity: 33553920
sde, discard_granularity: 33553920
md1, discard_max_bytes: 2199023255040
sdb, discard_max_bytes: 2199023255040
sdc, discard_max_bytes: 2199023255040
sdd, discard_max_bytes: 2199023255040
sde, discard_max_bytes: 2199023255040
md1, discard_zeroes_data: 0           <====== believe this should be 1
sdb, discard_zeroes_data: 1
sdc, discard_zeroes_data: 1
sdd, discard_zeroes_data: 1
sde, discard_zeroes_data: 1

[root@host ~]# pvcreate /dev/md1
  Physical volume "/dev/md1" successfully created

[root@host ~]# vgcreate vg2 /dev/md1
  Volume group "vg2" successfully created

[root@host ~]# lvcreate -n lv_data1 -l 100%FREE vg2
  Logical volume "lv_data1" created.

[root@host ~]# mkfs.ext4 /dev/mapper/vg2-lv_data1 
mke2fs 1.41.12 (17-May-2010)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=256 blocks
14647296 inodes, 58576896 blocks
2928844 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1788 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@host ~]# mkdir /test
[root@host ~]# mount /dev/vg2/lv_data1 /test
[root@host ~]# touch /test/testfile
[root@host ~]# rm /test/testfile
rm: remove regular empty file `/test/testfile'? y
[root@host ~]# fstrim -v /test/
/test/: 235969359872 bytes were trimmed

Actual results:

discard_zeroes_data set to 0

Expected results:

discard_zeroes_data set to 1 at discard_supported and raid456.devices_handle_discard_safely set to Y

* proposed patch created by maintenance engineering attached
Comment 1 Jes Sorensen 2015-12-08 09:37:18 EST

Could you elaborate on why you believe a raid5 array needs to report discard
zeroes data?

Whether or not data is zeroed has nothing to do with whether or not was
discarded. All discard does is to tell the underlying drives that they discard
of the data, and knowing it is zeroed makes it faster for the raid stack to
do the resync work. The fact that an array does not zero data on discard does
not imply that discard doesn't work.

Comment 2 John Pittman 2015-12-10 08:46:57 EST
Hi Jes,

Thanks a lot for the explanation, I was not aware of that.  I suppose then if the discard_zeroes_data parameter exists for the md device in /sys it should inherit or be set to the same value as the underlying disks, assuming all the underlying disks have the same discard_zeroes_data value.

Comment 3 Jes Sorensen 2015-12-10 09:51:49 EST

There is no requirement for this parameter to be inherited in this case. For
raid456 in particular the RAID stack has the freedom to write whatever it
finds the most optimal to the device when it is being discarded. So while it
might be nice for it to zero data, I don't see the current behavior being a

I am going to close this as notabug, if you feel this is wrong please reopen
with a case for it.


Note You need to log in before you can comment on or make changes to this bug.