Bug 1313377

Summary: lvm fails to discard logical volumes when discard_granularity is less than sector size
Product: [Fedora] Fedora Reporter: Till Maas <opensource>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: agk, bmarzins, bmr, dwysocha, heinzm, jonathan, lvm-team, msnitzer, opensource, prajnoha, prockai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: lvm2-2.02.150-2.fc24.x86_64 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-01 09:17:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1314687    
Bug Blocks:    

Description Till Maas 2016-03-01 14:00:10 UTC
Description of problem:
In lib/metadata/pv_manip.c discard_pv_segment() checks whether  dev_discard_granularity(peg->pv->fmt->cmd->dev_types, peg->pv->dev) returns 0 and skips discarding when it does. However this function reads from paths like

/sys//dev/block/8:0/queue/discard_granularity

that contain the granularity in bytes (see https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-block) and converts it to block size:

_dev_topology_attribute(dt, "queue/discard_granularity", dev, 0UL)
...
    if (sscanf(buffer, "%lu", &value) != 1) {
        log_warn("sysfs file %s not in expected format: %s", path, buffer);
        goto out_close;
    }

    log_very_verbose("Device %s: %s is %lu%s.",
             dev_name(dev), attribute, value, default_value ? "" : " bytes");

    result = value >> SECTOR_SHIFT;

For my SSD, the discard_granularity is 1, i.e. 1 byte, but value >> SECTOR_SHIFT converts this 1 into a 0, making lvremove skip the discard step.

Version-Release number of selected component (if applicable):
lvm2-2.02.132-2.fc23

How reproducible:
always

Steps to Reproduce:
1. get a SSD with discard_granularity 1
2. lvcreate a LV
3. lvremove the LV with issue_discards set to 1 in /etc/lvm/lvm.conf

Actual results:
no discards are issued for the LV data

Expected results:
discards  are issued


Additional info:
Discards can be monitored with strace or 
blktrace -a issue -d /dev/sda -o - | blkparse -i - | grep 'D.*D'

It looks like this might affect F22 as well if SSDs are reported with a discard_granularity < 512

Comment 1 Zdenek Kabelac 2016-03-03 13:40:43 UTC
Have you tried other kernels ?

We know there is was a patchset in recent 4.5 kernel which accidentally changed the reported value from being reported in bytes to be reported in sectors.
So isn't possible some portion of this patch has been backported?

Normally there never should be a value smaller then sector (512B) (or 0).

Could you check this with other kernels ?

What's the SSD device in use ?

What's the kernel in use ?

Comment 2 Till Maas 2016-03-03 14:37:21 UTC
	Model Number:       Crucial_CT500MX200SSD1                  
	Firmware Revision:  MU02    
	Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

Kernel:
4.4.2-301.fc23.x86_64

On a debian system with the same hardware, the kernel reports a granularity of 4096, which would be 8 sectors and matches hdparm output btw:
	   *	Data Set Management TRIM supported (limit 8 blocks)

I can try with other kernels later (maybe) but currently a reboot is inconvenient. Btw. other tools will happily discard blocks, for example fstrim or blkdiscard. Therefore even if the kernel is reporting the wrong values, lvm2 could be more fault tolerant.

Comment 3 Zdenek Kabelac 2016-03-03 16:18:39 UTC
Well yep but then we just hide kernel bugs, which are better to be exposed and fixed ASAP.

But in this case we could round-up to 1  in this case and issue warning that something is not right.

Comment 4 Till Maas 2016-03-03 17:40:00 UTC
(In reply to Zdenek Kabelac from comment #3)
> Well yep but then we just hide kernel bugs, which are better to be exposed
> and fixed ASAP.

The problem is, that the bug is not really exposed, since lvremove does not really tell that the volume was not discarded btw. Maybe the check whether or not to discard a volume could be performed before showing the prompt that asks whether to discard or not. Also it would be nice if one could make lvremove to actually fail if discard does not work. I would like to use discard as a security feature to make it harder to retrieve data after it was deleted, for example for privacy protection. In this use case silently not discarding is bad.

> But in this case we could round-up to 1  in this case and issue warning that
> something is not right.

I was wondering if rounding up would be always the right thing to do, because otherwise a sector might not be handled even if it needs to be. But I did not check which features use the information.

Comment 5 Alasdair Kergon 2016-03-03 22:30:04 UTC
Discard is about saving space, improving efficiency by reducing the number of blocks that need to be tracked, enabling faster reuse of storage in an over-provisioned system - things like that.  Any specific discard request comes with no guarantees.  Security or privacy?  No - don't rely on it for those purposes.

Comment 6 Alasdair Kergon 2016-03-03 22:43:34 UTC
As for the LVM side of things:

Yes, lvremove ought to be able to discard thin volumes before removing them.  We need to fix that.

If the kernel is giving us the wrong number because of a known bug in some particular kernels, we could consider correcting it, but it might not be worth the trouble if it's just a short-term error and the bad kernels will quickly be replaced with working ones.

Comment 7 Till Maas 2016-03-04 08:10:27 UTC
(In reply to Alasdair Kergon from comment #5)
> Discard is about saving space, improving efficiency by reducing the number
> of blocks that need to be tracked, enabling faster reuse of storage in an
> over-provisioned system - things like that.  Any specific discard request
> comes with no guarantees.  Security or privacy?  No - don't rely on it for
> those purposes.

Yes initially many or most SSDs do not guarantee anything for trim commands, but this was change later. Some SSDs (such as the one I use) guarantee that they will only return zeros after a trim command: https://en.wikipedia.org/wiki/Trim_%28computing%29#ATA
Therefore these can be used to make sure that data can only be recovered with a backdoored firmware or hardware access to the SSD but not with only access to the block device.

Comment 8 Till Maas 2016-03-04 08:16:55 UTC
(In reply to Alasdair Kergon from comment #6)
> As for the LVM side of things:
> 
> Yes, lvremove ought to be able to discard thin volumes before removing them.
> We need to fix that.

I copied this comment to bug 1314292 which tracks the thinpool problem.

> If the kernel is giving us the wrong number because of a known bug in some
> particular kernels, we could consider correcting it, but it might not be
> worth the trouble if it's just a short-term error and the bad kernels will
> quickly be replaced with working ones.

4.4.3-300.fc23.x86_64 still reports the granularity as "1" (this is one kernel newer that the one I used initially). Btw. according to the docs at https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-block every non-zero value indicates that discard is supported, therefore for this aspect one can argue that it is not such a big kernel bug.

Comment 9 Till Maas 2016-03-04 09:11:39 UTC
With kernel 4.3.5-300.fc23.x86_64 the reported granularity is 4096.

Comment 10 Till Maas 2016-03-04 09:16:59 UTC
I reported now bug 1314687 to track the problem in the kernel.

Comment 11 Zdenek Kabelac 2016-03-11 13:19:13 UTC
I've pushed upstream this 'workaround' patch:

https://www.redhat.com/archives/lvm-devel/2016-March/msg00096.html

Comment 12 Mike McCune 2016-03-28 22:58:03 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions