Bug 801561

Summary:	tuned-adm enterprise-storage should not disable barriers on devices with write back cache
Product:	Red Hat Enterprise Linux 6	Reporter:	Eric Sandeen <esandeen>
Component:	tuned	Assignee:	Jan Vcelak <jvcelak>
Status:	CLOSED ERRATA	QA Contact:	Branislav Blaškovič <bblaskov>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.2	CC:	azelinka, bblaskov, dcleal, hector.arteaga, jmoyer, jskarvad, rwheeler, tsmetana
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	tuned-0.2.19-8.el6	Doc Type:	Bug Fix
Doc Text:	Cause: tuned daemon runs with 'enterprise-storage' profile enabled, some non-root non-boot disk partition from a device with 'write back' cache is mounted. Consequence: tuned remounts the partition with 'nobarriers' option. This might break the filesystem in the case of a power-failure. Fix: Patch applied to not enable 'nobarriers' mount option on partitions which are present on devices with 'write back' cache. This limits to devices which are seen as SCSI by the kernel. Result: 'nobarriers' is not enabled on partitions on devices with 'write back' cache.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-02-21 10:05:09 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Sandeen 2012-03-08 20:47:49 UTC

Description of problem:

The enterprise-storage profile for tuned is too aggressive about disabling barriers on the storage that it finds.

        # Find non-root and non-boot partitions, disable barriers on them

this is way too relaxed a test - if the user has local-disk /opt or /var for example as a separate filesystem, those will get barriers disabled as well as enterprise storage on the SAN.

The key to knowing if barriers can safely be disabled is the type of cache on the device.  This is available in sysfs, i.e.:

/sys/block/sda/device/scsi_disk/2\:0\:0\:0/cache_type

Possible types are:
        "write through", "none", "write back",
        "write back, no read (daft)"

If the type is "write back" then barriers cannot be safely disabled unless it is known that the cache is battery-backed.

This probably will be tricky, because it is then up to the administrator to know this fact, and tuned probably would then need more configurability to specify which devices are truly safe after all.  But I think it's worth it given the alternative, which is filesystem corruption.

Ideally anything with "write back" in cache_type should simply be skipped.  To be more flexible, a config file specifically allowing certain block devices might be useful.

Thanks,
-Eric

Comment 2 RHEL Program Management 2012-09-07 05:34:24 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 3 Ric Wheeler 2012-09-07 10:45:04 UTC

If this is happening, this will cause data loss and we should fix it.

Can we reconsider getting this done for 6.4 please?

Comment 4 RHEL Program Management 2012-09-20 12:33:06 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 7 Jan Vcelak 2012-09-24 14:22:57 UTC

The proposed approach for detecting cache type works only with SATA drives. And we didn't manage to find an universal way of doing this. In addition, the current approach does not work for any device-mapper drives (like LVM). These are ignored right now.

It looks like we will need to redesign and reimplement this whole feature. I'm not sure if we can afford this for RHEL-6.4 due to low QA capacity.

Comment 8 Ric Wheeler 2012-09-24 15:52:22 UTC

If we can fix it for S-ATA drives only (not breaking anything else), that would be a very solid improvement and prevent potential data corruption in a very common configuration (laptops and desktops).

Thanks!

Comment 9 Eric Sandeen 2012-09-24 16:21:08 UTC

I agree with Ric.  Keeping barriers enabled on SATA drives w/ write-back cache would be very valuable incremental improvement, even if it's not a universal solution.  Remember the failure mode here is filesystem corruption, we really need to be sure that our automated tuning scripts will not expose customers to that.

But are you sure it's only valid for SATA drives?  On RHEL5 my san-attached disk has a cache_type and shows "write through":

# cat /sys/block/sdb/device/scsi_disk\:1\:0\:0\:0/cache_type 
write through
# cat /sys/block/sdb/device/model 
RAID 5          
# cat /sys/block/sdb/device/vendor 
DGC     

Why do you say it is only valid for SATA drives?

Comment 10 Jaroslav Škarvada 2012-09-24 16:22:30 UTC

Is the following sdparm check enough?
# sdparm -g WCE=1 -H DEVICE

Comment 11 Jaroslav Škarvada 2012-09-24 16:24:31 UTC

(In reply to comment #7)
> In addition, the current approach does not work for any device-mapper drives (like LVM). These are ignored right now.
> 

We could parse the lsblk output to get the matching physical drives.

Comment 12 Eric Sandeen 2012-09-24 16:35:23 UTC

I'm not a scsi guru but yes, I think sdparm is probably the better way to go (sysfs can be difficult to work with!)

Ok, fair point about software raid over physical devices, that might be work to get to.  I'll ask LVM folks if this is exposed in any simpler way.

Comment 13 Ric Wheeler 2012-09-24 17:40:45 UTC

$ cat /sys/block/sda/device/scsi_disk/0\:0\:0\:0/cache_type 
write back

shows me the "write back" cache type - not sure if that is easier or harder to parse.

Comment 14 Ric Wheeler 2012-09-24 17:41:39 UTC

Note we should verify that enterprise luns (iSCSI or FC luns from NetApp, EMC, HP, IBM, etc) show the write through cache type.

Comment 15 Jaroslav Škarvada 2012-09-24 17:57:57 UTC

(In reply to comment #13)
> $ cat /sys/block/sda/device/scsi_disk/0\:0\:0\:0/cache_type 
> write back
> 
> shows me the "write back" cache type - not sure if that is easier or harder
> to parse.
>
No problem, with scsi_disk, we could get it by simple query.

>Note we should verify that enterprise luns (iSCSI or FC luns from NetApp, EMC,
>HP, IBM, etc) show the write through cache type.
>
Could you help us with it? We have very limited access to HW, only what we can publicly find in Beaker and it can take ages to get the machines from there.

Comment 16 Jeff Moyer 2012-09-24 18:24:13 UTC

(In reply to comment #15)
> (In reply to comment #13)
> > $ cat /sys/block/sda/device/scsi_disk/0\:0\:0\:0/cache_type 
> > write back
> > 
> > shows me the "write back" cache type - not sure if that is easier or harder
> > to parse.
> >
> No problem, with scsi_disk, we could get it by simple query.
> 
> >Note we should verify that enterprise luns (iSCSI or FC luns from NetApp, EMC,
> >HP, IBM, etc) show the write through cache type.
> >
> Could you help us with it? We have very limited access to HW, only what we
> can publicly find in Beaker and it can take ages to get the machines from
> there.

HP EVA advertises a write-through cache.

Comment 17 Jaroslav Škarvada 2012-09-24 23:11:28 UTC

(In reply to comment #11)
> (In reply to comment #7)
> > In addition, the current approach does not work for any device-mapper drives (like LVM). These are ignored right now.
> > 
> 
> We could parse the lsblk output to get the matching physical drives.

Proposed bash function to match mount points to physical drives (could be optimised):

function find_disk
{
  TREE=`lsblk -l 2>/dev/null | tac` || return
  LINES=`echo "$TREE" | grep -n " $1\$" | cut -d':' -f1`
  for l in $LINES
  do
    echo "$TREE" | sed -n "$l,$ p" | grep -m1 " disk " | sed 's/^\([^ ]\+\).*/\1/'
  done | uniq
}

Comment 19 Jan Vcelak 2012-09-27 11:57:12 UTC

Fix committed to upstream repository:
http://git.fedorahosted.org/cgit/tuned.git/commit/?h=1.0&id=64b45e8

This solution can handle LVM volumes on multiple physical devices. Barriers are not disabled in case any of the underlying device has "write back" cache. We can check this only if the device uses SCSI in kernel (SATA, SAN, ...). If we cannot check the cache type, we assume that the barriers can be safely enabled. Barriers are also not disabled on / and /boot partitions.

Comment 21 Jan Vcelak 2012-09-27 14:32:31 UTC

Resolved in tuned-0.2.19-8.el6

Comment 23 Eric Sandeen 2012-09-28 20:27:03 UTC

I'm curious, what happens if an explicit "barrier" or "nobarrier" is placed in fstab; does tuned profile enterprise-storage (or default) override it?  I think it does, and I'm not sure it should.  Granted, that may be a separate bug/RFE ...

what do you think?

Comment 24 Jan Vcelak 2012-10-02 11:25:26 UTC

> I'm curious, what happens if an explicit "barrier" or "nobarrier" is placed
> in fstab; does tuned profile enterprise-storage (or default) override it?  I
> think it does, and I'm not sure it should.  Granted, that may be a separate
> bug/RFE ...

This was not handled by the old code either. And it can be definitively improved. Please, create a different RFE bugzilla for that.

Comment 27 Branislav Blaškovič 2012-11-28 16:57:09 UTC

I've connected SCSI disc with Cache mode: writeback in KVM.

  # cat /sys/block/sda/device/scsi_disk/2\:0\:0\:0/cache_type
  write back
  # rpm -q tuned
  tuned-0.2.19-7.el6.noarch
  # tuned-adm profile enterprise-storage
  # mount | grep "/dev/sda"
  /dev/sda on /mnt/test type ext4 (rw,barrier,nobarrier)
  ^ Bug reproduced                            ^

After upgrading the package

  # rpm -q tuned
  tuned-0.2.19-9.el6.noarch
  # tuned-adm profile enterprise-storage
  # mount | grep "/dev/sda"
  /dev/sda on /mnt/test type ext4 (rw)
  ^ bug verified                     ^


I will try to do automatic test later.

Comment 29 errata-xmlrpc 2013-02-21 10:05:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0386.html