Bug 975052 - volume cloning is failing with an I/O error happening when the backing snapshot is wiped
Summary: volume cloning is failing with an I/O error happening when the backing snapsh...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-packstack
Version: 3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: async
: 4.0
Assignee: Martin Magr
QA Contact: Yogev Rabl
URL:
Whiteboard: storage
Depends On:
Blocks: 974575 975894
TreeView+ depends on / blocked
 
Reported: 2013-06-17 14:04 UTC by Giulio Fidente
Modified: 2016-04-26 21:32 UTC (History)
16 users (show)

Fixed In Version: openstack-packstack-2013.2.1-0.1.dev691.el6ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 975894 (view as bug list)
Environment:
Last Closed: 2013-12-20 00:06:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1191812 0 None None None Never
OpenStack gerrit 33553 0 None None None Never
Red Hat Product Errata RHEA-2013:1859 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2013-12-21 00:01:48 UTC

Description Giulio Fidente 2013-06-17 14:04:12 UTC
Description of problem:
cloning a volume using the --source-volid argument is failing with the following:

> Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf dd if=/dev/zero of=/dev/mapper/cinder--volumes-clone--snap--727c84cc--69ee--4b7d--bdb0--832b5086f1bb count=1024 bs=1M conv=fdatasync
Exit code: 1
Stdout: ''
Stderr: "/bin/dd: fdatasync failed for `/dev/mapper/cinder--volumes-clone--snap--727c84cc--69ee--4b7d--bdb0--832b5086f1bb': Input/output error\n1024+0 records in\n1024+0 records out\n1073741824 bytes (1.1 GB) copied, 19.7612 s, 54.3 MB/s\n"

NOTE: the actual dd from the origin snapshot into the destination volume is working; it is the cleanup of the origin snapshot which fails


Version-Release number of selected component (if applicable):
openstack-cinder-2013.1.2-3.el6ost.noarch


Steps to Reproduce:
1. create a new empty volume (vol1)
2. create a new volume cloning vol1

Comment 1 Kushal Das 2013-06-17 14:17:34 UTC
I am having the same error message when tried to clone using the same way. The following python code can reproduce the error.

from cinderclient.v1 import volumes
from cinderclient.v1 import client
from cinderclient.v1 import volume_snapshots
from cinderclient import exceptions
username='admin'
api_key='openstack'
project_id='admin'
auth_url='http://127.0.0.1:35357/v2.0/'
cclient= client.Client(username ,api_key ,project_id ,auth_url )
kargs = {'display_name' : 'vol11'}
res1 = cclient.volumes.create(1, **kargs)
# Wait for some time for the image to be created
kargs = {'display_name' : 'vol12', 'source_volid': res1.id}
res4 = cclient.volumes.create(1, **kargs)

The error message can be found at /var/log/volume.log

Comment 2 Eric Harney 2013-06-17 15:36:12 UTC
strace on dd process (used to wipe after data copy):

read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
fdatasync(1)                            = -1 EIO (Input/output error)

Presumably introduced by https://review.openstack.org/#/c/20125/ , looks like /dev/zero does not always support fdatasync().



Kushal you can probably work around this by setting volume_clear=none in cinder.conf if it is holding up your work.  Not generally recommended as it is insecure.

Comment 3 Eric Harney 2013-06-17 15:44:28 UTC
(In reply to Eric Harney from comment #2)
> 
> Presumably introduced by https://review.openstack.org/#/c/20125/ , looks
> like /dev/zero does not always support fdatasync().
> 

Oops, actually the above error was showing that the LVM device failed fdatasync, not /dev/zero.

Comment 4 Eric Harney 2013-06-17 16:49:17 UTC
What's happening here is that the LVM snapshot being written to from /dev/zero becomes invalid -- seems to be "full".  (Shows as "I" in the Attr output of lvs, and "INACTIVE" in lvdisplay.)


Data% is at 100.00:
LV                                              VG             Attr      LSize   Pool Origin                                      Data%  Move Log Cpy%Sync Convert
  clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 cinder-volumes swi-I-s--   1.00g      volume-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 100.00                          

[root@rhelvm cinder(keystone_admin)]# lvdisplay cinder-volumes/clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9
  --- Logical volume ---
  LV Path                /dev/cinder-volumes/clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9
  LV Name                clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9
  VG Name                cinder-volumes
  LV UUID                bfVqn4-MwYG-lDFn-m5iC-0L0C-M3b7-brKjyC
  LV Write Access        read/write
  LV Creation host, time rhelvm.localdomain, 2013-06-17 12:44:49 -0400
  LV snapshot status     INACTIVE destination for volume-fd162a44-60bf-41c1-ba5f-2ffabf2989b9
  LV Status              available
  # open                 0
  LV Size                1.00 GiB
  Current LE             256
  COW-table size         1.00 GiB
  COW-table LE           256
  Snapshot chunk size    4.00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     1024
  Block device           253:3

Comment 5 Eric Harney 2013-06-17 17:06:55 UTC
The problem can be worked around by enabling LVM snapshot autoextend.  RHEL 6.4 has this disabled by default.

To enable, set "snapshot_autoextend_threshold" in /etc/lvm/lvm.conf.  (I set it to 75.)

This causes the snapshot to grow when it is full, so that the zeroing can complete.  The question then is, should we use this, or should Cinder actually be asking for a slightly bigger LVM snapshot?  (By default it creates one precisely the same size as the volume, but it seems that this isn't sufficient when you are expecting to wipe the entire volume size on it.)

Comment 6 Eric Harney 2013-06-18 18:24:05 UTC
Plan from the storage call per abaron is to fix this in z-stream by having packstack automatically enable LVM autoextend during installation.  This will mitigate this problem.

Comment 10 Eric Harney 2013-06-18 19:13:49 UTC
Goal for packstack is to set:

snapshot_autoextend_threshold = 80
snapshot_autoextend_percent = 20

in the "activation" section of /etc/lvm/lvm.conf.

No other action is required to load this configuration.

Comment 11 Alasdair Kergon 2013-06-18 21:33:58 UTC
LVM has an existing option to issue discards against any extents it frees.  (Hardware that guarantees zeroing discarded extents could use this instead.)

I think we could modify LVM reasonably easily to write zeroes instead (or as well as) sending discards.  The price to pay would be a delay while waiting for the zeroing to complete, locking out any concurrent LVM update operations.

    # Issue discards to a logical volumes's underlying physical volume(s) when
    # the logical volume is no longer using the physical volumes' space (e.g.
    # lvremove, lvreduce, etc).  Discards inform the storage that a region is
    # no longer in use.  Storage that supports discards advertise the protocol
    # specific way discards should be issued by the kernel (TRIM, UNMAP, or
    # WRITE SAME with UNMAP bit set).  Not all storage will support or benefit
    # from discards but SSDs and thinly provisioned LUNs generally do.  If set
    # to 1, discards will only be issued if both the storage and kernel provide
    # support.
    # 1 enables; 0 disables.
    issue_discards = 0

Other LVM options would take longer to code, such as modifying lvremove and/or lvconvert to detach the snapshot "cow" volume, leaving it as a normal LV for you to wipe and remove.

If at some point the code starts using LVM's thin provisioning/snapshots, this already offers the facility to have any newly-allocated space filled with zeroes.

Note that the existing method - writing to the snapshot itself - will not overwrite the snapshot metadata (i.e. what blocks were mapped where and in what sequence).

Comment 12 Eric Harney 2013-06-18 22:30:37 UTC
For testing: when the LV snap is auto-extended, a message will appear in /var/log/messages like

Jun 18 18:09:30 rhel-lvm lvm[19104]: Snapshot cinder--volumes-clone--snap--af3affea--4689--41fb--a0b0--df34534bd5ae is now 90% full.
Jun 18 18:09:30 rhel-lvm lvm[19104]: Extending logical volume clone-snap-af3affea-4689-41fb-a0b0-df34534bd5ae to 1.20 GiB
Jun 18 18:09:42 rhel-lvm lvm[19104]: Monitoring snapshot cinder--volumes-clone--snap--af3affea--4689--41fb--a0b0--df34534bd5ae
Jun 18 18:09:42 rhel-lvm lvm[19104]: Logical volume clone-snap-af3affea-4689-41fb-a0b0-df34534bd5ae successfully resized

Comment 13 Alan Pevec 2013-06-19 08:18:55 UTC
(In reply to Eric Harney from comment #3)
> > Presumably introduced by https://review.openstack.org/#/c/20125/ , looks
> > like /dev/zero does not always support fdatasync().
> Oops, actually the above error was showing that the LVM device failed
> fdatasync, not /dev/zero.

Adding pbrady (FHI) as he is the owner of 20125 review.

Comment 14 Pádraig Brady 2013-06-19 08:28:43 UTC
So the fdatasync is a good thing here to pinpoint the error.
Where is the size mismatch occurring exactly? Are we zeroing the wrong size.
or creating LVs with the wrong size in cinder?

Comment 15 Alan Pevec 2013-06-19 10:52:21 UTC
From the upstream review https://review.openstack.org/33347
"...there is a small percentage of space consumed by the metadata used to track the snapshot data, and this lives inside the space allocated for the snap LV itself."

Comment 16 Ian Wienand 2013-07-02 12:01:35 UTC
> Goal for packstack is to set:

> snapshot_autoextend_threshold = 80
> snapshot_autoextend_percent = 20

Unfortunately I don't think this is a sufficient solution.  I see this problem constantly on a CI job running tempest on a RHEL6 VM; it exhibits as a failure of tempest.api.volume.test_volumes_get:VolumesGetTestXML.test_volume_create_get_delete_as_clone.

i see lvm monitoring the created snapshot volume, but even with the extend threshold set to 50% it still never gets autoextended.  I believe this is because the dd from /dev/zero is just happening too fast.  To prove it was working, I manually did the dd of only 60%, and it did get autoextended

Undoubtedly the lvm daemon missing the autoextend has a lot to do with the underpowered vm running on a busy server (but that's a CI environment for you :)  On the other hand; really fast storage probably exhibits the issue too.

The correct solution seems to be increasing the snapshot size at the outset; but that upstream review seems currently be expired.

Comment 18 Alasdair Kergon 2013-07-02 12:23:20 UTC
I think the "correct" solution is in comment #11 - for lvm to make it easy for you to detach the device and zero the part that got used.

Comment 19 Eric Harney 2013-07-05 17:24:33 UTC
I agree that the autoextend change is not a sufficient solution (though I did not expect the behavior in comment #16) -- it was intended to get things working until we can have a more robust solution in place.

Increasing the snapshot size at the outset doesn't fix the problem as far as I can tell.  It may mean that the wipe succeeds, but only because you don't wipe the whole thing.  This can lead to some percentage of data not being wiped, if I understand correctly.

I think agk is correct that comment #11 is the only real way to fix this correctly.  (That, or recommend migration to the Thin LVM driver, which I think does not exhibit this problem.)

Comment 22 Yogev Rabl 2013-10-17 12:25:47 UTC
verified on openstack-cinder-2013.2-0.11.rc1.el6ost.noarch

Comment 25 errata-xmlrpc 2013-12-20 00:06:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html


Note You need to log in before you can comment on or make changes to this bug.