Description of problem: cloning a volume using the --source-volid argument is failing with the following: > Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf dd if=/dev/zero of=/dev/mapper/cinder--volumes-clone--snap--727c84cc--69ee--4b7d--bdb0--832b5086f1bb count=1024 bs=1M conv=fdatasync Exit code: 1 Stdout: '' Stderr: "/bin/dd: fdatasync failed for `/dev/mapper/cinder--volumes-clone--snap--727c84cc--69ee--4b7d--bdb0--832b5086f1bb': Input/output error\n1024+0 records in\n1024+0 records out\n1073741824 bytes (1.1 GB) copied, 19.7612 s, 54.3 MB/s\n" NOTE: the actual dd from the origin snapshot into the destination volume is working; it is the cleanup of the origin snapshot which fails Version-Release number of selected component (if applicable): openstack-cinder-2013.1.2-3.el6ost.noarch Steps to Reproduce: 1. create a new empty volume (vol1) 2. create a new volume cloning vol1
I am having the same error message when tried to clone using the same way. The following python code can reproduce the error. from cinderclient.v1 import volumes from cinderclient.v1 import client from cinderclient.v1 import volume_snapshots from cinderclient import exceptions username='admin' api_key='openstack' project_id='admin' auth_url='http://127.0.0.1:35357/v2.0/' cclient= client.Client(username ,api_key ,project_id ,auth_url ) kargs = {'display_name' : 'vol11'} res1 = cclient.volumes.create(1, **kargs) # Wait for some time for the image to be created kargs = {'display_name' : 'vol12', 'source_volid': res1.id} res4 = cclient.volumes.create(1, **kargs) The error message can be found at /var/log/volume.log
strace on dd process (used to wipe after data copy): read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576 fdatasync(1) = -1 EIO (Input/output error) Presumably introduced by https://review.openstack.org/#/c/20125/ , looks like /dev/zero does not always support fdatasync(). Kushal you can probably work around this by setting volume_clear=none in cinder.conf if it is holding up your work. Not generally recommended as it is insecure.
(In reply to Eric Harney from comment #2) > > Presumably introduced by https://review.openstack.org/#/c/20125/ , looks > like /dev/zero does not always support fdatasync(). > Oops, actually the above error was showing that the LVM device failed fdatasync, not /dev/zero.
What's happening here is that the LVM snapshot being written to from /dev/zero becomes invalid -- seems to be "full". (Shows as "I" in the Attr output of lvs, and "INACTIVE" in lvdisplay.) Data% is at 100.00: LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 cinder-volumes swi-I-s-- 1.00g volume-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 100.00 [root@rhelvm cinder(keystone_admin)]# lvdisplay cinder-volumes/clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 --- Logical volume --- LV Path /dev/cinder-volumes/clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 LV Name clone-snap-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 VG Name cinder-volumes LV UUID bfVqn4-MwYG-lDFn-m5iC-0L0C-M3b7-brKjyC LV Write Access read/write LV Creation host, time rhelvm.localdomain, 2013-06-17 12:44:49 -0400 LV snapshot status INACTIVE destination for volume-fd162a44-60bf-41c1-ba5f-2ffabf2989b9 LV Status available # open 0 LV Size 1.00 GiB Current LE 256 COW-table size 1.00 GiB COW-table LE 256 Snapshot chunk size 4.00 KiB Segments 1 Allocation inherit Read ahead sectors auto - currently set to 1024 Block device 253:3
The problem can be worked around by enabling LVM snapshot autoextend. RHEL 6.4 has this disabled by default. To enable, set "snapshot_autoextend_threshold" in /etc/lvm/lvm.conf. (I set it to 75.) This causes the snapshot to grow when it is full, so that the zeroing can complete. The question then is, should we use this, or should Cinder actually be asking for a slightly bigger LVM snapshot? (By default it creates one precisely the same size as the volume, but it seems that this isn't sufficient when you are expecting to wipe the entire volume size on it.)
Plan from the storage call per abaron is to fix this in z-stream by having packstack automatically enable LVM autoextend during installation. This will mitigate this problem.
Goal for packstack is to set: snapshot_autoextend_threshold = 80 snapshot_autoextend_percent = 20 in the "activation" section of /etc/lvm/lvm.conf. No other action is required to load this configuration.
LVM has an existing option to issue discards against any extents it frees. (Hardware that guarantees zeroing discarded extents could use this instead.) I think we could modify LVM reasonably easily to write zeroes instead (or as well as) sending discards. The price to pay would be a delay while waiting for the zeroing to complete, locking out any concurrent LVM update operations. # Issue discards to a logical volumes's underlying physical volume(s) when # the logical volume is no longer using the physical volumes' space (e.g. # lvremove, lvreduce, etc). Discards inform the storage that a region is # no longer in use. Storage that supports discards advertise the protocol # specific way discards should be issued by the kernel (TRIM, UNMAP, or # WRITE SAME with UNMAP bit set). Not all storage will support or benefit # from discards but SSDs and thinly provisioned LUNs generally do. If set # to 1, discards will only be issued if both the storage and kernel provide # support. # 1 enables; 0 disables. issue_discards = 0 Other LVM options would take longer to code, such as modifying lvremove and/or lvconvert to detach the snapshot "cow" volume, leaving it as a normal LV for you to wipe and remove. If at some point the code starts using LVM's thin provisioning/snapshots, this already offers the facility to have any newly-allocated space filled with zeroes. Note that the existing method - writing to the snapshot itself - will not overwrite the snapshot metadata (i.e. what blocks were mapped where and in what sequence).
For testing: when the LV snap is auto-extended, a message will appear in /var/log/messages like Jun 18 18:09:30 rhel-lvm lvm[19104]: Snapshot cinder--volumes-clone--snap--af3affea--4689--41fb--a0b0--df34534bd5ae is now 90% full. Jun 18 18:09:30 rhel-lvm lvm[19104]: Extending logical volume clone-snap-af3affea-4689-41fb-a0b0-df34534bd5ae to 1.20 GiB Jun 18 18:09:42 rhel-lvm lvm[19104]: Monitoring snapshot cinder--volumes-clone--snap--af3affea--4689--41fb--a0b0--df34534bd5ae Jun 18 18:09:42 rhel-lvm lvm[19104]: Logical volume clone-snap-af3affea-4689-41fb-a0b0-df34534bd5ae successfully resized
(In reply to Eric Harney from comment #3) > > Presumably introduced by https://review.openstack.org/#/c/20125/ , looks > > like /dev/zero does not always support fdatasync(). > Oops, actually the above error was showing that the LVM device failed > fdatasync, not /dev/zero. Adding pbrady (FHI) as he is the owner of 20125 review.
So the fdatasync is a good thing here to pinpoint the error. Where is the size mismatch occurring exactly? Are we zeroing the wrong size. or creating LVs with the wrong size in cinder?
From the upstream review https://review.openstack.org/33347 "...there is a small percentage of space consumed by the metadata used to track the snapshot data, and this lives inside the space allocated for the snap LV itself."
> Goal for packstack is to set: > snapshot_autoextend_threshold = 80 > snapshot_autoextend_percent = 20 Unfortunately I don't think this is a sufficient solution. I see this problem constantly on a CI job running tempest on a RHEL6 VM; it exhibits as a failure of tempest.api.volume.test_volumes_get:VolumesGetTestXML.test_volume_create_get_delete_as_clone. i see lvm monitoring the created snapshot volume, but even with the extend threshold set to 50% it still never gets autoextended. I believe this is because the dd from /dev/zero is just happening too fast. To prove it was working, I manually did the dd of only 60%, and it did get autoextended Undoubtedly the lvm daemon missing the autoextend has a lot to do with the underpowered vm running on a busy server (but that's a CI environment for you :) On the other hand; really fast storage probably exhibits the issue too. The correct solution seems to be increasing the snapshot size at the outset; but that upstream review seems currently be expired.
I think the "correct" solution is in comment #11 - for lvm to make it easy for you to detach the device and zero the part that got used.
I agree that the autoextend change is not a sufficient solution (though I did not expect the behavior in comment #16) -- it was intended to get things working until we can have a more robust solution in place. Increasing the snapshot size at the outset doesn't fix the problem as far as I can tell. It may mean that the wipe succeeds, but only because you don't wipe the whole thing. This can lead to some percentage of data not being wiped, if I understand correctly. I think agk is correct that comment #11 is the only real way to fix this correctly. (That, or recommend migration to the Thin LVM driver, which I think does not exhibit this problem.)
verified on openstack-cinder-2013.2-0.11.rc1.el6ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1859.html