Bug 1082941

Summary:

[vdsm] vmSnapshot fails on 'IOError: [Errno 22] Invalid argument'

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Elad <ebenahar>

Component:

vdsm

Assignee:

Arik <ahadas>

Status:

CLOSED ERRATA

QA Contact:

Gadi Ickowicz <gickowic>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

3.4.0

CC:

ahadas, amureini, bazulay, gklein, iheim, knesenko, lbopf, lpeer, michal.skrivanek, nlevinki, scohen, tnisan, yeylon

Target Milestone:

---

Keywords:

Regression

Target Release:

3.4.0

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

virt

Fixed In Version:

vdsm-4.14.7-0.1.beta3.el6ev

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-06-09 13:30:00 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1052318, 1083560

Attachments:

Description	Flags
engine, vdsm, libvirt, qemu and sanlock logs	none

Description Elad 2014-04-01 07:06:32 UTC

Created attachment 881229 [details]
engine, vdsm, libvirt, qemu and sanlock logs

Description of problem:
Live snapshot fails with the following error in vdsm.log:

Thread-32890::ERROR::2014-04-01 09:44:39,005::BindingXMLRPC::1081::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 1065, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 372, in vmSnapshot
    return vm.snapshot(snapDrives, snapMemVolHandle)
  File "/usr/share/vdsm/API.py", line 679, in snapshot
    return v.snapshot(snapDrives, memoryParams)
  File "/usr/share/vdsm/vm.py", line 4014, in snapshot
    _padMemoryVolume(memoryVolPath, spType, sdUUID)
  File "/usr/share/vdsm/vm.py", line 3881, in _padMemoryVolume
    padToBlockSize(memoryVolPath)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 297, in callCrabRPCFunction
    *args, **kwargs)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 199, in callCrabRPCFunction
    raise err
IOError: [Errno 22] Invalid argument


Version-Release number of selected component (if applicable):
RHEV-3.4-AV5
vdsm-4.14.2-0.2.el6ev.x86_64
libvirt-0.10.2-29.el6_5.5.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.7.x86_64
sanlock-2.8-1.el6.x86_64
rhevm-3.4.0-0.12.beta2.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create live snapshot
2.
3.



Additional info: engine, vdsm, libvirt, qemu and sanlock logs

Comment 2 Elad 2014-04-01 08:04:31 UTC

Checked also with vdsm-4.14.5-0.2.beta2.el6ev.x86_64 (av5)

Comment 3 Allon Mureinik 2014-04-01 10:02:43 UTC

Daniel, please take a look - is it possible that engine is sending the wrong parameters due to single disk snapshots refactoring?

Comment 4 Daniel Erez 2014-04-02 09:34:02 UTC

(In reply to Allon Mureinik from comment #3)
> Daniel, please take a look - is it possible that engine is sending the wrong
> parameters due to single disk snapshots refactoring?

Doesn't seem so. The issue has been reproduced a few times only on the specific environment. It looks like an IOError during the creation of memory volume, so I'm not sure there's anything else we can do in vdsm layer.

Comment 5 Tal Nisan 2014-04-02 13:04:23 UTC

(In reply to Daniel Erez from comment #4)
> (In reply to Allon Mureinik from comment #3)
> > Daniel, please take a look - is it possible that engine is sending the wrong
> > parameters due to single disk snapshots refactoring?
> 
> Doesn't seem so. The issue has been reproduced a few times only on the
> specific environment. It looks like an IOError during the creation of memory
> volume, so I'm not sure there's anything else we can do in vdsm layer.

Seems like a virt issue with memory padding, Michal, can one of your guys take a look please?
Note that Daniel could not consistently reproduce it.

Comment 6 Arik 2014-04-02 14:33:09 UTC

We are not supposed to pad volumes on block devices.
There's a check in vm.py that the storage domain type is file-based and only in that case we do the padding.
The memory volumes resided on block device so we should figure out why we think it is file in vdsm.

Comment 7 Arik 2014-04-02 14:34:07 UTC

(sorry, I removed the 'blocks' field by mistake)

Comment 8 Michal Skrivanek 2014-04-02 14:38:56 UTC

the type of the storage pool as returned by getStoragePoolInfo is NFS. Tal, are there changes done as part of local/shared SD feature I'm not aware of?
How is it supposed to be checked now?

Comment 9 Michal Skrivanek 2014-04-02 14:53:30 UTC

if so we need a better/new way how to check if the domain used for memory volume is file based or not

Comment 10 Allon Mureinik 2014-04-02 15:56:34 UTC

Some insight from Gadi: When examining the memory padding function, the decision on how to pad is done by the STORAGE POOL type (e.g.):

        def _padMemoryVolume(memoryVolPath, spType, sdUUId):
            if spType == sd.NFS_DOMAIN:
                oop.getProcessPool(sdUUID).fileUtils. \
                    padToBlockSize(memoryVolPath)
            else:
                fileUtils.padToBlockSize(memoryVolPath)

Now that we introduced storage mixed types, this assumption is wrong. Instead, we should inspect the type of the domain itself, presumably by producing it.

Comment 12 Arik 2014-04-10 11:09:01 UTC

http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=a4074deaafbe1f8a20a60b25fb4672cb52c671f2

Comment 13 Gadi Ickowicz 2014-04-28 12:00:32 UTC

Verified on av7.

Able to create live snapshot with memory for vm when pool type was NFS (master domain was NFS), and disk was on ISCSI domain.

Comment 14 Allon Mureinik 2014-05-09 10:55:42 UTC

lbopf - this is a regression that was introduced in the development cycle of 3.4.0, and fixed within it. There is no publicly available version with this problem. Why should we supply doctext for it?

Comment 15 Lucy Bopf 2014-05-12 00:07:47 UTC

Hi Allon,

At this stage, I am supplying doc text for all errata in my queue, since no flags have been set the contrary. I am also quite new to this process. Am I to understand that all bugs marked as 'regression' usually do not require doc text? If you can confirm that this is the case, I will clear the text.

Thanks.

Comment 16 Lucy Bopf 2014-05-12 04:50:58 UTC

Further to my last message: If a bug is attached to an erratum, then the assumption is that it will require doc text. If it doesn't require doc text, then please set the "requires_doc_text" flag to -. I will set the flag in this instance.

Thanks.

Comment 17 Allon Mureinik 2014-05-14 12:28:11 UTC

(In reply to lbopf from comment #15)
> Hi Allon,
> 
> At this stage, I am supplying doc text for all errata in my queue, since no
> flags have been set the contrary. I am also quite new to this process. Am I
> to understand that all bugs marked as 'regression' usually do not require
> doc text? If you can confirm that this is the case, I will clear the text.
Yes, this is usually true. Marking a bug as a regression usually means that "Version X had some functionality, an early build of X+1 broke it, and a later build un-broke it".
From a customer's perspective, who only only consumes official releases, this is a mute point - he has version X with a working functionality, and he upgrades to X+1 and observes the functionality is still available.
Obviously, this is not true for 100% of the cases, and sometimes caveats exists - but if they do, it's up to the developer that solved the bug to raise them with a "requires_doc_text?" flag. 

(In reply to lbopf from comment #16)
> Further to my last message: If a bug is attached to an erratum, then the
> assumption is that it will require doc text. If it doesn't require doc text,
> then please set the "requires_doc_text" flag to -. I will set the flag in
> this instance.
Duly noted. 
I'll instruct my team to pay more attention to this.

Comment 18 errata-xmlrpc 2014-06-09 13:30:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html