Bug 1069610 - Failed to remove a snapshot after restarting vdsm while creating this snapshot
Summary: Failed to remove a snapshot after restarting vdsm while creating this snapshot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.5.1
Assignee: Adam Litke
QA Contact: Gil Klein
URL:
Whiteboard: storage
Depends On:
Blocks: 1193195
TreeView+ depends on / blocked
 
Reported: 2014-02-25 11:47 UTC by Meital Bourvine
Modified: 2016-02-10 17:36 UTC (History)
13 users (show)

Fixed In Version: ovirt-3.5.1_rc1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-21 16:06:13 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1158563 0 unspecified CLOSED Failed to Delete First NFS snapshot with live merge 2021-02-22 00:41:40 UTC
oVirt gerrit 35096 0 master MERGED storage: Search only the current image for children Never
oVirt gerrit 35769 0 ovirt-3.5 MERGED storage: Search only the current image for children Never

Internal Links: 1158563

Description Meital Bourvine 2014-02-25 11:47:32 UTC
Description of problem:
Failed to remove a snapshot after restarting vdsm while creating this snapshot

Version-Release number of selected component (if applicable):
ovirt-beta3

How reproducible:
100%

Steps to Reproduce:
1. Take a live snapshot
2. Restart the vdsm (setup with only 1 host)
3. Try to remove the snapshot

Actual results:
It fails

Expected results:
Snapshot will be removed successfuly.

Additional info:

Comment 3 Federico Simoncelli 2014-02-25 12:27:04 UTC
Looking at:

a431eaca-3c30-44b2-8dbf-e4512578c0f5::INFO::2014-02-24 14:27:40,205::fileVolume::166::Storage.Volume::(delete) Request to delete volume bee17580-01e3-42ed-9f74-3ad66cd7a1be

...

743f33c6-c3bd-4eca-b235-c540cd93011d::DEBUG::2014-02-24 14:27:40,280::volume::1083::Storage.Volume::(qemuRebase) (qemuRebase): REBASE /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/925943e9-f2e6-4866-b84d-40ffa8d41014/c2542f19-9e11-430a-8f93-a37d90101842 DONE

...

743f33c6-c3bd-4eca-b235-c540cd93011d::DEBUG::2014-02-24 14:27:40,355::utils::556::root::(execCmd) '/bin/grep -E -H PUUID.*c2542f19-9e11-430a-8f93-a37d90101842 /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/48e65b80-a5c3-4a24-9a80-e796bc60b7d9/1cba543c-ac87-4b13-9725-e752f21fdd1b.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/48e65b80-a5c3-4a24-9a80-e796bc60b7d9/f5ffcf7c-d9ec-4cef-800a-0fa020c8dab4.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/48e65b80-a5c3-4a24-9a80-e796bc60b7d9/65564fc5-ad6d-4eb4-b672-e99dbcc178ce.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/5a659c95-9e2b-487d-9f61-d53a43cadc54/3e9b8604-5265-4344-9637-29797c07400d.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/5a659c95-9e2b-487d-9f61-d53a43cadc54/320310dc-096e-4554-bd08-60d035de16d4.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/5a659c95-9e2b-487d-9f61-d53a43cadc54/bee17580-01e3-42ed-9f74-3ad66cd7a1be.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/7a73985a-5b34-4849-9594-97e67d606f1c/134b9d45-d111-4778-92cb-a23e87c37f0e.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/7a73985a-5b34-4849-9594-97e67d606f1c/173210d7-c51d-4c6d-9611-1639ebff821c.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/7a73985a-5b34-4849-9594-97e67d606f1c/e3b82b55-0db6-4c4d-8668-a10b4255bd62.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/925943e9-f2e6-4866-b84d-40ffa8d41014/f2d65add-9744-4209-bf67-a3ba04d3aed4.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/925943e9-f2e6-4866-b84d-40ffa8d41014/c2542f19-9e11-430a-8f93-a37d90101842.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/925943e9-f2e6-4866-b84d-40ffa8d41014/1f2f5a90-6325-4592-8b8f-049e3a5571f6.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/b54ddb97-3e0b-45c7-a209-114348c8dd4e/01ac5fae-86ca-4463-8e09-8d5c1f8fa960.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/b54ddb97-3e0b-45c7-a209-114348c8dd4e/fda248cf-5c41-47e8-8944-d66a9babc82c.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/b54ddb97-3e0b-45c7-a209-114348c8dd4e/f9afbcf8-1d1a-473f-b6ed-0830c24c18d6.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/b54ddb97-3e0b-45c7-a209-114348c8dd4e/a1f6402e-4190-4b5d-80e6-d39e3b24b05e.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/b26752fc-c757-4732-878c-974d9ec07ff3/57e854c5-e5d6-4177-91a8-bbdefa4b46b2.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/c036e111-6d11-4507-a168-2a6713d836b4/a73eefe7-4c2f-4a24-ba85-077cda014e53.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/4d55e4f8-12c4-4541-86f6-cc099e37de2e/1048d8b1-5981-4304-84af-a57f988983e7.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/4d55e4f8-12c4-4541-86f6-cc099e37de2e/e35a3f05-b54f-43f4-aa7c-9db0f6833e6d.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/4d55e4f8-12c4-4541-86f6-cc099e37de2e/e157459f-c94b-4630-9908-b39d34ece366.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/4d55e4f8-12c4-4541-86f6-cc099e37de2e/c4918016-d4e5-47fd-b006-d76b8a32080a.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/2a763a37-e490-4031-a89e-4421ea4ba5f4/79d2aa26-5b17-4dd9-bad5-74f7392719a8.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/09ada8f4-cde1-4e93-8393-f37eaa0938e5/b25617a4-3c7a-4ecb-bbe8-f0f7a9635c83.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/47155032-d826-4c75-8057-e8dfe01ed7cc/a7aff92d-e7ca-4736-a915-c705bbcd75b8.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/80f31bf6-6bff-4e24-9878-f1701ba3b6d3/b20bc727-00a6-4499-a62a-18aff775ded8.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/074355b5-15ae-4482-9c3d-fb40ebcf3d57/bf568268-d90e-4ef4-9d51-891b4664266f.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/1f6bdfc1-b9bf-468e-ad44-8c846934954c/4d8c1e67-f3aa-44cb-9542-502ac49f1eec.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/c350abec-3aa4-482d-bbb3-14572ef73ad6/87487222-f4ce-4820-9030-d4ab4727e448.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/0f706a98-64fe-43a8-a9a5-fad86630ddce/6d3ffbf4-16a7-4345-9267-6ceb76e5bc46.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/9fafa9e9-bbb9-4a90-8b7b-bc9d221f622e/b73b9bfe-4b02-48fb-ad37-9417df9f21d4.meta /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/c5055291-58a4-4cdb-aaf9-cbe7cb6bfa7c/3df646a0-66c2-44cf-8882-3eb1b4932caa.meta' (cwd None)
a431eaca-3c30-44b2-8dbf-e4512578c0f5::DEBUG::2014-02-24 14:27:40,366::fileVolume::529::Storage.Volume::(validateVolumePath) validate path for 3e9b8604-5265-4344-9637-29797c07400d
a431eaca-3c30-44b2-8dbf-e4512578c0f5::DEBUG::2014-02-24 14:27:40,368::fileVolume::529::Storage.Volume::(validateVolumePath) validate path for 3e9b8604-5265-4344-9637-29797c07400d
a431eaca-3c30-44b2-8dbf-e4512578c0f5::INFO::2014-02-24 14:27:40,379::image::829::Storage.Image::(__teardownSubChain) Teardown volume 3e9b8604-5265-4344-9637-29797c07400d from image 5a659c95-9e2b-487d-9f61-d53a43cadc54
743f33c6-c3bd-4eca-b235-c540cd93011d::DEBUG::2014-02-24 14:27:40,380::utils::576::root::(execCmd) FAILED: <err> = '/bin/grep: /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/5a659c95-9e2b-487d-9f61-d53a43cadc54/bee17580-01e3-42ed-9f74-3ad66cd7a1be.meta: No such file or directory\n'; <rc> = 2
743f33c6-c3bd-4eca-b235-c540cd93011d::ERROR::2014-02-24 14:27:40,382::image::1125::Storage.Image::(merge) rc: 2, out: [], err: ['/bin/grep: /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/5a659c95-9e2b-487d-9f61-d53a43cadc54/bee17580-01e3-42ed-9f74-3ad66cd7a1be.meta: No such file or directory']
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 1113, in merge
    sdDom, srcVolParams, volParams, reqSize, chain)
  File "/usr/share/vdsm/storage/image.py", line 931, in _baseCowVolumeMerge
    unsafe=False, rollback=True)
  File "/usr/share/vdsm/storage/volume.py", line 250, in rebase
    self.recheckIfLeaf()
  File "/usr/share/vdsm/storage/volume.py", line 762, in recheckIfLeaf
    childrenNum = len(self.getChildren())
  File "/usr/share/vdsm/storage/fileVolume.py", line 381, in getChildren
    matches = grepCmd(pattern, metaPaths)
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 214, in grepCmd
    raise ValueError("rc: %s, out: %s, err: %s" % (rc, out, err))
ValueError: rc: 2, out: [], err: ['/bin/grep: /rhev/data-center/mnt/10.35.160.108:_RHEV_jenkins-vm-28__nfs__2014__02__24__13__47__48__206415/9ef357e9-6fa8-4da9-9b30-0ad0df5f515d/images/5a659c95-9e2b-487d-9f61-d53a43cadc54/bee17580-01e3-42ed-9f74-3ad66cd7a1be.meta: No such file or directory']


It seems that at least one problem is a race introduced in:

 317b957 New getChildrenList implementation.


In fact metaPaths is caching all the meta files present on a storage domain in order grep their content, at the same time one or more volumes/images (unrelated to the relevant locked one) may be removed.

    def getChildren(self):
        """ Return children volume UUIDs.

        Children can be found in any image of the volume SD.
        """
        domPath = self.imagePath.split('images')[0]
        metaPattern = os.path.join(domPath, 'images', '*', '*.meta')
        metaPaths = oop.getProcessPool(self.sdUUID).glob.glob(metaPattern)
        pattern = "%s.*%s" % (volume.PUUID, self.volUUID)
        matches = grepCmd(pattern, metaPaths)
        if matches:
            children = []
            for line in matches:
                volMeta = os.path.basename(line.split(':')[0])
                children.append(os.path.splitext(volMeta)[0])  # volUUID
        else:
            children = tuple()

        return tuple(children)


I am also worried that passing the (long) list of meta files of the entire domain to the grep command could not scale well (trash the logs, reach a limit of the command line, etc.).

Comment 4 Itamar Heim 2014-03-02 05:42:43 UTC
Setting target release to current version for consideration and review. please
do not push non-RFE bugs to an undefined target release to make sure bugs are
reviewed for relevancy, fix, closure, etc.

Comment 5 Sandro Bonazzola 2014-03-04 09:23:03 UTC
This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 6 Meital Bourvine 2014-04-09 10:56:25 UTC
What's the status of fixing this bug?
It causes many failures in automation tests.

Comment 7 Sandro Bonazzola 2014-06-11 07:04:48 UTC
This is an automated message:
oVirt 3.4.2 has been released.
This bug has been re-targeted from 3.4.2 to 3.4.3 since priority or severity were high or urgent.

Comment 8 Sandro Bonazzola 2014-06-11 07:05:24 UTC
This is an automated message:
oVirt 3.4.2 has been released.
This bug has been re-targeted from 3.4.2 to 3.4.3 since priority or severity were high or urgent.

Comment 10 Allon Mureinik 2014-07-03 12:08:48 UTC
Aharon, why is this an automation blocker?
I understand why this test case may fail, but what does this prevent you from testing?

Comment 11 Gadi Ickowicz 2014-07-21 13:34:42 UTC
(In reply to Allon Mureinik from comment #10)
> Aharon, why is this an automation blocker?
> I understand why this test case may fail, but what does this prevent you
> from testing?

This prevents us from running several test cases, but only those related to this specific scenario (restarting vdsm during snapshot creation operation). We could skip these specific tests until this is fixed.

Comment 12 Adam Litke 2014-11-10 15:25:35 UTC
*** Bug 1158563 has been marked as a duplicate of this bug. ***

Comment 13 Adam Litke 2014-11-10 15:29:25 UTC
Hi Federico, 

We should try to get this fixed since it is impacting many live merge scenarios.  In bug 1158563 you suggest that deleteVolumes should take an exclusive image lock but I am not convinced that will solve the case where there are two concurrent deletes on the same domain but involving different images.  Can you clarify your suggestion in case I am missing something?

Comment 14 Adam Litke 2014-12-01 21:46:04 UTC
Removing no longer needed needinfo

Comment 15 Sandro Bonazzola 2015-01-15 14:25:50 UTC
This is an automated message: 
This bug should be fixed in oVirt 3.5.1 RC1, moving to QA

Comment 16 Sandro Bonazzola 2015-01-21 16:06:13 UTC
oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.