Bug 1456504 - LSM status should be polled using the job and not the entities status
Summary: LSM status should be polled using the job and not the entities status
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-system-tests
Classification: Community
Component: General
Version: ---
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: milestone3
: ---
Assignee: Benny Zlotnik
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-29 13:36 UTC by Dafna Ron
Modified: 2018-06-11 15:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-11 15:40:18 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: planning_ack+
amureini: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1460694 0 unspecified CLOSED It's possible to issue multiple concurrent LSM commands on the same disk 2021-02-22 00:41:40 UTC
oVirt gerrit 84338 0 None None None 2018-02-12 12:09:07 UTC
oVirt gerrit 89061 0 None None None 2018-03-15 13:33:52 UTC

Internal Links: 1460694

Description Dafna Ron 2017-05-29 13:36:26 UTC
Description of problem:

Having the lock in the Command scope causes a race condition if an operation runs after LSM tasks are done but the engine lock was not released yet.

Version-Release number of selected component (if applicable):

4.2

How reproducible:

race condition happens sporadically 

Steps to Reproduce:
1. run ost in ovirt-jenkins
2.
3.

Actual results:

some of the runs would fail on conflict error for disk lock 

Expected results:

hotunplug should not fail on conflict with disk lock 

Additional info:

https://gerrit.ovirt.org/#/c/77504/
https://gerrit.ovirt.org/#/c/77499/h

Comment 1 Tal Nisan 2017-10-18 11:08:23 UTC
Benny, I see the "see also" bugs are closed and verified, this issue is fixed as well as part of those fixes?

Comment 2 Benny Zlotnik 2017-10-30 11:05:55 UTC
(In reply to Tal Nisan from comment #1)
> Benny, I see the "see also" bugs are closed and verified, this issue is
> fixed as well as part of those fixes?

No, this issue currently waits for this: https://bugzilla.redhat.com/show_bug.cgi?id=1460701, so a sensible solution can be merged (it's currently fixed with a time.sleep(3))

Comment 3 Allon Mureinik 2017-11-23 16:23:40 UTC
(In reply to Benny Zlotnik from comment #2)
> (In reply to Tal Nisan from comment #1)
> > Benny, I see the "see also" bugs are closed and verified, this issue is
> > fixed as well as part of those fixes?
> 
> No, this issue currently waits for this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1460701, so a sensible solution
> can be merged (it's currently fixed with a time.sleep(3))

So this is actually an OST bug? (i.e., OST isn't using the right mechanism to track LSM's completion)?

Comment 4 Benny Zlotnik 2017-11-28 09:04:58 UTC
(In reply to Allon Mureinik from comment #3)
> (In reply to Benny Zlotnik from comment #2)
> > (In reply to Tal Nisan from comment #1)
> > > Benny, I see the "see also" bugs are closed and verified, this issue is
> > > fixed as well as part of those fixes?
> > 
> > No, this issue currently waits for this:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1460701, so a sensible solution
> > can be merged (it's currently fixed with a time.sleep(3))
> 
> So this is actually an OST bug? (i.e., OST isn't using the right mechanism
> to track LSM's completion)?

Yes, I confused it with something else. The lock changing solution didn't work and caused another bug. I'll changed the description

Comment 5 Allon Mureinik 2018-03-04 10:10:09 UTC
Eyal - once OST code is merged, it's essentially live.
Is there any point to leave this BZ on MODIFIED?

I propose the following:
1. Move the ON_QA now.
2. If the LSM test passes for the next X build (let's say a week?), it can be marked as CLOSED CURRENTRELEASE.

Does this make sense to you?

Comment 6 Eyal Edri 2018-03-04 10:19:26 UTC
Sounds right, though maybe we should wait a bit more if it was a known race? 
anyway @Dafna can track it and if the test is failing again we can think if to reopen the bug.

I'm OK with moving to ON_QA.

Comment 7 Benny Zlotnik 2018-03-08 09:13:32 UTC
Moving back to POST for now as the patch was reverted since the 4.2 SDK wasn't built with the updated ovirt-engine-api-model yet

Comment 8 Allon Mureinik 2018-03-15 12:20:41 UTC
The master and 4.2 suites have been separated (see commit 9a861d4 for details). Can we un-revert this patch?

Comment 9 Benny Zlotnik 2018-03-15 12:36:15 UTC
(In reply to Allon Mureinik from comment #8)
> The master and 4.2 suites have been separated (see commit 9a861d4 for
> details). Can we un-revert this patch?

Done

Comment 10 Allon Mureinik 2018-03-21 12:49:06 UTC
Patch has been merged.
Setting BZ to ON_QA, and if we don't see any issues for a couple of weeks, I'll move it to CLOSED CURRENTRELEASE


Note You need to log in before you can comment on or make changes to this bug.