Bug 1482454
| Summary: | restarting engine cause all DC's to become non-responsive status for a couple of seconds | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Avihai <aefrat> | ||||
| Component: | BLL.Storage | Assignee: | Tal Nisan <tnisan> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Elad <ebenahar> | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.2.0 | CC: | aefrat, bugs, ebenahar, mburman | ||||
| Target Milestone: | --- | Keywords: | Automation, AutomationBlocker | ||||
| Target Release: | --- | Flags: | sbonazzo:
ovirt-4.3-
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-07-16 08:42:21 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Avihai
2017-08-17 10:29:35 UTC
Created attachment 1314628 [details]
engine & vdsm logs
This issue as it fails all my tests that include engine restart tests as DC is down for ~2 seconds & automation test teardown operations (detach Dc,remove DC...) fails as DC is not available . 1. Is this a regression? 2. I would say that you either need to improve your tests to check for status, or wait a while after engine restart. let's assume we fix it and it's not 'non-responsive' . What status do you expect it to be? It won't be 'Active' for a while for sure. As Avihai is on PTO for few days, The issue is not in checking the status, but the fact that restarting the ovirt-engine service triggers re-initialization of the data center. (In reply to Raz Tamir from comment #4) > As Avihai is on PTO for few days, > > The issue is not in checking the status, but the fact that restarting the > ovirt-engine service triggers re-initialization of the data center. I can't see any re-initialization in the logs (although if you point me to something I'm missing that would be great). As far as I can see, the engine just marks statuses as "unknown" until it gets confirmation that they are up. (In reply to Allon Mureinik from comment #5) > (In reply to Raz Tamir from comment #4) > > As Avihai is on PTO for few days, > > > > The issue is not in checking the status, but the fact that restarting the > > ovirt-engine service triggers re-initialization of the data center. > > I can't see any re-initialization in the logs (although if you point me to > something I'm missing that would be great). > As far as I can see, the engine just marks statuses as "unknown" until it > gets confirmation that they are up. CLOSE-NOTABUG / WONTFIX / DEFERRED? (In reply to Yaniv Kaul from comment #3) > 1. Is this a regression? No. I checked & this is occurring also on 4.1. > 2. I would say that you either need to improve your tests to check for > status, or wait a while after engine restart. let's assume we fix it and > it's not 'non-responsive' . What status do you expect it to be? It won't be > 'Active' for a while for sure. To clarify, the issues are: 1) DC goes to 'unresponsive' state after engine restart - why ? we did not restart VDSM but the engine. 2) After engine restart, the DC states goes like this: A) DC reach an 'active' state B) DC goes to 'Unknown' C) DC change back to 'active' Automation currently after engine restart, waits for 'active' DC state but as DC goes from 'active' -> 'unknown' state automation tries to perform actions on DC & fails as DC is not available. Sure, I can change automation tests to wait for these states changes but the question is are these DC states changes by design or not ? IMHO, it does not look reasonable for the DC to be 'active' & then go to 'unresponsive' . I would expect that 1) After restart engine, DC should not go to 'unresponsive' state at all. 2) If DC by design have to go to another state other then 'active' please change DC to 'active' state only when it is finally ready to work & avoid toggling states ('active' -> 'unknown' -> 'active') Closing old bugs, feel free to reopen if still needed. This bug is alive and affecting our automation tests. Fresh bug is BZ 1609565 Please fix this issue. |