Bug 1389764
Summary: | Cluster compatibility upgrade to 3.6 still hitting race condition. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Frank DeLorey <fdelorey> | ||||
Component: | ovirt-engine | Assignee: | Nobody <nobody> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | meital avital <mavital> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.6.9 | CC: | fdelorey, gklein, lsurette, michal.skrivanek, msivak, rbalakri, Rhev-m-bugs, srevivo, ykaul | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-01 10:55:43 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Frank DeLorey
2016-10-28 14:07:09 UTC
I am going to have the customer try the workaround from BZ 1369415 unless Engineering recommends otherwise. During a time frame in which you expect no running VM to stop and no stopped VM to start then you can reduce the frequency of updates of guest agent nics so they will not interfere with the cluster upgrade. This can be done by playing with the values of VdsRefreshRate and NumberVmRefreshesBeforeSave in the database (in vdc_options table). By default VdsRefreshRate=3 and NumberVmRefreshesBeforeSave=5 and that is why every 15 (3*5) seconds guest agent nics are being saved. You can change NumberVmRefreshesBeforeSave to 10000. Then restart ovirt-engine. Then upgrade the cluster. Then set NumberVmRefreshesBeforeSave back to 5. And then restart ovirt-engine again. 1) psql -U engine -c "update vdc_options set option_value = '1000' where option_name = 'NumberVmRefreshesBeforeSave';" 2) service ovirt-engine restart 3) update the VDI cluster to 3.6 4) psql -U engine -c "update vdc_options set option_value = '5' where option_name = 'NumberVmRefreshesBeforeSave';" 5) service ovirt-engine restart Created attachment 1215012 [details]
Engine log from time of failure
The cluster update was attempted at 2016-10-26 14:30:13
(In reply to Frank DeLorey from comment #0) > Description of problem: > Customer is running 3.6.9 and in one cluster is still hitting the race > condition marked fixed in BZ 1369415. According to logs you are hitting a different issue unrelated to bug 1369415 There seems to be anothe CpuProfile problem not fixed yet, I've only found bug 1386289 not in 3.6.9. Your suggested workaround wouldn't help. The offending VM needs to be fixed(sometimes Edit VM and save helps) or removed. It's a bit tricky to see which one it is, but it should be the last VM id in log before the whole transaction rolls back Hi Michal, Are you stating that the workaround from BZ 1386289 would help in this case? Thanks, Frank I don't actually know, it also might be best for SLA team to answer re workaround (adding Martin). I see it fails with ACTION_TYPE_CPU_PROFILE_EMPTY and so Edit that VM and assigning it a profile manually might resolve it. According to log it should be VM "training-mb-6", but there might be more, it stops on a first VM it finds a problem with. The customer stated that this was resolved when they implemented the fix for BZ 1386289. This BZ can probably be marked not-a-bug. meaning the resolution is not NOTABUG but rather a duplicate, especially since it has not been fixed upstream 3.6 and the downstream bug is on 3.6.10 which is not yet released *** This bug has been marked as a duplicate of bug 1386289 *** |