Bug 1168540
Summary: | [scale] RHEV-M with remote database is not functional, every hour DC crashes. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Yuri Obshansky <yobshans> | ||||||||
Component: | ovirt-engine | Assignee: | Nobody <nobody> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | |||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.5.0 | CC: | ecohen, gklein, iheim, laravot, lpeer, lsurette, michal.skrivanek, oourfali, pkliczew, pstehlik, rbalakri, Rhev-m-bugs, yeylon, yobshans | ||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||
Target Release: | 3.5.0 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | infra | ||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-12-01 16:42:24 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1164308, 1164311 | ||||||||||
Attachments: |
|
Description
Yuri Obshansky
2014-11-27 09:21:14 UTC
Created attachment 961917 [details]
engine.log
Created attachment 961918 [details]
vdsm.log
Created attachment 961919 [details]
screenshot errors
It seems that the bug #1102147 is related. VM profile: 1 G RAM, 2 CPUs, 10G disk The problem shoudn't be related to remote db, the repoStats is called every minutes and lastCheck returns very high values, which i assume is related to the heavy load the host is under. you can increase the MaxStorageVdsTimeoutCheckSec config value so the domain won't be considered as "problematic" under that report results. (read also https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c89) Thread-10375::INFO::2014-11-27 10:15:13,996::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'6a70504d-00df-4869-8695-0f4420 53f6b6': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.00100719', 'lastCheck': '0.5', 'valid': True}, '8bf73417-ab48-40db-8590-514f64fc75b0': {'code ': 0, 'version': 3, 'acquired': True, 'delay': '0.000568427', 'lastCheck': '69.8', 'valid': True}} Thread-10375::INFO::2014-11-27 10:17:46,504::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'6a70504d-00df-4869-8695-0f442053f6b6': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.00186739', 'lastCheck': '50.9', 'valid': True}, '8bf73417-ab48-40db-8590-514f64fc75b0': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000568427', 'lastCheck': '222.3', 'valid': True}} So, why I can load host with 200 VMs using local database? As I mentioned above, test passed successfully. How can you describe that? (In reply to Yuri Obshansky from comment #7) > So, why I can load host with 200 VMs using local database? > As I mentioned above, test passed successfully. Can you describe the differences between those 2 runs? E.g. is this the same HW? Is storage shared with anyone/anything? Same scripts were used to start all those VMs in both cases? Was there any difference in the rate they have been created and/or started? What is the load on host(s) with 100 VMs, highest CPU, highest I/O? Are the guests installed/doing anything? Thanks, michal (In reply to Yuri Obshansky from comment #7) > So, why I can load host with 200 VMs using local database? > As I mentioned above, test passed successfully. > How can you describe that? Hi Yuri, I performed RCA for the log of the failure, as i mentioned - in this log there's no evidence for the DB as a cause for the issue, the problem is between the host-storage and the vm/host monitoring coupling. When there will be logs of a succefull run, we'll be able to compare the differences and to analyze why it succeeded, as it seems to me - the DB isn't the cause here, and it was matter of luck. Hi, I tried reproduce it several times unsuccessfully. Looks like you are right and this problem is not related to remote database. Let's close it. |