Bug 594476
Summary: | status check program for vm.sh & user-controlled error tolerance | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Benjamin Kahn <bkahn> |
Component: | rgmanager | Assignee: | Lon Hohberger <lhh> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | urgent | ||
Version: | 5.4 | CC: | cluster-maint, djansa, edamato, fnadge, jkortus, jwest, lhh, pm-eus, psubrama, yeylon, ykaul |
Target Milestone: | rc | Keywords: | FutureFeature, ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rgmanager-2.0.52-6.el5_5.8 | Doc Type: | Enhancement |
Doc Text: |
Previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2010-08-25 06:33:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 583788 | ||
Bug Blocks: |
Description
Benjamin Kahn
2010-05-20 19:27:56 UTC
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=75c7ecae61d9400d084227304d5b7068fdaa310b http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=7a1e2af8cc37409b3a38fbe9b5945d186be9f2ec http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=61a7c0f26c4248778d656bcecb47375c419fafd6 http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=ea76067bc05359c1582bfd84be4c1b3eb05b230a Last patch has been updated: http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=805e0bae5db683fb33ec2e3c14b12c6380885494 we need some more improvement to the rhevm-check validation. 1. in current state we have only one timeout interval for rhev-check every X min. due to the VM restart take ~ 5 min. this is the minimum limit that the test can run. this is unelectable for the rhevm node period for downtime (5 min interval + 5 min boot time will cause 10 min. of downtime) we need to add a way to reduce this timeout to a more manner time. one way to do this is by adding two different types of intervals a. interval= X - for regular testing b. after_failure_interval = Y - time to wait after the VM was restarted before initial testing 2. in the current state after one failure of the rhev-check.sh the rhevm node will be rebooted which is not the best way to go, we need to take in account possible scenarios that the VM did not response due to load or other possible scenarios. we need to add a way to test few times before we determining if the RHEVM VM is dead. lets say if rhev-check.sh return error MSG once keep retry for X times for Y intervals and if all attempts has failed migrate the VM __max_failures="5" __failure_expire_time="60" 3. in current state the VM shutdown is being executed using virtsh shutdown and after 15 sec the KVM process is being killed so the VM did not have time to properly shutdown which can (and will) lead for corruption. (i had one) we need to increase the timeout between the shutdown of the VM and the process being killed (100~120 sec. should be fine) looks like at this stage rhev-check.sh does not work as expected. the 5 min timeout for starting a VM never ends. 1. migrate the VM service. 2. as soon as the VM was relocated kill the KVM process on the server 3. see that the rhev-check keep getting errors but will not try to migrate the service once again after 5 min as expected but only after half an hour. this will require respin. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0647.html Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program. Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program.+Previously, vm.sh only checked the status of the VM itself, not the status of any services inside. With this update, administrators may now use a newly provided status check program which checks the availability of services within virtual machines running Red Hat Enterprise Virtualization Manager. Timeouts for starting and stopping virtual machines are now configurable in cluster.conf. The start timeout is based on the status check program. |