Bug 821219
Summary: | prevent executing upgrade when asynchronous tasks are still running | ||
---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Eli Mesika <emesika> |
Component: | ovirt-engine-installer | Assignee: | Moran Goldboim <mgoldboi> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | unspecified | CC: | acathrow, alourie, dyasny, hateya, iheim, mgoldboi, ykaul |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | integration infra | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-01-23 21:38:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
you can't tell an administrator, who has a scheduled window of downtime to perform the upgrade that they can't do it for a few hours. it is fine to warn user about this, cancel them if needed, but we can't have the admin blocked from performing an upgrade. (In reply to comment #1) > you can't tell an administrator, who has a scheduled window of downtime to > perform the upgrade that they can't do it for a few hours. > it is fine to warn user about this, cancel them if needed, but we can't have > the admin blocked from performing an upgrade. As stated above, it is just a recommendation and the admin is able to check the check-box immediately and run the upgrade without any wait. If the upgrade is done when async tasks are active and business entities or commands are changed , the result will be that the JBoss will not start because the core code will try to restore objects from the binary representation stored in those table and this is really dangerous and I don't know even if we have simple recover procedure for such cases. So, it is suggested as a recommendation when admin can force upgrade to occur whenever he likes by the additional checkbox described above. (In reply to comment #2) > (In reply to comment #1) > > you can't tell an administrator, who has a scheduled window of downtime to > > perform the upgrade that they can't do it for a few hours. > > it is fine to warn user about this, cancel them if needed, but we can't have > > the admin blocked from performing an upgrade. > > As stated above, it is just a recommendation and the admin is able to check the > check-box immediately and run the upgrade without any wait. > If the upgrade is done when async tasks are active and business entities or > commands are changed , the result will be that the JBoss will not start because > the core code will try to restore objects from the binary representation stored > in those table and this is really dangerous and I don't know even if we have > simple recover procedure for such cases. > So, it is suggested as a recommendation when admin can force upgrade to occur > whenever he likes by the additional checkbox described above. just had this incident today, current behavior is not good as upgrade.sh fails brutally. (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > you can't tell an administrator, who has a scheduled window of downtime to > > > perform the upgrade that they can't do it for a few hours. > > > it is fine to warn user about this, cancel them if needed, but we can't have > > > the admin blocked from performing an upgrade. > > > > As stated above, it is just a recommendation and the admin is able to check the > > check-box immediately and run the upgrade without any wait. > > If the upgrade is done when async tasks are active and business entities or > > commands are changed , the result will be that the JBoss will not start because > > the core code will try to restore objects from the binary representation stored > > in those table and this is really dangerous and I don't know even if we have > > simple recover procedure for such cases. > > So, it is suggested as a recommendation when admin can force upgrade to occur > > whenever he likes by the additional checkbox described above. > > just had this incident today, current behavior is not good as upgrade.sh fails > brutally. I don't think the problrm is in the upgrade.sh, in most cases it will complete successfully, but starting JBoss will invoke the Backend compensation handling on startup that assumes that if those tables are not clean we should try to rollback the falling command and since there will be no matching between objects binary representation in DB and actual object (because a code change modified some objects structure), this operation will fail causing the application not to start at all even after a successful upgrade. This already occurred several times in QE and has a chance to happen in customer sites as well. Patch is not affecting the downtime needed for an upgrade, rather , it recommends in the per-upgrade step when an upgrade is relatively safe and let the user ignore the warning on his own risk. The code that prevents upgrading when there are async tasks in the system was merged to 3.1. |
Description of problem: Since asynchronous tasks information are persisted to async_tasks table and compensation data (used for rollbacks) are persisted to business_entity_snapshot table, we should avoid upgrading the system when asynchronous tasks are still running. The reason is that those tables have some binary data representing business entities and command parameters. Since those may change from version to version, it is clear that any attempt to restore an object from its old binary representation will cause the system to crash. async_tasks Version-Release number of selected component (if applicable): How reproducible: Upgrading the system in the middle of asynchronous task when the upgrade changes some the objects stored in the above table (for example : a field was added to a business entity) Steps to Reproduce: 1. 2. 3. Actual results: Upgrades done when asynchronous tasks are still running may lead to a crash in core code that will prevent JBoss from running (since compensation data is checked in application startup) Expected results:postgres The installer should check if there are any asynchronous tasks are still running in async_tasks or business_entity_snapshot tables and ask the user to wait for task completion if he tries to upgrade the system when asynchronous tasks are still running Since we may have a crash that leaves unclear junk in those tables, the installer should have a check-box of "Force delete asynchronous tasks meta-data" In the case the user mark this check-box the database upgrade script will run with a -c flag that will force cleanup of the async_tasks and business_entity_snapshot tables Additional info: You can check if there are no records in the async_tasks and business_entity_snapshot tables by > echo "select count(*) from business_entity_snapshot,async_tasks;" | psql -U <user> --pset=tuples_only=on <database> This will return 0 if there is no data in those tables.