Bug 1103787 - [RFE] rhc snapshot and restore ops should not return until all cartridges are available again
Summary: [RFE] rhc snapshot and restore ops should not return until all cartridges are...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Image
Version: 1.x
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Ben Parees
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-02 14:37 UTC by Oleg Fayans
Modified: 2016-12-01 00:28 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-11 22:53:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
wsg.py file for inserting test data into database (4.22 KB, text/x-python)
2014-06-02 14:37 UTC, Oleg Fayans
no flags Details

Description Oleg Fayans 2014-06-02 14:37:49 UTC
Created attachment 901472 [details]
wsg.py file for inserting test data into database

Description of problem:

When I snapshot the app with a database cartridge installed, having some considerable amount of data, I am unable to perform other database operations with the app for 2-3 seconds. Then the functionality restores. This causes a big-data scenarios to fail with no good reason.


Version-Release number of selected component (if applicable):
tested against latest devenv

How reproducible:
Always

Steps to Reproduce:
1. Crate a python-2.7 app with mysql cartridge installed
2. Deploy an attached wsgi.py file to the repo of the app and git push
3. Access the <app_url>/insert?size=10000 to insert a thousand simple records in the database
4. rhc snapshot-save -a <app_name>
5. Access the <app_url>/insert?size=5000 again immediately

Actual results:
503 Service Unavailable

Expected results:
Should insert the records successfully


Additional info:

Comment 1 Aleksandar Kostadinov 2014-06-02 14:45:27 UTC
Additionally I as a user would like to know that my app is in operational state after some rhc operations are complete - like snapshot create/restore. And I don't expect to have to wait indefinite time until the app is running normally.

That's increasingly important when application becomes bigger and user wants to automate particular type of usage like regular maintenance.

Comment 2 Ben Parees 2014-06-02 20:03:11 UTC
Currently the mysql cartridge is stopped as part of the snapshot process (and then restarted if it was started when the snapshot began), so that is likely the cause of the delay that you see.

Michal:  you worked in this space most recently, any idea why we stop the cartridge at the end of snapshot?  (postgres does it too).  Are we trying to ensure the DB doesn't write to other files during the snapshot operation?

Aleksandar:  Please open a separate RFE for your comment.

Comment 3 Aleksandar Kostadinov 2014-06-02 20:19:36 UTC
Ben, I think my comment is about the same thing as the requester described just from a different perspective. Basically after snapshot create/restore one would expect the app to be available just like after a start/restart operation one would expect the same.

Comment 4 Ben Parees 2014-06-02 20:33:17 UTC
Aleksandar:  Assuming the app was started prior to the snapshot/restore op, the app will be available soon after the snapshot/restore op completes (pending the need for a start/restart).  My understanding is you want a better report of when the app is actually back up.

From my understanding, the bug is questioning why the app becomes temporarily unavailable during a snapshot operation, not asking for additional information about when the app is available again.

Comment 5 Aleksandar Kostadinov 2014-06-04 09:04:37 UTC
Ben, my request to have the app immediately available after snapshot create/restore call.
I think the original report is about the same thing. I'll have Oleg to confirm.

Comment 6 Oleg Fayans 2014-06-04 09:11:36 UTC
Alexandar is right. When we perform rhc snapshot-save/snapshot-restore, the operation (ideally) should not return an exit code until the app is fully backed-up/restored. I can imagine that database restore is performed in a separate thread in background and it is hard to have the response from that process. But if it would be possible to fix, it would improve stability of our autotests largely.

Comment 7 Oleg Fayans 2014-06-04 10:54:32 UTC
In fact we have a workaround for this bug in our scenarios: after each snapshoot save/restore we wait 10 seconds and then the scenario passes. Should we leave it as it is, Ben?

Comment 8 Aleksandar Kostadinov 2014-06-04 11:41:32 UTC
Actually what we observe is that web cart seems to be available after snapshot operation but there is some time needed for the DB cart. Is this correct, Oleg? I think that for our test suite as well for users it would be better all carts to be available after such operation. Having random sleep times is very inconvenient and unreliable in scripts.
IMHO if we are already handling web cart properly, then perhaps that can be extended to the DB carts as well?

Regards,
Aleksandar

Comment 9 Ben Parees 2014-06-04 13:51:24 UTC
Ok, thanks for the clarification, i've updated the title to hopefully better reflect what is being requested here.

Comment 10 Ben Parees 2014-06-04 14:01:52 UTC
to follow up on comment 2 (now irrelevant to this bug):

mfojtik indicated the db stop is needed to ensure consistent state of the db files that are snapshotted.  we're not clear on why there is both a dump taken, and the db files backed up, would need further investigation.


Note You need to log in before you can comment on or make changes to this bug.