Bug 919379
| Summary: | The oo-admin-clear-pending-ops script will delete more data besides pending_op_groups object | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OKD | Reporter: | Rony Gong 🔥 <qgong> | ||||||||
| Component: | Pod | Assignee: | Rajat Chopra <rchopra> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> | ||||||||
| Severity: | low | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 2.x | CC: | dmcphers | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2013-04-02 14:27:03 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Rony Gong 🔥
2013-03-08 09:59:45 UTC
That is because the op_group that was pasted as 'stuck' was 'add_features'. So, it rolled back the features and emptied the application. Thats what it is supposed to do. As a further enhancement, the following can be done : 1. If the op_group was 'add_features' and a rollback is done. Then the app should be deleted too. 2. If the op_group was 'delete' and an execute is performed, then the app should be deleted too. Long term, the issue is that cartridge hooks are not re-entrant. That should be fixed. Keeping the bug open and thinking about other alternatives. Commits pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/5ecd2fe772d6f71bbfbc75d66316a46cc58a1281 fix bug919379 https://github.com/openshift/origin-server/commit/64013e8c90ec66a42a7d7dfa60005c99e380a633 Merge pull request #1605 from rajatchopra/master fix bug919379 - clear-pending-ops marks delete_app op Tested on devenv_2978,Same error as before @Rajat: for this point 1. If the op_group was 'add_features' and a rollback is done. Then the app should be deleted too. when rollback,here will delete the data of group_instance but will not delete data in node, like uid, folder This is design? Fixed with https://github.com/openshift/origin-server/pull/1773 If a rollback fails, the app would now NOT get deleted from mongo, unless all gears have been cleared up. Reassigned on devenv_2998, same error as before if one op "op_type": "create_group_instance" is "state": "completed" why it need execute op.execute, then raise exception "roll back", this will remove the data of group_instance. @Rajat, Do we need add filter out the op that "state" is"completed" ? op.execute is done on op_group (like add_features) and not on a particular op within (e.g. create_group_instance). The 'eligible_ops' function in pending_app_op_group.rb ensures that only non-complete ops are really executed. If you do a copy-paste of the op_group after it was actually executed, then op_group.execute will start from the point the copy was done. to me, the sequence of operations is this - 1. op_group with add_features is created in mongo 2. op_group goes ahead and executes until create_group_instance, but init_gear/create_gear/configure etc are still not complete. 3. A copy operation is done from mongo at this point 4. the op_group goes ahead and completes itself, thereby creating a gear etc.. 5. Now a paste is done of what was copied in #3, so mongo shows that group_instance is complete but gears need to be created 6. admin_clear_pending_ops script comes around and executes this op_group - it fails at creating the gear because gear already exists 7. all completed ops are rolled back, which means group_instance is emptied (but the gear is not deleted) If the above is true, then we would see what you are seeing but the flaw is that it would never occur in reality. Can you verify this by providing the broker/mcollective logs as well when the admin-clear-pending-ops operation takes place? @Rajat, the above is true, then agree with your opinion. here is the logs after above steps happened, when run "oo-admin-clear-pending-ops" Created attachment 716989 [details]
development.log
Created attachment 716990 [details]
mcollective-client.log
Created attachment 716992 [details]
mcollective.log
|