Bug 878205
Summary: | concurrent user actions result in inconsistencies in gear DB | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Luke Meyer <lmeyer> |
Component: | Node | Assignee: | Luke Meyer <lmeyer> |
Status: | CLOSED ERRATA | QA Contact: | libra bugs <libra-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 1.1.1 | CC: | bhatiam, bleanhar, gpei, xjia |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
MongoDB access was not done in a way that always guaranteed consistency.
Consequence:
If multiple alterations were performed to a user's application(s) concurrently, it was possible for some of them to get overwritten (thus lost) by others, making MongoDB inconsistent with the reality of the gears on the node. The canonical example was if the same app was scaled up by two separate logins concurrently, one of the gears would not be known to MongoDB.
Fix:
Distributed locking mechanisms were introduced with the DB schema and model refactor that went into OSE 1.2. Upgrade to OSE 1.2.
Result:
User actions should be successfully queued for consistentcy.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-07-09 19:49:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 855307 | ||
Bug Blocks: |
Description
Luke Meyer
2012-11-19 20:45:54 UTC
Found one issue belong to this bug against puddle: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-21.1/ Description of problem: After trigger 3 scale-up events at same time, "oo-admin-chk" would report an error about the inconsistency between node and mongodb. How reproducible: always Steps to Reproduce: 1.Create scalable app and disable auto-scaling 2.Trigger 3 scale-up events at same time for i in `seq 1 3 `; do curl -k -X POST -H 'Accept: application/xml' -d event=scale-up --user gpei:redhat https://broker.rhn.com/broker/rest/domains/1010/applications/app/events & done 3.Run oo-admin-chk on broker [root@broker ~]# oo-admin-chk Check failed. FAIL: user gpei has a mismatch in consumed gears (5) and actual gears (4)! Gear 2c61a7ebb19a4a68a7bd2c8b5454f298 exists on node [node1.rhn.com, uid:1154] but does not exist in mongo database Actual results: Some gears exist on node but does not exist in mongodb. Sometimes, after I trigger 3 or 5 scale-up events at same time, when I checking the gear number of the scalable app via REST api, the result does not match the real number of gears on nodes. QE would like to make this bug to trace multiple concurrent scale-up issue. Version: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.2/2013-05-02.1 Verify: Scale up 10 times. 4 fail. 6 success. When failed, it tells that: Application is currently busy performing another operation. Please try again in a minute. Whatever, the data in mongodb is accordance with the actual gear info. [root@broker ~]# [root@broker ~]# rhc apps php @ http://php-jia1.osev2.com/ (uuid: 518309ae4052a73a05000006) ----------------------------------------------------------------- Created: 5:49 PM Gears: 6 (defaults to small) Git URL: ssh://518309ae4052a73a05000006.com/~/git/php.git/ SSH: 518309ae4052a73a05000006.com php-5.3 (PHP 5.3) ----------------- Scaling: x6 (minimum: 1, maximum: available) on small gears haproxy-1.4 (OpenShift Web Balancer) ------------------------------------ Gears: Located with php-5.3 You have 1 applications [root@broker ~]# oo-admin-chk -v Started at: 2013-05-02 17:57:33 -0700 Time to fetch mongo data: 0.01s Total gears found in mongo: 6 Time to get all gears from nodes: 20.277s Total gears found on the nodes: 6 Total nodes that responded : 2 Checking application gears and ssh keys on corresponding nodes: 518309ae4052a73a05000006 : String... OK 51830a0a4052a73a05000028 : String... OK 51830a404052a73a05000035 : String... OK 51830a774052a73a05000042 : String... OK 51830ab14052a73a0500004f : String... OK 51830aef4052a7f30a000002 : String... OK Checking node gears in application database: 51830a0a4052a73a05000028... OK 51830aef4052a7f30a000002... OK 51830a404052a73a05000035... OK 518309ae4052a73a05000006... OK 51830a774052a73a05000042... OK 51830ab14052a73a0500004f... OK Success Total time: 20.287s Finished at: 2013-05-02 17:57:53 -0700 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1031.html |