Red Hat Bugzilla – Full Text Bug Listing
|Summary:||concurrent user actions result in inconsistencies in gear DB|
|Product:||Atomic Enterprise Platform and OpenShift Enterprise||Reporter:||Luke Meyer <lmeyer>|
|Component:||Kubernetes||Assignee:||Luke Meyer <lmeyer>|
|Status:||CLOSED ERRATA||QA Contact:||libra bugs <libra-bugs>|
|Version:||1.1.1||CC:||bhatiam, bleanhar, gpei, xjia|
|Fixed In Version:||Doc Type:||Bug Fix|
Cause: MongoDB access was not done in a way that always guaranteed consistency. Consequence: If multiple alterations were performed to a user's application(s) concurrently, it was possible for some of them to get overwritten (thus lost) by others, making MongoDB inconsistent with the reality of the gears on the node. The canonical example was if the same app was scaled up by two separate logins concurrently, one of the gears would not be known to MongoDB. Fix: Distributed locking mechanisms were introduced with the DB schema and model refactor that went into OSE 1.2. Upgrade to OSE 1.2. Result: User actions should be successfully queued for consistentcy.
|Last Closed:||2013-07-09 15:49:27 EDT||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:||855307|
Description Luke Meyer 2012-11-19 15:45:54 EST
Description of problem: Essentially, there are some race conditions currently that can cause changes to MongoDB to be overwritten by concurrent changes to the same user's apps/gears. This may cause gears to exist on node hosts that are unreferenced by MongoDB, or vice versa. Version-Release number of selected component (if applicable): OSE 1.0 How reproducible: In the upstream bug, this was reliably produced by manually triggering multiple concurrent scale-up events against a scaled app. There are probably other cases of user concurrent actions with similar results. Additional info: This sort of problem can be detected by regular monitoring of the "oo-admin-chk" command on the broker. Administrative action will be required to adjust gear usage counts (oo-admin-ctl-user), remove phantom apps from the MongoDB (oo-admin-ctl-app), or remove unreferenced gears from node hosts.
Comment 2 Gaoyun Pei 2013-03-27 03:06:30 EDT
Found one issue belong to this bug against puddle: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-21.1/ Description of problem: After trigger 3 scale-up events at same time, "oo-admin-chk" would report an error about the inconsistency between node and mongodb. How reproducible: always Steps to Reproduce: 1.Create scalable app and disable auto-scaling 2.Trigger 3 scale-up events at same time for i in `seq 1 3 `; do curl -k -X POST -H 'Accept: application/xml' -d event=scale-up --user firstname.lastname@example.org:redhat https://broker.rhn.com/broker/rest/domains/1010/applications/app/events & done 3.Run oo-admin-chk on broker [root@broker ~]# oo-admin-chk Check failed. FAIL: user email@example.com has a mismatch in consumed gears (5) and actual gears (4)! Gear 2c61a7ebb19a4a68a7bd2c8b5454f298 exists on node [node1.rhn.com, uid:1154] but does not exist in mongo database Actual results: Some gears exist on node but does not exist in mongodb.
Comment 3 Gaoyun Pei 2013-03-27 03:14:54 EDT
Sometimes, after I trigger 3 or 5 scale-up events at same time, when I checking the gear number of the scalable app via REST api, the result does not match the real number of gears on nodes. QE would like to make this bug to trace multiple concurrent scale-up issue.
Comment 5 xjia 2013-05-02 21:02:28 EDT
Version: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.2/2013-05-02.1 Verify: Scale up 10 times. 4 fail. 6 success. When failed, it tells that: Application is currently busy performing another operation. Please try again in a minute. Whatever, the data in mongodb is accordance with the actual gear info. [root@broker ~]# [root@broker ~]# rhc apps php @ http://php-jia1.osev2.com/ (uuid: 518309ae4052a73a05000006) ----------------------------------------------------------------- Created: 5:49 PM Gears: 6 (defaults to small) Git URL: ssh://firstname.lastname@example.org/~/git/php.git/ SSH: email@example.com php-5.3 (PHP 5.3) ----------------- Scaling: x6 (minimum: 1, maximum: available) on small gears haproxy-1.4 (OpenShift Web Balancer) ------------------------------------ Gears: Located with php-5.3 You have 1 applications [root@broker ~]# oo-admin-chk -v Started at: 2013-05-02 17:57:33 -0700 Time to fetch mongo data: 0.01s Total gears found in mongo: 6 Time to get all gears from nodes: 20.277s Total gears found on the nodes: 6 Total nodes that responded : 2 Checking application gears and ssh keys on corresponding nodes: 518309ae4052a73a05000006 : String... OK 51830a0a4052a73a05000028 : String... OK 51830a404052a73a05000035 : String... OK 51830a774052a73a05000042 : String... OK 51830ab14052a73a0500004f : String... OK 51830aef4052a7f30a000002 : String... OK Checking node gears in application database: 51830a0a4052a73a05000028... OK 51830aef4052a7f30a000002... OK 51830a404052a73a05000035... OK 518309ae4052a73a05000006... OK 51830a774052a73a05000042... OK 51830ab14052a73a0500004f... OK Success Total time: 20.287s Finished at: 2013-05-02 17:57:53 -0700
Comment 7 errata-xmlrpc 2013-07-09 15:49:27 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1031.html