Description of problem: Essentially, there are some race conditions currently that can cause changes to MongoDB to be overwritten by concurrent changes to the same user's apps/gears. This may cause gears to exist on node hosts that are unreferenced by MongoDB, or vice versa. Version-Release number of selected component (if applicable): OSE 1.0 How reproducible: In the upstream bug, this was reliably produced by manually triggering multiple concurrent scale-up events against a scaled app. There are probably other cases of user concurrent actions with similar results. Additional info: This sort of problem can be detected by regular monitoring of the "oo-admin-chk" command on the broker. Administrative action will be required to adjust gear usage counts (oo-admin-ctl-user), remove phantom apps from the MongoDB (oo-admin-ctl-app), or remove unreferenced gears from node hosts.
Found one issue belong to this bug against puddle: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-21.1/ Description of problem: After trigger 3 scale-up events at same time, "oo-admin-chk" would report an error about the inconsistency between node and mongodb. How reproducible: always Steps to Reproduce: 1.Create scalable app and disable auto-scaling 2.Trigger 3 scale-up events at same time for i in `seq 1 3 `; do curl -k -X POST -H 'Accept: application/xml' -d event=scale-up --user gpei:redhat https://broker.rhn.com/broker/rest/domains/1010/applications/app/events & done 3.Run oo-admin-chk on broker [root@broker ~]# oo-admin-chk Check failed. FAIL: user gpei has a mismatch in consumed gears (5) and actual gears (4)! Gear 2c61a7ebb19a4a68a7bd2c8b5454f298 exists on node [node1.rhn.com, uid:1154] but does not exist in mongo database Actual results: Some gears exist on node but does not exist in mongodb.
Sometimes, after I trigger 3 or 5 scale-up events at same time, when I checking the gear number of the scalable app via REST api, the result does not match the real number of gears on nodes. QE would like to make this bug to trace multiple concurrent scale-up issue.
Version: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.2/2013-05-02.1 Verify: Scale up 10 times. 4 fail. 6 success. When failed, it tells that: Application is currently busy performing another operation. Please try again in a minute. Whatever, the data in mongodb is accordance with the actual gear info. [root@broker ~]# [root@broker ~]# rhc apps php @ http://php-jia1.osev2.com/ (uuid: 518309ae4052a73a05000006) ----------------------------------------------------------------- Created: 5:49 PM Gears: 6 (defaults to small) Git URL: ssh://518309ae4052a73a05000006.com/~/git/php.git/ SSH: 518309ae4052a73a05000006.com php-5.3 (PHP 5.3) ----------------- Scaling: x6 (minimum: 1, maximum: available) on small gears haproxy-1.4 (OpenShift Web Balancer) ------------------------------------ Gears: Located with php-5.3 You have 1 applications [root@broker ~]# oo-admin-chk -v Started at: 2013-05-02 17:57:33 -0700 Time to fetch mongo data: 0.01s Total gears found in mongo: 6 Time to get all gears from nodes: 20.277s Total gears found on the nodes: 6 Total nodes that responded : 2 Checking application gears and ssh keys on corresponding nodes: 518309ae4052a73a05000006 : String... OK 51830a0a4052a73a05000028 : String... OK 51830a404052a73a05000035 : String... OK 51830a774052a73a05000042 : String... OK 51830ab14052a73a0500004f : String... OK 51830aef4052a7f30a000002 : String... OK Checking node gears in application database: 51830a0a4052a73a05000028... OK 51830aef4052a7f30a000002... OK 51830a404052a73a05000035... OK 518309ae4052a73a05000006... OK 51830a774052a73a05000042... OK 51830ab14052a73a0500004f... OK Success Total time: 20.287s Finished at: 2013-05-02 17:57:53 -0700
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1031.html