Bug 974268
| Summary: | oo-accept-node fails when gears are created / deleted while it's running | ||
|---|---|---|---|
| Product: | OpenShift Online | Reporter: | Thomas Wiest <twiest> |
| Component: | Containers | Assignee: | Rob Millner <rmillner> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 2.x | CC: | bmeng, mfisher, xtian, yadu |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-06-24 14:54:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Thomas Wiest
2013-06-13 19:39:44 UTC
Tested on my C9 node that was creating 4000 gears, 5 at a time. https://github.com/openshift/origin-server/pull/2858 Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/f2e95067fba4b8c55120043f1318d5d9250769c3 Bug 974268 - Squash error messages for gears which have been created or destroyed while the accept-node script is run. Tested on devenv_3368 with following method [root@ip-10-60-129-152 ~]# for i in `seq 1 100` ;do oo-app-create --with-app-uuid 123123$i --with-container-uuid 123123$i --with-namespace dom1 --with-app-name app$i & done During the oo-app-create running. Use oo-accept-node to check the transient issues. [root@ip-10-60-129-152 ~]# oo-accept-node FAIL: user 12312351 does not have quotas imposed FAIL: user 12312381 does not have quotas imposed 2 ERRORS [root@ip-10-60-129-152 ~]# oo-accept-node PASS It will report the gear issue in the 1st time run, and PASS in the following try. Assign the bug back. Narrowed down the set of places where the user list and quotas can get out of sync. Also now using the lock file from unix_user.rb as another way to determine if a gear create/delete ran. Used the above script, and its mirror image with oo-app-destroy, in a loop 10 times. The oo-accept-node script running in a loop no longer fails with the following pull request. https://github.com/openshift/origin-server/pull/2867 Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/3b2d0950c41dc82436a89224f57b16773e042e80 Bug 974268 - Narrow the window where user and quota data can get out of sync and set the start time prior to any other collection. Deal with a race condition with the lock files in unix_user. Checked on devenv_3375, oo-accept-node will not report error for both oo-app-create and oo-app-destroy with multiple operations parallel run. Move bug to verified. |