During the V2 rollout we began using a new migration framework that can leave metadata laying around in gears if there are bugs in its clean feature. This bug is to add detection of this metadata to oo-accept-node, so that operations can easily find affected gears when this happens.
PRs made to stage and master for the fix.
Disregard above comment.
Check for migration complete markers and premigration state. Also check for V1 cartridge directories. Should be disabled by default in oo-accept-node; enable with a CLI option.
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/a37873bc4191673b4cb4bca644a12d38c9c41e3f Fix bug 971955: migration metadata check in oo-accept-node Create oo-admin-repair-node utility
Tested on devenv_3360, Create app, login to the app via SSH 1. Touch .premigration_state under app-root/runtime/ 2. Mkdir .migration_test under app-root/data/ 3. Run oo-accept-node --include-upgrade-checks from server side. [root@ip-10-203-50-14 app-root]# oo-accept-node --include-upgrade-checks FAIL: directory 51bab44c1b9b5f097c000001 contains migration data FAIL: directory 51bab44c1b9b5f097c000001 contains pre-migration state 2 ERRORS 4. Run oo-admin-repair-node [root@ip-10-203-50-14 app-root]# oo-admin-repair-node clean-upgrade Removing /var/lib/openshift/51bab44c1b9b5f097c000001/app-root/data/.migration Removing /var/lib/openshift/51bab44c1b9b5f097c000001/app-root/data/.migration_test1 Removing /var/lib/openshift/51bab44c1b9b5f097c000001/app-root/runtime/.premigration_state 5. Run oo-accept-node again [root@ip-10-203-50-14 app-root]# oo-accept-node --include-upgrade-checks PASS
this issue has been reproduced on devenv_3382 Steps to Reproduce: 1.create one app 2.ssh into app 1)create one file named '.premigration_state' on 'app-root/runtime/' 2)create some file with '.migration' beginning on 'app-root/data/', like ' .migration122', '.migration-343.test'. 3)create a directory named 'ruby-1.8' on gear home directory. 3. run 'oo-accept-node --include-upgrade-checks' Actual results: [root@ip-10-114-221-243 ~]# oo-accept-node --include-upgrade-checks PASS Expected results: should show the error like: FAIL: directory 57d01c80d6fd11e2ac2f12313d23a556 contains migration data FAIL: directory 57d01c80d6fd11e2ac2f12313d23a556 contains pre-migration state FAIL: directory 57d01c80d6fd11e2ac2f12313d23a556 contains a V1 cartridge directory"
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/9d79bb4fe510f49847f2cd5be1f09e89eee57027 Fix bug 971955: load users correctly from /etc/passwd
The problem was that another change to avoid failing for gears that were being modified while the accept-node script was running was not loading the user list correctly.
Tested this issue on Devenv_3396, it has been fixed [root@ip-10-145-181-217 data]# oo-accept-node --include-upgrade-checks FAIL: directory 51c3b601b2ef9b0a12000001 contains a V1 cartridge directory FAIL: directory 51c3b601b2ef9b0a12000001 contains migration data FAIL: directory 51c3b601b2ef9b0a12000001 contains pre-migration state 3 ERRORS