Bug 971062 - rhc migration dies if something happens to /tmp on the node.
rhc migration dies if something happens to /tmp on the node.
Status: CLOSED CURRENTRELEASE
Product: OpenShift Online
Classification: Red Hat
Component: Containers (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Paul Morie
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-05 11:12 EDT by Wesley Hearn
Modified: 2015-05-14 19:20 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-24 10:51:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
migration log (16.39 KB, text/plain)
2013-06-14 07:52 EDT, Xiaoli Tian
no flags Details

  None (edit)
Description Wesley Hearn 2013-06-05 11:12:19 EDT
Description of problem:
Migrating app 'lattedonatte' gear 'lattedonatte' with uuid 'dae104ef647746479f4d8128af444573' on node 'ex-std-node19.prod.rhcloud.com' for user: mattdherrick@gmail.com
Failed to migrate with cmd: '/usr/bin/rhc-admin-migrate --login 'mattdherrick@gmail.com' --migrate-gear 'dae104ef647746479f4d8128af444573' --app-name 'lattedonatte' --version '2.0.28'' after 2 tries with exception: Failed migrating gear. Rerun with: /usr/bin/rhc-admin-migrate --login 'mattdherrick@gmail.com' --migrate-gear 'dae104ef647746479f4d8128af444573' --app-name 'lattedonatte' --version '2.0.28'
["/usr/bin/rhc-admin-migrate:82:in `block in migrate_gear'", "/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'", "/usr/bin/rhc-admin-migrate:51:in `migrate_gear'", "/usr/bin/rhc-admin-migrate:569:in `block in migrate_from_file'", "/usr/bin/rhc-admin-migrate:567:in `each'", "/usr/bin/rhc-admin-migrate:567:in `migrate_from_file'", "/usr/bin/rhc-admin-migrate:710:in `<main>'"]
Output:
Migrating gear on node with: /usr/bin/rhc-admin-migrate --login 'mattdherrick@gmail.com' --migrate-gear 'dae104ef647746479f4d8128af444573' --app-name 'lattedonatte' --version '2.0.28'
Migrating on node...
***time_migrate_on_node_measured_from_broker=666***
Migrate on node output:
 Beginning V1 -> V2 migration
Inspecting gear at /var/lib/openshift/dae104ef647746479f4d8128af444573
Checking for V1 stop lock in ["php-5.3"]
V1 stop lock not detected
Marking step migrate_stop_lock complete
Stopping gear with uuid 'dae104ef647746479f4d8128af444573' on node 'ex-std-node19'
Carts to migrate: {:framework_carts=>["php-5.3"], :db_carts=>["mysql-5.1"], :plugin_carts=>["phpmyadmin-3.4"]}
Migrating cartridge php
Caught an exception during internal migration steps: No such file or directory - getcwd
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.12/lib/openshift-origin-node/model/v2_cart_model.rb:466:in `chdir'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.12/lib/openshift-origin-node/model/v2_cart_model.rb:466:in `secure_cartridge'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.12/lib/openshift-origin-node/model/v2_cart_model.rb:459:in `create_cartridge_directory'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/lib/migrate.rb:573:in `block in migrate_cartridge'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.12/lib/openshift-origin-node/utils/cgroups.rb:70:in `with_no_cpu_limits'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/lib/migrate.rb:571:in `migrate_cartridge'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/lib/migrate.rb:521:in `block in migrate_cartridges'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/lib/migrate.rb:516:in `each'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/lib/migrate.rb:516:in `migrate_cartridges'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/lib/migrate.rb:182:in `migrate'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/libra.rb:68:in `migrate_action'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'
Migrate on node exit code: 1

Restarting mcollective on the node fixes it and it will continue. We are unsure what is happening to /tmp and this has happened across several nodes
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Paul Morie 2013-06-05 23:00:51 EDT
PRs submitted.  Explanation of the fix will be forthcoming.
Comment 2 Paul Morie 2013-06-11 18:01:08 EDT
This bug got away from me - I believe that QE has already tested this even thought it has been in MODIFIED.  You should see log statements about changing the working directory.  You should be able to see the current working directory of the process during the migration, as well.
Comment 3 Xiaoli Tian 2013-06-14 07:51:04 EDT
Hi, Paul

Had tried migration from v1 to v2 on devenv-stage_370, did  not find any log about "Changing working directory to /tmp" in migration log ( attached in next comment), broker log, node log etc.

Is this pull request the only fix for this bug:

https://github.com/openshift/origin-server/pull/2764 ?

Seems there's only function defined, where will the function be called?

Or is there any special steps needed to reproduce this bug?

Thanks for confirmation.
Comment 4 Xiaoli Tian 2013-06-14 07:52:10 EDT
(In reply to xiaoli from comment #3)
> Hi, Paul
> 
> Had tried migration from v1 to v2 on devenv-stage_370, did  not find any log

I meant to say "devenv-stage_375"
Comment 5 Xiaoli Tian 2013-06-14 07:52:47 EDT
Created attachment 761227 [details]
migration log

Migration log
Comment 6 Paul Morie 2013-06-14 09:54:01 EDT
xioli-

The before_processing_hook is an mcollective hook which is called by mcollective before the agent begins processing a request.  The log message is present in the mcollective log, rather than the migration log.
Comment 7 Xiaoli Tian 2013-06-17 03:10:49 EDT
Thanks for the information, Paul.

Verified it on devenv-stage_376, while running migration:

[root@ip-10-12-193-245 log]# cat mcollective.log |grep "Changing working directory"
D, [2013-06-15T13:45:54.480902 #9874] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:48:43.838891 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:57:31.431934 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:57:39.963948 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:57:49.149920 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:57:50.236388 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:57:51.268293 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:57:53.810899 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp
D, [2013-06-17T02:58:04.509108 #4413] DEBUG -- : openshift.rb:35:in `before_processing_hook' Changing working directory to /tmp

Note You need to log in before you can comment on or make changes to this bug.