Description of problem: Sometimes a 'git push' works flawlessly and sometimes it hangs indefinitely. It seems to hang more often when using a scaled application with more than one gear. Here's an example output when trying to push new code to my scaled application using 3 gears: [bbrowning@f19 ~/tmp/tb3cluster]$ git push Counting objects: 9, done. Delta compression using up to 2 threads. Compressing objects: 100% (4/4), done. Writing objects: 100% (5/5), 469 bytes | 0 bytes/s, done. Total 5 (delta 3), reused 0 (delta 0) remote: Stopping jbossas cart remote: Sending SIGTERM to jboss:29004 ... remote: CLIENT_RESULT: HAProxy instance is stopped remote: CLIENT_RESULT: HAProxy instance is started remote: SSH_CMD: ssh 5200fad44382ec95410000d8.62.29 remote: SSH_CMD: ssh 52010024e0b8cd75a4000050.249.97 The command was hung at this point for over 30 minutes before I decided to CTRL+C it. At this point git thinks the commit didn't get pushed to the remote, but in fact it did because I see my configuration change (this was a change to .openshift/config/standalone.xml) reflected in the running gears. All 3 gears have my pushed code and were restarted, despite what git and the hung git push think. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi, Ben, Have you added any action hooks for the application? Thanks, Mrunal
Yes, my application does have a pre_start_jbossas7 hook - https://github.com/openshift-quickstart/torquebox-quickstart/blob/master/.openshift/action_hooks/pre_start_jbossas-7 that calls into https://github.com/openshift-quickstart/torquebox-quickstart/blob/master/.openshift/torquebox.sh Is there some exit code I need to ensure the hooks have for things to progress smoothly?
This may not be the issue, but could help: https://github.com/openshift-quickstart/torquebox-quickstart/blob/master/.openshift/action_hooks/pre_start_jbossas-7#L22 TORQUEBOX_HOME isn't set in this file, so torquebox is getting installed on every pre_start if I am reading the code correctly, which could slow down the deploy cycle.
TORQUEBOX_HOME is set via the sourced https://github.com/openshift-quickstart/torquebox-quickstart/blob/master/.openshift/torquebox.sh The pre_start_jbossas-7 script is executing fine and the new scaled up gear is booting and deploying the application fine. Everything's working great except the 'rhc scale-cartridge' command just hangs instead of exiting when things are scaled up.
The steps to download, unzip torquebox and then the bundle install are adding to the long times here. Did your rhc scale-cartridge commands eventually return?
Sometimes the command returns and sometimes it hangs indefinitely. I've waited more than 30 minutes on several occasions just to see if it would ever return and it has not. The weird thing is as far as I can tell the new code does get pushed to all the gears and everything restarts and comes back up fine. The 'git push' just doesn't return.
I tried this a few times yesterday but wasn't able to reproduce the git push hanging. I will keep on trying to see if we can get anything.
*** Bug 993248 has been marked as a duplicate of this bug. ***
I retried this several times yesterday. It did exceed 240 seconds but the gears did eventually come up. I suggest converting the quickstart into a cartridge using CDK. Also, the code in pre start seems to be re-installing torquebox every time slowing down git pushes/scale ups. As far as the git push hanging, I wasn't able to reproduce that. I am wondering if it has something to do with your ssh/git settings. I am lowering severity on this one since it should not block the release.
I've seen a hang during scaling up with the Immutant cartridge as well (https://github.com/immutant/openshift-immutant-cart) which is very similar to my quickstart but converted to a cartridge. Thus, I don't think converting to a cartridge will help anything. The code does not install torquebox on every git push - it only does it once per gear and never again for that gear.
Also, the 240 seconds mentioned - is that some timeout that could just be increased?