Description of problem: Packstack freezes with this output: ... Copying Puppet modules and manifests... [ DONE ] Applying Puppet manifests... Applying 172.16.0.16_prescript.pp Testing if puppet apply is finished : 172.16.0.16_prescript.pp [ | ] Waiting for even an hour doesn't help. This seems to happen on 2 of our OpenStack systems, where we run packstack in a VM (installing OpenStack inside OpenStack). Somehow, it doesn't happen on a third one... We thought it may be slow I/O that causes this, but later it got reproduced on some bare-metal multi-node system (ask oblaut). Version-Release number of selected component (if applicable): Reproduced with RHOS3 on RHEL-6.4 in a VM, RDO Havana on both RHEL-6.4 and 6.5 in a VM. Weirdly enough, this never happened on one of our images that had some packages pre-installed for RHOS3. Reproduced on both kernel 2.6.32-420.el6.x86_64 and 2.6.32-358.118.1.openstack.el6.x86_64. How reproducible: Once it starts happening on some system, it seems to happen always. Steps to Reproduce: 1. packstack --allinone, but only on some systems Additional info: The problem seems to be in the file /usr/lib/python2.6/site-packages/packstack/puppet/modules/packstack/lib/facter/netns.py 117 # Check that iptables filtering and save/restore can be performed 118 saved_iptables_state = e('%(ns1)s iptables-save').stdout.read() 119 e('%(ns1)s iptables -A INPUT -p icmp --icmp-type 8 -j DROP') 120 e('%(ns2)s ping -c 1 -w 1 %(address1_1)s', return_code=1) 121 e('%(ns1)s iptables-restore', input=saved_iptables_state) 122 e('%(ns2)s ping -c 1 -w 1 %(address1_1)s') It gets stuck on line #121, where iptables-restore somehow doesn't get any input. Yet when I tried to add a check whether saved_iptables_state is empty into that file before running packstack, it didn't get stuck...for the first time on that system. I'm guessing it could be some timing issue. # ps -ef|tail root 4595 2 0 16:25 ? 00:00:00 [flush-7:0] root 4777 3759 0 16:26 ? 00:00:00 sshd: root@pts/1 root 4781 4777 0 16:26 pts/1 00:00:00 -bash root 5011 1 0 16:27 ? 00:00:00 bash -x root 5013 5011 0 16:27 ? 00:00:00 flock /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/ps.lock puppet apply --modulepath /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/modules /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/manifests/172.16.0.15_prescript.pp root 5015 5013 0 16:27 ? 00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/modules /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/manifests/172.16.0.15_prescript.pp root 5175 5015 0 16:27 ? 00:00:00 python /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/modules/packstack/lib/facter/netns.py root 5209 5175 0 16:27 ? 00:00:00 iptables-restore root 5340 4781 2 16:28 pts/1 00:00:00 ps -ef root 5341 4781 0 16:28 pts/1 00:00:00 tail
This problem can be worked around by terminating the installation when it's stuck and running it again.
Attached output from my setup http://pastebin.test.redhat.com/168849 My setup includes 4 hosts - 1 controller - 1 network controller - 2 hosts for compute the hosts are installed with RHEL 6.5 and rhos 4.0 puddle 10.3 output of 3 hosts out of 4 [root@puma04 ~]# ps -ef|tail root 26170 1 0 16:38 ? 00:00:01 /usr/bin/ruby /usr/bin/puppet agent root 26469 5523 0 16:50 ? 00:00:00 sshd: root@pts/1 root 26472 26469 0 16:51 pts/1 00:00:00 -bash root 26489 26472 0 16:53 pts/1 00:00:09 /usr/bin/python /usr/bin/packstack --answer-file=RHOS-Controller-OVS-Neutron root 26645 2 0 16:54 ? 00:00:00 [loop0] root 26765 1 0 16:57 ? 00:00:00 bash -x root 26766 26765 0 16:57 ? 00:00:00 flock /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/ps.lock puppet apply --modulepath /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/modules /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/manifests/10.35.160.17_prescript.pp root 26767 26766 0 16:57 ? 00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/modules /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/manifests/10.35.160.17_prescript.pp root 26850 26767 0 16:57 ? 00:00:00 python /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/modules/packstack/lib/facter/netns.py root 26895 26850 0 16:57 ? 00:00:00 iptables-restore [root@puma34 ~]# ps -ef|tail root 5690 1 0 15:00 ttyS0 00:00:00 /sbin/agetty /dev/ttyS0 115200 vt100-nav root 5697 581 0 15:00 ? 00:00:00 /sbin/udevd -d root 5698 581 0 15:00 ? 00:00:00 /sbin/udevd -d ntp 6019 1 0 15:01 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g root 26052 1 0 16:38 ? 00:00:01 /usr/bin/ruby /usr/bin/puppet agent root 26432 1 0 16:57 ? 00:00:00 bash -x root 26433 26432 0 16:57 ? 00:00:00 flock /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/ps.lock puppet apply --modulepath /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/modules /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/manifests/10.35.160.77_prescript.pp root 26434 26433 0 16:57 ? 00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/modules /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/manifests/10.35.160.77_prescript.pp root 26590 26434 0 16:57 ? 00:00:00 python /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/modules/packstack/lib/facter/netns.py root 26626 26590 0 16:57 ? 00:00:00 iptables-restore [root@puma40 ~]# ps -ef|tail root 5695 1 0 15:00 tty6 00:00:00 /sbin/mingetty /dev/tty6 root 5702 586 0 15:00 ? 00:00:00 /sbin/udevd -d root 5703 586 0 15:00 ? 00:00:00 /sbin/udevd -d ntp 6021 1 0 15:00 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g root 26053 1 0 16:38 ? 00:00:01 /usr/bin/ruby /usr/bin/puppet agent root 26433 1 0 16:57 ? 00:00:00 bash -x root 26434 26433 0 16:57 ? 00:00:00 flock /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/ps.lock puppet apply --modulepath /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/modules /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/manifests/10.35.160.89_prescript.pp root 26435 26434 0 16:57 ? 00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/modules /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/manifests/10.35.160.89_prescript.pp root 26648 26435 0 16:57 ? 00:00:00 python /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/modules/packstack/lib/facter/netns.py root 26684 26648 0 16:57 ? 00:00:00 iptables-restore
(In reply to Ofer Blaut from comment #3) > Attached output from my setup http://pastebin.test.redhat.com/168849 Reminder: please don't put internal URLs in public comments.
Ofer, is it a regression? test blocker? any known workarounds? if so, please set proper flags.
Hi Haim I didn't experience the problem before . according to Martina it happens on RHOS3 on RHEL-6.4, so it is not regression from her POV Still it blocks me don't know how Martina solved it on her setups ( also Dafna )
Submitted patch should fix this issue. I could not reproduce it, so patch is based just on my thoughts. It would be good if you could apply that patch on your packstack installation (you will find the file in /usr/lib/python2.[67]/site-packages/packstack/puppet/modules/packstack/lib/facter/netns.py) and comment either here or in gerrit, if the problem still persists or not.
The problem is that execute uses process.communicate() when there is input. communicate() does read stdout and sterr, leaving them empty, depending on timing. 15 def execute(cmd_string, check_error=True, return_code=0, input=None, 16 block=True, error_msg='Error executing cmd'): 17 print cmd_string 18 cmd = cmd_string.split(' ') 19 proc = subprocess.Popen(cmd, 20 stdin=subprocess.PIPE, 21 stdout=subprocess.PIPE, 22 stderr=subprocess.PIPE) 23 if input: 24 proc.communicate(input=input) ^^^^^^^^ 25 elif block: 26 proc.wait() this means that the saved_iptables_state can be non-deterministically empty: # Check that iptables filtering and save/restore can be performed saved_iptables_state = e('%(ns1)s iptables-save').stdout.read() ^^^^^^^^^^^^^ e('%(ns1)s iptables -A INPUT -p icmp --icmp-type 8 -j DROP') e('%(ns2)s ping -c 1 -w 1 %(address1_1)s', return_code=1) which will cause following to block forever: e('%(ns1)s iptables-restore', input=saved_iptables_state) e('%(ns2)s ping -c 1 -w 1 %(address1_1)s')
Jardo, the cause is known already, but thanks for repeating it. I was asking Martina to test my fix, because I wasn't able to reproduce it.
The patch seems to be working. So far I've only tried it twice (with RHOS3 and RHOS4), I'll need a few more until I'd consider it verified, due to it's non-deterministic nature.
Adding to beta milestone for MODIFIED/ON-QA bugs
Bug solved in latest build please move the bug to ON_QA
Auto qa_ack+ for VERIFIED bugs so they can be added to RC erratum
I already wrote the bug text but I agree with Alan that it shouldn't be required.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1859.html