Bug 1016773 - Packstack gets stuck on prescript.pp
Summary: Packstack gets stuck on prescript.pp
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-packstack
Version: 3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: beta
: 4.0
Assignee: Maru Newby
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks: 1029403
TreeView+ depends on / blocked
 
Reported: 2013-10-08 16:29 UTC by Martina Kollarova
Modified: 2014-10-07 00:16 UTC (History)
14 users (show)

Fixed In Version: openstack-packstack-2013.2.1-0.7.dev806.el6ost
Doc Type: Bug Fix
Doc Text:
Cause: The netns compatibility check attempts to read data from stdin before it is available. Consequence: Packstack gets stuck trying to apply prescript.pp Fix: Update the netns compatibility check to safely read data from stdin. Result: Packstack no longer hangs when applying prescript.pp
Clone Of:
: 1029403 (view as bug list)
Environment:
Last Closed: 2013-12-20 00:26:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 50615 0 None None None Never
Red Hat Product Errata RHEA-2013:1859 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2013-12-21 00:01:48 UTC

Internal Links: 1096516

Description Martina Kollarova 2013-10-08 16:29:51 UTC
Description of problem:

Packstack freezes with this output:

...
Copying Puppet modules and manifests...                [ DONE ]
Applying Puppet manifests...
Applying 172.16.0.16_prescript.pp
Testing if puppet apply is finished : 172.16.0.16_prescript.pp [ | ]

Waiting for even an hour doesn't help.

This seems to happen on 2 of our OpenStack systems, where we run packstack in a VM (installing OpenStack inside OpenStack). Somehow, it doesn't happen on a third one... We thought it may be slow I/O that causes this, but later it got reproduced on some bare-metal multi-node system (ask oblaut).

Version-Release number of selected component (if applicable):
Reproduced with RHOS3 on RHEL-6.4 in a VM, RDO Havana on both RHEL-6.4 and 6.5 in a VM. Weirdly enough, this never happened on one of our images that had some packages pre-installed for RHOS3.

Reproduced on both kernel 2.6.32-420.el6.x86_64 and 2.6.32-358.118.1.openstack.el6.x86_64.

How reproducible:
Once it starts happening on some system, it seems to happen always.

Steps to Reproduce:
1. packstack --allinone, but only on some systems


Additional info:

The problem seems to be in the file
/usr/lib/python2.6/site-packages/packstack/puppet/modules/packstack/lib/facter/netns.py


117 # Check that iptables filtering and save/restore can be performed
118 saved_iptables_state = e('%(ns1)s iptables-save').stdout.read()
119 e('%(ns1)s iptables -A INPUT -p icmp --icmp-type 8 -j DROP')
120 e('%(ns2)s ping -c 1 -w 1 %(address1_1)s', return_code=1)
121 e('%(ns1)s iptables-restore', input=saved_iptables_state)
122 e('%(ns2)s ping -c 1 -w 1 %(address1_1)s')

It gets stuck on line #121, where iptables-restore somehow doesn't get any input.

Yet when I tried to add a check whether saved_iptables_state is empty into that file before running packstack, it didn't get stuck...for the first time on that system. I'm guessing it could be some timing issue.


# ps -ef|tail
root      4595     2  0 16:25 ?        00:00:00 [flush-7:0]
root      4777  3759  0 16:26 ?        00:00:00 sshd: root@pts/1 
root      4781  4777  0 16:26 pts/1    00:00:00 -bash
root      5011     1  0 16:27 ?        00:00:00 bash -x
root      5013  5011  0 16:27 ?        00:00:00 flock /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/ps.lock puppet apply --modulepath /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/modules /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/manifests/172.16.0.15_prescript.pp
root      5015  5013  0 16:27 ?        00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/modules /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/manifests/172.16.0.15_prescript.pp
root      5175  5015  0 16:27 ?        00:00:00 python /var/tmp/packstack/b362fac0a09f4eb689fdc3ad0410747c/modules/packstack/lib/facter/netns.py
root      5209  5175  0 16:27 ?        00:00:00 iptables-restore
root      5340  4781  2 16:28 pts/1    00:00:00 ps -ef
root      5341  4781  0 16:28 pts/1    00:00:00 tail

Comment 1 Martina Kollarova 2013-10-08 16:34:16 UTC
This problem can be worked around by terminating the installation when it's stuck and running it again.

Comment 3 Ofer Blaut 2013-10-08 17:36:51 UTC
Attached output from my setup http://pastebin.test.redhat.com/168849

My setup includes 4 hosts
- 1 controller 
- 1 network controller
- 2 hosts for compute 

the hosts are installed with RHEL 6.5 and rhos 4.0 puddle 10.3


output of 3 hosts out of 4 


[root@puma04 ~]# ps -ef|tail
root     26170     1  0 16:38 ?        00:00:01 /usr/bin/ruby /usr/bin/puppet agent
root     26469  5523  0 16:50 ?        00:00:00 sshd: root@pts/1 
root     26472 26469  0 16:51 pts/1    00:00:00 -bash
root     26489 26472  0 16:53 pts/1    00:00:09 /usr/bin/python /usr/bin/packstack --answer-file=RHOS-Controller-OVS-Neutron
root     26645     2  0 16:54 ?        00:00:00 [loop0]
root     26765     1  0 16:57 ?        00:00:00 bash -x
root     26766 26765  0 16:57 ?        00:00:00 flock /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/ps.lock puppet apply --modulepath /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/modules /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/manifests/10.35.160.17_prescript.pp
root     26767 26766  0 16:57 ?        00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/modules /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/manifests/10.35.160.17_prescript.pp
root     26850 26767  0 16:57 ?        00:00:00 python /var/tmp/packstack/d082c52135bd4fdfad7b95e9e6b8fe08/modules/packstack/lib/facter/netns.py
root     26895 26850  0 16:57 ?        00:00:00 iptables-restore



[root@puma34 ~]# ps -ef|tail
root      5690     1  0 15:00 ttyS0    00:00:00 /sbin/agetty /dev/ttyS0 115200 vt100-nav
root      5697   581  0 15:00 ?        00:00:00 /sbin/udevd -d
root      5698   581  0 15:00 ?        00:00:00 /sbin/udevd -d
ntp       6019     1  0 15:01 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root     26052     1  0 16:38 ?        00:00:01 /usr/bin/ruby /usr/bin/puppet agent
root     26432     1  0 16:57 ?        00:00:00 bash -x
root     26433 26432  0 16:57 ?        00:00:00 flock /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/ps.lock puppet apply --modulepath /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/modules /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/manifests/10.35.160.77_prescript.pp
root     26434 26433  0 16:57 ?        00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/modules /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/manifests/10.35.160.77_prescript.pp
root     26590 26434  0 16:57 ?        00:00:00 python /var/tmp/packstack/7f25274dbeff4d2a907de85fda288ab6/modules/packstack/lib/facter/netns.py
root     26626 26590  0 16:57 ?        00:00:00 iptables-restore



[root@puma40 ~]# ps -ef|tail
root      5695     1  0 15:00 tty6     00:00:00 /sbin/mingetty /dev/tty6
root      5702   586  0 15:00 ?        00:00:00 /sbin/udevd -d
root      5703   586  0 15:00 ?        00:00:00 /sbin/udevd -d
ntp       6021     1  0 15:00 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root     26053     1  0 16:38 ?        00:00:01 /usr/bin/ruby /usr/bin/puppet agent
root     26433     1  0 16:57 ?        00:00:00 bash -x
root     26434 26433  0 16:57 ?        00:00:00 flock /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/ps.lock puppet apply --modulepath /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/modules /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/manifests/10.35.160.89_prescript.pp
root     26435 26434  0 16:57 ?        00:00:00 /usr/bin/ruby /usr/bin/puppet apply --modulepath /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/modules /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/manifests/10.35.160.89_prescript.pp
root     26648 26435  0 16:57 ?        00:00:00 python /var/tmp/packstack/35cc4280df0448cfad4cefd33f14b383/modules/packstack/lib/facter/netns.py
root     26684 26648  0 16:57 ?        00:00:00 iptables-restore

Comment 4 Alan Pevec 2013-10-08 17:57:30 UTC
(In reply to Ofer Blaut from comment #3)
> Attached output from my setup http://pastebin.test.redhat.com/168849

Reminder: please don't put internal URLs in public comments.

Comment 5 Haim 2013-10-09 06:08:38 UTC
Ofer, is it a regression? test blocker? any known workarounds? if so, please set proper flags.

Comment 6 Ofer Blaut 2013-10-09 08:32:21 UTC
Hi Haim

I didn't experience the problem before .
according to Martina it happens on RHOS3 on RHEL-6.4, so it is not regression from her POV

Still it blocks me don't know how Martina solved it on her setups ( also Dafna )

Comment 7 Martin Magr 2013-10-09 12:59:07 UTC
Submitted patch should fix this issue. I could not reproduce it, so patch is based just on my thoughts. It would be good if you could apply that patch on your packstack installation (you will find the file in /usr/lib/python2.[67]/site-packages/packstack/puppet/modules/packstack/lib/facter/netns.py) and comment either here or in gerrit, if the problem still persists or not.

Comment 8 Jaroslav Henner 2013-10-09 15:09:29 UTC
The problem is that execute uses process.communicate() when there is input. communicate() does read stdout and sterr, leaving them empty, depending on timing.

 15 def execute(cmd_string, check_error=True, return_code=0, input=None,
 16             block=True, error_msg='Error executing cmd'):
 17     print cmd_string
 18     cmd = cmd_string.split(' ')
 19     proc = subprocess.Popen(cmd,
 20                             stdin=subprocess.PIPE,
 21                             stdout=subprocess.PIPE,
 22                             stderr=subprocess.PIPE)
 23     if input: 
 24         proc.communicate(input=input)
        ^^^^^^^^
 25     elif block:
 26         proc.wait()

this means that the saved_iptables_state can be non-deterministically empty:

            # Check that iptables filtering and save/restore can be performed
            saved_iptables_state = e('%(ns1)s iptables-save').stdout.read()
                                                              ^^^^^^^^^^^^^
            e('%(ns1)s iptables -A INPUT -p icmp --icmp-type 8 -j DROP')
            e('%(ns2)s ping -c 1 -w 1 %(address1_1)s', return_code=1)

which will cause following to block forever:
            e('%(ns1)s iptables-restore', input=saved_iptables_state)
            e('%(ns2)s ping -c 1 -w 1 %(address1_1)s')

Comment 9 Martin Magr 2013-10-10 09:41:48 UTC
Jardo, the cause is known already, but thanks for repeating it. I was asking Martina to test my fix, because I wasn't able to reproduce it.

Comment 10 Martina Kollarova 2013-10-10 11:37:46 UTC
The patch seems to be working. So far I've only tried it twice (with RHOS3 and RHOS4), I'll need a few more until I'd consider it verified, due to it's non-deterministic nature.

Comment 12 Scott Lewis 2013-11-18 20:35:36 UTC
Adding to beta milestone for MODIFIED/ON-QA bugs

Comment 13 Ofer Blaut 2013-11-20 09:40:27 UTC
Bug solved in latest build

please move the bug to ON_QA

Comment 15 Scott Lewis 2013-12-16 18:56:47 UTC
Auto qa_ack+ for VERIFIED bugs so they can be added to RC erratum

Comment 18 Maru Newby 2013-12-17 12:53:42 UTC
I already wrote the bug text but I agree with Alan that it shouldn't be required.

Comment 20 errata-xmlrpc 2013-12-20 00:26:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html


Note You need to log in before you can comment on or make changes to this bug.