Description: Hi folks. When using the rhel-osp-installer on RHEL-OSP6 (RHEL7.2) to build an OpenStack deployment, it seems that the Quickstack Puppet modules for nova are causing deployment failures whenever creating the cloned Pacemaker resources via "pcs resource create". This used to work when we were doing the FlexPod OpenStack CVD validation earlier this summer. Getting the following errors on the Controllers: Notice: /Stage[main]/Quickstack::Pacemaker::Nova/Quickstack::Pacemaker::Resource::Generic[openstack-nova-novncproxy]/Exec[create openstack-nova-novncproxy resource]/returns: Error: When using 'op' you must specify an operation name and at least one option Error: /usr/sbin/pcs resource create openstack-nova-novncproxy systemd:openstack-nova-novncproxy clone interleave=true op monitor start-delay=10s returned 1 instead of one of [0] Error: /Stage[main]/Quickstack::Pacemaker::Nova/Quickstack::Pacemaker::Resource::Generic[openstack-nova-novncproxy]/Exec[create openstack-nova-novncproxy resource]/returns: change from notrun to 0 failed: /usr/sbin/pcs resource create openstack-nova-novncproxy systemd:openstack-nova-novncproxy clone interleave=true op monitor start-delay=10s returned 1 instead of one of [0] I believe the following resources are affected by this: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] openstack-nova-consoleauth-clone [openstack-nova-consoleauth] openstack-nova-conductor-clone [openstack-nova-conductor] openstack-nova-api-clone [openstack-nova-api] openstack-nova-scheduler-clone [openstack-nova-scheduler] If one patches the following file on the Installer (Puppet master): /etc/puppet/environments/production/modules/quickstack/manifests/pacemaker/resource/generic.pp Line 26 to this, removing the "op": #$_operation_opts = "${operation_opts}" The installation progresses further. Whether or not this is the correct action, I cannot say. We need someone to take a look at this. Installer Node package versions: openstack-puppet-modules-2014.2.15-4.el7ost.noarch openstack-foreman-installer-3.0.26-1.el7ost.noarch foreman-discovery-image-7.0-20150227.0.el7ost.noarch foreman-proxy-1.6.0.30-6.el7ost.noarch rubygem-foreman_api-0.1.11-6.el7sat.noarch foreman-postgresql-1.6.0.49-6.el7ost.noarch foreman-installer-1.6.0-0.4.RC1.el7ost.noarch foreman-selinux-1.6.0.14-1.el7sat.noarch foreman-1.6.0.49-6.el7ost.noarch Controller Nodes package versions: openstack-nova-common-2014.2.3-35.el7ost.noarch openstack-nova-api-2014.2.3-35.el7ost.noarch openstack-nova-scheduler-2014.2.3-35.el7ost.noarch openstack-nova-console-2014.2.3-35.el7ost.noarch openstack-nova-novncproxy-2014.2.3-35.el7ost.noarch openstack-nova-cert-2014.2.3-35.el7ost.noarch openstack-nova-conductor-2014.2.3-35.el7ost.noarch External links: Severity (U/H/M/L): H Business Priority: Urgent
The suggested change seems likely harmless enough with the one caveat potentially being neutron. This installer implemented the reference architecture found here [1], and I woudl have some concern that since this neutron timeout setting would get dropped, there could be other issues if it is slow to start [1] https://github.com/beekhof/osp-ha-deploy/blob/Juno-RDO6/pcmk/neutron-server.scenario#L176
For what it's worth, we did get a successful deployment using the rhel-osp-installer after implementing the above change. However, that deployment was contingent on Bugzilla 1290684 being resolved as well. @Jason -- if there is a different change you can recommend that would not affect Neutron, please let me know and I will try it. In our environment at least, Neutron came up fine.
David, Any chance you could submit a patch to the kilo branch for astapor?
I just hit this on my system on Centos7 The latest pacemaker has a new error message "When using 'op' you must specify an operation name and at least one option" which was not reported before. I checked the command that was running: /usr/sbin/pcs resource create httpd systemd:httpd clone interleave=true op monitor start-delay=10s which looked ok, so I then checked how pcs was parsing this. As soon as pcs sees the clone argument, it thinks everything is a clone arg. So previously the above command was not setting up the operation correctly (start-delay was added as a clone param). Changing the command to /usr/sbin/pcs resource create httpd systemd:httpd op monitor start-delay=10s clone interleave=true Works correctly. (or fix pcs argument parsing)
Pull request to fix this bug here: https://github.com/redhat-openstack/astapor/pull/567
David, If we are able to produce an updated rpm would you be able to help verify?
Hi Mike, Sure can, and would be happy to kick off a new build to verify, assuming 1290684 is patched too. That bug and this one prevent deployments from occurring.
Merged
The patch above that removes the 'op' word means that any options passed that should be 'op' actions and parameters are now clone parameters. It will allow the install, but the pacemaker resource setup will be wrong You need to move the 'op' setup before the 'clone' setup (because pcs has some issues parsing its input) The version of astapor I am using also sometimes sends the clone parameter via the resource_params, so I stopped using operation_opts and added 'op xxx xxx=yy' into the resource_params where every I was calling generic.pp
I was able to test this with the assistance of a colleague of mine and can say that builds are successful now with the rhel-osp-installer. Builds no longer fail. We even did a clean install of the database, as I seem to remember that it was required when updating the openstack-foreman-installer package. Not sure about Comment #12 from Mark though, which has merit. Will leave that analysis to others though. Marking bug VERIFIED. Hope that's the right state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0284.html