Bug 1288270

Summary: Heat stack-update fails due to missing package pm-utils
Product: Red Hat OpenStack Reporter: Graeme Gillies <ggillies>
Component: rhosp-directorAssignee: Mike Burns <mburns>
Status: CLOSED CURRENTRELEASE QA Contact: Omri Hochman <ohochman>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: achernet, dprince, emacchi, ggillies, hbrock, jcoufal, kambiz, mburns, mcornea, rhel-osp-director-maint
Target Milestone: gaKeywords: TestOnly, Triaged
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-20 11:22:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Graeme Gillies 2015-12-04 00:18:14 UTC
Hi,

I have a deployed overcloud that is running fine, with the exception that ntp was never configured on the machines (even though I passed the parameter to the deploy command to do so, and the heat stack-show had the NtpServer parmater set).

So I tried do a stack-update to reconverge the stack and hopefully run the puppet ntp class.

The stack-update failed, with the compute nodes reporting the following error

$ heat deployment-output-show --format raw 9b72ae64-dc9e-4fea-a163-5d7d19fc0096 deploy_stdout
Notice: Compiled catalog for rhqe-bare-cmpt-4.localdomain in environment production in 1.82 seconds
Notice: /Stage[main]/Ntp::Config/File[/etc/ntp.conf]/content: content changed '{md5}c07b9a377faea45b96b7d3bf8976004b' to '{md5}1093a7410ab2fbc90d67f732da7e974d'
Notice: /Stage[main]/Ntp::Service/Service[ntp]: Triggered 'refresh' from 1 events
Notice: Finished catalog run in 3.62 seconds

$ heat deployment-output-show --format raw 9b72ae64-dc9e-4fea-a163-5d7d19fc0096 deploy_stderr
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::host'; class ::nova::vncproxy has not been evaluated
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_protocol'; class ::nova::vncproxy has not been evaluated
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::port'; class ::nova::vncproxy has not been evaluated
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::vncproxy::vncproxy_path'; class ::nova::vncproxy has not been evaluated
Warning: Scope(Class[Ceilometer::Agent::Compute]): This class is deprecated. Please use ceilometer::agent::polling with compute namespace instead.
Warning: The package type's allow_virtual parameter will be changing its default value from false to true in a future release. If you do not want to allow virtual packages, please explicitly set allow_virtual to false.
   (at /usr/share/ruby/vendor_ruby/puppet/type.rb:816:in `set_default')
Error: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y list pm-utils' returned 1: Error: No matching Packages to list
Wrapped exception:
Execution of '/usr/bin/yum -d 0 -e 0 -y list pm-utils' returned 1: Error: No matching Packages to list
Error: /Stage[main]/Nova::Compute/Package[pm-utils]/ensure: change from absent to latest failed: Could not update: Execution of '/usr/bin/yum -d 0 -e 0 -y list pm-utils' returned 1: Error: No matching Packages to list

Looking at the puppet code for nova compute

https://github.com/redhat-openstack/openstack-puppet-modules/blob/master/nova/manifests/compute.pp#L222-L224

It wants the pm-utils package. This package doesn't seem to actually be in the overcloud image at all (checking using guestfish at least).

So I don't know how this could ever have worked, and the fact that the puppet ntp class looks like it's running for the first time is also surprising as well.

I'm not sure, but it seems some software deployments are perhaps never actually run at first stack deploy at all? That can't be right can it?

Regards,

Graeme

Comment 3 Mike Burns 2015-12-08 14:01:56 UTC
Partial fix for this in the attached patch.  The patch simply adds pm-utils to new images.  

The upgrade part is somewhat more concerning though.  pm-utils is part of RHEL so it should be available on any host with the right repos enabled.  

Graeme,  can you manually run yum list pm-utils on your compute node?  The output of yum repolist would also be useful.

Thanks

Comment 4 Graeme Gillies 2015-12-09 00:05:28 UTC
Hi so I think I may not have clarified enough what this bug is actually about

I think it's a symptom of a much deeper problem, which is I'm not sure how/if puppet is actually working correctly in Director.

When a do a new deployment with director and the images we ship, not hooked up to cdn or any extra repos. It works. However, it really shouldn't work (right?).

It shouldn't work because our images don't contain pm-utils. In fact, if I look at the puppet output from a brand new deployment, I can see that puppet actually says

/Stage[main]/Nova::Compute/Package[pm-utils]/ensure:created

(Or something to that effect). So the initial puppet run at deployment time is saying that the package is indeed installed, or everything is ok.......when it isn't.

If I understand things correctly, as it stands right now, director should never be able to deploy at all, as the compute nodes would break on the missing package.

So I think the larger question here is not how do we fix this, but why wasn't it broken in the first place?

Comment 5 Mike Burns 2015-12-09 13:17:07 UTC
Whether this is a blocker or not depends on whether you had CDN or other repos configured, IMO.  We need to know if it fails with the repos enabled.

There is a bug that the images are missing the package.  The reason this doesn't fail is because the initial deployment basically disables all yum installations on the assumption that all packages are pre-installed.  It overrides the ensure:created.

Comment 6 Graeme Gillies 2015-12-10 01:20:43 UTC
I did not have the machines hooked up to cdn or any repos

Comment 9 Hugh Brock 2016-01-14 22:40:03 UTC
I don't understand what the corrective action is for this bug, I guess. If it's a bug about a missing package in the image, then we should fix it.

If it is a bug about disabling package installation on an image-based install, then I think that is expected behavior -- if that's a bad idea, then maybe we should change it, but we can't do that just on the basis of this bug.

Graeme if you could clarify, I will either assign this bug blocker so it gets fixed, or on the other hand flag it RFE.

Comment 10 Graeme Gillies 2016-01-15 00:22:19 UTC
Ok so I've had a bit more of a think about this and I think I can clearly explain what I would like to happen.

I think this bug should be kept open to track getting the pm-utils package installed onto the overcloud image. It's entirely reasonable to expect that doing a deployment, then immediately doing a stack update (without any yum repos), should work.

There needs to be another bugzilla opened to track getting rid of whatever hack we do for puppet that breaks it's ability to install packages on initial deploy (the override ensure:created).

1) I would think you would want deploys (and thus CI) to break when you are missing a package from the image, so you can fix it straight away

2) From an operators perspective, it is incredibly confusing because one of the first things you learn with puppet is how to use it to install a package. When you see director/puppet tell you that puppet has installed a package (or thinks it's installed), and it's *flat out lying to you*, it makes it frustrating and confusing. Especially because when you run the deploy a second time it changes and all of a sudden it now is trying to install packages.

Comment 11 Hugh Brock 2016-02-05 12:09:29 UTC
"There needs to be another bugzilla opened to track getting rid of whatever hack we do for puppet that breaks it's ability to install packages on initial deploy (the override ensure:created)."

I believe we did in fact fix this on OSP 7.3 when it bit us on another package. Adding needinfo on @emacchi to be sure. If we did, we need to make sure that fix gets merged on trunk as well and backported to 8.

The missing pm-utils bug should be fixed as well.

Comment 12 Mike Burns 2016-02-05 12:16:31 UTC
The specific issue for this bug was to include pm-utils in the image build.  The patch for that was abandoned because puppet-nova no longer requires pm-utils.  It has been removed from the nova puppet module.

Testing:  Updates work without repos enabled, code review that puppet-nova does not pull in pm-utils.

Comment 16 Omri Hochman 2016-04-15 21:06:05 UTC
Verified - pm-utils was not installed on any of the overcloud nodes. 
upgrade overcloud from 7.3 to 8.0 worked as exacted.

Environment:
-------------
instack-undercloud-2.2.7-4.el7ost.noarch
instack-0.0.8-2.el7ost.noarch
openstack-heat-api-5.0.1-5.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-7.el7ost.noarch
python-heatclient-1.0.0-1.el7ost.noarch
openstack-heat-engine-5.0.1-5.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-api-cloudwatch-5.0.1-5.el7ost.noarch
openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch