Bug 1455065

Summary: fail using collectd-environment on osp11 updated from osp10
Product: Red Hat OpenStack Reporter: Cyril Lopez <cylopez>
Component: rhosp-directorAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED EOL QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: cylopez, dbecker, mbultel, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-22 12:30:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Cyril Lopez 2017-05-24 08:06:47 UTC
Description of problem:

On a deployment OSP10 upgraded fully in OSP11, unable to use  collectd-environment cause collectd package is not installed and puppet code do not install it

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.0.0-10.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. deploy osp10
2. upgrade to osp11
3. update the stack with collectd-environment

Puppet log in debug mode :
Debug: Executing: '/bin/rpm -q collectd-disk --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n'
Debug: Executing: '/bin/rpm -q collectd-disk --nosignature --nodigest --qf %{NAME} %|EPOCH?{%{EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n --whatprovides'
Warning: [norpm] Attempting to install collectd-disk but it will not be installed
Notice: /Stage[main]/Collectd::Plugin::Disk/Package[collectd-disk]/ensure: created
Debug: /Stage[main]/Collectd::Plugin::Disk/Package[collectd-disk]: The container Class[Collectd::Plugin::Disk] will propagate my refresh event
Debug: /Stage[main]/Collectd::Plugin::Memcached/Collectd::Plugin[memcached]/File[older_memcached.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Memcached/Collectd::Plugin[memcached]/File[old_memcached.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Disk/Collectd::Plugin[disk]/File[older_disk.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Disk/Collectd::Plugin[disk]/File[old_disk.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: Class[Collectd::Plugin::Disk]: The container Stage[main] will propagate my refresh event
Debug: /Stage[main]/Collectd::Plugin::Interface/Collectd::Plugin[interface]/File[older_interface.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Interface/Collectd::Plugin[interface]/File[old_interface.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Load/Collectd::Plugin[load]/File[older_load.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Load/Collectd::Plugin[load]/File[old_load.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Memory/Collectd::Plugin[memory]/File[older_memory.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Memory/Collectd::Plugin[memory]/File[old_memory.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Processes/Collectd::Plugin[processes]/File[older_processes.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Processes/Collectd::Plugin[processes]/File[old_processes.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Processes/Concat[/etc/collectd.d/processes-config.conf]/Concat_file[/etc/collectd.d/processes-config.conf]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Processes/Concat[/etc/collectd.d/processes-config.conf]/File[/etc/collectd.d/processes-config.conf]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Tcpconns/Collectd::Plugin[tcpconns]/File[older_tcpconns.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: /Stage[main]/Collectd::Plugin::Tcpconns/Collectd::Plugin[tcpconns]/File[old_tcpconns.load]: Nothing to manage: no ensure and the resource doesn't exist
Debug: Executing: '/bin/systemctl is-active collectd'
Debug: Executing: '/bin/systemctl is-enabled collectd'
Debug: Executing: '/bin/systemctl unmask collectd'
Debug: Runing journalctl command to get logs for systemd start failure: journalctl -n 50 --since '5 minutes ago' -u collectd --no-pager
Debug: Executing: 'journalctl -n 50 --since '5 minutes ago' -u collectd --no-pager'
Error: Systemd start for collectd failed!
journalctl log for collectd:
-- No entries --

Error: /Stage[main]/Collectd::Service/Service[collectd]/ensure: change from stopped to running failed: Systemd start for collectd failed!
journalctl log for collectd:
-- No entries --

Debug: Class[Collectd::Service]: Resource is being skipped, unscheduling all events

Comment 1 Sofer Athlan-Guyot 2017-06-02 13:05:52 UTC
Hi,

so there is provision to install the package during upgrade[1].

Can you precise what you mean by "3. update the stack with collectd-environment".  Is it, upgrade and then modify "role_dada" with     - OS::TripleO::Services::Collectd and redeploy ?

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/puppet/services/metrics/collectd.yaml#L130..L134

Comment 2 Cyril Lopez 2017-07-04 10:31:14 UTC
Hi Sofer,

My workflow was :
- install osp10
- upgrade to osp11
- update the stack (already updated) with collectd.

I don't touch to role_data. So I guest, we don't know. I will try to double-check.

Comment 3 Sofer Athlan-Guyot 2017-07-17 11:20:05 UTC
Hi Cyril,

(In reply to Cyril Lopez from comment #2)
> Hi Sofer,
> 
> My workflow was :
> - install osp10
> - upgrade to osp11
> - update the stack (already updated) with collectd.

Can you describe exactly what you did there for reference ?

> 
> I don't touch to role_data. So I guest, we don't know. I will try to
> double-check.

So I think that would explain it all.  The upgrade task described in the link above happens during upgrade if you have the  

  - OS::TripleO::Services::Collectdservices::collectd

in the role data associated with the controller.  As it's not there then the installation doesn't happen.

Later at the "::tripleo::profile::base::metrics::collectd" puppet manifest is kick off by the "step_config", but package installation is disable during puppet run.

So we have a nice loop that make the addition of new packages problematic for new roles.

It's not seen during install of osp11 because the base image have the necessary packages.

We could solve this in different ways:
 - documentation: if you need new role you *must* add it during upgrade ... not nice on the user IMHO;
 - activate pkg installation for new packages;
 - something else: we have to have the always in sync with the latest, so maybe we should strive during upgrade to have all the new packages in the base image installed in the current overcloud.

I think the latest is the best one.  Upgrade/update would aslo upgrade the current overcloud to match what is in the latest base image.

This part need further discussion, but with the incoming of docker images that may not be relevant anymore.

I have no clear cut solution, trying to bring the discussion upstream.

Comment 4 Sofer Athlan-Guyot 2017-08-18 11:54:56 UTC
Thread started there http://lists.openstack.org/pipermail/openstack-dev/2017-August/121271.html

Comment 5 Scott Lewis 2018-06-22 12:30:53 UTC
OSP11 is now retired, see details at https://access.redhat.com/errata/product/191/ver=11/rhel---7/x86_64/RHBA-2018:1828