Bug 1567232 - Upgrading overcloud Mitaka to Newton fails with os-collect-config timeout option
Summary: Upgrading overcloud Mitaka to Newton fails with os-collect-config timeout option
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-collect-config
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Sofer Athlan-Guyot
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-13 15:37 UTC by Eric Beaudoin
Modified: 2018-06-11 14:16 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-11 14:16:30 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eric Beaudoin 2018-04-13 15:37:56 UTC
Description of problem:
Upgrading overcloud Mitaka to Newton fails with os-collect-config timeout option


Version-Release number of selected component (if applicable):
OSP9->OSP10


Actual results:
This was the error in the journal from os-collect-config

Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: [--lockfile LOCKFILE]
Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: os-refresh-config: error: unrecognized arguments: --timeout 14400
Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:28.213 3392 ERROR os-collect-config [-] Command failed, will not cache new data. Command 'os-refresh-config
Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:28.213 3392 WARNING os-collect-config [-] Sleeping 1.00 seconds before re-exec.
Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:29.719 3392 WARNING os_collect_config.local [-] /var/lib/os-collect-config/local-data not found. Skipping
Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:29.719 3392 WARNING os_collect_config.local [-] No local metadata found (['/var/lib/os-collect-config/local
Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: usage: os-refresh-config [-h] [--print-base] [--print-phases]
Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: [--log-level {ERROR,WARN,CRITICAL,INFO,DEBUG}]
Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: [--lockfile LOCKFILE]

Expected results:
No error

Additional info:
Possible workaround:
updated the package manually:

[stack@cld1-director-0 desjardins-monitoring]$ ansible -i inventories -m copy -sa "src=/home/stack/os-refresh-config-0.1.11-5.el7ost.noarch.rpm dest=/root/os-refresh-config-0.1.11-5.el7ost.noarch.rpm" overcloud
[stack@cld1-director-0 desjardins-monitoring]$ ansible -i inventories/cld1-overcloud -m shell -sa "yum -y localinstall /root/os-refresh-config-0.1.11-5.el7ost.noarch.rpm" overcloud

Related bug:
Upgrading overcloud Mitaka to Newton fails with os-collect-config timeout option
https://bugs.launchpad.net/tripleo/+bug/1632890

Comment 1 Sofer Athlan-Guyot 2018-04-23 14:29:28 UTC
Hi,

Something is strange here.  I have a running  osp9->osp10 upgraded environment and I've got that:

[1] [heat-admin@compute-0 ~]$ os-refresh-config --help | grep timeout
                         [--lockfile LOCKFILE] [--timeout TIMEOUT]
  --timeout TIMEOUT     Seconds until the current run will be terminated.


[2] [root@compute-0 ~]# yum whatprovides $(which os-refresh-config)
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
os-refresh-config-5.1.0-1.el7ost.noarch : Refresh system configuration
Repo        : @rhelosp-10.0-puddle
Matched from:
Filename    : /bin/os-refresh-config

[3] grep os-refresh-config /var/log/yum.log
Apr 12 18:58:27 Updated: os-refresh-config-5.1.0-1.el7ost.noarch

That mean that:
 1. the os-refresh-config has the --timeout option
 2. it comes from osp10 repo and the version is vastly different that the one in the bz, I've got 5.1.0-1
 3. it has been installed during the upgrade.

And I've got a successful upgrade.

Could you send over the yum.log file, it's as if the os-refresh-config hadn't been upgraded.  If that's the case we will need to understand why.


For reference.
This is [4] the code that was changed in osp10 to add timeout.

[4] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/role.role.j2.yaml#L109..L110

Comment 3 Sofer Athlan-Guyot 2018-06-11 14:16:30 UTC
Hi,

I'm closing this one for lack of information.  If it's still an issue, please re-open the bz with the information from https://bugzilla.redhat.com/show_bug.cgi?id=1567232#c1 in it.

Regards,


Note You need to log in before you can comment on or make changes to this bug.