Description of problem: Upgrading overcloud Mitaka to Newton fails with os-collect-config timeout option Version-Release number of selected component (if applicable): OSP9->OSP10 Actual results: This was the error in the journal from os-collect-config Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: [--lockfile LOCKFILE] Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: os-refresh-config: error: unrecognized arguments: --timeout 14400 Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:28.213 3392 ERROR os-collect-config [-] Command failed, will not cache new data. Command 'os-refresh-config Apr 12 17:11:28 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:28.213 3392 WARNING os-collect-config [-] Sleeping 1.00 seconds before re-exec. Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:29.719 3392 WARNING os_collect_config.local [-] /var/lib/os-collect-config/local-data not found. Skipping Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: 2018-04-12 17:11:29.719 3392 WARNING os_collect_config.local [-] No local metadata found (['/var/lib/os-collect-config/local Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: usage: os-refresh-config [-h] [--print-base] [--print-phases] Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: [--log-level {ERROR,WARN,CRITICAL,INFO,DEBUG}] Apr 12 17:11:29 cld1-compute-1.com os-collect-config[3392]: [--lockfile LOCKFILE] Expected results: No error Additional info: Possible workaround: updated the package manually: [stack@cld1-director-0 desjardins-monitoring]$ ansible -i inventories -m copy -sa "src=/home/stack/os-refresh-config-0.1.11-5.el7ost.noarch.rpm dest=/root/os-refresh-config-0.1.11-5.el7ost.noarch.rpm" overcloud [stack@cld1-director-0 desjardins-monitoring]$ ansible -i inventories/cld1-overcloud -m shell -sa "yum -y localinstall /root/os-refresh-config-0.1.11-5.el7ost.noarch.rpm" overcloud Related bug: Upgrading overcloud Mitaka to Newton fails with os-collect-config timeout option https://bugs.launchpad.net/tripleo/+bug/1632890
Hi, Something is strange here. I have a running osp9->osp10 upgraded environment and I've got that: [1] [heat-admin@compute-0 ~]$ os-refresh-config --help | grep timeout [--lockfile LOCKFILE] [--timeout TIMEOUT] --timeout TIMEOUT Seconds until the current run will be terminated. [2] [root@compute-0 ~]# yum whatprovides $(which os-refresh-config) Loaded plugins: product-id, search-disabled-repos, subscription-manager This system is not registered with an entitlement server. You can use subscription-manager to register. os-refresh-config-5.1.0-1.el7ost.noarch : Refresh system configuration Repo : @rhelosp-10.0-puddle Matched from: Filename : /bin/os-refresh-config [3] grep os-refresh-config /var/log/yum.log Apr 12 18:58:27 Updated: os-refresh-config-5.1.0-1.el7ost.noarch That mean that: 1. the os-refresh-config has the --timeout option 2. it comes from osp10 repo and the version is vastly different that the one in the bz, I've got 5.1.0-1 3. it has been installed during the upgrade. And I've got a successful upgrade. Could you send over the yum.log file, it's as if the os-refresh-config hadn't been upgraded. If that's the case we will need to understand why. For reference. This is [4] the code that was changed in osp10 to add timeout. [4] https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/role.role.j2.yaml#L109..L110
Hi, I'm closing this one for lack of information. If it's still an issue, please re-open the bz with the information from https://bugzilla.redhat.com/show_bug.cgi?id=1567232#c1 in it. Regards,