Description of problem: ----------------------- Attempt to add extra compute after overcloud upgrade to RHOS-11 fails due to timeout. openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ --control-scale 3 \ --control-flavor controller \ --compute-scale 3 \ --compute-flavor compute \ --ceph-storage-scale 3 \ --ceph-storage-flavor ceph \ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml ... 2018-05-03 09:58:36Z [overcloud-Compute-2ju3dpbou7mu.2.NetworkDeployment]: CREATE_IN_PROGRESS state changed 2018-05-03 09:58:38Z [overcloud-Compute-2ju3dpbou7mu.2.NovaComputeConfig]: CREATE_COMPLETE state changed 2018-05-03 11:33:15Z [Compute]: UPDATE_FAILED UPDATE aborted 2018-05-03 11:33:16Z [overcloud]: UPDATE_FAILED Timed out 2018-05-03 11:33:16Z [overcloud-Compute-2ju3dpbou7mu.2]: CREATE_FAILED CREATE aborted 2018-05-03 11:33:16Z [overcloud-Compute-2ju3dpbou7mu]: UPDATE_FAILED Operation cancelled Stack overcloud UPDATE_FAILED Version-Release number of selected component (if applicable): Checking logs on newly added compute: ------------------------------------- May 03 07:30:22 compute-2 os-collect-config[3052]: /usr/libexec/os-refresh-config/configure.d/20-os-net-config: line 81: /etc/os-net-config/dhcp_all_interfaces.yaml: No such file or directory May 03 07:30:22 compute-2 os-collect-config[3052]: + os-net-config -c /etc/os-net-config/dhcp_all_interfaces.yaml -v --detailed-exit-codes --cleanup May 03 07:30:22 compute-2 os-collect-config[3052]: [2018/05/03 07:30:22 AM] [INFO] Using config file at: /etc/os-net-config/dhcp_all_interfaces.yaml May 03 07:30:22 compute-2 os-collect-config[3052]: [2018/05/03 07:30:22 AM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml May 03 07:30:22 compute-2 os-collect-config[3052]: [2018/05/03 07:30:22 AM] [INFO] Ifcfg net config provider created. May 03 07:30:22 compute-2 os-collect-config[3052]: [2018/05/03 07:30:22 AM] [ERROR] No config file exists at: /etc/os-net-config/dhcp_all_interfaces.yaml May 03 07:30:22 compute-2 os-collect-config[3052]: + RETVAL=1 May 03 07:30:22 compute-2 os-collect-config[3052]: + [[ 1 == 2 ]] May 03 07:30:22 compute-2 os-collect-config[3052]: + [[ 1 != 0 ]] May 03 07:30:22 compute-2 os-collect-config[3052]: + echo 'ERROR: configuration of safe defaults failed.' May 03 07:30:22 compute-2 os-collect-config[3052]: ERROR: configuration of safe defaults failed. May 03 07:30:22 compute-2 os-collect-config[3052]: [2018-05-03 07:30:22,953] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returne May 03 07:30:22 compute-2 os-collect-config[3052]: [2018-05-03 07:30:22,954] (os-refresh-config) [ERROR] Aborting... May 03 07:30:22 compute-2 os-collect-config[3052]: Command failed, will not cache new data. Command 'os-refresh-config --timeout 14400' returned non-zero exit status 1 May 03 07:30:22 compute-2 os-collect-config[3052]: Sleeping 1.00 seconds before re-exec. Packages: --------- openstack-tripleo-heat-templates-6.2.12-2.el7ost.noarch openstack-tripleo-validations-5.6.4-1.el7ost.noarch python-tripleoclient-6.2.4-1.el7ost.noarch openstack-tripleo-common-6.1.5-1.el7ost.noarch openstack-tripleo-image-elements-6.1.3-1.el7ost.noarch openstack-tripleo-heat-templates-6.2.12-2.el7ost.noarch openstack-tripleo-ui-3.2.2-2.el7ost.noarch openstack-tripleo-puppet-elements-6.2.5-1.el7ost.noarch puppet-tripleo-6.5.11-1.el7ost.noarch openstack-tripleo-0.0.8-0.3.4de13b3git.el7ost.noarch rhosp-director-images-ipa-11.0-20180501.1.el7ost.noarch rhosp-director-images-10.0-20180501.1.el7ost.noarch rhosp-director-images-ipa-10.0-20180501.1.el7ost.noarch rhosp-director-images-11.0-20180501.1.el7ost.noarch Steps to Reproduce: -------------------- 1. Upgrade UC/OC to RHOS-11(2018-05-01.2) 2. Scale up overcloud Actual results: ---------------- Scale up failed Expected results: ----------------- Scale up succeeds
Regarding: May 03 07:30:22 compute-2 os-collect-config[3052]: /usr/libexec/os-refresh-config/configure.d/20-os-net-config: line 81: /etc/os-net-config/dhcp_all_interfaces.yaml: No such file or directory Since 20-os-net-config is actually creating /etc/os-net-config/dhcp_all_interfaces.yaml starting at line 55, the only way that error message would occur is if /etc/os-net-config/ did not exist. Need to track down where this dir is getting created and why it would not when scaling up in this case. It appears that the logs for compute-2 are not in the Build artifacts link. I only see compute-0 and compute-1. Can you indicate where the logs were obtained in the initial bug comment?
@Bob, scale up failed so inventory wasn't updated properly, hence logs collection failed: fatal: [compute-2]: FAILED! => { "changed": false, "module_stderr": "Shared connection to 172.16.0.11 closed.\r\n", "module_stdout": "Please login as the user \"heat-admin\" rather than the user \"root\".\r\n\r\n", "rc": 0 }
>Bob, You are also talking about old/new format. Can you point me about an example of >each one ? I dont know which one I'm using In Ocata the nic config files were changed to use a script instead of the os-apply-config to drive os-net-config. The "old-style" nic config files could be identified by: Software Config to drive os-net-config to configure multiple interfaces group: os-apply-config While the "new-style" doesn't use os-apply-config, instead it includes the script: str_replace: template: get_file: ../../scripts/run-os-net-config.sh The old-style was still supported until Queens. In Queens a script is available in $THT/tools/yaml-nic-config-2-script.py to do the conversion. We've seen some issues when upgrading and using the new-style configs in that /etc/os-net-config/config.json was overwritten by os-apply-config, e.g. - https://bugzilla.redhat.com/show_bug.cgi?id=1514949. Not sure yet what is going on here or if the problem you are seeing is the same as comment 3. Would be useful to get some logs to help figure out what is going on.
Yuri - the 2nd set of logs looks quite different from the first, I don't see the os-net-config issues from the initial description. I do see many connectivity issues when trying to access the metadata server from controller-2 I also see these libvirt issues in /var/log/messages on controller-2, which may be causing problems. messages:May 11 13:58:06 controller-2 libvirtd: 2018-05-11 17:58:06.846+0000: 1649: error : logStrToLong_ui:2564 : Failed to convert 'virtio0' to unsigned int messages:May 11 13:58:06 controller-2 libvirtd: 2018-05-11 17:58:06.849+0000: 1649: error : virPCIGetDeviceAddressFromSysfsLink:2643 : internal error: Failed to parse PCI config address 'virtio0' messages:May 11 13:58:06 controller-2 libvirtd: 2018-05-11 17:58:06.850+0000: 1649: error : logStrToLong_ui:2564 : Failed to convert 'virtio1' to unsigned int messages:May 11 13:58:06 controller-2 libvirtd: 2018-05-11 17:58:06.850+0000: 1649: error : virPCIGetDeviceAddressFromSysfsLink:2643 : internal error: Failed to parse PCI config address 'virtio1' messages:May 11 13:58:06 controller-2 libvirtd: 2018-05-11 17:58:06.852+0000: 1649: error : logStrToLong_ui:2564 : Failed to convert 'virtio2' to unsigned int messages:May 11 13:58:06 controller-2 libvirtd: 2018-05-11 17:58:06.852+0000: 1649: error : virPCIGetDeviceAddressFromSysfsLink:2643 : internal error: Failed to parse PCI config address 'virtio2' messages:May 11 21:11:38 controller-2 libvirtd: 2018-05-11 21:11:38.808+0000: 2289: error : logStrToLong_ui:2564 : Failed to convert 'virtio0' to unsigned int messages:May 11 21:11:38 controller-2 libvirtd: 2018-05-11 21:11:38.809+0000: 2289: error : virPCIGetDeviceAddressFromSysfsLink:2643 : internal error: Failed to parse PCI config address 'virtio0' messages:May 11 21:11:38 controller-2 libvirtd: 2018-05-11 21:11:38.812+0000: 2289: error : logStrToLong_ui:2564 : Failed to convert 'virtio1' to unsigned int messages:May 11 21:11:38 controller-2 libvirtd: 2018-05-11 21:11:38.812+0000: 2289: error : virPCIGetDeviceAddressFromSysfsLink:2643 : internal error: Failed to parse PCI config address 'virtio1' messages:May 11 21:11:38 controller-2 libvirtd: 2018-05-11 21:11:38.813+0000: 2289: error : logStrToLong_ui:2564 : Failed to convert 'virtio2' to unsigned int messages:May 11 21:11:38 controller-2 libvirtd: 2018-05-11 21:11:38.813+0000: 2289: error : virPCIGetDeviceAddressFromSysfsLink:2643 : internal error: Failed to parse PCI config address 'virtio2'
OSP11 is now retired, see details at https://access.redhat.com/errata/product/191/ver=11/rhel---7/x86_64/RHBA-2018:1828