Bug 1484533
Summary: | 51-hosts at scale fails to complete and does not report an error (need a backport to OSP10 overcloud image) | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Randy Rubins <rrubins> |
Component: | openstack-tripleo-image-elements | Assignee: | Ben Nemec <bnemec> |
Status: | CLOSED ERRATA | QA Contact: | Gurenko Alex <agurenko> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 10.0 (Newton) | CC: | aschultz, bnemec, dcadzow, dvd, emacchi, jslagle, mburns, ohochman, rhel-osp-director-maint, rhosp-bugs-internal, samccann, vcojot |
Target Milestone: | z5 | Keywords: | FeatureBackport, Triaged, ZStream |
Target Release: | 10.0 (Newton) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-image-elements-5.3.0-3.el7ost | Doc Type: | Bug Fix |
Doc Text: |
In larger scale deployments (100 or more overcloud nodes), the 51-hosts script used to write all overcloud nodes into each /etc/hosts file would fail due to an "Argument list too long" error.
This limitation has been fixed and should no longer block large-scale deployments.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-09-28 16:35:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Randy Rubins
2017-08-23 19:42:43 UTC
*** Bug 1484523 has been marked as a duplicate of this bug. *** *** Bug 1427878 has been marked as a duplicate of this bug. *** How did you workaround this ? , we tried the patch , we still see the same issue where 51-hosts get stuck.. How ever if i run manually inside the node "os-refresh-config --log-level DEBUG" , i do not get any error , it is going through fine. Just during the deployment , node progress get stuck , when we scale upto 35 nodes itself. During deployment , we do not see "dib-run-parts Fri Sep 15 18:41:33 EDT 2017 51-hosts completed" @bigswitch: Did you update your overcloud-full image with the patched 51-hosts file? We just used libguestfs-tools utilities to update the image until official fix gets released. Mounting the modified image via guestfish (http://libguestfs.org/guestfish.1.html) would be an easy way to make and/or validate the change. We used virt-customize to upload the new 51-hosts file into the overcloud image.. virt-customize -a overcloud-full.qcow2 --upload /home/stack/templates/51-hosts:/usr/libexec/os-refresh-config/configure.d/51-hosts Your updated 51-hosts file looks like this? https://review.openstack.org/gitweb?p=openstack/tripleo-image-elements.git;a=blob_plain;f=elements/hosts/os-refresh-config/configure.d/51-hosts And when you run 'os-refresh-config' manually, it updates the /etc/hosts on that overcloud node properly? Yes , it is the same file , when i run manually , it does update the /etc/hosts on the node. Did you also update the existing overcloud nodes with the updated 51-hosts file? If you did, and it's still failing with the above AWK error, I'd recommend opening a support case then. In our case, the IN_PROGRESS ("stuck") software deployments cleared upon rerunning overcloud stack update. To summarize, we updated 51-hosts file on all existing overcloud nodes (144 computes and 3 controllers already built with older overcloud image) + updated overcloud-full.qcow2 image, re-uploaded it to glance prior to re-running the overcloud stack update to arrive at 160 computes and 3 controllers. All patches are available on a fresh deployment (2017-09-07.2 build), but due to lack of hardware it's not tested on a same setup. [stack@undercloud-0 ~]$ rpm -q openstack-tripleo-image-elements openstack-tripleo-image-elements-5.3.0-3.el7ost.noarch Is there any setting you have used in undercloud to scale to 140 nodes? At only 35 nodes it's unlikely you're running into this problem. I would suggest opening a separate bug with details on the issues you're seeing. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2825 |