Bug 1524808
Summary: | VMs not able to reach metadata server because of l3 agent is throwing errors | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | PURANDHAR SAIRAM MANNIDI <pmannidi> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Brian Haley <bhaley> | |
Status: | CLOSED ERRATA | QA Contact: | Federico Ressi <fressi> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 10.0 (Newton) | CC: | amuller, beagles, bhaley, chrisw, jlibosva, lbezdick, mburns, nyechiel, pmannidi, ragiman, rhel-osp-director-maint, srevivo | |
Target Milestone: | async | Keywords: | Triaged, ZStream | |
Target Release: | 10.0 (Newton) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-tripleo-heat-templates-5.3.10-4.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1558197 (view as bug list) | Environment: | ||
Last Closed: | 2018-06-27 23:30:46 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1558197, 1763193 |
Description
PURANDHAR SAIRAM MANNIDI
2017-12-12 06:43:03 UTC
Can you please attach the l3-agent.log file and/or an sosreport so this can be further diagnosed? From the "No buffer space available" it looks like the neighbour table on the system is full, which shouldn't happen normally. One place to look for failures is /var/log/messages - it could show the table is full. Also, looking at the net.ipv4.gc_thresh* sysctl settings would be useful, since it would tell you how many entries the table is configured for. Sai, Thanks for the info. From the l3-agent log it looks like the neighbour table has overflowed since adding an ARP entry is failing. Can the customer check in /var/log/messages for any related warnings? Also, can they provide the output of 'sysctl -a | grep gc_thresh' ? It could just be the need to increase the size of the table for their workload. Just checking if additional info on my previous comment is available. Those numbers for gc_thresh* are very low if this is a large deployment. I actually found a bug and patch that increased these values done less than a year ago, I'll link it in the bug. That said, on the affected node they could try increasing these values and see if the problem persists: # sysctl -w net.ipv4.neigh.default.gc_thresh1=1024 # sysctl -w net.ipv4.neigh.default.gc_thresh2=2048 # sysctl -w net.ipv4.neigh.default.gc_thresh3=4096 Those are the new default values. If that works then someone could look at backporting the changes. upstream backport for OSPd/TripleO fix for this on the newton branch is https://review.openstack.org/#/c/532612/ I don't think you have to boot 5-6 instances, verifying the gc_thresh settings are correct should be enough as it shows the size of the neighbour table has been increased. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2101 |