Bug 1321720

Summary: getting unknown host when we ping CNAME after the application restart
Product: Red Hat Enterprise Linux 6 Reporter: Jayaraj <jdeenada>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED WONTFIX QA Contact: qe-baseos-tools-bugs
Severity: high Docs Contact:
Priority: urgent    
Version: 6.6CC: ashankar, cww, fweimer, jdeenada, mnewsome, pfrankli
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-09 18:02:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1269194    

Description Jayaraj 2016-03-29 03:02:32 UTC
Description of problem:

Issue: After restarting an application, when we ping it with CNAME, we are getting unknown host error for around 10-15 mins. nslookup is able to resolve during this time. After 10-15 mins, ping is also working. This happens everytime the application is restarted.

However if we bring down nscd, ping works fine without any error. 
if the customer use A record, then agents are able to communicate with OMS immediately after the application restart. Only when they use the CNAME we are facing this issue.

The package version installed:

nscd-2.12-1.149.el6_6.5.x86_64 

glibc-2.12-1.149.el6_6.5.i686
glibc-2.12-1.149.el6_6.5.x86_64 
glibc-common-2.12-1.149.el6_6.5.x86_64 
glibc-devel-2.12-1.149.el6_6.5.i686
glibc-devel-2.12-1.149.el6_6.5.x86_64
glibc-headers-2.12-1.149.el6_6.5.x86_64


Unable to reproduce the issue, however customer did a workaround the problem by lowering the TTL of CNAME record from 900sec to 20sec same as A record TTL. 
This proves that the issue was with NSCD, caching incorrectly similar to the issue documented in https://access.redhat.com/solutions/91233 .

Since the issue is occurring in RHEL 6.6 systems running  "glibc-2.12-1.149.el6.x86_64" (this version is already in a higher than the version that was first resolved as mentioned in the above said KB), i need some investigation. Could you please check if the bug mentioned before has re-surfaced or if this is a new bug?

Comment 11 Jayaraj 2016-04-05 02:02:07 UTC
Customer has asked time to simulate the issue and come back with tcpdump output as they currently applied a workaround to resolve the issue by bringing down the TTL value of CNAME to match with A record. I will keep the BZ updated as soon as I have the update from the customer.