Hide Forgot
Description of problem: Issue: After restarting an application, when we ping it with CNAME, we are getting unknown host error for around 10-15 mins. nslookup is able to resolve during this time. After 10-15 mins, ping is also working. This happens everytime the application is restarted. However if we bring down nscd, ping works fine without any error. if the customer use A record, then agents are able to communicate with OMS immediately after the application restart. Only when they use the CNAME we are facing this issue. The package version installed: nscd-2.12-1.149.el6_6.5.x86_64 glibc-2.12-1.149.el6_6.5.i686 glibc-2.12-1.149.el6_6.5.x86_64 glibc-common-2.12-1.149.el6_6.5.x86_64 glibc-devel-2.12-1.149.el6_6.5.i686 glibc-devel-2.12-1.149.el6_6.5.x86_64 glibc-headers-2.12-1.149.el6_6.5.x86_64 Unable to reproduce the issue, however customer did a workaround the problem by lowering the TTL of CNAME record from 900sec to 20sec same as A record TTL. This proves that the issue was with NSCD, caching incorrectly similar to the issue documented in https://access.redhat.com/solutions/91233 . Since the issue is occurring in RHEL 6.6 systems running "glibc-2.12-1.149.el6.x86_64" (this version is already in a higher than the version that was first resolved as mentioned in the above said KB), i need some investigation. Could you please check if the bug mentioned before has re-surfaced or if this is a new bug?
Customer has asked time to simulate the issue and come back with tcpdump output as they currently applied a workaround to resolve the issue by bringing down the TTL value of CNAME to match with A record. I will keep the BZ updated as soon as I have the update from the customer.