RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1238628 - nss_myhostname should use new glibc API to support union of result
Summary: nss_myhostname should use new glibc API to support union of result
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.3
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: ---
Assignee: systemd-maint
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On: 1374228 1319285
Blocks: 1295396 1313485
TreeView+ depends on / blocked
 
Reported: 2015-07-02 10:16 UTC by Siddhesh Poyarekar
Modified: 2021-07-13 08:40 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1319285 (view as bug list)
Environment:
Last Closed: 2019-09-18 12:59:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Siddhesh Poyarekar 2015-07-02 10:16:13 UTC
Description of problem:
When the DNS server is unreachable, getaddrinfo is expected to return EAI_AGAIN.  This was fixed in RHEL-6 with bug 1044628 but the same fix is useless in RHEL-7. This is because a default RHEL-7 install also has myhostname plugin, which returns NODATA, resulting in getaddrinfo returning EAI_NONAME.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
See reproducer in bug 1098042.

Actual results:
getaddrinfo returns EAI_NONAME

Expected results:
getaddrinfo returns EAI_AGAIN

Additional info:

I have marked this as a regression because it breaks behaviour from rhel-6.

Comment 5 Lukáš Nykrýn 2016-03-07 15:52:01 UTC
Should be easy to fix -> devel_ack

Comment 7 Lukáš Nykrýn 2016-03-08 16:21:55 UTC
After some thinking I don't think that the suggested fix is correct. Based on that logic every nss plugin should return NSS_STATUS_TRYAGAIN in the case that it does not find the host, which is nonsense. I think that glibc should return EAI_AGAIN in the case that one of the modules returns NSS_STATUS_TRYAGAIN and return the EAI_NONAME in the case that every module ended with NSS_STATUS_NOTFOUND.

Reassigning to glibc for further comments.

Comment 10 Carlos O'Donell 2016-03-16 06:34:41 UTC
(In reply to Lukáš Nykrýn from comment #7)
> After some thinking I don't think that the suggested fix is correct. Based
> on that logic every nss plugin should return NSS_STATUS_TRYAGAIN in the case
> that it does not find the host, which is nonsense. 

Not finding the host is different from the nameserver being unreachable.

If the service can verify the host is not found then it does not need to return NSS_STATUS_TRYAGAIN, it can authoritatively return NSS_STATUS_NOTFOUND.

The case I outline below is that nss_myhostname is a special service and may need to bend the rules a bit to integrate with the expected behaviours (which is what this bug is about).

> I think that glibc should
> return EAI_AGAIN in the case that one of the modules returns
> NSS_STATUS_TRYAGAIN and return the EAI_NONAME in the case that every module
> ended with NSS_STATUS_NOTFOUND.

The glibc NSS services delegate to the next available service. The next service run is considered authoritative. This supports delegating answers if certain services are not yet initialized, connected to their data sources, or empty.

We cannot return EAI_AGAIN if one or more services return NSS_STATUS_TRYAGAIN because a later service might have a valid authoritative answer and we should not retry the query since that would degrade performance. Retrying the query might never succeed in this case. Imagine the case where the user's local nameserver is unreachable, but the user wants to query his hostname. In your suggestion the local DNS query would return with EAI_AGAIN, myhostname would succeed, and yet the result you suggest is returning EAI_AGAIN. That would break a use case for myhostname?

The case of returning EAI_NONAME if every service returned NSS_STATUS_NOTFOUND is not relevant to the case at hand. Changing the returned value violates the design behind NSS services allowing subsequent services to override results.

The root cause of the problem is that nss_myhostname is a special service that is designed to go at the end of the services list (overrides all services) but is not actually authoritative over all services. Conservatively nss_myhostname should:
(a) Identify if the input is something it can authoritatively lookup. If it is then proceeed normally.
(b) If the service can't authoritatively lookup the input then return NSS_STATUS_TRYAGAIN to indicate that perhaps in the future there might be an answer from another service.

This would fix the use case where the user configured DNS servers are unreachable, and nss_myhostname answers authoritatively "That host is not found" when it should instead answer "Try again." Again, this is a special case because nss_myhostname is a unique NSS service.

Does that clarify the bug request?

This problem is not unique to RHEL7 and does also exist in Fedora and upstream.

Comment 11 Lukáš Nykrýn 2016-03-16 09:46:25 UTC
One more question, I thought that NSS_STATUS_NOTFOUND is not necessarily authoritative. I though that for such setup you should use  NOTFOUND=return

Comment 12 Lukáš Nykrýn 2016-03-16 10:37:57 UTC
And another question. What would getaddrinfo return in the case that we make this change, dns says NSS_STATUS_NOTFOUND but latter myhostname returns NSS_STATUS_TRYAGAIN?

Comment 13 Carlos O'Donell 2016-03-17 21:13:58 UTC
(In reply to Lukáš Nykrýn from comment #11)
> One more question, I thought that NSS_STATUS_NOTFOUND is not necessarily
> authoritative. I though that for such setup you should use  NOTFOUND=return

Sorry, let me try to clarify the terminology.

All services are authoritative for the databases they service.

There are default delegation rules that say "If this is not the last the service, and the service returned NOTFOUND, then continue" (there is one such rule for every return type). If it is the last service, then the NOTFOUND result is final (no more services remain to delegate to).

You can override that rule by using "[NOTFOUND=return]" to force delegation to end if NOTFOUND is returned. This is currently used with `mdns4_minimal` to make it authoritative over Zeroconf hostnames. That means that if mdns4_minimal returns NOTFOUND it's because the Zeroconf hostname was not found, and no further resolution should happen (this avoids loading DNS servers with Zeroconf hostname queries).

(In reply to Lukáš Nykrýn from comment #12)
> And another question. What would getaddrinfo return in the case that we make
> this change, dns says NSS_STATUS_NOTFOUND but latter myhostname returns
> NSS_STATUS_TRYAGAIN?

You raise an interesting conundrum here.

The solution I had in mind was to move myhostname ahead of DNS. This isn't particularly appealing because it has been argued that DNS should be chosen as authoritative if it has entries for the names being looked up instead of relying on myhostname. Therefore this solution is out unless someone can show that there is a clear consensus on deciding which names *could* be resolved ahead of DNS.

In your example you cite, without moving the position of the myhostname service, we would return TRYAGAIN (from the last service e.g. myhostname) incorrectly when it should be NOTFOUND. That is certainly a problem.

It would appear that the result of myhostname in this case is dependent on the results of the preceeding service lookup, particularly if myhostname finds it is not authoritative for the lookup.

For example:

(a) If the previous service would have returned X for the lookup, and myhostname is not-authoritative for the lookup, it should propagate the X result.

(b) If myhostname is authoritative it can return any result it wishes. Discussions about what is or is not authoritative can happen in another issue e.g. gateway discussion.

Doing (a) and (b) would allow:

hosts: files ... dns myhostname ...

To work correctly, where if dns returns NOTFOUND, and myhostname should not resolve a lookup, it should also return NOTFOUND.

If dns returns TRYAGAIN, and myhostname should not resolve a lookup, it should also return TRYAGAIN.

Passing on the result of the last service as-if myhostname had not been present. It would be nice if there was an API to get the results of the last service but there aren't. We could add them, but it would force a glibc upgrade for everyone using the newer systemd that required the symbols in question. 

The NSS service and delegations were never designed for a system where a service might only be able to answer a portion of the queries passed to it. The consequence of that is this bug report, and the fact that myhostname and mymachines are last in the list.

We have an `int __nss_configure_lookup (const char *dbname, const char *service_line);` to configure the hosts lookup for the current process which could be used to re-run the lookup with myhostname removed, and that would give you the answer that you want to propagate. Unfortunately the database is process global and changing it would interfere with other threads making similar lookups. The API __nss_configure_lookup was designed for testing and some compatibility NSS plugins.

Therefore I have no workaround to solve this problem.

It looks like you don't have enough information to return the correct result.

I think we should be able to fix this in RHEL7 if we extend a private API for nss_myhostname to use to get the result of the last service e.g. __nss_last_service_result().

Did I miss something in my analysis?

Comment 14 Carlos O'Donell 2016-03-18 00:51:36 UTC
(In reply to Carlos O'Donell from comment #13)
> Passing on the result of the last service as-if myhostname had not been
> present. It would be nice if there was an API to get the results of the last
> service but there aren't. We could add them, but it would force a glibc
> upgrade for everyone using the newer systemd that required the symbols in
> question. 

Given that NSS is always a plugin, we could have myhostname dlopen libc and look to see if it supports the new API, and use it to query the last service result and pass that along if possible, otherwise just do what it does today.

Comment 15 Lukáš Nykrýn 2016-03-18 12:40:12 UTC
So if I get this correctly. We need to clone this bug for glibc to get a new api in glibc and then use it in myhostname (if it is present).

Comment 16 Carlos O'Donell 2016-03-18 15:43:18 UTC
(In reply to Lukáš Nykrýn from comment #15)
> So if I get this correctly. We need to clone this bug for glibc to get a new
> api in glibc and then use it in myhostname (if it is present).

Yes exactly. It will likely be an internal API with a double underscore designed for this specific use case.

Comment 18 Lukáš Nykrýn 2016-03-23 10:05:54 UTC
Removing devel_ack, since we are waiting for 1319285.

Comment 20 Carlos O'Donell 2016-05-24 18:28:31 UTC
Raised issue upstream to have a conversation about the notion of supporting NSS service plugins that only answer a subset of requests in an authoritative way.

https://www.sourceware.org/ml/libc-alpha/2016-05/msg00554.html

Comment 21 Lukáš Nykrýn 2016-06-09 12:20:39 UTC
Since this bug depends on the change in glibc, I really don't think we can fix this in 7.3 timeframe.

Comment 22 Michal Sekletar 2016-08-26 11:24:35 UTC
This was moved to RHEL-7.4, removing blocker flag.

Comment 24 Carlos O'Donell 2016-09-08 09:50:09 UTC
Still working on the upstream API for this in glibc. Again, this is being tracked in bug 1319285.

Comment 25 Lukáš Nykrýn 2017-04-24 12:26:51 UTC
Since the feature in glibc is not ready yet -> 7.5

Comment 29 Michal Sekletar 2019-09-18 12:59:48 UTC
Closing this one as it depends on the glibc change and glibc team is not going to deliver the new API needed to fix this in RHEL-7.

https://bugzilla.redhat.com/show_bug.cgi?id=1319285#c11


Note You need to log in before you can comment on or make changes to this bug.