847878 – find -nouser reports files owned by valid users

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 847878 - find -nouser reports files owned by valid users

Summary: find -nouser reports files owned by valid users

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Carlos O'Donell
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-13 21:51 UTC by Pedro
Modified:	2016-11-24 12:19 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-06-23 03:14:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Pedro 2012-08-13 21:51:14 UTC

Description of problem:

Running the command: "find / -mount \( -nouser -o -nogroup \)" returns a *long* list of files. When I take a closer look, those files are owned by valid users. For example, if I then do "ls -l" on the files, it shows up that they are owned by the "oracle" user (it happens with other users also). What's more, some files are owned by "1041", which turns out to be "oracle" once I do a "ypcat".

I took a look at system-config-authentication, and things seem to be in order. Errors are not being reported to /var/log/messages. I can actually "su" to the users that find says don't exist.

Version-Release number of selected component (if applicable):
find --version:
find (GNU findutils) 4.4.2
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION SELINUX FTS() CBO(level=0)

How reproducible:
It happens every time I run the find command as specified above.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Kamil Dudka 2012-08-26 22:41:51 UTC

Could you please attach the contents of your /etc/nsswitch.conf?

Comment 3 Pedro 2012-08-29 13:27:38 UTC

(In reply to comment #2)
> Could you please attach the contents of your /etc/nsswitch.conf?

I can't attach the file directly, but I can include what's on there:

passwd: files nis
shadow: files nis
group: files nis
hosts: files nis dns
bootparams: nis [NOTFOUND=return] files
ethers: files
netmasks: files
networks: files
protocols: files
rpc: files
services: files
netgroup: files nis
publickey: nis
automount: files nis
aliases: files nis


Thanks,
Pedro

Comment 4 Kamil Dudka 2012-09-02 14:05:27 UTC

What does the following command say on your box?

getent passwd 1041; echo $?

Comment 5 Pedro 2012-09-04 21:26:45 UTC

oracle:xxx:1041:1001:Oracle Admin:/home/oracle:/bin/csh
0

(where xxx is the shadow password)


Thanks again,
Pedro

Comment 6 Kamil Dudka 2012-09-04 22:26:06 UTC

Then uid 1041 is likely not the reason why those files were listed...

Please try the following two commands:

find / -xdev -nouser -printf "%U\n"
find / -xdev -nogroup -printf "%G\n"

Comment 7 Pedro 2012-09-05 13:12:47 UTC

I think some work was done on the system since I first posted (I haven't really been working on them for a while now). It used to be that there were a bunch of files under /tmp that had the issue, but now there are none. There's only one mount that has the issue now, it seems like. 

Running the first command returns a LOT of results, all saying "1041", and the second one also returns a lot of results, all "1001".

Comment 8 Kamil Dudka 2012-09-05 13:50:57 UTC

The implementation of the -nouser predicate is so trivial that it cannot be wrong.  It does exactly what POSIX asks for:

    "The primary shall evaluate as true if the file belongs to a user ID for which the getpwuid() function defined in the System Interfaces volume of POSIX.1-2008 (or equivalent) returns NULL."

The only problem I can think of is that getpwuid() returns inconsistent results on subsequent calls in your case.  Please attach the following ltrace output:

ltrace -e getpwuid find /tmp -xdev -nouser -prune

Comment 9 RHEL Program Management 2012-09-07 05:33:47 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 10 Pedro 2012-09-07 13:50:51 UTC

I don't know if this is what you mean by inconsistent results, but after running your command in the directories where I was seeing the issue, I got one of two results:

getpwuid(1041, 0x7fffeb17b010, 0x9f83c0, -1, 256)     = 0x387cf8dda0
getpwuid(1041, 0x7fffeb17b010, 0x9f83c0, -1, 0)     = 0x387cf8dda0

From what I could tell, most of the results were the "256" kind.

I am sorry I can't provide the actual file with results, but the hosts in question are on a separate network with no internet access, and it's not an easy task to copy the things over. :/

Thanks,
Pedro

(In reply to comment #8)
> The implementation of the -nouser predicate is so trivial that it cannot be
> wrong.  It does exactly what POSIX asks for:
> 
>     "The primary shall evaluate as true if the file belongs to a user ID for
> which the getpwuid() function defined in the System Interfaces volume of
> POSIX.1-2008 (or equivalent) returns NULL."
> 
> The only problem I can think of is that getpwuid() returns inconsistent
> results on subsequent calls in your case.  Please attach the following
> ltrace output:
> 
> ltrace -e getpwuid find /tmp -xdev -nouser -prune

Comment 11 Kamil Dudka 2012-09-07 14:26:50 UTC

Both the calls you captured return non-zero value, which means that a user entry was found.  You can easily check how a result for the non-existing user looks like:

# install -o 777 -d xxx
# ltrace -e getpwuid find xxx -xdev -nouser -prune
getpwuid(777, 0x7fff3ebee350, 0x82f590, 0x7fff3ebee350, 0)       = 0
xxx
+++ exited (status 0) +++

Comment 12 Kamil Dudka 2012-09-08 21:40:33 UTC

As for the extra four arguments given to getpwuid, they are just unrelated values grabbed from stack.  You can get rid of them by the following command prior to running ltrace:

echo 'addr getpwuid(uint);' >> ~/.ltrace.conf

Comment 13 Pedro 2012-09-13 17:46:57 UTC

(In reply to comment #11)
> Both the calls you captured return non-zero value, which means that a user
> entry was found.  You can easily check how a result for the non-existing
> user looks like:
> 
> # install -o 777 -d xxx
> # ltrace -e getpwuid find xxx -xdev -nouser -prune
> getpwuid(777, 0x7fff3ebee350, 0x82f590, 0x7fff3ebee350, 0)       = 0
> xxx
> +++ exited (status 0) +++

That's the weird part. It claims to have files that are "unowned", but I am clearly able to find the users that it says don't exist.

I'm grasping for straws here, but could it have something to do with the NIS server, and the fact that it's running Solaris? I mean, it wouldn't make much sense since the files are local, and it doesn't happen with ALL files (it doesn't complain with all files owned by me, for example), but I thought it wouldn't hurt to ask.

Comment 14 Pedro 2012-09-13 18:10:43 UTC

Actually, nevermind. I was able to do the same test on a RHEL 5.6 server that we have (using the same NIS server), and didn't get the findings for the same files.

Comment 15 Kamil Dudka 2012-09-13 18:40:54 UTC

It does not really matter whether the files are local or remote since the file system provides only their UIDs anyway.  It is the getpwuid() function what looks for the corresponding user entry.  Since both the NSS infrastructure and the NIS lookup provider belong to the glibc package, I am switching the component such.

Comment 16 Jeff Law 2012-09-13 20:10:45 UTC

This sounds similar to something we fixed in RHEL 6.3.  Any chance you could try your test on a RHEL 6.3 system talking to the Solaris NIS server?

Comment 17 Pedro 2012-09-14 14:43:50 UTC

Unfortunately, I don't think that's an option. :/ If you could tell me what packages need updating (and where I could find them) I can try passing that on to the system owners. I think maybe in the future we'll move on to a newer RH release, but for now we're locked onto 6.1.

Thanks,
Pedro

Comment 18 Jeff Law 2012-09-14 18:24:55 UTC

I wasn't suggesting you update any machines to a newer release-- just to run the test from a RHEL 6.3 box (if you have one) to confirm/deny that you're bumping against the same problem we fixed in 6.3.

If you don't have a 6.3 machine handy, you might be able to try using a scratch 6.1 machine after first confirming the incorrect behaviour, then updating glibc* and nscd* and testing again.  Note however, that if you take this approach, make sure it's a scratch machine.

While I'm not immediately aware of any problems running a RHEL 6.3 glibc on a RHEL 6.1 system, it's not a configuration we test or support.

Comment 19 Jean-PIerre Melkonian 2013-07-31 06:25:46 UTC

I have a similar problem since i use redhat 6
I have noticed with netstat that the number of ports in time_wait state increase dramatically during the find -nouser. (foreign adress 127.0.0.1:111 and also 127.0.0.1:148, my host is it's own nis server).I suspect that a new port is open for each file scanned to check the user in the NIS.
find does not report errors, but obviously the owner of the file is not found in the nis

setting
net.ipv4.tcp_tw_recycle = 1
seems to correct the problem, but it is not always sufficient.

this does not occur on redhat5, 4, 3  or other os

Comment 21 Carlos O'Donell 2014-06-23 03:14:42 UTC

We don't have sufficient data to figure out what's going on here. I have never seen a system behave as described by the original bug reporter. We would need a lot more data and testing to determine what's wrong. For starters you want to isolate yourself from the NIS server and try using local accounts only, and then work up slowly to determine exactly what causes the problem with the users. 

If this is still a problem please reopen the bug.

Comment 22 Pedro 2015-03-02 17:26:35 UTC

I've moved to a different company so I can't provide further info on this bug. I still get emails from bugzilla in regards to this bug, so I'll try closing this to see if the emails go away. Thanks!

Note You need to log in before you can comment on or make changes to this bug.