202454 – installation of screen(1) fails if nscd is running

Bug 202454 - installation of screen(1) fails if nscd is running

Summary: installation of screen(1) fails if nscd is running

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	shadow-utils
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Peter Vrabec
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:	191464
Blocks:
TreeView+	depends on / blocked

Reported:	2006-08-14 15:41 UTC by Elena Zannoni
Modified:	2007-11-30 22:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-10-11 15:30:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
strace output (53.56 KB, text/plain) 2006-08-22 15:20 UTC, Peter Vrabec	no flags	Details
another strace (606.15 KB, text/plain) 2006-10-03 13:40 UTC, Peter Vrabec	no flags	Details
View All

Description Elena Zannoni 2006-08-14 15:41:39 UTC

+++ This bug was initially created as a clone of Bug #191464 +++

Description of problem:
If nscd is running and clean installation (i.e. not an update) of screen is
performed, group screen isn't created at the right moment and /usr/bin/screen
and /var/run/screen are saved with wrong access rights. screen then refuses to
start. If I uninstall screen (and also remove his group and /var/run/screen,
which isn't removed by uninstall scriptlets) and install it again with nscd
turned off, it passes.

I was told that nscd is to blame, not screen itself.

Version-Release number of selected component (if applicable):
nscd-2.4-4, screen-4.0.2-12

How reproducible:
Always, well, I've tried it on one box only so far

Steps to Reproduce:
1. make sure you don't have screen installed at all (no group screen, no
/var/run/screen)
2. start nscd (unless it's already running)
3. install screen (using yum or plain rpm)
  
Actual results:
# yum install screen
...
Running Transaction
  Installing: screen                                                
[1/1]warning: group screen does not exist - using root
  Installing: screen                       ######################## 
[1/1]warning: group screen does not exist - using root
  Installing: screen                       ######################### [1/1]

Installed: screen.i386 0:4.0.2-12
Complete!
# screen
Directory '/var/run/screen' must have mode 777.
# rpm -V screen
.M....G..   /usr/bin/screen
......G..   /var/run/screen
# grep screen /etc/group
screen:x:84:

( ^ which means that group screen is created at some moment)

Expected results:
success

Additional info:
my nscd.conf looks good, I guess: (rpm -V nscd doesn't shout)
check-files             group           yes

-- Additional comment from drepper on 2006-05-29 01:15 EST --
This cannot be expected to work otherwise.  nscd is monitoring the change to
/etc/group etc but this isn't done for every access.  Otherwise there'd be no
acceleration.

What has to happen is that nscd is told to prune the cache using the -i
parameter.  So, who's adding the IDs?  rpm?  Whatever program is responsible
should run

  /usr/sbin/nscd -i passwd

after adding the UID to /etc/passwd and

/usr/sbin/nscd -i group

after adding the GID to /etc/group.

I'm reassigning the bug to rpm for now.

-- Additional comment from n3npq on 2006-05-29 11:35 EST --
If nscd is now working from cache rather than backing store with a periodic (and
imperfect) check of 
backing store for cache consistency (a good idea imho), then useradd should
notify that, indeed, the
nscd cache is invalid and should be recreated.


That is a shadowutils, not an rpm, problem.

-- Additional comment from pvrabec on 2006-05-30 05:47 EST --
This problem is fixed by screen-4.0.2-13
%pre
/usr/sbin/groupadd -g 84 -r -f screen
service nscd force-reload

Even thou I don't know why nscd force-reload is needed. I thought groupadd 
properly notify 
nscd that it needs to flush its cache.(#186803)



-- Additional comment from rbiba on 2006-05-30 06:06 EST --
Peter, are you sure that this is a clean solution? If nscd isn't running,
installation spits:

# rpm -ihv /usr/src/redhat/RPMS/i386/screen-4.0.2-13.i386.rpm
Preparing...                ########################################### [100%]
Reloading nscd: [FAILED]
   1:screen                 ########################################### [100%]

This is pretty expected but shoudn't be printed.

-- Additional comment from jakub on 2006-05-30 07:47 EST --
service nscd force-reload is of course the wrong thing to do.
[ -x /usr/sbin/nscd ] && /usr/sbin/nscd -i group
is much better, but still, it is groupadd duty to notify nscd.
shadow-utils already has in lib/nscd.c nscd_flush_cache routine which should
do the same thing as nscd -i, but there was a bug in it (#186803).
So, the question is, was this problem seen also with shadow-utils-4.0.14-6.FC5
(or later)?

-- Additional comment from pvrabec on 2006-05-30 08:05 EST --
Yes it was.  I have compared nscd_flush_cache() routine with nscd -i and it
seems to be 
equal.

-- Additional comment from jakub on 2006-05-30 08:19 EST --
Another problem could be that nscd_flush_cache in e.g. useradd.c or groupadd.c
is called before close_files.  That is before fclose on the /etc/group in
this case, so the addition to /etc/group might very well be just cached and
not written to disk yet.  So, you either need fflush on all the open files
not written yet, or call nscd_flush_cache after close_files.


-- Additional comment from pvrabec on 2006-05-30 09:28 EST --
Created an attachment (id=130228)
strace output

I have switched that lines, but it doesn't help.
# strace -fo log rpm -i screen-4.0.2-12.i386.rpm
warning: group screen does not exist - using root
warning: group screen does not exist - using root

@@ -558,14 +558,15 @@
		find_new_gid ();

	grp_update ();
-	nscd_flush_cache ("group");

	close_files ();
+	nscd_flush_cache ("group");


-- Additional comment from jakub on 2006-05-30 11:50 EST --
I think the shadow-utils patch is needed, but there is also a glibc bug and
further shadow-utils changes are needed too.
The problem on the glibc side (well, nscd) is that nscd -i just writes the
request to the socket and doesn't wait for an ack that the cache has been
invalidated (or at least the invalidation initiated in a way that clients will
not use the old cached data).
The changes on shadow-utils side will need to be:
1) spawning /usr/sbin/nscd -i {passwd,group} rather than doing the INVALIDATE
by hand - INVALIDATE is nscd private request and while nscd can rely on the same
nscd version already running (if any) by restarting nscd in its scripts,
shadow-utils can't.

-- Additional comment from jakub on 2006-05-30 12:41 EST --
http://sources.redhat.com/ml/libc-hacker/2006-05/msg00023.html

-- Additional comment from pvrabec on 2006-05-31 05:56 EST --
Created an attachment (id=130259)
shadow-utils nscd.c candidate

What do u think about that, Jakub?

-- Additional comment from drepper on 2006-06-03 12:22 EST --
Re comment #11: you don't have error checking.  fork and fail, execl can fail,
waitpid can return EINTR.  It might be better to use posix_spawn() instead of
fork+exec and use TEMP_FAILURE_RETRY around the waitpid call and compare the
return value with the child PID.

Also, does the rest of the code already use perror?  If yes, fine, if not do
what the rest does.

And a nit: always add const where possible.  The parameter for the functions
should be const char *.

-- Additional comment from pvrabec on 2006-06-04 15:42 EST --
Created an attachment (id=130458)
nscd.c candidate #2

Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about
this.

-- Additional comment from kloczek.pg.gda.pl on 2006-06-04 15:59 EST --
(In reply to comment #9)
> I think the shadow-utils patch is needed, but there is also a glibc bug and
> further shadow-utils changes are needed too.

shadow 4.0.2 was released more than four years ago and bug with not flushing
nscd cache was fixed more than year ago in shadow source tree in released shadow
4.0.11  (btw: tmorrow will be released 4.0.16).


-- Additional comment from kloczek.pg.gda.pl on 2006-06-04 16:05 EST --
(In reply to comment #13)
> Created an attachment (id=130458) [edit]
> nscd.c candidate #2
> 
> Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about
> this.

Flushing nscd cache by run "nscd -i <service>" is overkill. nscd provides way
for flushing map cache by "talk" to nscd socket and this method is used more
than year by shadow tools (look at http://cvs.pld.org.pl/shadow/nscd.c for how
it is used now).


-- Additional comment from drepper on 2006-06-04 16:15 EST --
> Flushing nscd cache by run "nscd -i <service>" is overkill.

No, it's not.  Nobody is allowed to use that socket outside of glibc.  The
protocol is private.  That's the whole reason for the problem.  Stop commenting
on things you have no clue about.

-- Additional comment from kloczek.pg.gda.pl on 2006-06-04 16:28 EST --
Any problems make this public ? (planed changes or so ?)
Why any glibc code also not uses flushing by "nscd -i <service>" (fo make more
bloated glibc code ?)

-- Additional comment from jakub on 2006-06-05 12:34 EST --
Yes, there have been changes e.g. to fix this bug on the glibc side.
See http://sources.redhat.com/ml/libc-hacker/2006-05/msg00023.html
Not sure what bloated glibc code you are talking about, the only place
in glibc which flushes the nscd caches is nscd command with -i option.

-- Additional comment from pvrabec on 2006-06-14 07:43 EST --
shadow-utils-4.0.16-3 use "nscd -i"



-- Additional comment from rbiba on 2006-07-04 04:29 EST --
Thanks for the fix in rawhide. However, any chance the updated packages will make
in into FC5? There are another packages that scream when they're being installed
(including servers such as dovecot or bind) and it's pretty annoying to fix the
ownership issues manually.

-- Additional comment from pvrabec on 2006-07-10 05:10 EST --
(In reply to comment #20)
> Thanks for the fix in rawhide. However, any chance the updated packages will make
> in into FC5? There are another packages that scream when they're being installed
> (including servers such as dovecot or bind) and it's pretty annoying to fix the
> ownership issues manually.

shadow-utils update is useless until glibc is fixed in FC5


-- Additional comment from jakub on 2006-08-08 09:37 EST --
That's actually not true.  Even the shadow-utils change alone will fix this
issue, because if glibc doesn't write back an ACK that the database has been
already invalidated, shadow-utils will wait on the read until it fails
(and it will fail as soon as the database is invalidated, as old glibc will
just close the socket in that case).

Comment 2 Peter Vrabec 2006-08-21 14:41:52 UTC

Even thou I know shadow-utils have problem with invalidating of nscd cache, I need to 
reproduce it against something else. Screen isn't a good example since it doesn't use it's own 
group in rhel-4.

Comment 3 Ulrich Drepper 2006-08-21 15:21:08 UTC

You can write a trivial tet program like (pseudo-code):

int main (void) {
  system ("adduser something");
  getpwnam ("something");
}

If the getpwnam call fails the bug isn't fixed.

Comment 4 Peter Vrabec 2006-08-22 09:37:30 UTC

I can't reproduce it. I have tried #up2date {dovecot,named} and Ulrich's test program without 
any success. :-(

nscd (pid 15313) is running...
glibc-2.3.4-2.25
shadow-utils-4.0.3-60.RHEL4

Do u have any suggestions Elena?

Comment 5 Jakub Jelinek 2006-08-22 10:07:26 UTC

As strace on RHEL4 U4 shows, shadow-utils-4.0.3-60.RHEL4 does quite bad things:
getent group screen; /usr/sbin/groupadd -g 84 -r -f screen; getent group screen
shows groupadd opening a wrong socket (/var/run/.nscd_socket instead of the
RHEL4+ /var/run/nscd/socket) and if that fails, sends SIGHUP signal to nscd
(this happens to work if nscd is awaken fast enough, but there is no guarantee
it will be).  If opening the socket succeeds (try for testing
ln -sf /var/run/nscd/socket /var/run/.nscd_socket), then it does another bogus
thing - writes the request in multiple writes, which nscd won't grok, as the
initial read is non-blocking and therefore groupadd is killed by SIGPIPE.

I don't think you need to spend too much time trying to reproduce it, just the
strace log is enough to find several severe bugs that simply have to be fixed
for RHEL4.5.

Comment 6 Peter Vrabec 2006-08-22 15:20:52 UTC

Created attachment 134644 [details]
strace output

How do you like this, Jakub? It's strace output of shadow-utils-4.0.3, which
were patched to use same mechanism to invalidate nscd cache as
shadow-utils-4.0.17 from rawhide.

Comment 8 Peter Vrabec 2006-10-03 13:40:40 UTC

Created attachment 137646 [details]
another strace

I can't reproduce this problem on RHEL-4, even thou I have used
screen-4.0.2-12.

Comment 11 RHEL Program Management 2006-10-11 15:30:36 UTC

Quality Engineering Management has reviewed and declined this request.  You may
appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.