191464 – installation of screen(1) fails if nscd is running

Bug 191464 - installation of screen(1) fails if nscd is running

Summary: installation of screen(1) fails if nscd is running

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	shadow-utils
Sub Component:
Version:	5
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Peter Vrabec
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	202454
TreeView+	depends on / blocked

Reported:	2006-05-12 09:01 UTC by Radek Bíba
Modified:	2007-11-30 22:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-06-14 11:43:52 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace output (424.76 KB, text/plain) 2006-05-30 13:28 UTC, Peter Vrabec	no flags	Details
shadow-utils nscd.c candidate (411 bytes, text/x-csrc) 2006-05-31 09:56 UTC, Peter Vrabec	no flags	Details
nscd.c candidate #2 (992 bytes, text/x-csrc) 2006-06-04 19:42 UTC, Peter Vrabec	no flags	Details
View All

Description Radek Bíba 2006-05-12 09:01:24 UTC

Description of problem:
If nscd is running and clean installation (i.e. not an update) of screen is
performed, group screen isn't created at the right moment and /usr/bin/screen
and /var/run/screen are saved with wrong access rights. screen then refuses to
start. If I uninstall screen (and also remove his group and /var/run/screen,
which isn't removed by uninstall scriptlets) and install it again with nscd
turned off, it passes.

I was told that nscd is to blame, not screen itself.

Version-Release number of selected component (if applicable):
nscd-2.4-4, screen-4.0.2-12

How reproducible:
Always, well, I've tried it on one box only so far

Steps to Reproduce:
1. make sure you don't have screen installed at all (no group screen, no
/var/run/screen)
2. start nscd (unless it's already running)
3. install screen (using yum or plain rpm)
  
Actual results:
# yum install screen
...
Running Transaction
  Installing: screen                                                
[1/1]warning: group screen does not exist - using root
  Installing: screen                       ######################## 
[1/1]warning: group screen does not exist - using root
  Installing: screen                       ######################### [1/1]

Installed: screen.i386 0:4.0.2-12
Complete!
# screen
Directory '/var/run/screen' must have mode 777.
# rpm -V screen
.M....G..   /usr/bin/screen
......G..   /var/run/screen
# grep screen /etc/group
screen:x:84:

( ^ which means that group screen is created at some moment)

Expected results:
success

Additional info:
my nscd.conf looks good, I guess: (rpm -V nscd doesn't shout)
check-files             group           yes

Comment 1 Ulrich Drepper 2006-05-29 05:15:05 UTC

This cannot be expected to work otherwise.  nscd is monitoring the change to
/etc/group etc but this isn't done for every access.  Otherwise there'd be no
acceleration.

What has to happen is that nscd is told to prune the cache using the -i
parameter.  So, who's adding the IDs?  rpm?  Whatever program is responsible
should run

  /usr/sbin/nscd -i passwd

after adding the UID to /etc/passwd and

/usr/sbin/nscd -i group

after adding the GID to /etc/group.

I'm reassigning the bug to rpm for now.

Comment 2 Jeff Johnson 2006-05-29 15:35:08 UTC

If nscd is now working from cache rather than backing store with a periodic (and imperfect) check of 
backing store for cache consistency (a good idea imho), then useradd should notify that, indeed, the
nscd cache is invalid and should be recreated.


That is a shadowutils, not an rpm, problem.

Comment 3 Peter Vrabec 2006-05-30 09:47:58 UTC

This problem is fixed by screen-4.0.2-13
%pre
/usr/sbin/groupadd -g 84 -r -f screen
service nscd force-reload

Even thou I don't know why nscd force-reload is needed. I thought groupadd  properly notify 
nscd that it needs to flush its cache.(#186803)

Comment 4 Radek Bíba 2006-05-30 10:06:50 UTC

Peter, are you sure that this is a clean solution? If nscd isn't running,
installation spits:

# rpm -ihv /usr/src/redhat/RPMS/i386/screen-4.0.2-13.i386.rpm
Preparing...                ########################################### [100%]
Reloading nscd: [FAILED]
   1:screen                 ########################################### [100%]

This is pretty expected but shoudn't be printed.

Comment 5 Jakub Jelinek 2006-05-30 11:47:53 UTC

service nscd force-reload is of course the wrong thing to do.
[ -x /usr/sbin/nscd ] && /usr/sbin/nscd -i group
is much better, but still, it is groupadd duty to notify nscd.
shadow-utils already has in lib/nscd.c nscd_flush_cache routine which should
do the same thing as nscd -i, but there was a bug in it (#186803).
So, the question is, was this problem seen also with shadow-utils-4.0.14-6.FC5
(or later)?

Comment 6 Peter Vrabec 2006-05-30 12:05:29 UTC

Yes it was.  I have compared nscd_flush_cache() routine with nscd -i and it seems to be 
equal.

Comment 7 Jakub Jelinek 2006-05-30 12:19:07 UTC

Another problem could be that nscd_flush_cache in e.g. useradd.c or groupadd.c
is called before close_files.  That is before fclose on the /etc/group in
this case, so the addition to /etc/group might very well be just cached and
not written to disk yet.  So, you either need fflush on all the open files
not written yet, or call nscd_flush_cache after close_files.

Comment 8 Peter Vrabec 2006-05-30 13:28:26 UTC

Created attachment 130228 [details]
strace output

I have switched that lines, but it doesn't help.
# strace -fo log rpm -i screen-4.0.2-12.i386.rpm
warning: group screen does not exist - using root
warning: group screen does not exist - using root

@@ -558,14 +558,15 @@
		find_new_gid ();

	grp_update ();
-	nscd_flush_cache ("group");

	close_files ();
+	nscd_flush_cache ("group");

Comment 9 Jakub Jelinek 2006-05-30 15:50:51 UTC

I think the shadow-utils patch is needed, but there is also a glibc bug and
further shadow-utils changes are needed too.
The problem on the glibc side (well, nscd) is that nscd -i just writes the
request to the socket and doesn't wait for an ack that the cache has been
invalidated (or at least the invalidation initiated in a way that clients will
not use the old cached data).
The changes on shadow-utils side will need to be:
1) spawning /usr/sbin/nscd -i {passwd,group} rather than doing the INVALIDATE
by hand - INVALIDATE is nscd private request and while nscd can rely on the same
nscd version already running (if any) by restarting nscd in its scripts,
shadow-utils can't.

Comment 10 Jakub Jelinek 2006-05-30 16:41:51 UTC

http://sources.redhat.com/ml/libc-hacker/2006-05/msg00023.html

Comment 11 Peter Vrabec 2006-05-31 09:56:47 UTC

Created attachment 130259 [details]
shadow-utils nscd.c candidate

What do u think about that, Jakub?

Comment 12 Ulrich Drepper 2006-06-03 16:22:39 UTC

Re comment #11: you don't have error checking.  fork and fail, execl can fail,
waitpid can return EINTR.  It might be better to use posix_spawn() instead of
fork+exec and use TEMP_FAILURE_RETRY around the waitpid call and compare the
return value with the child PID.

Also, does the rest of the code already use perror?  If yes, fine, if not do
what the rest does.

And a nit: always add const where possible.  The parameter for the functions
should be const char *.

Comment 13 Peter Vrabec 2006-06-04 19:42:02 UTC

Created attachment 130458 [details]
nscd.c candidate #2

Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about
this.

Comment 14 kloczek 2006-06-04 19:59:55 UTC

(In reply to comment #9)
> I think the shadow-utils patch is needed, but there is also a glibc bug and
> further shadow-utils changes are needed too.

shadow 4.0.2 was released more than four years ago and bug with not flushing
nscd cache was fixed more than year ago in shadow source tree in released shadow
4.0.11  (btw: tmorrow will be released 4.0.16).

Comment 15 kloczek 2006-06-04 20:05:38 UTC

(In reply to comment #13)
> Created an attachment (id=130458) [edit]
> nscd.c candidate #2
> 
> Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about
> this.

Flushing nscd cache by run "nscd -i <service>" is overkill. nscd provides way
for flushing map cache by "talk" to nscd socket and this method is used more
than year by shadow tools (look at http://cvs.pld.org.pl/shadow/nscd.c for how
it is used now).

Comment 16 Ulrich Drepper 2006-06-04 20:15:26 UTC

> Flushing nscd cache by run "nscd -i <service>" is overkill.

No, it's not.  Nobody is allowed to use that socket outside of glibc.  The
protocol is private.  That's the whole reason for the problem.  Stop commenting
on things you have no clue about.

Comment 17 kloczek 2006-06-04 20:28:04 UTC

Any problems make this public ? (planed changes or so ?)
Why any glibc code also not uses flushing by "nscd -i <service>" (fo make more
bloated glibc code ?)

Comment 18 Jakub Jelinek 2006-06-05 16:34:38 UTC

Yes, there have been changes e.g. to fix this bug on the glibc side.
See http://sources.redhat.com/ml/libc-hacker/2006-05/msg00023.html
Not sure what bloated glibc code you are talking about, the only place
in glibc which flushes the nscd caches is nscd command with -i option.

Comment 19 Peter Vrabec 2006-06-14 11:43:52 UTC

shadow-utils-4.0.16-3 use "nscd -i"

Comment 20 Radek Bíba 2006-07-04 08:29:13 UTC

Thanks for the fix in rawhide. However, any chance the updated packages will make
in into FC5? There are another packages that scream when they're being installed
(including servers such as dovecot or bind) and it's pretty annoying to fix the
ownership issues manually.

Comment 21 Peter Vrabec 2006-07-10 09:10:03 UTC

(In reply to comment #20)
> Thanks for the fix in rawhide. However, any chance the updated packages will make
> in into FC5? There are another packages that scream when they're being installed
> (including servers such as dovecot or bind) and it's pretty annoying to fix the
> ownership issues manually.

shadow-utils update is useless until glibc is fixed in FC5

Comment 22 Jakub Jelinek 2006-08-08 13:37:57 UTC

That's actually not true.  Even the shadow-utils change alone will fix this
issue, because if glibc doesn't write back an ACK that the database has been
already invalidated, shadow-utils will wait on the read until it fails
(and it will fail as soon as the database is invalidated, as old glibc will
just close the socket in that case).

Note You need to log in before you can comment on or make changes to this bug.