Description of problem: If nscd is running and clean installation (i.e. not an update) of screen is performed, group screen isn't created at the right moment and /usr/bin/screen and /var/run/screen are saved with wrong access rights. screen then refuses to start. If I uninstall screen (and also remove his group and /var/run/screen, which isn't removed by uninstall scriptlets) and install it again with nscd turned off, it passes. I was told that nscd is to blame, not screen itself. Version-Release number of selected component (if applicable): nscd-2.4-4, screen-4.0.2-12 How reproducible: Always, well, I've tried it on one box only so far Steps to Reproduce: 1. make sure you don't have screen installed at all (no group screen, no /var/run/screen) 2. start nscd (unless it's already running) 3. install screen (using yum or plain rpm) Actual results: # yum install screen ... Running Transaction Installing: screen [1/1]warning: group screen does not exist - using root Installing: screen ######################## [1/1]warning: group screen does not exist - using root Installing: screen ######################### [1/1] Installed: screen.i386 0:4.0.2-12 Complete! # screen Directory '/var/run/screen' must have mode 777. # rpm -V screen .M....G.. /usr/bin/screen ......G.. /var/run/screen # grep screen /etc/group screen:x:84: ( ^ which means that group screen is created at some moment) Expected results: success Additional info: my nscd.conf looks good, I guess: (rpm -V nscd doesn't shout) check-files group yes
This cannot be expected to work otherwise. nscd is monitoring the change to /etc/group etc but this isn't done for every access. Otherwise there'd be no acceleration. What has to happen is that nscd is told to prune the cache using the -i parameter. So, who's adding the IDs? rpm? Whatever program is responsible should run /usr/sbin/nscd -i passwd after adding the UID to /etc/passwd and /usr/sbin/nscd -i group after adding the GID to /etc/group. I'm reassigning the bug to rpm for now.
If nscd is now working from cache rather than backing store with a periodic (and imperfect) check of backing store for cache consistency (a good idea imho), then useradd should notify that, indeed, the nscd cache is invalid and should be recreated. That is a shadowutils, not an rpm, problem.
This problem is fixed by screen-4.0.2-13 %pre /usr/sbin/groupadd -g 84 -r -f screen service nscd force-reload Even thou I don't know why nscd force-reload is needed. I thought groupadd properly notify nscd that it needs to flush its cache.(#186803)
Peter, are you sure that this is a clean solution? If nscd isn't running, installation spits: # rpm -ihv /usr/src/redhat/RPMS/i386/screen-4.0.2-13.i386.rpm Preparing... ########################################### [100%] Reloading nscd: [FAILED] 1:screen ########################################### [100%] This is pretty expected but shoudn't be printed.
service nscd force-reload is of course the wrong thing to do. [ -x /usr/sbin/nscd ] && /usr/sbin/nscd -i group is much better, but still, it is groupadd duty to notify nscd. shadow-utils already has in lib/nscd.c nscd_flush_cache routine which should do the same thing as nscd -i, but there was a bug in it (#186803). So, the question is, was this problem seen also with shadow-utils-4.0.14-6.FC5 (or later)?
Yes it was. I have compared nscd_flush_cache() routine with nscd -i and it seems to be equal.
Another problem could be that nscd_flush_cache in e.g. useradd.c or groupadd.c is called before close_files. That is before fclose on the /etc/group in this case, so the addition to /etc/group might very well be just cached and not written to disk yet. So, you either need fflush on all the open files not written yet, or call nscd_flush_cache after close_files.
Created attachment 130228 [details] strace output I have switched that lines, but it doesn't help. # strace -fo log rpm -i screen-4.0.2-12.i386.rpm warning: group screen does not exist - using root warning: group screen does not exist - using root @@ -558,14 +558,15 @@ find_new_gid (); grp_update (); - nscd_flush_cache ("group"); close_files (); + nscd_flush_cache ("group");
I think the shadow-utils patch is needed, but there is also a glibc bug and further shadow-utils changes are needed too. The problem on the glibc side (well, nscd) is that nscd -i just writes the request to the socket and doesn't wait for an ack that the cache has been invalidated (or at least the invalidation initiated in a way that clients will not use the old cached data). The changes on shadow-utils side will need to be: 1) spawning /usr/sbin/nscd -i {passwd,group} rather than doing the INVALIDATE by hand - INVALIDATE is nscd private request and while nscd can rely on the same nscd version already running (if any) by restarting nscd in its scripts, shadow-utils can't.
http://sources.redhat.com/ml/libc-hacker/2006-05/msg00023.html
Created attachment 130259 [details] shadow-utils nscd.c candidate What do u think about that, Jakub?
Re comment #11: you don't have error checking. fork and fail, execl can fail, waitpid can return EINTR. It might be better to use posix_spawn() instead of fork+exec and use TEMP_FAILURE_RETRY around the waitpid call and compare the return value with the child PID. Also, does the rest of the code already use perror? If yes, fine, if not do what the rest does. And a nit: always add const where possible. The parameter for the functions should be const char *.
Created attachment 130458 [details] nscd.c candidate #2 Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about this.
(In reply to comment #9) > I think the shadow-utils patch is needed, but there is also a glibc bug and > further shadow-utils changes are needed too. shadow 4.0.2 was released more than four years ago and bug with not flushing nscd cache was fixed more than year ago in shadow source tree in released shadow 4.0.11 (btw: tmorrow will be released 4.0.16).
(In reply to comment #13) > Created an attachment (id=130458) [edit] > nscd.c candidate #2 > > Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about > this. Flushing nscd cache by run "nscd -i <service>" is overkill. nscd provides way for flushing map cache by "talk" to nscd socket and this method is used more than year by shadow tools (look at http://cvs.pld.org.pl/shadow/nscd.c for how it is used now).
> Flushing nscd cache by run "nscd -i <service>" is overkill. No, it's not. Nobody is allowed to use that socket outside of glibc. The protocol is private. That's the whole reason for the problem. Stop commenting on things you have no clue about.
Any problems make this public ? (planed changes or so ?) Why any glibc code also not uses flushing by "nscd -i <service>" (fo make more bloated glibc code ?)
Yes, there have been changes e.g. to fix this bug on the glibc side. See http://sources.redhat.com/ml/libc-hacker/2006-05/msg00023.html Not sure what bloated glibc code you are talking about, the only place in glibc which flushes the nscd caches is nscd command with -i option.
shadow-utils-4.0.16-3 use "nscd -i"
Thanks for the fix in rawhide. However, any chance the updated packages will make in into FC5? There are another packages that scream when they're being installed (including servers such as dovecot or bind) and it's pretty annoying to fix the ownership issues manually.
(In reply to comment #20) > Thanks for the fix in rawhide. However, any chance the updated packages will make > in into FC5? There are another packages that scream when they're being installed > (including servers such as dovecot or bind) and it's pretty annoying to fix the > ownership issues manually. shadow-utils update is useless until glibc is fixed in FC5
That's actually not true. Even the shadow-utils change alone will fix this issue, because if glibc doesn't write back an ACK that the database has been already invalidated, shadow-utils will wait on the read until it fails (and it will fail as soon as the database is invalidated, as old glibc will just close the socket in that case).