Bug 191464
Summary: | installation of screen(1) fails if nscd is running | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Radek Bíba <rbiba> | ||||||||
Component: | shadow-utils | Assignee: | Peter Vrabec <pvrabec> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | David Lawrence <dkl> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5 | CC: | drepper, jakub, kloczek, prockai | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2006-06-14 11:43:52 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 202454 | ||||||||||
Attachments: |
|
Description
Radek Bíba
2006-05-12 09:01:24 UTC
This cannot be expected to work otherwise. nscd is monitoring the change to /etc/group etc but this isn't done for every access. Otherwise there'd be no acceleration. What has to happen is that nscd is told to prune the cache using the -i parameter. So, who's adding the IDs? rpm? Whatever program is responsible should run /usr/sbin/nscd -i passwd after adding the UID to /etc/passwd and /usr/sbin/nscd -i group after adding the GID to /etc/group. I'm reassigning the bug to rpm for now. If nscd is now working from cache rather than backing store with a periodic (and imperfect) check of backing store for cache consistency (a good idea imho), then useradd should notify that, indeed, the nscd cache is invalid and should be recreated. That is a shadowutils, not an rpm, problem. This problem is fixed by screen-4.0.2-13 %pre /usr/sbin/groupadd -g 84 -r -f screen service nscd force-reload Even thou I don't know why nscd force-reload is needed. I thought groupadd properly notify nscd that it needs to flush its cache.(#186803) Peter, are you sure that this is a clean solution? If nscd isn't running, installation spits: # rpm -ihv /usr/src/redhat/RPMS/i386/screen-4.0.2-13.i386.rpm Preparing... ########################################### [100%] Reloading nscd: [FAILED] 1:screen ########################################### [100%] This is pretty expected but shoudn't be printed. service nscd force-reload is of course the wrong thing to do. [ -x /usr/sbin/nscd ] && /usr/sbin/nscd -i group is much better, but still, it is groupadd duty to notify nscd. shadow-utils already has in lib/nscd.c nscd_flush_cache routine which should do the same thing as nscd -i, but there was a bug in it (#186803). So, the question is, was this problem seen also with shadow-utils-4.0.14-6.FC5 (or later)? Yes it was. I have compared nscd_flush_cache() routine with nscd -i and it seems to be equal. Another problem could be that nscd_flush_cache in e.g. useradd.c or groupadd.c is called before close_files. That is before fclose on the /etc/group in this case, so the addition to /etc/group might very well be just cached and not written to disk yet. So, you either need fflush on all the open files not written yet, or call nscd_flush_cache after close_files. Created attachment 130228 [details]
strace output
I have switched that lines, but it doesn't help.
# strace -fo log rpm -i screen-4.0.2-12.i386.rpm
warning: group screen does not exist - using root
warning: group screen does not exist - using root
@@ -558,14 +558,15 @@
find_new_gid ();
grp_update ();
- nscd_flush_cache ("group");
close_files ();
+ nscd_flush_cache ("group");
I think the shadow-utils patch is needed, but there is also a glibc bug and further shadow-utils changes are needed too. The problem on the glibc side (well, nscd) is that nscd -i just writes the request to the socket and doesn't wait for an ack that the cache has been invalidated (or at least the invalidation initiated in a way that clients will not use the old cached data). The changes on shadow-utils side will need to be: 1) spawning /usr/sbin/nscd -i {passwd,group} rather than doing the INVALIDATE by hand - INVALIDATE is nscd private request and while nscd can rely on the same nscd version already running (if any) by restarting nscd in its scripts, shadow-utils can't. Created attachment 130259 [details]
shadow-utils nscd.c candidate
What do u think about that, Jakub?
Re comment #11: you don't have error checking. fork and fail, execl can fail, waitpid can return EINTR. It might be better to use posix_spawn() instead of fork+exec and use TEMP_FAILURE_RETRY around the waitpid call and compare the return value with the child PID. Also, does the rest of the code already use perror? If yes, fine, if not do what the rest does. And a nit: always add const where possible. The parameter for the functions should be const char *. Created attachment 130458 [details]
nscd.c candidate #2
Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about
this.
(In reply to comment #9) > I think the shadow-utils patch is needed, but there is also a glibc bug and > further shadow-utils changes are needed too. shadow 4.0.2 was released more than four years ago and bug with not flushing nscd cache was fixed more than year ago in shadow source tree in released shadow 4.0.11 (btw: tmorrow will be released 4.0.16). (In reply to comment #13) > Created an attachment (id=130458) [edit] > nscd.c candidate #2 > > Thanks for your comments, Ulrich. Let's see what Tomasz KÅoczko thinks about > this. Flushing nscd cache by run "nscd -i <service>" is overkill. nscd provides way for flushing map cache by "talk" to nscd socket and this method is used more than year by shadow tools (look at http://cvs.pld.org.pl/shadow/nscd.c for how it is used now). > Flushing nscd cache by run "nscd -i <service>" is overkill.
No, it's not. Nobody is allowed to use that socket outside of glibc. The
protocol is private. That's the whole reason for the problem. Stop commenting
on things you have no clue about.
Any problems make this public ? (planed changes or so ?) Why any glibc code also not uses flushing by "nscd -i <service>" (fo make more bloated glibc code ?) Yes, there have been changes e.g. to fix this bug on the glibc side. See http://sources.redhat.com/ml/libc-hacker/2006-05/msg00023.html Not sure what bloated glibc code you are talking about, the only place in glibc which flushes the nscd caches is nscd command with -i option. shadow-utils-4.0.16-3 use "nscd -i" Thanks for the fix in rawhide. However, any chance the updated packages will make in into FC5? There are another packages that scream when they're being installed (including servers such as dovecot or bind) and it's pretty annoying to fix the ownership issues manually. (In reply to comment #20) > Thanks for the fix in rawhide. However, any chance the updated packages will make > in into FC5? There are another packages that scream when they're being installed > (including servers such as dovecot or bind) and it's pretty annoying to fix the > ownership issues manually. shadow-utils update is useless until glibc is fixed in FC5 That's actually not true. Even the shadow-utils change alone will fix this issue, because if glibc doesn't write back an ACK that the database has been already invalidated, shadow-utils will wait on the read until it fails (and it will fail as soon as the database is invalidated, as old glibc will just close the socket in that case). |