Bug 35980
Summary: | in rc.sysinit, fsck can't find LABEL=/somefs only if run from initlog (see also 28768) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | j. alan eldridge <alane> | ||||||||
Component: | e2fsprogs | Assignee: | Florian La Roche <laroche> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | David Lawrence <dkl> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.1 | CC: | notting | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2001-04-16 00:57:51 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
j. alan eldridge
2001-04-15 20:14:37 UTC
OK, I can add another datapoint. If I run the fsck from initlog immediately after running it directly from rc.sysinit, the problem does not present itself. so it looks like there's some weird 1st time effect. maybe. Created attachment 15404 [details]
strace output from fscked up run of initlog -c "fsck ....."
I've got an strace of fsck being run by initlog from rc.sysinit (yes! really!) that shows the fsck-up in action. File is attached, but here's the salient bits: 218 open("/proc/partitions", O_RDONLY) = 7 218 fstat64(7, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 218 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000 218 _llseek(7, 0, [0], SEEK_SET) = 0 218 read(7, "major minor #blocks name r"..., 1024) = 1024 218 read(7, "1 8032468 hdb7 3 0 6 40 0 0 0"..., 1024) = 147 218 read(7, "", 1024) = 0 218 _llseek(7, 0, [0], SEEK_SET) = 0 218 read(7, "major minor #blocks name r"..., 1024) = 1024 218 open("/dev/hda5", O_RDONLY) = 8 [ ok, we're just getting superblocks now ... then, we go back and read a bit more from /proc/partitions, still on fd 7. nothing fscks with fd 7 in the meantime. ] 218 open("/dev/hdb6", O_RDONLY) = 8 218 lseek(8, 1024, SEEK_SET) = 1024 218 read(8, "\240\260\36\0rH=\0\342\234\0\0=\300&\0\207\252\36\0\0\0"..., 136)= 136 218 close(8) = 0 218 read(7, " 3 71 8032468 hdb7 3 0 6"..., 1024) = 156 218 open("/dev/71", O_RDONLY) = -1 ENOENT (No such file or directory) [ EXCUSE ME? DID YOU SAY "/dev/71"??? WTF IS THAT? ] Now just have to see where that's coming from. Is something fscking with fd 7, is /proc/partitions getting fscked up, or is the whole damn thing on drugz? The solution is at hand, and it has grave implications. The problem is that /proc/partitions is changing in between the read calls. Fsck reads the first 1K, acts on it (checking a bunch of partitions and *changing /proc/partitions in the process*), and then stdio gets the "rest" of the file, only things aren't "lined up" with where they used to be. This explains the first time effect. I don't have an explanation for why running fsck directly from rc.sysinit didn't provoke the bug, but I'm going to put that at a lower priority. I'll have an answer for that one later on. So what's the solution? Well, you shouldn't use stdio to read /proc/something-that-might-change. In fact, I think in general, it may not be safe to read anything out of /proc except as 1 (atomic) call to read. So, that's fine, I'll just stat the fscker to see how big a buffer I need and ... can't do that, as stat on /proc/anything returns a size of 0. Houston, we have a problem. So what's the fix? Errrr.... suggest, for now, using a generous helping of setvbuf() magic on the code in fsck that reads /proc/partitions. That'll take care of LABEL problems. For now. In fsck. Only. In general, this is a real problem that demands a general solution. E.g., mount reads /proc/partitions. Doing a general audit of all code for calls that reference /proc/* is going to suck real hard, but it is probably a good idea. Also, a libprocfs.a probably needs to be born, with code to read the files using atomic ops. It's kinda gross, but this bug is grosser. /* read_proc_file(sz, ppch) reads the file "/proc/" + sz into a buffer using a single read call; the ptr to the buffer is placed in *ppch, and the size of the file (as determined by reading to end) is returned. */ size_t read_proc_file(char *sz, char **ppch) { const size_t cbIncr = 1024; size_t cbBuf = cbIncr; int fd = -1; char *pch = 0; char *szFn = malloc(strlen(sz) + 7); strcat(strcpy(szFn, "/proc/"), sz); fd = open(szFn, O_RDONLY); free(szFn); if (fd < 0) { *ppch = 0; return -1; } pch = (char *)malloc(cbBuf ); for (;;) { int n = read(fd, pch, cbBuf); if (n < cbBuf) break; lseek(fd, 0, SEEK_SET); pch = (char *)realloc(pch, cbBuf += cbIncr); } if (n <= 0) { free(pch); pch = 0; } close(fd); *ppch = pch; return n; } Captains Log. Final entry. Found why running fsck without initlog as parent didn't bomb out. Basically, running initlog to exec fsck results in different activity on the filesystems, in particular / (since that's all that's mounted at first). It's entirely luck that the contents of /proc/partitions was lined up differently with respect to the 1K boundary. In particular, the second read started a bit "earlier" in the file, early enough that it brought in a newline after a couple of characters, making a malformed line for the loop in get_device_by_label.c. The big loop in uuidcache_init() short circuits on this bad line of text. The code is: if (sscanf (line, " %d %d %d %[^\n ]", &ma, &mi, &sz, ptname) != 4) continue; After the newline, the line for hdb7 starts over again. It's all in the attachment after this note. A patch will be attached later tonight or tomorrow. Created attachment 15405 [details]
trace files, contents of /proc/partitions
Created attachment 15406 [details]
new spec file + patch file
assigning to e2fsprogs fix in 1.19-5 or newer Thanks. |