Bug 35980

Summary: in rc.sysinit, fsck can't find LABEL=/somefs only if run from initlog (see also 28768)
Product: [Retired] Red Hat Linux Reporter: j. alan eldridge <alane>
Component: e2fsprogsAssignee: Florian La Roche <laroche>
Status: CLOSED RAWHIDE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: notting
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-04-16 00:57:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace output from fscked up run of initlog -c "fsck ....."
none
trace files, contents of /proc/partitions
none
new spec file + patch file none

Description j. alan eldridge 2001-04-15 20:14:37 UTC
Using e2fsprogs-1.19-3 and initscripts-5.64.2-1.

Conditions:

ONLY at system startup (in rc.sysinit)
ONLY when initlog -c "fsck -T -A -R -a $fsckoptions" is used

fsck will fail to find a filesystem to check, saying "Couldn't find
matching filesystem: LABEL=/mswin". (No, that's not a winbloze fs, it's
just another ext2 filesystem).

If fsck -T -A -R -a $fsckoptions is run directly from rc.sysinit, without
involving initlog, the error does not happen.

This is 100% reproducible, 110% maddening, and 120% pissing me off because
it's just damn near impossible to diagnose further. The conditions are just
too bizarre. 

We're two or three levels of fork/exec down the chain, we're in system
startup, we've got nothing mounted yet except /, and that's the only
conditions that make it occur. Fsck.

Comment 1 j. alan eldridge 2001-04-15 21:12:28 UTC
OK, I can add another datapoint.

If I run the fsck from initlog immediately after running it directly from
rc.sysinit, the problem does not present itself.

so it looks like there's some weird 1st time effect. maybe.

Comment 2 j. alan eldridge 2001-04-15 22:14:28 UTC
Created attachment 15404 [details]
strace output from fscked up run of initlog -c "fsck ....."

Comment 3 j. alan eldridge 2001-04-15 22:21:47 UTC
I've got an strace of fsck being run by initlog from rc.sysinit (yes! really!)
that shows the fsck-up in action. File is attached, but here's the salient bits:
218   open("/proc/partitions", O_RDONLY) = 7
218   fstat64(7, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
218   mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40017000
218   _llseek(7, 0, [0], SEEK_SET)      = 0
218   read(7, "major minor  #blocks  name     r"..., 1024) = 1024
218   read(7, "1    8032468 hdb7 3 0 6 40 0 0 0"..., 1024) = 147
218   read(7, "", 1024)                 = 0
218   _llseek(7, 0, [0], SEEK_SET)      = 0
218   read(7, "major minor  #blocks  name     r"..., 1024) = 1024
218   open("/dev/hda5", O_RDONLY)       = 8

[ ok, we're just getting superblocks now ... then, we go back and read a bit
more from /proc/partitions, still on fd 7. nothing fscks with fd 7 in the
meantime. ]
218   open("/dev/hdb6", O_RDONLY)       = 8
218   lseek(8, 1024, SEEK_SET)          = 1024
218   read(8, "\240\260\36\0rH=\0\342\234\0\0=\300&\0\207\252\36\0\0\0"...,
136)= 136
218   close(8)                          = 0
218   read(7, "   3    71    8032468 hdb7 3 0 6"..., 1024) = 156
218   open("/dev/71", O_RDONLY)         = -1 ENOENT (No such file or directory)

[ EXCUSE ME? DID YOU SAY "/dev/71"??? WTF IS THAT? ]

Now just have to see where that's coming from. Is something fscking with fd 7,
is /proc/partitions getting fscked up, or is the whole damn thing on drugz? 



Comment 4 j. alan eldridge 2001-04-15 23:06:43 UTC
The solution is at hand, and it has grave implications. The problem is that
/proc/partitions is changing in between the read calls.  Fsck reads the first
1K, acts on it (checking a bunch of partitions and *changing /proc/partitions in
the process*), and then stdio gets the "rest" of the file, only things aren't
"lined up" with where they used to be.

This explains the first time effect. I don't have an explanation for why running
fsck directly from rc.sysinit didn't provoke the bug, but I'm going to put that
at a lower priority. I'll have an answer for that one later on.

So what's the solution? Well, you shouldn't use stdio to read
/proc/something-that-might-change. In fact, I think in general, it may not be
safe to read anything out of /proc except as 1 (atomic) call to read. So, that's
fine, I'll just stat the fscker to see how big a buffer I need and ... can't do
that, as stat on /proc/anything returns a size of 0. 

Houston, we have a problem.

So what's the fix? Errrr.... suggest, for now, using a generous helping of
setvbuf() magic on the code in fsck that reads /proc/partitions. That'll take
care of LABEL problems. For now. In fsck. Only.

In general, this is a real problem that demands a general solution. E.g., mount
reads /proc/partitions. Doing a general audit of all code for calls that
reference /proc/* is going to suck real hard, but it is probably a good idea.
Also, a libprocfs.a probably needs to be born, with code to read the files using
atomic ops. It's kinda gross, but this bug is grosser.

/*
	read_proc_file(sz, ppch)

	reads the file "/proc/" + sz into a buffer using a single read call;
	the ptr to the buffer is placed in *ppch, and the size of the file (as
	determined by reading to end) is returned.
*/

size_t read_proc_file(char *sz, char **ppch)
{
	const size_t cbIncr = 1024;
	size_t cbBuf = cbIncr;
	int fd = -1;
	char *pch = 0;
	char *szFn = malloc(strlen(sz) + 7);
	
	strcat(strcpy(szFn, "/proc/"), sz);
	fd = open(szFn, O_RDONLY);
	free(szFn);
	if (fd < 0) {
		*ppch = 0; return -1;
	}
	pch = (char *)malloc(cbBuf );
	for (;;) {
		int n = read(fd, pch, cbBuf);
		if (n < cbBuf) break;
		lseek(fd, 0, SEEK_SET);
		pch = (char *)realloc(pch, cbBuf += cbIncr);
	}
	if (n <= 0) {
		free(pch); pch = 0;
	}
	close(fd);
	*ppch = pch; return n;
}
		


Comment 5 j. alan eldridge 2001-04-15 23:39:19 UTC
Captains Log. Final entry.

Found why running fsck without initlog as parent didn't bomb out. Basically,
running initlog to exec fsck results in different activity on the filesystems,
in particular / (since that's all that's mounted at first). It's entirely luck
that the contents of /proc/partitions was lined up differently with respect to
the 1K boundary.

In particular, the second read started a bit "earlier" in the file, early enough
that it brought in a newline after a couple of characters, making a malformed
line for the loop in get_device_by_label.c.
The big loop in uuidcache_init() short circuits on this bad line of text. The
code is:

	if (sscanf (line, " %d %d %d %[^\n ]",
		    &ma, &mi, &sz, ptname) != 4)
		continue;

After the newline, the line for hdb7 starts over again.  It's all in the
attachment after this note. A patch will be attached later tonight or tomorrow.




Comment 6 j. alan eldridge 2001-04-15 23:40:17 UTC
Created attachment 15405 [details]
trace files, contents of /proc/partitions

Comment 7 j. alan eldridge 2001-04-16 00:53:34 UTC
Created attachment 15406 [details]
new spec file + patch file

Comment 8 Bill Nottingham 2001-04-16 00:57:23 UTC
assigning to e2fsprogs

Comment 9 Florian La Roche 2001-05-15 16:05:41 UTC
fix in 1.19-5 or newer

Thanks.