Bug 761858 (GLUSTER-126)

Summary: Immediate segfault when used for rootfs
Product: [Community] GlusterFS Reporter: Gordan Bobic <gordan>
Component: coreAssignee: Anand Avati <aavati>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 2.0.3CC: amarts, chrisw, gluster-bugs, shehjart, vijay, vinayak
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Segfault strace
none
Truncated libglusterfs.so.0.0.0
none
The good version of libglusterfs.so.0.0.0 from my package
none
root volume spec from node 1
none
root volume spec from node 2 none

Description Anand Avati 2009-07-09 21:22:27 UTC
The strace looks as though glusterfsd is crashing at a very very early stage, can you please confirm that the binaries and libraries and built alright? Is there a core dump available?

Comment 1 Gordan Bobic 2009-07-10 00:16:25 UTC
2.0.3 segfaults when used for rootfs. 2.0.2 works fine. Attached is the strace of glusterfsd 2.0.3 as it segfaults. Crash occurs regardless of what parameters it is run with (strace attached is for a run without any parameters).

glusterfsd is from the RPM package built using:
rpmbuild -tb glusterfs-2.0.3.tar.gz
on CentOS 5.3, kernel 2.6.18-128.1.16.el5

I can provide an initrd that causes the fault, but it is about 50MB. Please advise if I should attach it here (I'm concerned it may be too big).

Comment 2 Shehjar Tikoo 2009-07-10 03:41:50 UTC
git-bisect might help here.

Comment 3 Gordan Bobic 2009-07-10 08:31:38 UTC
Here are links to the grub.conf, kernel, and initrd used:

grub.conf: http://sentinel1.shatteredsilicon.net/grub.conf

kernel: http://sentinel1.shatteredsilicon.net/vmlinuz-2.6.18-128.1.16.el5

initrd: http://sentinel1.shatteredsilicon.net/initrd_sr_segfault-2.6.18-128.1.16.el5-glfs-2.0.3.img

The initrd is about 50MB, so I thought attaching it here would be ill advised, hence why I'm providing links.


The binary packages seemed to build OK and they seem to work outside initrd.

Once the boot fails with the initrd, it'll drop to a prompt. At this point you can try running glusterfsd and it will segfault. If you run this on a machine (virtual or otherwise) with tools already installed you can mount an ext3 partition with whatever debug tools are required and run them off that (may need to set the LD_LIBRARY_PATH so libs are found on the tools partition).

### WARNING ###
The initrd will dump it's contents to /dev/md1 and switchroot itself to it so it can deallocate it's memory. It assumes /boot on /dev/md0 (shouldn't be relevant) and final root on /dev/md2 (not really relevant). Only run this in a VM or a machine that you can afford to lose the contents of, just in case something weird happens and it attempts to extract contents to a file system you need to keep!

Please let me know if you require anything else to track this down.

Comment 4 Anand Avati 2009-07-12 04:55:48 UTC
Gordan, I ran the kernel and initrd images you provided in qemu. I got a shell with a "comoonics> " prompt. glusterfsd seems to run fine in there. Am I missing some step?

Avati

Comment 5 Gordan Bobic 2009-07-12 06:05:24 UTC
Something strange seems to be happening. I had this issue re-created on two different machines, with separately built initrds. The original segfaulting initrd got deleted when I rolled back, so the attached one was a rebuilt one. I just rebuilt the initrds on all my test nodes, and the problem has disappeared!

Since there is no longer a reproducible test case, I guess the bug might as well be closed. Sorry, I know just how frustrating this is. :-(

Comment 6 Gordan Bobic 2009-07-14 15:54:15 UTC
I may have found a clue as to what is happening.
I noticed on my gluster-root cluster that after a recent rpm package reinstall, that /usr/lib64/libglusterfs.so.0.0.0 was truncated! ldconfig was reporting it. A quick check revealed that the file was smaller than the file in the rpm package. Re-installing the package fixed the problem, but this is still quite disturbing. The rpm upgrade was carried out while the system was running 2.0.2. If this happened during the install for my old initrd (the one that was segfaulting), that would explain why it would fail as soon as it linked against the library.

It also explains why the full init instances would work (they seem to link against the already mmap()-ed version, rather than the on-disk version, and the initrd was already providing an (older - 2.0.2) libglusterfs.

The worrying part here is that it implies that there was some kind of a truncation bug intermittently present as recently as 2.0.2.

If this was indeed the cause (I'm taking a guess here), then perhaps this bug should be relabeled as a truncation bug of some sort?

Comment 7 Anand Avati 2009-07-14 17:01:45 UTC
Is this 2.0.2->upgrade truncation reproducible?

Comment 8 Gordan Bobic 2009-07-14 21:50:09 UTC
It isn't reliably reproducible. I must have tried to recreate it a dozen times by now, and the only two times that it actually occurred were when I wasn't paying attention (which makes it also the two times it bit me).

Comment 9 Anand Avati 2009-07-14 21:52:46 UTC
> It isn't reliably reproducible. I must have tried to recreate it a dozen times
> by now, and the only two times that it actually occurred were when I wasn't
> paying attention (which makes it also the two times it bit me).

Are you certain of the availability of disk space in your image at the time of upgrade?

Avati

Comment 10 Gordan Bobic 2009-07-14 21:59:26 UTC
Last time this occurred on the root file system, which in this case measures 6 TB, of which about 5TB is free. The initrd could have theoretically gotten truncated in the original instance because I was building it into a tmpfs on /tmp, but I have done that before with all the other images and it never caused a problem (200MB initrd and 2GB of RAM). So even if it explains one instance, it doesn't explain the other where there was no space contention.

Note - this is just a freak example where it was the glusterfs library itself that seems to have gotten truncated. I have no easy way to check if anything else suffered similar damage, but it hasn't happened to any of the other libraries (or ldconfig would have complained about those, too). So the issue does appear to be rare. And possibly only be happening during the original write/overwrite. I also hadn't checked if the file was literally truncated or more extensively corrupted. It didn't occur to me to save the broken copies until after I'd replaced them.

Comment 11 Gordan Bobic 2009-07-24 06:11:52 UTC
I have just had this happen _again_, and it happened without any writes going to the files that got truncated!

The machine started up fine 9+ days ago
# uptime
 10:06:32 up 9 days, 14:25,  1 user,  load average: 0.24, 0.09, 0.03

and mounted all the file systems correctly.
# mount | grep gluster
glusterfs#/etc/glusterfs/root.vol on / type fuse (rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
glusterfs#/etc/glusterfs/root.vol on /cdsl.local type fuse (rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
/dev/md2 on /gluster type ext3 (rw,noatime,nodiratime,data=ordered)
/dev/md2 on /gluster type ext3 (rw,noatime,nodiratime,data=ordered)
glusterfs#/etc/glusterfs/home.vol on /home type fuse (rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
glusterfs#/etc/glusterfs/shared.vol on /shared type fuse (rw,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

The gluster packages haven't been updated since.

But now:

# glusterfsd
Segmentation fault

# strace glusterfsd
execve("/usr/sbin/glusterfsd", ["glusterfsd"], [/* 23 vars */]) = 0
brk(0)                                  = 0x18e0c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b3fbce56000
uname({sys="Linux", node="raiden.shatteredsilicon.net", ...}) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=54514, ...}) = 0
mmap(NULL, 54514, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b3fbce57000
close(3)                                = 0
open("/usr/lib64/libglusterfs.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\236\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=213368, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b3fbce65000
mmap(NULL, 2380816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b3fbce66000
mprotect(0x2b3fbce99000, 2093056, PROT_NONE) = 0
mmap(0x2b3fbd098000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x32000) = 0x2b3fbd098000
mmap(0x2b3fbd09a000, 70672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b3fbd09a000
close(3)                                = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

# ldconfig
ldconfig: file /usr/lib64/libglusterfs.so.0.0.0 is truncated
ldconfig: file /usr/lib64/libglusterfs.so.0 is truncated

The rootfs is gluster, and thus /usr/lib64 is on gluster (2.0.4).

I have attached the truncated libglusterfs.so.0.0.0, just in case it is useful for debug purposes.

The particularly worrying thing is that this means that the truncation can happen even when just _reading_ (and mmap()-ing since it is a dll) a file, no writes are required to cause truncation!

Comment 12 Gordan Bobic 2009-07-24 06:13:51 UTC
Created attachment 43 [details]
Source rpm for a fixed version of dhcpcd daemon

Comment 13 Gordan Bobic 2009-07-24 06:22:44 UTC
Created attachment 44 [details]
Original source + "gcc -E"'d versions thereof...

Comment 14 Gordan Bobic 2009-07-24 06:29:38 UTC
I have just added the good version of libglusterfs.so.0.0.0 from my package. I also did a diff by truncating the broken version to the length of the good version, and they still seem to differ.

This is looking suspiciously like some kind of a weird memory stomp that somehow ends up writing to a read-only mmap() area. The weird thing is that none of the other libraries have thus far been affected.

Comment 15 Gordan Bobic 2009-07-24 06:32:04 UTC
This gets weirder:

Broken file:
# ls -la libglusterfs.so.0.0.0

-rwxr-xr-x 1 root root 213368 Jul 12 10:48 libglusterfs.so.0.0.0

Good file from the package (re-installed glusterfs-common):
# ls -la libglusterfs.so.0.0.0 
-rwxr-xr-x 1 root root 211032 Jul 12 10:48 libglusterfs.so.0.0.0

The correct file is _smaller_???

Comment 16 Gordan Bobic 2009-07-26 07:52:53 UTC
Just managed to get the problem to re-occur while I was actually looking. Again, with ~/.viminfo. The contents of the file being saved end up in .viminfo. The file itself seens to  get saved correctly, but it's contents also clobber .viminfo.

I'm doing this on a shared root as root user, and /root is shared between the two machines.

It just happened when the two machines are editing a file with the same path. (Note: in this case those are different files, even though the path is the same, as /cdsl.local is bind-mounted from a different path and unshared files live under there, symlinked to the shared file system. e.g.
/etc/gluster -> /cdsl.local/etc/gluster
/cdsl.local bind mounted on /cluster/<nodeid>
so /etc/gluster/shared.vol on one machine isn't the same between the machines, but files have the same path). I am not sure if the same-path is relevant here - last time it didn't seem to be as it resulted from pine and vim being open at the same time and .pine-debug ended up being saved into .viminfo).

It cannot be reproduced reliably, either - it seems to happen randomly. It is starting to look like there is a FD pointer that gets leaked to a different operation on a different file, so a file can clobber another, whose pointer is still around.

I will attach my root.vol files from the two nodes that are running.

Comment 17 Gordan Bobic 2009-07-26 07:53:59 UTC
Created attachment 45 [details]
fix for an incorrect access to time structures (mostly Alpha)

Comment 18 Gordan Bobic 2009-07-26 07:54:19 UTC
Created attachment 46 [details]
Patch file for inn-2.2.1/nnrpd/newnews.c, fix missing ">" bug

Comment 19 Anand Avati 2009-07-26 13:23:07 UTC
> It cannot be reproduced reliably, either - it seems to happen randomly. It is
> starting to look like there is a FD pointer that gets leaked to a different
> operation on a different file, so a file can clobber another, whose pointer is
> still around.
> 
> I will attach my root.vol files from the two nodes that are running.

Did you have a chance to check if the "corruption" is consistent on the files all the 3 servers, or did just one of the copies get corrupted this way? Is it possible to turn on logging temporarily for the period of debugging this problem? The answer to the consistent corruption and logfile might throw some more light. We are investigating other possible causes.

Avati

Comment 20 Gordan Bobic 2009-07-26 21:34:18 UTC
I'd love to have logging enabled, but unfortunately it seems impossible until syslog support is implemented. Logging refuses to work in the initrd root. I don't know why, but the file never gets created.

I haven't checked the files on all servers. There are actually only two servers up, the third node doesn't exist yet. I've been having problems getting it to sync transparently from scratch, but that's a different bug report I'll file soon.

Comment 21 Anand Avati 2009-08-07 05:24:12 UTC
PATCH: http://patches.gluster.com/patch/943 in master (protocol/client: fixed registration of saved_fds)

Comment 22 Anand Avati 2009-08-07 05:24:29 UTC
PATCH: http://patches.gluster.com/patch/943 in release-2.0 (protocol/client: fixed registration of saved_fds)

Comment 23 Gordan Bobic 2009-08-07 11:23:42 UTC
Will this make it into the 2.0.6 release? I notice 2.0.5 never passed qa.

Comment 24 Vijay Bellur 2009-08-07 11:59:27 UTC
(In reply to comment #23)
> Will this make it into the 2.0.6 release? I notice 2.0.5 never passed qa.

Yes, this will be part of 2.0.6 release.

Comment 25 Vijay Bellur 2009-08-14 07:08:28 UTC
*** Bug 192 has been marked as a duplicate of this bug. ***

Comment 26 Vijay Bellur 2009-08-25 08:46:51 UTC
(In reply to comment #23)
> Will this make it into the 2.0.6 release? I notice 2.0.5 never passed qa.

Hi Gordan,

We believe this is fixed in 2.0.6. Please re-open if you happen to notice it again.

Thanks,
Vijay