Description of problem: "rpm -i" hangs if there is a filesystem mount listed in /etc/mtab which is inaccessible, even if that filesystem is not used. For example, if there is a remote NFS share which has gone down, rpm -i will fail. rpm either should not be walking /etc/mtab and checking irrelevant mountpoints, or it should be able to timeout inaccessible filesystems. Version-Release number of selected component (if applicable): Verified on Fedora Core 5 (rpm 4.4.2-15.2) and RedHat 9 (rpm 4.2-0.69). How reproducible: Mount a remote filesystem via NFS. Stop nfsd on the remote system. Attempt to install a local rpm. Steps to Reproduce: 1. mount remote:/dir /mnt/point 2. ssh remote "/etc/init.d/nfs stop" 3. rpm -i foo.rpm Actual results: rpm -i hangs in stat64("/mnt/point") Expected results: rpm -i should succeed. Additional info: strace output of "rpm -i". /mnt/fc5 is the location of my .rpm file; /mnt/iso is the stale nfs mountpoint. getcwd("/mnt/fc5/Fedora/RPMS", 128) = 21 time(NULL) = 1146605416 open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 4 futex(0xa4eadc, FUTEX_WAKE, 2147483647) = 0 fstat64(4, {st_mode=S_IFREG|0644, st_size=539, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7b68000 read(4, "/dev/sda5 / ext3 rw 0 0\nproc /pr"..., 4096) = 539 stat64("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 stat64("/sys", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 stat64("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 stat64("/altRoot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0 stat64("/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) = 0 stat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat64("/proc/sys/fs/binfmt_misc", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 stat64("/var/lib/nfs/rpc_pipefs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 stat64("/mnt/fc5", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0 stat64("/mnt/iso", At this point the rpm process must be killed with SIGKILL: 0xbfa2ea7c) = ? ERESTARTSYS (To be restarted) +++ killed by SIGKILL +++ Process 6400 detached
FWIW, the stat is attempting to avoid stale mounts (when ESTALE is returned) and is necessary to acquire st_dev for the mounted file system in order to do disk accounting correctly per-mount
Created attachment 128825 [details] Code hangs calling stat() on a stale nfs mountpoint.
It's not clear then why the stat is hanging on the stale mountpoint, if it should be returning ESTALE. In my case, everything is being installed to a single partition (using rpm --root), so if there were an option to disable per-mount accounting, that would solve my problem. I still want to check free space on the (known) target partition, so I can't use rpm --ignoresize. The same hang can be triggered by rpm -qp foo.rpm --qf '%{FSSIZES}' I guess I could try to fake it by doing my own accounting with --qf '%{FILESIZES}' but this number seems to be a little smaller. I attach a small code snippet which reproduces this same behavior with stat() -- also, df does the same thing. Is this a bug, and if so should it be moved up- or down-stream? Note, it takes about a minute after the nfs server stops before stat decides something is wrong.
The hang is triggered by a stale NFS mount point. rpm is subject to behavior dictated by glibc, the kernel, and other standards. Avoiding a hard hang with ESTALE is not easily solved in general.
This is fixed in rpm-4.4.7 (at least) and later. UPSTREAM
Fixed upstream in rpm.org too now... FC5 is EOL, changing version to devel.
Fixed in next rawhide push, thanks for the patch Jeff.
This bug is still present in the latest RPM. If an NFS mountpoint is inaccessible, then rpm -i does a stat(2) system call which hangs. I'm attaching two straces. The first is from a successful run of rpm -i. The second is from a run of rpm -i where I have deliberately made an NFS mountpoint unreachable.
Created attachment 422814 [details] Successful run of rpm -i
Created attachment 422816 [details] rpm -i hangs on stat when the NFS server is unreachable
RPM version 4.8.0-beta1
Dunno why you're using 4.8.0-beta1 at this point... Anyway, the issue has been really dealt with in rpm >= 4.8.0-8 in F13 and rawhide by only stat()'ing the filesystems the transaction will actually touch, so something like /home on NFS hanging wont cause rpm to hang unnecessarily.