Bug 190496

Summary: rpm --install hangs while walking mtab
Product: [Fedora] Fedora Reporter: Jesse Zbikowski <embeddedlinuxguy>
Component: rpmAssignee: Panu Matilainen <pmatilai>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: liko, rjones
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-10 08:18:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
Code hangs calling stat() on a stale nfs mountpoint.
Successful run of rpm -i
rpm -i hangs on stat when the NFS server is unreachable none

Description Jesse Zbikowski 2006-05-02 22:04:04 UTC
Description of problem:

"rpm -i" hangs if there is a filesystem mount listed in /etc/mtab which is
inaccessible, even if that filesystem is not used.  For example, if there is a
remote NFS share which has gone down, rpm -i will fail.  rpm either should not
be walking /etc/mtab and checking irrelevant mountpoints, or it should be able
to timeout inaccessible filesystems.

Version-Release number of selected component (if applicable):

Verified on Fedora Core 5 (rpm 4.4.2-15.2) and RedHat 9 (rpm 4.2-0.69).

How reproducible:

Mount a remote filesystem via NFS.  Stop nfsd on the remote system.  Attempt to
install a local rpm.

Steps to Reproduce:
1. mount remote:/dir /mnt/point
2. ssh remote "/etc/init.d/nfs stop"
3. rpm -i foo.rpm
Actual results:

rpm -i hangs in stat64("/mnt/point")

Expected results:

rpm -i should succeed.

Additional info:

strace output of "rpm -i".  /mnt/fc5 is the location of my .rpm file; /mnt/iso
is the stale nfs mountpoint.

getcwd("/mnt/fc5/Fedora/RPMS", 128)     = 21
time(NULL)                              = 1146605416
open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 4
futex(0xa4eadc, FUTEX_WAKE, 2147483647) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=539, ...}) = 0
read(4, "/dev/sda5 / ext3 rw 0 0\nproc /pr"..., 4096) = 539
stat64("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat64("/sys", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/altRoot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
stat64("/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) = 0
stat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc/sys/fs/binfmt_misc", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/var/lib/nfs/rpc_pipefs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/mnt/fc5", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0

At this point the rpm process must be killed with SIGKILL:

 0xbfa2ea7c)          = ? ERESTARTSYS (To be restarted)
+++ killed by SIGKILL +++
Process 6400 detached

Comment 1 Jeff Johnson 2006-05-02 23:04:44 UTC
FWIW, the stat is attempting to avoid stale mounts (when ESTALE is returned) and is necessary
to acquire st_dev for the mounted file system in order to do disk accounting correctly per-mount

Comment 2 Jesse Zbikowski 2006-05-09 23:44:35 UTC
Created attachment 128825 [details]
Code hangs calling stat() on a stale nfs mountpoint.

Comment 3 Jesse Zbikowski 2006-05-09 23:47:36 UTC
It's not clear then why the stat is hanging on the stale mountpoint, if it
should be returning ESTALE.  In my case, everything is being installed to a
single partition (using rpm --root), so if there were an option to disable
per-mount accounting, that would solve my problem.  I still want to check free
space on the (known) target partition, so I can't use rpm --ignoresize.

The same hang can be triggered by

rpm -qp foo.rpm --qf '%{FSSIZES}'

I guess I could try to fake it by doing my own accounting with --qf
'%{FILESIZES}' but this number seems to be a little smaller.

I attach a small code snippet which reproduces this same behavior with stat() --
also, df does the same thing.  Is this a bug, and if so should it be moved up-
or down-stream?

Note, it takes about a minute after the nfs server stops before stat decides
something is wrong.

Comment 4 Jeff Johnson 2006-05-11 21:07:06 UTC
The hang is triggered by a stale NFS mount point. rpm is subject to
behavior dictated by glibc, the kernel, and other standards. Avoiding
a hard hang with ESTALE is not easily solved in general.

Comment 5 Jeff Johnson 2007-01-14 13:26:56 UTC
This is fixed in rpm-4.4.7 (at least) and later.


Comment 6 Panu Matilainen 2007-08-09 18:53:26 UTC
Fixed upstream in rpm.org too now...

FC5 is EOL, changing version to devel.

Comment 7 Panu Matilainen 2007-08-14 08:42:47 UTC
Fixed in next rawhide push, thanks for the patch Jeff.

Comment 8 Richard W.M. Jones 2010-06-10 07:57:51 UTC
This bug is still present in the latest RPM.

If an NFS mountpoint is inaccessible, then rpm -i does a stat(2) system
call which hangs.

I'm attaching two straces.  The first is from a successful run of rpm -i.  The
second is from a run of rpm -i where I have deliberately made an NFS
mountpoint unreachable.

Comment 9 Richard W.M. Jones 2010-06-10 07:59:11 UTC
Created attachment 422814 [details]
Successful run of rpm -i

Comment 10 Richard W.M. Jones 2010-06-10 07:59:51 UTC
Created attachment 422816 [details]
rpm -i hangs on stat when the NFS server is unreachable

Comment 11 Richard W.M. Jones 2010-06-10 08:00:38 UTC
RPM version 4.8.0-beta1

Comment 12 Panu Matilainen 2010-06-10 08:18:37 UTC
Dunno why you're using 4.8.0-beta1 at this point... 

Anyway, the issue has been really dealt with in rpm >= 4.8.0-8 in F13 and rawhide by only stat()'ing the filesystems the transaction will actually touch, so something like /home on NFS hanging wont cause rpm to hang unnecessarily.