Bug 190496 - rpm --install hangs while walking mtab
rpm --install hangs while walking mtab
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: rpm (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Panu Matilainen
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-02 18:04 EDT by Jesse Zbikowski
Modified: 2010-06-14 23:17 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-10 04:18:37 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Code hangs calling stat() on a stale nfs mountpoint. (360 bytes, text/plain)
2006-05-09 19:44 EDT, Jesse Zbikowski
no flags Details
Successful run of rpm -i (382.82 KB, text/plain)
2010-06-10 03:59 EDT, Richard W.M. Jones
no flags Details
rpm -i hangs on stat when the NFS server is unreachable (65.81 KB, text/plain)
2010-06-10 03:59 EDT, Richard W.M. Jones
no flags Details

  None (edit)
Description Jesse Zbikowski 2006-05-02 18:04:04 EDT
Description of problem:

"rpm -i" hangs if there is a filesystem mount listed in /etc/mtab which is
inaccessible, even if that filesystem is not used.  For example, if there is a
remote NFS share which has gone down, rpm -i will fail.  rpm either should not
be walking /etc/mtab and checking irrelevant mountpoints, or it should be able
to timeout inaccessible filesystems.

Version-Release number of selected component (if applicable):

Verified on Fedora Core 5 (rpm 4.4.2-15.2) and RedHat 9 (rpm 4.2-0.69).

How reproducible:

Mount a remote filesystem via NFS.  Stop nfsd on the remote system.  Attempt to
install a local rpm.

Steps to Reproduce:
1. mount remote:/dir /mnt/point
2. ssh remote "/etc/init.d/nfs stop"
3. rpm -i foo.rpm
  
Actual results:

rpm -i hangs in stat64("/mnt/point")

Expected results:

rpm -i should succeed.

Additional info:

strace output of "rpm -i".  /mnt/fc5 is the location of my .rpm file; /mnt/iso
is the stale nfs mountpoint.

getcwd("/mnt/fc5/Fedora/RPMS", 128)     = 21
time(NULL)                              = 1146605416
open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 4
futex(0xa4eadc, FUTEX_WAKE, 2147483647) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=539, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7b68000
read(4, "/dev/sda5 / ext3 rw 0 0\nproc /pr"..., 4096) = 539
stat64("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat64("/sys", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/altRoot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
stat64("/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) = 0
stat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc/sys/fs/binfmt_misc", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/var/lib/nfs/rpc_pipefs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/mnt/fc5", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
stat64("/mnt/iso",

At this point the rpm process must be killed with SIGKILL:

 0xbfa2ea7c)          = ? ERESTARTSYS (To be restarted)
+++ killed by SIGKILL +++
Process 6400 detached
Comment 1 Jeff Johnson 2006-05-02 19:04:44 EDT
FWIW, the stat is attempting to avoid stale mounts (when ESTALE is returned) and is necessary
to acquire st_dev for the mounted file system in order to do disk accounting correctly per-mount
Comment 2 Jesse Zbikowski 2006-05-09 19:44:35 EDT
Created attachment 128825 [details]
Code hangs calling stat() on a stale nfs mountpoint.
Comment 3 Jesse Zbikowski 2006-05-09 19:47:36 EDT
It's not clear then why the stat is hanging on the stale mountpoint, if it
should be returning ESTALE.  In my case, everything is being installed to a
single partition (using rpm --root), so if there were an option to disable
per-mount accounting, that would solve my problem.  I still want to check free
space on the (known) target partition, so I can't use rpm --ignoresize.

The same hang can be triggered by

rpm -qp foo.rpm --qf '%{FSSIZES}'

I guess I could try to fake it by doing my own accounting with --qf
'%{FILESIZES}' but this number seems to be a little smaller.

I attach a small code snippet which reproduces this same behavior with stat() --
also, df does the same thing.  Is this a bug, and if so should it be moved up-
or down-stream?

Note, it takes about a minute after the nfs server stops before stat decides
something is wrong.
Comment 4 Jeff Johnson 2006-05-11 17:07:06 EDT
The hang is triggered by a stale NFS mount point. rpm is subject to
behavior dictated by glibc, the kernel, and other standards. Avoiding
a hard hang with ESTALE is not easily solved in general.
Comment 5 Jeff Johnson 2007-01-14 08:26:56 EST
This is fixed in rpm-4.4.7 (at least) and later.

UPSTREAM
Comment 6 Panu Matilainen 2007-08-09 14:53:26 EDT
Fixed upstream in rpm.org too now...

FC5 is EOL, changing version to devel.
Comment 7 Panu Matilainen 2007-08-14 04:42:47 EDT
Fixed in next rawhide push, thanks for the patch Jeff.
Comment 8 Richard W.M. Jones 2010-06-10 03:57:51 EDT
This bug is still present in the latest RPM.

If an NFS mountpoint is inaccessible, then rpm -i does a stat(2) system
call which hangs.

I'm attaching two straces.  The first is from a successful run of rpm -i.  The
second is from a run of rpm -i where I have deliberately made an NFS
mountpoint unreachable.
Comment 9 Richard W.M. Jones 2010-06-10 03:59:11 EDT
Created attachment 422814 [details]
Successful run of rpm -i
Comment 10 Richard W.M. Jones 2010-06-10 03:59:51 EDT
Created attachment 422816 [details]
rpm -i hangs on stat when the NFS server is unreachable
Comment 11 Richard W.M. Jones 2010-06-10 04:00:38 EDT
RPM version 4.8.0-beta1
Comment 12 Panu Matilainen 2010-06-10 04:18:37 EDT
Dunno why you're using 4.8.0-beta1 at this point... 

Anyway, the issue has been really dealt with in rpm >= 4.8.0-8 in F13 and rawhide by only stat()'ing the filesystems the transaction will actually touch, so something like /home on NFS hanging wont cause rpm to hang unnecessarily.

Note You need to log in before you can comment on or make changes to this bug.