Bug 190496 - rpm --install hangs while walking mtab
Summary: rpm --install hangs while walking mtab
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm   
(Show other bugs)
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Panu Matilainen
QA Contact:
URL:
Whiteboard:
Keywords: Reopened
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-05-02 22:04 UTC by Jesse Zbikowski
Modified: 2010-06-15 03:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-10 08:18:37 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Code hangs calling stat() on a stale nfs mountpoint. (360 bytes, text/plain)
2006-05-09 23:44 UTC, Jesse Zbikowski
no flags Details
Successful run of rpm -i (382.82 KB, text/plain)
2010-06-10 07:59 UTC, Richard W.M. Jones
no flags Details
rpm -i hangs on stat when the NFS server is unreachable (65.81 KB, text/plain)
2010-06-10 07:59 UTC, Richard W.M. Jones
no flags Details

Description Jesse Zbikowski 2006-05-02 22:04:04 UTC
Description of problem:

"rpm -i" hangs if there is a filesystem mount listed in /etc/mtab which is
inaccessible, even if that filesystem is not used.  For example, if there is a
remote NFS share which has gone down, rpm -i will fail.  rpm either should not
be walking /etc/mtab and checking irrelevant mountpoints, or it should be able
to timeout inaccessible filesystems.

Version-Release number of selected component (if applicable):

Verified on Fedora Core 5 (rpm 4.4.2-15.2) and RedHat 9 (rpm 4.2-0.69).

How reproducible:

Mount a remote filesystem via NFS.  Stop nfsd on the remote system.  Attempt to
install a local rpm.

Steps to Reproduce:
1. mount remote:/dir /mnt/point
2. ssh remote "/etc/init.d/nfs stop"
3. rpm -i foo.rpm
  
Actual results:

rpm -i hangs in stat64("/mnt/point")

Expected results:

rpm -i should succeed.

Additional info:

strace output of "rpm -i".  /mnt/fc5 is the location of my .rpm file; /mnt/iso
is the stale nfs mountpoint.

getcwd("/mnt/fc5/Fedora/RPMS", 128)     = 21
time(NULL)                              = 1146605416
open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 4
futex(0xa4eadc, FUTEX_WAKE, 2147483647) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=539, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7b68000
read(4, "/dev/sda5 / ext3 rw 0 0\nproc /pr"..., 4096) = 539
stat64("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat64("/sys", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/altRoot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
stat64("/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) = 0
stat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc/sys/fs/binfmt_misc", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/var/lib/nfs/rpc_pipefs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/mnt/fc5", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
stat64("/mnt/iso",

At this point the rpm process must be killed with SIGKILL:

 0xbfa2ea7c)          = ? ERESTARTSYS (To be restarted)
+++ killed by SIGKILL +++
Process 6400 detached

Comment 1 Jeff Johnson 2006-05-02 23:04:44 UTC
FWIW, the stat is attempting to avoid stale mounts (when ESTALE is returned) and is necessary
to acquire st_dev for the mounted file system in order to do disk accounting correctly per-mount

Comment 2 Jesse Zbikowski 2006-05-09 23:44:35 UTC
Created attachment 128825 [details]
Code hangs calling stat() on a stale nfs mountpoint.

Comment 3 Jesse Zbikowski 2006-05-09 23:47:36 UTC
It's not clear then why the stat is hanging on the stale mountpoint, if it
should be returning ESTALE.  In my case, everything is being installed to a
single partition (using rpm --root), so if there were an option to disable
per-mount accounting, that would solve my problem.  I still want to check free
space on the (known) target partition, so I can't use rpm --ignoresize.

The same hang can be triggered by

rpm -qp foo.rpm --qf '%{FSSIZES}'

I guess I could try to fake it by doing my own accounting with --qf
'%{FILESIZES}' but this number seems to be a little smaller.

I attach a small code snippet which reproduces this same behavior with stat() --
also, df does the same thing.  Is this a bug, and if so should it be moved up-
or down-stream?

Note, it takes about a minute after the nfs server stops before stat decides
something is wrong.


Comment 4 Jeff Johnson 2006-05-11 21:07:06 UTC
The hang is triggered by a stale NFS mount point. rpm is subject to
behavior dictated by glibc, the kernel, and other standards. Avoiding
a hard hang with ESTALE is not easily solved in general.

Comment 5 Jeff Johnson 2007-01-14 13:26:56 UTC
This is fixed in rpm-4.4.7 (at least) and later.

UPSTREAM

Comment 6 Panu Matilainen 2007-08-09 18:53:26 UTC
Fixed upstream in rpm.org too now...

FC5 is EOL, changing version to devel.

Comment 7 Panu Matilainen 2007-08-14 08:42:47 UTC
Fixed in next rawhide push, thanks for the patch Jeff.

Comment 8 Richard W.M. Jones 2010-06-10 07:57:51 UTC
This bug is still present in the latest RPM.

If an NFS mountpoint is inaccessible, then rpm -i does a stat(2) system
call which hangs.

I'm attaching two straces.  The first is from a successful run of rpm -i.  The
second is from a run of rpm -i where I have deliberately made an NFS
mountpoint unreachable.

Comment 9 Richard W.M. Jones 2010-06-10 07:59:11 UTC
Created attachment 422814 [details]
Successful run of rpm -i

Comment 10 Richard W.M. Jones 2010-06-10 07:59:51 UTC
Created attachment 422816 [details]
rpm -i hangs on stat when the NFS server is unreachable

Comment 11 Richard W.M. Jones 2010-06-10 08:00:38 UTC
RPM version 4.8.0-beta1

Comment 12 Panu Matilainen 2010-06-10 08:18:37 UTC
Dunno why you're using 4.8.0-beta1 at this point... 

Anyway, the issue has been really dealt with in rpm >= 4.8.0-8 in F13 and rawhide by only stat()'ing the filesystems the transaction will actually touch, so something like /home on NFS hanging wont cause rpm to hang unnecessarily.


Note You need to log in before you can comment on or make changes to this bug.