190496 – rpm --install hangs while walking mtab

Bug 190496 - rpm --install hangs while walking mtab

Summary: rpm --install hangs while walking mtab

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Panu Matilainen
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-05-02 22:04 UTC by Jesse Zbikowski
Modified:	2010-06-15 03:17 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-06-10 08:18:37 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Code hangs calling stat() on a stale nfs mountpoint. (360 bytes, text/plain) 2006-05-09 23:44 UTC, Jesse Zbikowski	no flags	Details
Successful run of rpm -i (382.82 KB, text/plain) 2010-06-10 07:59 UTC, Richard W.M. Jones	no flags	Details
rpm -i hangs on stat when the NFS server is unreachable (65.81 KB, text/plain) 2010-06-10 07:59 UTC, Richard W.M. Jones	no flags	Details
Show Obsolete (1) View All

Description Jesse Zbikowski 2006-05-02 22:04:04 UTC

Description of problem:

"rpm -i" hangs if there is a filesystem mount listed in /etc/mtab which is
inaccessible, even if that filesystem is not used.  For example, if there is a
remote NFS share which has gone down, rpm -i will fail.  rpm either should not
be walking /etc/mtab and checking irrelevant mountpoints, or it should be able
to timeout inaccessible filesystems.

Version-Release number of selected component (if applicable):

Verified on Fedora Core 5 (rpm 4.4.2-15.2) and RedHat 9 (rpm 4.2-0.69).

How reproducible:

Mount a remote filesystem via NFS.  Stop nfsd on the remote system.  Attempt to
install a local rpm.

Steps to Reproduce:
1. mount remote:/dir /mnt/point
2. ssh remote "/etc/init.d/nfs stop"
3. rpm -i foo.rpm
  
Actual results:

rpm -i hangs in stat64("/mnt/point")

Expected results:

rpm -i should succeed.

Additional info:

strace output of "rpm -i".  /mnt/fc5 is the location of my .rpm file; /mnt/iso
is the stale nfs mountpoint.

getcwd("/mnt/fc5/Fedora/RPMS", 128)     = 21
time(NULL)                              = 1146605416
open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 4
futex(0xa4eadc, FUTEX_WAKE, 2147483647) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=539, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7b68000
read(4, "/dev/sda5 / ext3 rw 0 0\nproc /pr"..., 4096) = 539
stat64("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
stat64("/sys", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/dev/pts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/altRoot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/boot", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
stat64("/dev/shm", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=40, ...}) = 0
stat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat64("/proc/sys/fs/binfmt_misc", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/var/lib/nfs/rpc_pipefs", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/net", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat64("/mnt/fc5", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
stat64("/mnt/iso",

At this point the rpm process must be killed with SIGKILL:

 0xbfa2ea7c)          = ? ERESTARTSYS (To be restarted)
+++ killed by SIGKILL +++
Process 6400 detached

Comment 1 Jeff Johnson 2006-05-02 23:04:44 UTC

FWIW, the stat is attempting to avoid stale mounts (when ESTALE is returned) and is necessary
to acquire st_dev for the mounted file system in order to do disk accounting correctly per-mount

Comment 2 Jesse Zbikowski 2006-05-09 23:44:35 UTC

Created attachment 128825 [details]
Code hangs calling stat() on a stale nfs mountpoint.

Comment 3 Jesse Zbikowski 2006-05-09 23:47:36 UTC

It's not clear then why the stat is hanging on the stale mountpoint, if it
should be returning ESTALE.  In my case, everything is being installed to a
single partition (using rpm --root), so if there were an option to disable
per-mount accounting, that would solve my problem.  I still want to check free
space on the (known) target partition, so I can't use rpm --ignoresize.

The same hang can be triggered by

rpm -qp foo.rpm --qf '%{FSSIZES}'

I guess I could try to fake it by doing my own accounting with --qf
'%{FILESIZES}' but this number seems to be a little smaller.

I attach a small code snippet which reproduces this same behavior with stat() --
also, df does the same thing.  Is this a bug, and if so should it be moved up-
or down-stream?

Note, it takes about a minute after the nfs server stops before stat decides
something is wrong.

Comment 4 Jeff Johnson 2006-05-11 21:07:06 UTC

The hang is triggered by a stale NFS mount point. rpm is subject to
behavior dictated by glibc, the kernel, and other standards. Avoiding
a hard hang with ESTALE is not easily solved in general.

Comment 5 Jeff Johnson 2007-01-14 13:26:56 UTC

This is fixed in rpm-4.4.7 (at least) and later.

UPSTREAM

Comment 6 Panu Matilainen 2007-08-09 18:53:26 UTC

Fixed upstream in rpm.org too now...

FC5 is EOL, changing version to devel.

Comment 7 Panu Matilainen 2007-08-14 08:42:47 UTC

Fixed in next rawhide push, thanks for the patch Jeff.

Comment 8 Richard W.M. Jones 2010-06-10 07:57:51 UTC

This bug is still present in the latest RPM.

If an NFS mountpoint is inaccessible, then rpm -i does a stat(2) system
call which hangs.

I'm attaching two straces.  The first is from a successful run of rpm -i.  The
second is from a run of rpm -i where I have deliberately made an NFS
mountpoint unreachable.

Comment 9 Richard W.M. Jones 2010-06-10 07:59:11 UTC

Created attachment 422814 [details]
Successful run of rpm -i

Comment 10 Richard W.M. Jones 2010-06-10 07:59:51 UTC

Created attachment 422816 [details]
rpm -i hangs on stat when the NFS server is unreachable

Comment 11 Richard W.M. Jones 2010-06-10 08:00:38 UTC

RPM version 4.8.0-beta1

Comment 12 Panu Matilainen 2010-06-10 08:18:37 UTC

Dunno why you're using 4.8.0-beta1 at this point... 

Anyway, the issue has been really dealt with in rpm >= 4.8.0-8 in F13 and rawhide by only stat()'ing the filesystems the transaction will actually touch, so something like /home on NFS hanging wont cause rpm to hang unnecessarily.

Note You need to log in before you can comment on or make changes to this bug.