Bug 80644 - Leak / infinite loop issue with knfsd and htree enabled partitions
Summary: Leak / infinite loop issue with knfsd and htree enabled partitions
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Public Beta
Classification: Retired
Component: kernel
Version: phoebe
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 79578
TreeView+ depends on / blocked
 
Reported: 2002-12-29 12:43 UTC by Peter van Egdom
Modified: 2007-04-18 16:49 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-01-15 21:09:39 UTC
Embargoed:


Attachments (Terms of Use)

Description Peter van Egdom 2002-12-29 12:43:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021218

Description of problem:

I was preparing a NFS setup session for testing Phoebe network installation on a
older
CD-ROM-less PC.

What I did was copying each CD-ROM of Phoebe 8.0.92 to the directory
"/var/export/8.0.92"
with the following command :

 "cp -var /mnt/cdrom/RedHat/ /var/export/8.0.92/"

I then used "redhat-config-nfs" to setup this directory for nfs exporting.
The resulting exports file looks like this :

[root@powermate root]# cat /etc/exports
/var/export/8.0.92/      *(ro,sync)
[root@powermate root]# 

To test the NFS client / server, I mounted on the same machine serving this
directory with the following command :

"mount -t nfs 10.0.0.4:/var/export/8.0.92/ /mnt/nfs/"

Here's the output of mount :

[root@powermate root]# mount
/dev/hda5 on / type ext3 (rw)
none on /proc type proc (rw)
usbdevfs on /proc/bus/usb type usbdevfs (rw)
/dev/hda3 on /boot type ext3 (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /dev/shm type tmpfs (rw)
/dev/cdrom on /mnt/cdrom type iso9660 (ro,nosuid,nodev)
10.0.0.4:/var/export/8.0.92/ on /mnt/nfs type nfs (rw,addr=10.0.0.4)

NFS works, as seen with the command :
[root@powermate nfs]# cd /mnt/nfs/
[root@powermate nfs]# ll
totaal 4
drwxr-xr-x    4 root     root         4096 dec 20 19:50 RedHat
[root@powermate nfs]#

Then I tried a find, just for fun. But this was the result :

[root@powermate nfs]# find .
.
./RedHat
./RedHat/base
./RedHat/base/comps.rpm
./RedHat/base/comps.xml
./RedHat/base/hdlist
./RedHat/base/hdlist2
./RedHat/base/hdstg2.img
./RedHat/base/netstg2.img
./RedHat/base/stage2.img
./RedHat/base/TRANS.TBL
./RedHat/RPMS
[here]

Here ([here]) the find command just hangs a _very long time_ for this PC (512
MB, Pentium IV), and after a while finally the find continues it's output. *BUT*
it's in an infinite loop. 

During this loop the machine quickly became unresponsive to mouse interaction, I
managed
to fire up a vmstat in an xterm to look at the swap, and this was the output of
"vmstat 1" :

<snip>
 3  0  0  57764   6384   1364  55428    0  264     0   264  106   500  3 97  0
 2  0  0  58028   6380   1220  50596    4  472     4   472  111   375  3 97  0
 2  0  0  58292   6376    628  46224    0  444     0   444  111   361  2 98  0
 2  0  1  58556   6372    448  41400    0  264     0   264  106   325  1 99  0
 2  0  1  58820   6520    692  36156    0  272   276   272  177   439  1 99  0
 5  0  1  59072   6492    604  31220    0  256   972   256  198   528  6 94  0
 3  0  0  59084   6492    608  26404    0  496    36   496  210   491  5 95  0
 2  1  1  62220   6380    440  21668    0 5484   132  5500  451   565  0 100  0
 2  0  1  64156   6380    416  16680    0 2628    16  2716  323   338  6 94  0
 2  4  1  64300   6380    412  11796    0 2248   220  2244  511   426  7 93  0
 3  2  1  69048   6372    312   7400    0 6084   284  6108  470   381  3 89  8
 2  3  1  79660   6372    340   6876    4 11348   392 11384  400   528  5 95  0
 2  4  1  95368   6248    324   5448   28 16404   468 16412  442   394  1 72 27
 5  2  1 103828   6532    316   4584   40 8480   212  8484  331   449  1 30 69
 2  4  1 105984   6532    464   5484   72 2180  1672  2180  290   818  2 98  0
 3  6  1 110832   6516    564   6008   96 4856  1468  4856  289   584  4 96  0
 3  3  0 114928   6516    640   6636  152 4096  1016  4096  275   492  4 96  0
 2  4  0 116104   6388    680   7600  304 1208  1520  1208  277   608  4 63 33
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 3 10  0 116500   6400    684   8444  436  396  1420   396  254   618  1  0 99
 2  9  0 116632   6408    712   8944  496  132  1028   132  264   601  0  1 99
 2  6  0 116768   6412    704   9192  592  136   928   136  271   642  1  0 99
 2  3  0 116896   6416    696   9760  460  128  1016   176  251   560  0  0 100
 5  0  1 117276   6520    696   9888  448  388   932   388  266   611  1  0 99
 1  4  0 117292   6516    704  10960  356   16  1436    16  242   596  1  0 99
 2  2  0 119932   6400    548  11412  528 2644  1072  2644  288   574  1  1 98
 3  0  0 121104   6560    480  12220  428 1172  1272  1172  268   779 11  2 87
 3  0  0 121464   6520    472  12244  180  360   188   380  159  1225 95  5  0
 5  0  0 121604   6520    656  12016   12  140   288   140  166  1349 98  2  0
 4  0  0 121988   6540    636  12036   32  384    48   384  122  1141 98  2  0
 3  0  0 122176   6520    628  12108  104  188   176   188  165  1348 95  5  0
 2  0  0 122308   6544    632  12188   60  132   152   132  240  1894 98  2  0
 3  0  0 122308   6580    644  12248    4    0    80     0  176  1610 99  1  0
 2  0  0 122440   6580    636  12256    0  132     0   132  107  1200 95  5  0
<snip>

The swap usage kept on growing and growing and if I didn't do a CTRL-C of the
find the machine would have gone berserk.

To exclude the program "find" in this problem I used "ls" to checkout this
problem too. As seen in
the following example :

[peterve@powermate RPMS]$ pwd
/mnt/nfs/RedHat/RPMS
[peterve@powermate RPMS]$ ll
[here]

Also [here] continues to wait for a long time, so it's not "find". Also in this
case my machine
quickly became unresponsive, I decided to do CTRL-C.

Anyway, I don't know which program is at fault here. 
The kernel? The NFS client? The NFS daemon? Me?

I'll be happy to provide more information if needed.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. (reproduced twice) 
2. Copy RH CD's to some directory and export that directory.
3. On the same Phoebe PC's do a NFS mount to that directory.
4. "ls" and "find" freak out displaying the contents of the mount point.    

Actual Results:  

Some kind of infinite loop got triggered getting info from the filesystem.
During this infinite loop the machine got unresponsive and memory usage of mentioned
programs grew and grew until user did CTRL-C fearing his harddisk would smoke.

Expected Results:  

Contents of directory should be displayed normally.

Additional info:

Problem seen with the following versions (stock install of Phoebe 8.0.92) :

[root@powermate root]# cat /proc/version 
Linux version 2.4.20-2.2 (bhcompile.redhat.com) (gcc version
3.2.1 20021207 (Red Hat Linux 8.0 3.2.1-2)) #1 Fri Dec 20 13:02:59 EST 2002
[root@powermate root]# rpm -qa |grep -i nfs
redhat-config-nfs-1.0.2-2
nfs-utils-1.0.1-2.8
[root@powermate root]#

Comment 1 Peter van Egdom 2002-12-29 15:07:49 UTC
I tried a NFS install this midday from my older CD-ROM-less Pentium/75 to the P4.

I had not much progress at first, until I saw in "/var/log/messages" on my P4
the following line : 
powermate rpc.mountd: refused mount request from 10.0.0.1 for
/var/exports/8.0.92 (/): no expor
t entry

Hey, thats strange! This line appeared in the NFS server log *before* Anaconda
on the Pentium/75 was asking me about the IP-adress and directory location of
the server containing the RPM's.
And I did even explicitely mentioned the directory location "/var/export/8.0.92"
to Anaconda on the Pentium/75.

Extra-odd, because on the site : 

http://www.redhat.com/docs/manuals/linux/RHL-8.0-Manual/install-guide/s1-begininstall-net.html

there's an example which speaks about making a directory (I quote) "...directory
you create such as /export/8.0/" 

So it looks like there's a piece of hardcoded NFS path in Anaconda. 

Anyway, I renamed my "/var/export/8.0.92" directory to "/var/exports/8.0.92" on
the server.

After rebooting the P/75 (it did not time out searching for that directory which
did not exist yet)
I tried again. This time the logfile on the server had some positive messages :

Dec 29 15:06:52 powermate rpc.mountd: authenticated mount request from
10.0.0.1:613 for /var/exports/8.0.92 (/var/exports/8.0.92)
Dec 29 15:07:32 powermate last message repeated 2 times
Dec 29 15:08:52 powermate last message repeated 4 times
Dec 29 15:10:12 powermate last message repeated 4 times
Dec 29 15:11:13 powermate last message repeated 3 times
Dec 29 15:12:33 powermate last message repeated 4 times
Dec 29 15:13:53 powermate last message repeated 4 times
Dec 29 15:15:13 powermate last message repeated 4 times
Dec 29 15:16:33 powermate last message repeated 4 times
Dec 29 15:17:53 powermate last message repeated 4 times
Dec 29 15:19:13 powermate last message repeated 4 times
Dec 29 15:20:14 powermate last message repeated 3 times
Dec 29 15:21:34 powermate last message repeated 4 times
Dec 29 15:22:54 powermate last message repeated 4 times
Dec 29 15:24:14 powermate last message repeated 4 times
Dec 29 15:25:34 powermate last message repeated 4 times
Dec 29 15:26:54 powermate last message repeated 4 times
Dec 29 15:27:55 powermate last message repeated 3 times
Dec 29 15:29:15 powermate last message repeated 4 times
Dec 29 15:30:35 powermate last message repeated 4 times
Dec 29 15:31:55 powermate last message repeated 4 times
Dec 29 15:33:15 powermate last message repeated 4 times
Dec 29 15:33:35 powermate rpc.mountd: authenticated mount request from
10.0.0.1:613 for /var/exports/8.0.92 (/var/exports/8.0.92)

BUT it kept going on and on and on (a lot of flashing lights on my HUB). I
started a Ethereal session on the server and monitored for about 30 seconds the
network traffic, which resulted in a 37 MegaByte tcpdump file which contained *a
lot* of the following lines :

localhost.localdomain
/var/exports/8.0.92

This looks a lot like my 1st comment on this Bugzilla; a hosed NFS server
configuration on a stock 100% md5-ed fresh install of Phoebe 8.0.92.

Hmmm..


Comment 2 Bill Nottingham 2003-01-01 05:44:30 UTC
The server volume that saw the weird results were ext3 formatted with the 8.0.92
installer, yes? I'm guessing this may be an htree bug.

Comment 3 Peter van Egdom 2003-01-01 11:07:55 UTC
The following partitions are indeed created with the Anaconda 8.0.92 installer.
They are very probably, as noted in the release notes, HTree enabled by default.

/dev/hda5 on / type ext3 (rw)
/dev/hda3 on /boot type ext3 (rw)

Is it safe for me to do a test with disabling the HTree flag without corrupting
my data?


I quote from the release notes :

  "You can remove the HTree indexing feature from a filesystem by issuing
         the following command:

          tune2fs -O ^dir_index /dev/<filesystemdevice>

          You can then remove the indices from the directories by issuing the
          following command:

         e2fsck -fD /dev/<filesystemdevice>
     "


Comment 4 Peter van Egdom 2003-01-01 17:58:30 UTC
Some interesting news...

After backing up some of my data, I decided to remove the HTree indexing flags
from the partition which gave troubles with NFS on my system (see above).

I went to runlevel 1 (init 1) and entered the following commands.

tune2fs -O ^dir_index /dev/hda5
umount /dev/hda5
e2fsck -fD /dev/hda5

I entered a lot of "y" characters and rebooted the machine.

After starting in runlevel 5 I tried mounting "/var/exports/8.0.92/" with nfs
again :

/etc/rc.d/init.d/nfs start
mount -t nfs 10.0.0.4:/var/exports/8.0.92/ /mnt/nfs/

And hey! Doing a find in /mnt/nfs works as it should! Although I have not tried
a Phoebe 8.0.92 --> Phoebe 8.0.92 NFS installation yet, I'm feeling confident
that it will work this time, provided there are no Anaconda NFS client issues...

Well, it looks like there's an issue with nfsd and the htree patch on the kernel
provided with Phoebe 8.0.92.

Comment 5 Michael K. Johnson 2003-01-15 21:09:39 UTC
HTree has been disabled in the rawhide kernels as not quite ready for
prime time.


Note You need to log in before you can comment on or make changes to this bug.