From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows 95) Description of problem: I have two new systems both running RH 7.1. I use NFS to mount file systems between the two system. I am trying to copy an entire filesystem from one machine to another using CPIO. The copy always lockup in the same place. I have tried 2.4.2-2smp as well as 2.4.3-12smp. It responds the same on both Kernels. I have tried the copy using cp -a -v and it still lockup although in a different place. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Boot the system, logon. The file system mounted is: nasstor2:staff on /nasstor2/staff nasstor2:bbx on /nasstor2/bbx nasstor2:imaging on /nasstor2/imaging 2. cd /nasstor2/staff/prodstaff 3. find . -print | cpio -pdvmu /staff/prodstaff It copies files for several minutes then locks up. Actual Results: Sep 13 14:18:55 nasstor3 kernel: nfs: server nasstor2 not responding, still trying Sep 13 14:18:55 nasstor3 last message repeated 2 times Sep 13 14:20:44 nasstor3 kernel: nfs: task 29003 can't get a request slot Sep 13 14:20:44 nasstor3 kernel: nfs: task 29004 can't get a request slot Sep 13 14:20:44 nasstor3 kernel: nfs: task 29005 can't get a request slot Sep 13 14:20:44 nasstor3 kernel: nfs: task 29006 can't get a request slot Sep 13 14:20:44 nasstor3 kernel: nfs: task 29007 can't get a request slot Sep 13 14:20:44 nasstor3 kernel: nfs: task 29008 can't get a request slot Sep 13 14:20:44 nasstor3 kernel: nfs: task 29009 can't get a request slot Sep 13 14:20:44 nasstor3 kernel: nfs: task 29010 can't get a request slo Expected Results: Files would copy until finished. Additional info: Once this happens I can get the system back by running the script: /etc/rc2.d/K75netfs stop sleep 5 /etc/rc2.d/K75netfs start When I do this I get: Sep 13 14:28:37 nasstor3 kernel: nfs_statfs: statfs error = 5 Sep 13 14:28:37 nasstor3 umount: umount2: Device or resource busy Sep 13 14:28:37 nasstor3 umount: umount: /nasstor2/staff: device is busy Sep 13 14:28:38 nasstor3 netfs: Unmounting NFS filesystems: failed Sep 13 14:28:46 nasstor3 netfs: Unmounting NFS filesystems (retry): succeeded Sep 13 14:28:53 nasstor3 netfs: Mounting NFS filesystems: succeeded Sep 13 14:28:53 nasstor3 netfs: Mounting other filesystems: succeeded Hardware: 00:00.0 Host bridge: ServerWorks CNB20LE (rev 05) 00:00.1 Host bridge: ServerWorks CNB20LE (rev 05) 00:02.0 VGA compatible controller: ATI Technologies Inc 3D Rage IIC 215IIC [Mach 64 GT IIC] (rev 7a) 00:03.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08) 00:0f.0 ISA bridge: ServerWorks OSB4 (rev 4f) 00:0f.1 IDE interface: ServerWorks: Unknown device 0211 01:04.0 SCSI storage controller: Adaptec 7899P 01:04.1 SCSI storage controller: Adaptec 7899P 01:06.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03) 01:07.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (r ev 05) 01:07.1 RAID bus controller: Mylex Corporation DAC960PX (rev 05) 01:08.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (r ev 05) 01:08.1 RAID bus controller: Mylex Corporation DAC960PX (rev 05) 02:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 05) 02:05.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 05) OS: Linux nasstor3 2.4.3-12smp #1 SMP Fri Jun 8 14:38:50 EDT 2001 i686 unknown This is reproducable and stops in exactly the same spot everytime. Bob Lawson bobl
During the "problem time", can you ping the other host ? (eg is networking down)
The network is not down as I am accessing both systems using telnet sessions from a PC and they continue to work. But I tested it anyway and yes I can ping between the two systems.
Ok, I have now tried it with version 2.4.9-6smp and I get the same thing happening. It runs to the same point and then stops and starts producing the message: nfs: server nasstor2 not responding, still trying followed later by: nfs: task 31401 can't get a request slot nfs: task 31402 can't get a request slot ..... At this point no data moves between the two systems. After I interrupt the process I get: nfs: server nasstor2 OK But I still do not get a prompt back. A df hangs once it hits the mounted filesystems. Bob
I have tried the e100 driver instead of the eepro100 driver. But the problem persists.
You say exactly the same spot. Thats an important clue. You mean it stops on the same file each time ?
The transfer ALWAYS stops at exactly the same spot! Something in the back of my head has been saying... content... content. Ok... this seems very unlikely but I had to test to see if it was possible. I take the two files where the problem happens: DSC00036.JPG DSC00037.JPG Cpio prints the first file name but not the second one. I'm not sure if cpio prints the name before it copies or after so I took both files. 1) I move the files to a different location on the same filesystem but a location I am not trying to copy. Then I run my test again and it flys past the location it stopped at before. 2) I move the two files to a location which occurs earlier in the copy process. In both cases I have moved the file so the inodes and data allocation should remain untouched and the file is only renamed. I run the test again and it stops at the same file DSC00036.JPG! But DSC00036.JPG is now in a totally different directory and very close to the beginning of the cpio copy. 3) I once again move the files to an uncopied location. Now I copy the files back into a location to be copied. This should allocate a new inode and data blocks. I once again run the test. And it stops again at exactly the same file! My thoughts: It appears to be the contents of the file that are causing the problem. However I can ftp the file off the system without any problem. If it was a problem with the adapter/driver I would expect to see the problem when I used ftp. Ok... yes I understand the protocols are different and the packaging of the information is different. But I have easily moved many >1BG files with no problems over these adapters. But I do have problems with NFS. I also have problems with SCO<->Linux NFS. I still have the file, both the moved and copied files so if you need information from or about the files I can easily get it.
Bit pattern dependant problems really have to be at the hardware level. They do happen, obscurely quite often, when you get a slightly bad card combined with bit patterns that are "worst case" for the ethernet clocking algorithm and encoding. Things worth trying include changing the hub port the card is plugged into (in case its a hub problem), and changing the card. Bear in mind its not clear which end (or in the middle) may have problems. Also check the error counter behaviour on the cards Alan