Created attachment 470602 [details] Disk Information of rx2660 Description of problem: Situation 1: write a big file(size > 1GB,use dd or cp commands) to ext2 will cause process uninterruptible sleep when the written file size increased to about 369MB, the “D” STAT process cannot be killed, and will lead to the system cannot reboot normally. Situation 2: I have test ext2 with rhel4.9beta on other ia64 servers, such as rx7640、bl860c. The write performance is too high and abnormal,once umount the ext2 and remount it, all the file in the filesystem will lost and turn to Invalid. Version-Release number of selected component (if applicable): System: IO cards: FW version: HP Integrity Server-rx2660 P600 in slot (1) 2.04 A7173A: in slot (2) fw_1.03.35.70-efi_1.05.04.00 AB429A: in slot (3) fw_5.03.02-efi_2.2 Core-IO LSI-1068 System FW: Current firmware revisions MP FW : F.02.25 BMC FW : 05.26 EFI FW : ROM A 07.14, ROM B 07.14 System FW : ROM A 04.15, ROM B 04.11, Boot ROM A PDH FW : 50.07 UCIO FW : 03.0b PRS FW : 00.08 UpSeqRev: 02, DownSeqRev: 01 How reproducible: Steps to Reproduce: 1.Default or Everything install Rhel4.8GA to the disk from Core-IO LSI-1068 2.update Rhel4.8GA to Rhel4.9Beta 3.login the system, mkfs.ext2 on the disk of IO cards( This issue occurs on P600\ A7173A\ AB429A,as all the disk can reproduce ). 4.mount the disk to a mountpoint, and dd or cp a big file(size > 1GB) to the mountpoint directory, or touch a file and mkdir several directories. 5.check the dd process status by ps ax ,the cost time, count the write performance,and check the files status in the ext2 filesystem. 6.umount the ext2,and then re mount it. 7.check the files status in the ext2 filesystem. Actual results: Situation 1: Write a big file(size > 1GB,use dd or cp commands) to ext2 will cause process uninterruptible sleep when the written file size increased to about 369MB, the “D” STAT process cannot be killed, and will lead to the system cannot reboot normally. Situation 2: The write performance is too high and abnormal,once umount the ext2 and remount it, all the file in the filesystem will lost and turn to Invalid. Expected results: Situation 1:can dd or cp big file to ext2 normally. Situation 2:after remount ext2, all the existing files should be ok. Additional info: Situation 1:Testing on rx2660-12 server [root@minxm ~]# uname -a Linux minxm.rx2660-12 2.6.9-92.EL #1 SMP Mon Nov 29 14:42:44 EST 2010 ia64 ia64 ia64 GNU/Linux [root@minxm ~]# fdisk /dev/cciss/c0d0 [root@minxm ~]# mkfs.ext2 /dev/cciss/c0d0p1 [root@minxm ~]# mount -t ext2 /dev/cciss/c0d0p1 /root/p600/ [root@minxm ~]# time dd if=/dev/zero of=/root/p600/dd bs=1M count=5k Then check the written file size and process status: [root@minxm ~]# cd /root/p600/ [root@minxm p600]# ll total 375744 -rw-r--r-- 1 root root 384335872 Dec 21 22:41 dd drwx------ 2 root root 16384 Dec 21 22:40 lost+found [root@minxm p600]# ll -h total 367M -rw-r--r-- 1 root root 367M Dec 21 22:41 dd drwx------ 2 root root 16K Dec 21 22:40 lost+found [root@minxm p600]# ps ax |grep dd 24134 ttyS0 D+ 0:00 dd if /dev/zero of /root/p600/dd bs 1M count 5k 24148 pts/0 S+ 0:00 grep dd [root@minxm p600]#kill -9 24134 [root@minxm p600]# ps ax |grep dd 24134 ttyS0 D+ 0:00 dd if /dev/zero of /root/p600/dd bs 1M count 5k 24154 pts/0 S+ 0:00 grep dd [root@minxm p600]# reboot Broadcast message from root (pts/0) (Tue Dec 21 06:27:26 2010): The system is going down for reboot NOW! INIT: Switching to runlevel: 6 INIT: Sending processes the TERM signal (System will hang in reboot.) Situation 2:Testing on rx7640-3 server [root@maxcv ~]# uname -a Linux maxcv.rx7640-3-p0.test 2.6.9-92.EL #1 SMP Mon Nov 29 14:42:44 EST 2010 ia64 ia64 ia64 GNU/Linux [root@maxcv ~]# fdisk -lu /dev/sdb Disk /dev/sdb: 72.8 GB, 72839168000 bytes 255 heads, 63 sectors/track, 8855 cylinders, total 142264000 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System /dev/sdb1 63 58605119 29302528+ 83 Linux [root@maxcv ~]# mkfs.ext2 /dev/sdb1 mke2fs 1.35 (28-Feb-2004) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 3662848 inodes, 7325632 blocks 366281 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 224 block groups 32768 blocks per group, 32768 fragments per group 16352 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 22 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. [root@maxcv ~]# mount -t ext2 /dev/sdb1 /root/ext2/ [root@maxcv ~]# ll -h dddd -rw-r--r-- 1 root root 12G Dec 24 06:05 dddd [root@maxcv ~]# time cp /root/dddd /root/ext2/ real 0m17.049s user 0m0.362s sys 0m16.372s (The write performance is about 722MB/S) [root@maxcv ~]# cd /root/ext2/ [root@maxcv ext2]# mkdir 123 [root@maxcv ext2]# ll -h total 13G drwxr-xr-x 2 root root 4.0K Dec 24 09:29 123 -rw-r--r-- 1 root root 12G Dec 24 09:29 dddd drwx------ 2 root root 16K Dec 24 09:28 lost+found [root@maxcv ext2]# cd [root@maxcv ~]# umount /root/ext2 [root@maxcv ~]# mount -t ext2 /dev/sdb1 /root/ext2/ [root@maxcv ~]# cd /root/ext2/ [root@maxcv ext2]# ll total 16 ?--------- ? ? ? ? ? 123 ?--------- ? ? ? ? ? dddd drwx------ 2 root root 16384 Dec 24 09:28 lost+found [root@maxcv ext2]# mkdir 456 [root@maxcv ext2]# ll [root@maxcv ext2]# ll total 20 ?--------- ? ? ? ? ? 123 drwxr-xr-x 2 root root 4096 Dec 24 09:31 456 ?--------- ? ? ? ? ? dddd drwx------ 2 root root 16384 Dec 24 09:28 lost+found [root@maxcv ext2]# rm -rf dddd rm: cannot remove `dddd': Stale NFS file handle [root@maxcv ext2]# cd [root@maxcv ~]# umount /root/ext2 [root@maxcv ~]# mount -t ext2 /dev/sdb1 /root/ext2/ [root@maxcv ~]# cd /root/ext2 [root@maxcv ext2]# ll total 16 ?--------- ? ? ? ? ? 123 ?--------- ? ? ? ? ? 456 ?--------- ? ? ? ? ? dddd drwx------ 2 root root 16384 Dec 24 09:28 lost+found The metadata of file seems go wrong. Disk information of rx2660 please see the attachment of “diskinfo-rx2660”. Sysrq information of rx2660 please see the attachment of “sysrq-w-rx2660” and “sysrq-t-rx2660”.
Created attachment 470603 [details] sysrq-t on rhel4.9beta system of rx2660
Created attachment 470604 [details] sysrq-w on rhel4.9beta system of rx2660
Is this a regression?
Hi Eric I can`t confirm whether this is a regression, I start my testing from RHEL4.9beta version, and then met this issue. This issue don`t occur on RHEL4.8GA. Thanks, -Li
If it occurs in 4.9 but not in 4.8, then it is a regression since 4.8. Could you please test the two kernels here: http://people.redhat.com/esandeen/.bz665521/ and tell me if the problem is present in one but not the other? Thanks, -Eric
Hi Eric This issue don`t occur on kernel 2.6.9-89.44.EL, but be present in kernel 2.6.9-89.45.EL. Thanks, -Li
Thank you. This is likely a dup of bug #662839; when we have that built I'll alert you for another test... Thanks, -Eric
Could you please tell which build this fix will be included in? Currently IO stress testing are blocked by this issue.
For reference these are the changes in 89.45: * Fri Oct 15 2010 Vivek Goyal <vgoyal> [2.6.9-89.45] -scsi: scsi_do_req submitted commands (tape) never complete when device goes (Rob Evers) [636289] -scsi: log msg when getting unit attention (Mike Christie) [585430] -jbd: fix panic in jbd when running bashmemory (Josef Bacik) [488611] -qla2xxx: work around hypertransport sync flood error on sun x4200 with qla2xxx (Chad Dupuis) [621621] -aio: implement request batching for better merging and throughput (Jeff Moyer) [508377] -fs: a bunch of patches to fix various nfsd/iget() races (Alexander Viro) [189918] -net: bonding: add debug module option (Jiri Pirko) [247116] -fix fd leaks if pipe() is called with an invalid address (Amerigo Wang) [509627]
(In reply to comment #8) > Could you please tell which build this fix will be included in? Currently IO > stress testing are blocked by this issue. We're still discussing the fix, and will let you know. Thanks, -Eric
Can you please retest with the latest snapshots, I think this may be resolved now.
Hi Eric, Currently, I can not find any update on the RHN, the kernel version still is 2.6.9-92.EL. Thanks, -Dawei (In reply to comment #11) > Can you please retest with the latest snapshots, I think this may be resolved > now.
I'm sorry; you can test a later snapshot by getting a kernel from the maintainer's URL at http://people.redhat.com/vgoyal/rhel4/RPMS.kernel/ -Eric
Hi Eric, I have run some tests using kernel-2.6.9-100.EL.ia64.rpm which downloaded from you supplied URL. The ext2 works fine. Thanks, Dawei
Thank you for testing. I'll dup this bug to the other. *** This bug has been marked as a duplicate of bug 662839 ***