Bug 227225
Summary: | unzip fails on an NFS-mounted partition | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Julian C. Dunn <jdunn> | ||||||||
Component: | unzip | Assignee: | Ivana Varekova <varekova> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5.3 | CC: | acase, cebbert, davej, juanino, matt.dey, ohudlick, tao, varekova, wtogami | ||||||||
Target Milestone: | rc | Keywords: | Regression | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2008-0609 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-07-30 13:00:06 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Julian C. Dunn
2007-02-03 18:46:19 UTC
I try to reproduce your problem but I was unsuccesfull. It seems the disk which you mount is full - please could you attach here the output of df -h (and the name of your home directory), if the problem still affect it. Yes, I still have this problem: df -h shows jupiter:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/md2 231G 18G 214G 8% / /dev/md0 99M 16M 78M 17% /boot tmpfs 500M 0 500M 0% /dev/shm /dev/hde1 115G 32G 83G 28% /stash fs:/usr/home 20G 3.2G 15G 18% /net/fs/usr/home fs:/var/mail 3.9G 189M 3.4G 6% /net/fs/var/mail Sorry, if it's not obvious - my home directory is /home/staff/jdunn, and /home is a symlink to /net/fs/usr/home, automounted. So it seems there is enought space - could you please attach here zip archiv which causes this problem. Created attachment 147355 [details]
sample zip file
It happens on all such zip files but I'm attaching the specific one mentioned here just in case. I still can't reproduce you problem. Could you please attach the output of cat /etc/fstab | grep "/net/fs/usr/home" here? The home directory is not mounted in /etc/fstab, it's mounted over autofs using /net/fs. Do you have permissions to write to the ~/doc/aastra dir? (Could you attach the output of ls -l for this dir?) Yes I do: jupiter:/net/fs/usr/home/staff/jdunn/doc$ ls -ld aastra drwxr-xr-x 2 jdunn games 1536 Feb 3 13:45 aastra/ It seems for me to be a kernel rpoblem so I'm reassigning this bug to kernel. OS: RHEL5-u0-Server Arch: x86_64 Package: unzip-5.52-2.2.1 Filesystem: NFS mounted via autofs (using LDAP) Kernel: 2.6.18-8.1.6.el5 #1 SMP Problem: I have a similar problem, but I don't believe it's kernel related, because I built the info-zip software from source and it worked absolutely fine (no other changes to the system). I did not get this error message however: write error (disk full?). Continue? (y/n/^C) The first time I run unzip I get zero sized files (same as described above). If I run it again immediately following I get the expected results of my entire files. This seems likely that there may be a race condition with creating and writing to the file when NFS is involved. Example: # Test to show that zip file is okay: $ unzip -t test.zip Archive: test.zip testing: phd_comps2.sxc OK testing: phd_comps.csv OK testing: phd_comps.sxc OK No errors detected in compressed data of test.zip. # Show that only the zip file exists $ ls -la total 22 drwx------ 2 acase acase 512 Aug 14 16:30 . drwx------ 10 acase acase 4608 Aug 14 12:05 .. -rw------- 1 acase acase 15887 Aug 14 12:01 test.zip # Extract the zip file results in empty files $ unzip -o test.zip Archive: test.zip inflating: phd_comps2.sxc inflating: phd_comps.csv inflating: phd_comps.sxc $ ls -la total 22 drwx------ 2 acase acase 512 Aug 14 16:31 . drwx------ 10 acase acase 4608 Aug 14 12:05 .. -rw-r--r-- 1 acase acase 0 Aug 29 2006 phd_comps2.sxc -rw-r--r-- 1 acase acase 0 Sep 15 2006 phd_comps.csv -rw-r--r-- 1 acase acase 0 Aug 30 2006 phd_comps.sxc -rw------- 1 acase acase 15887 Aug 14 12:01 test.zip # Extract the same zip file while overwritting existing files # results in expected results $ unzip -o test.zip Archive: test.zip inflating: phd_comps2.sxc inflating: phd_comps.csv inflating: phd_comps.sxc $ ls -la total 40 drwx------ 2 acase acase 512 Aug 14 16:31 . drwx------ 10 acase acase 4608 Aug 14 12:05 .. -rw-r--r-- 1 acase acase 8964 Aug 29 2006 phd_comps2.sxc -rw-r--r-- 1 acase acase 996 Sep 15 2006 phd_comps.csv -rw-r--r-- 1 acase acase 7834 Aug 30 2006 phd_comps.sxc -rw------- 1 acase acase 15887 Aug 14 12:01 test.zip This bug may be unrelated, but they seemed very similar (except I didn't get an error). If I `echo $?` after the unzip it returns "0" both times. Are you having the same problem on more recent kernels? Yes it still has the same problem in the current RHEL5 kernel (2.6.18-8.1.8.el5) I did discover some more interesting details though, this only seems to happen on x86_64 systems. Works: Linux 2.6.18-8.1.8.el5 #1 SMP i686 i686 i386 Broken: Linux 2.6.18-8.1.8.el5 #1 SMP x86_64 x86_64 x86_64 I compiled unzip from source and it works fine on x86_64, it just seems to be the RHEL5 binary. They're both x86_64 compiles, here's the file output: Redhat's unzip (`file /usr/bin/unzip`): /usr/bin/unzip: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped My compile unzip (`file /usr/local/bin/unzip`): /usr/local/bin/unzip: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped The latest kernel I can run on this system (which is now F7) is the following: Linux jupiter.acf.aquezada.com 2.6.21-1.3228.fc7 #1 SMP Tue Jun 12 14:56:37 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux I still have the problem on that kernel. I too am having no luck reproducing this problem. I've tried on both a F8 (2.6.23-0.136.rc3.git7.fc7) and F7 (2.6.22.3-62.fc7) kernels. Questions: 1) is Selinux on? if so please temporary turn it off with the 'setenforce 0' command. 2) who/what is the server 3) could you post a bzip2 network trace of this problem. Something similar to: tshark -w /tmp/bz227225.pcap host <server> ; bzip2 tmp/bz227225.pcap I now have this kernel and the problem still occurs. Linux jupiter.acf.aquezada.com 2.6.22.5-76.fc7 #1 SMP Thu Aug 30 13:08:59 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux To answer your questions: 1) SELinux is not on. 2) The server is FreeBSD 6.2-STABLE: FreeBSD aphrodite.acf.aquezada.com 6.2-STABLE FreeBSD 6.2-STABLE #5: Mon Sep 17 16:46:38 EDT 2007 jdunn.aquezada.com:/usr/obj/usr/src/sys/APHRODITE i386 3) I captured the NFS traffic and will attach it shortly. Created attachment 199781 [details]
network traffic capture of NFS traffic
Although I have not found any smoking gun in the network trace does these failures occur using a different server? After further review and looking at packets 546, 602, 836, 1120 and 1145 in the packet trace, it seems the server is returning IO errors when the mode bits on a file are being changed. So I suspect that unzipping using a different server will work and this is the reason I'm not seeing this problem. Is there any type errors being logged to the system log file? Testing on a different server is beyond my capabilities, unfortunately, because this is a home setup and I only have one server set up. I don't see any errors in either the NFS server or client side. I'm interested in what Andrew Case's setup is (see comment #14) as he reports seeing this problem on RHEL5, only under x86_64, and only with the RedHat-shipped binary. I will also try compiling 'unzip' from source, like he did, and see what happens. Created attachment 207211 [details]
Network Trace for unziping phd_comp files from a test zip file
Our server is Solaris 10 running on Sparc. Revelent network trace information above is attached in plain text/bzip2 format. WRT the network trace in Comment #22 it appears there a write failures: reply ok 148 write ERROR: Permission denied which might be red herring but it a different error on a different op... Let me try this with a x86_64 client against a Solaris 10 server. Hopefully I'll see the problem... Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Wow. I'm having the same problem. I've isloated it a bit and can reproduce it easily. So far for me: * It occurs on RedHat 5.1 (x86_64) * It does --not-- occur on redhat 3 (i386) or 4 (x86_64) * It does --not-- occur on a Netapp NFS mount * It does --not-- occur on an NFS mount to a Solaris 10 box * It does occur to an Isilon storage system (www.isilon.com) When I strace I see this: open("1", O_WRONLY|O_CREAT, 037777777777) = 4 If I copy the binary from RH4 it works fine... and I see this: open("1", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4 Notice the 3777.... that is an overflow of the octal permissions as that is the largest possible number I think? I can easily reproduce this and will do so in a webex if someone can help fix it. (In reply to comment #26) > Wow. I'm having the same problem. I've isloated it a bit and can reproduce it > easily. So far for me: > > * It occurs on RedHat 5.1 (x86_64) > * It does --not-- occur on redhat 3 (i386) or 4 (x86_64) > * It does --not-- occur on a Netapp NFS mount > * It does --not-- occur on an NFS mount to a Solaris 10 box > * It does occur to an Isilon storage system (www.isilon.com) This is a Fedora bug - have you tested with Fedora 8? If you are having an issue with RHEL please file a new bug against that distribution. I am still not able to see this problem using following combos.. Client(x86_64) Server f8 (2.6.23.9-85.fc8) rhel5.2(beta) 2.6.18-78.el5 f8 (2.6.23.9-85.fc8) Solaris 10 f8 (2.6.23.9-85.fc8) NetApp (oldish) filer f8 (2.6.23.14-115.fc8) rhel5.2(beta) 2.6.18-78.el5 f8 (2.6.23.14-115.fc8) Solaris 10 f8 (2.6.23.14-115.fc8) NetApp (oldish) filer RHEL5.1 (2.6.18-53) rhel5.2(beta) 2.6.18-78.el5 RHEL5.1 (2.6.18-53) Solaris 10 RHEL5.1 (2.6.18-53) NetApp (oldish) filer Client(x86) Server rawhide (2.6.24-23.fc9) rhel5.2(beta) 2.6.18-78.el5 rawhide (2.6.24-23.fc9) Solaris 10 rawhide (2.6.24-23.fc9) NetApp (oldish) filer I'm re-assigning this to RHEL as as I think it might be a mis-file. At no point has Fedora been mentioned. This bug is similar to bug 156959 I have fixed the issue in 5.52-2.2.1 by editing line #171 in the patch file unzip-5.52-near-4GB.patch to be + fd = open(G.filename, O_WRONLY | O_LARGEFILE | O_CREAT | O_EXCL, 0666); In my testing I've been using unzip-5.52 and above. So thats probably why I could not reproduce this problem... The problem is very much dependent upon the NFS server in question. Some handle the incorrect call to open without and issue. Others choke on it. In our testing we have not been able to determines what decides if it works or not only that some nfs servers continue on without incident and others create a file with permissions that don't allow any further writing. In any case the open is being called incorrectly. If the O_CREAT flag is passed a mode argument is supposed to be passed as well or the results are undefined and this is what is happening. The fedora bug 156959 seems to be tracking this better than the enterprise linux bug. Shouldn't there be an advisory coming and a bugfix for this? (In reply to comment #34) > The fedora bug 156959 seems to be tracking this better than the enterprise linux > bug. Shouldn't there be an advisory coming and a bugfix for this? This bug got a little sidetracked, though both this and fedora bugs have the same root cause. Pending approval from development, fixes should be forthcoming for updates of RHEL and Fedora. This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. After compiling the source rpm for my client, they report that it works as expected. Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by mbelangia issue 167186 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0609.html |