Bug 227225

Summary: unzip fails on an NFS-mounted partition
Product: Red Hat Enterprise Linux 5 Reporter: Julian C. Dunn <jdunn>
Component: unzipAssignee: Ivana Varekova <varekova>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5.3CC: acase, cebbert, davej, juanino, matt.dey, ohudlick, tao, varekova, wtogami
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0609 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-30 13:00:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sample zip file
none
network traffic capture of NFS traffic
none
Network Trace for unziping phd_comp files from a test zip file none

Description Julian C. Dunn 2007-02-03 18:46:19 UTC
Description of problem:

unzip fails to work when a .zip file is located on an NFS partition.

Version-Release number of selected component (if applicable):

unzip-5.52-2.2.1

How reproducible:

Always

Steps to Reproduce:
1. Put a ZIP file on an NFS-mounted partition
2. Attempt to unzip it.
3.
  
Actual results:

[my home directory is mounted over NFS]

jupiter:~/doc/aastra$ unzip php_classes_1.4.1.zip 
Archive:  php_classes_1.4.1.zip
  inflating: AastraIPPhone.class.php  
AastraIPPhone.class.php:  write error (disk full?).  Continue? (y/n/^C) 

[Responding "y" causes a file of zero bytes to be extracted]

Expected results:

jupiter:~/doc/aastra$ cp php_classes_1.4.1.zip /tmp
jupiter:~/doc/aastra$ cd /tmp
jupiter:/tmp$ unzip php_classes_1.4.1.zip 
Archive:  php_classes_1.4.1.zip
  inflating: AastraIPPhone.class.php  
  inflating: AastraIPPhoneDirectory.class.php  
  inflating: AastraIPPhoneDirectoryEntry.class.php  
  inflating: AastraIPPhoneExecute.class.php  
  inflating: AastraIPPhoneExecuteEntry.class.php  
  inflating: AastraIPPhoneInputScreen.class.php  
  inflating: AastraIPPhoneSoftkeyEntry.class.php  
  inflating: AastraIPPhoneStatus.class.php  
  inflating: AastraIPPhoneStatusEntry.class.php  
  inflating: AastraIPPhoneTextMenu.class.php  
  inflating: AastraIPPhoneTextMenuEntry.class.php  
  inflating: AastraIPPhoneTextScreen.class.php  
  inflating: License.txt             
  inflating: sample.php              


Additional info:

Comment 1 Ivana Varekova 2007-02-05 13:12:40 UTC
I try to reproduce your problem but I was unsuccesfull.
It seems the disk which you mount is full - please could you attach here the
output of df -h (and the name of your home directory), if the problem still
affect it.

Comment 2 Julian C. Dunn 2007-02-05 13:23:08 UTC
Yes, I still have this problem: df -h shows

jupiter:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md2              231G   18G  214G   8% /
/dev/md0               99M   16M   78M  17% /boot
tmpfs                 500M     0  500M   0% /dev/shm
/dev/hde1             115G   32G   83G  28% /stash
fs:/usr/home           20G  3.2G   15G  18% /net/fs/usr/home
fs:/var/mail          3.9G  189M  3.4G   6% /net/fs/var/mail


Comment 3 Julian C. Dunn 2007-02-05 13:24:11 UTC
Sorry, if it's not obvious - my home directory is /home/staff/jdunn, and /home
is a symlink to /net/fs/usr/home, automounted.

Comment 4 Ivana Varekova 2007-02-05 13:27:35 UTC
So it seems there is enought space - could you please attach here zip archiv
which   causes this problem.

Comment 5 Julian C. Dunn 2007-02-05 13:47:11 UTC
Created attachment 147355 [details]
sample zip file

Comment 6 Julian C. Dunn 2007-02-05 13:47:38 UTC
It happens on all such zip files but I'm attaching the specific one mentioned
here just in case.

Comment 7 Ivana Varekova 2007-02-05 13:53:17 UTC
I still can't reproduce you problem. Could you please attach the output of 
cat /etc/fstab | grep "/net/fs/usr/home"
here?

Comment 8 Julian C. Dunn 2007-02-05 14:11:00 UTC
The home directory is not mounted in /etc/fstab, it's mounted over autofs using
/net/fs.

Comment 9 Ivana Varekova 2007-02-07 16:05:48 UTC
Do you have permissions to write to the ~/doc/aastra dir? (Could you attach the
output of ls -l for this dir?)

Comment 10 Julian C. Dunn 2007-02-09 00:08:40 UTC
Yes I do:

jupiter:/net/fs/usr/home/staff/jdunn/doc$ ls -ld aastra
drwxr-xr-x 2 jdunn games 1536 Feb  3 13:45 aastra/


Comment 11 Ivana Varekova 2007-02-09 10:53:55 UTC
It seems for me to be a kernel rpoblem so I'm reassigning this bug to kernel.

Comment 12 Andrew Case 2007-08-14 21:19:14 UTC
OS: RHEL5-u0-Server
Arch: x86_64
Package: unzip-5.52-2.2.1
Filesystem: NFS mounted via autofs (using LDAP)
Kernel: 2.6.18-8.1.6.el5 #1 SMP

Problem:

I have a similar problem, but I don't believe it's kernel related, because I
built the info-zip software from source and it worked absolutely fine (no other
changes to the system).

I did not get this error message however: 
   write error (disk full?).  Continue? (y/n/^C)


The first time I run unzip I get zero sized files (same as described above).  If
I run it again immediately following I get the expected results of my entire files.

This seems likely that there may be a race condition with creating and writing
to the file when NFS is involved.


Example:

# Test to show that zip file is okay:
$ unzip -t test.zip 
Archive:  test.zip
    testing: phd_comps2.sxc           OK
    testing: phd_comps.csv            OK
    testing: phd_comps.sxc            OK
No errors detected in compressed data of test.zip.

# Show that only the zip file exists
$ ls -la
total 22
drwx------  2 acase acase   512 Aug 14 16:30 .
drwx------ 10 acase acase  4608 Aug 14 12:05 ..
-rw-------  1 acase acase 15887 Aug 14 12:01 test.zip

# Extract the zip file results in empty files
$ unzip -o test.zip 
Archive:  test.zip
  inflating: phd_comps2.sxc          
  inflating: phd_comps.csv           
  inflating: phd_comps.sxc           
$ ls -la
total 22
drwx------  2 acase acase   512 Aug 14 16:31 .
drwx------ 10 acase acase  4608 Aug 14 12:05 ..
-rw-r--r--  1 acase acase     0 Aug 29  2006 phd_comps2.sxc
-rw-r--r--  1 acase acase     0 Sep 15  2006 phd_comps.csv
-rw-r--r--  1 acase acase     0 Aug 30  2006 phd_comps.sxc
-rw-------  1 acase acase 15887 Aug 14 12:01 test.zip

# Extract the same zip file while overwritting existing files
# results in expected results
$ unzip -o test.zip 
Archive:  test.zip
  inflating: phd_comps2.sxc          
  inflating: phd_comps.csv           
  inflating: phd_comps.sxc           
$ ls -la
total 40
drwx------  2 acase acase   512 Aug 14 16:31 .
drwx------ 10 acase acase  4608 Aug 14 12:05 ..
-rw-r--r--  1 acase acase  8964 Aug 29  2006 phd_comps2.sxc
-rw-r--r--  1 acase acase   996 Sep 15  2006 phd_comps.csv
-rw-r--r--  1 acase acase  7834 Aug 30  2006 phd_comps.sxc
-rw-------  1 acase acase 15887 Aug 14 12:01 test.zip


This bug may be unrelated, but they seemed very similar (except I didn't get an
error).  If I `echo $?` after the unzip it returns "0" both times.

Comment 13 Steve Dickson 2007-08-29 23:49:36 UTC
Are you having the same problem on more recent kernels?

Comment 14 Andrew Case 2007-08-30 02:31:38 UTC
Yes it still has the same problem in the current RHEL5 kernel (2.6.18-8.1.8.el5)

I did discover some more interesting details though, this only seems to happen
on x86_64 systems.

Works: Linux 2.6.18-8.1.8.el5 #1 SMP i686 i686 i386
Broken: Linux 2.6.18-8.1.8.el5 #1 SMP x86_64 x86_64 x86_64

I compiled unzip from source and it works fine on x86_64, it just seems to be
the RHEL5 binary.  They're both x86_64 compiles, here's the file output:

Redhat's unzip (`file /usr/bin/unzip`): 
/usr/bin/unzip: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for
GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9,
stripped
My compile unzip (`file /usr/local/bin/unzip`):
/usr/local/bin/unzip: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV),
for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9,
stripped


Comment 15 Julian C. Dunn 2007-08-30 02:39:45 UTC
The latest kernel I can run on this system (which is now F7) is the following:

Linux jupiter.acf.aquezada.com 2.6.21-1.3228.fc7 #1 SMP Tue Jun 12 14:56:37 EDT
2007 x86_64 x86_64 x86_64 GNU/Linux

I still have the problem on that kernel.

Comment 16 Steve Dickson 2007-08-30 11:31:51 UTC
I too am having no luck reproducing this problem. I've tried on both
a F8 (2.6.23-0.136.rc3.git7.fc7) and F7 (2.6.22.3-62.fc7) kernels.

Questions:
1) is Selinux on? if so please temporary turn it off with the
   'setenforce 0' command.
2) who/what is  the server
3) could you post a bzip2 network trace of this problem. 
   Something similar to:
      tshark -w /tmp/bz227225.pcap host <server> ; bzip2 tmp/bz227225.pcap


Comment 17 Julian C. Dunn 2007-09-19 17:12:03 UTC
I now have this kernel and the problem still occurs.

Linux jupiter.acf.aquezada.com 2.6.22.5-76.fc7 #1 SMP Thu Aug 30 13:08:59 EDT
2007 x86_64 x86_64 x86_64 GNU/Linux

To answer your questions:

1) SELinux is not on.
2) The server is FreeBSD 6.2-STABLE:

FreeBSD aphrodite.acf.aquezada.com 6.2-STABLE FreeBSD 6.2-STABLE #5: Mon Sep 17
16:46:38 EDT 2007    
jdunn.aquezada.com:/usr/obj/usr/src/sys/APHRODITE  i386

3) I captured the NFS traffic and will attach it shortly.

Comment 18 Julian C. Dunn 2007-09-19 17:12:45 UTC
Created attachment 199781 [details]
network traffic capture of NFS traffic

Comment 19 Steve Dickson 2007-09-25 16:12:30 UTC
Although I have not found any smoking gun in the network trace
does these failures occur using a different server? 

Comment 20 Steve Dickson 2007-09-25 17:06:07 UTC
After further review and looking at packets 546, 602, 836,
1120 and 1145 in the packet trace, it seems the server is
returning IO errors when the mode bits on a file are being
changed. 

So I suspect that unzipping using a different server
will work and this is the reason I'm not seeing this
problem.

Is there any type errors being logged to the system
log file?


Comment 21 Julian C. Dunn 2007-09-26 12:44:08 UTC
Testing on a different server is beyond my capabilities, unfortunately, because
this is a home setup and I only have one server set up.

I don't see any errors in either the NFS server or client side.

I'm interested in what Andrew Case's setup is (see comment #14) as he reports
seeing this problem on RHEL5, only under x86_64, and only with the
RedHat-shipped binary.

I will also try compiling 'unzip' from source, like he did, and see what happens.

Comment 22 Andrew Case 2007-09-26 15:07:15 UTC
Created attachment 207211 [details]
Network Trace for unziping phd_comp files from a test zip file

Comment 23 Andrew Case 2007-09-26 15:11:52 UTC
Our server is Solaris 10 running on Sparc.

Revelent network trace information above is attached in plain text/bzip2 format.

Comment 24 Steve Dickson 2007-09-26 23:17:10 UTC
WRT the network trace in Comment #22 it appears there a 
write failures:
      reply ok 148 write ERROR: Permission denied

which might be red herring but it a different error on
a different op... 

Let me try this with a x86_64 client against a Solaris 10 server.
Hopefully I'll see the problem... 

Comment 25 Christopher Brown 2008-01-09 00:07:26 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Comment 26 Jerry Uanino 2008-02-08 03:23:41 UTC
Wow. I'm having the same problem.  I've isloated it a bit and can reproduce it
easily.  So far for me:

* It occurs on RedHat 5.1 (x86_64)
* It does --not-- occur on redhat 3 (i386)  or 4 (x86_64)
* It does --not-- occur on a Netapp NFS mount
* It does --not-- occur on an NFS mount to a Solaris 10 box
* It does occur to an Isilon storage system (www.isilon.com)

When I strace I see this:
open("1", O_WRONLY|O_CREAT, 037777777777) = 4

If I copy the binary from RH4 it works fine... and I see this:
open("1", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4

Notice the 3777.... that is an overflow of the octal permissions as that is the
largest possible number I think?

I can easily reproduce this and will do so in a webex if someone can help fix it.



Comment 27 Christopher Brown 2008-02-08 12:50:47 UTC
(In reply to comment #26)
> Wow. I'm having the same problem.  I've isloated it a bit and can reproduce it
> easily.  So far for me:
> 
> * It occurs on RedHat 5.1 (x86_64)
> * It does --not-- occur on redhat 3 (i386)  or 4 (x86_64)
> * It does --not-- occur on a Netapp NFS mount
> * It does --not-- occur on an NFS mount to a Solaris 10 box
> * It does occur to an Isilon storage system (www.isilon.com)

This is a Fedora bug - have you tested with Fedora 8? If you are having an issue
with RHEL please file a new bug against that distribution. 

Comment 28 Steve Dickson 2008-02-08 14:30:49 UTC
I am still not able to see this problem using following
combos.. 

Client(x86_64)              Server
f8 (2.6.23.9-85.fc8)       rhel5.2(beta) 2.6.18-78.el5
f8 (2.6.23.9-85.fc8)       Solaris 10
f8 (2.6.23.9-85.fc8)       NetApp (oldish) filer

f8 (2.6.23.14-115.fc8)     rhel5.2(beta) 2.6.18-78.el5
f8 (2.6.23.14-115.fc8)     Solaris 10
f8 (2.6.23.14-115.fc8)     NetApp (oldish) filer

RHEL5.1 (2.6.18-53)        rhel5.2(beta) 2.6.18-78.el5
RHEL5.1 (2.6.18-53)        Solaris 10
RHEL5.1 (2.6.18-53)        NetApp (oldish) filer


Client(x86)              Server
rawhide (2.6.24-23.fc9)       rhel5.2(beta) 2.6.18-78.el5
rawhide (2.6.24-23.fc9)       Solaris 10
rawhide (2.6.24-23.fc9)       NetApp (oldish) filer


Comment 29 Christopher Brown 2008-02-08 16:22:39 UTC
I'm re-assigning this to RHEL as as I think it might be a mis-file. At no point
has Fedora been mentioned.

Comment 30 Matt Dey 2008-02-08 20:28:56 UTC
This bug is similar to bug 156959

I have fixed the issue in 5.52-2.2.1 by editing line #171 in the patch file 
unzip-5.52-near-4GB.patch to be

+    fd = open(G.filename, O_WRONLY | O_LARGEFILE | O_CREAT | O_EXCL, 0666);


Comment 31 Steve Dickson 2008-02-14 14:19:53 UTC
In my testing I've been using unzip-5.52 and above. So thats
probably why I could not reproduce this problem... 

Comment 32 Matt Dey 2008-02-14 15:22:29 UTC
The problem is very much dependent upon the NFS server in question.  Some handle
the incorrect call to open without and issue.  Others choke on it.  In our
testing we have not been able to determines what decides if it works or not only
that some nfs servers continue on without incident and others create a file with
permissions that don't allow any further writing.   

In any case the open is being called incorrectly.  If the O_CREAT flag is passed
a mode argument is supposed to be passed as well or the results are undefined
and this is what is happening.

Comment 34 Jerry Uanino 2008-05-23 13:53:31 UTC
The fedora bug 156959 seems to be tracking this better than the enterprise linux
bug.  Shouldn't there be an advisory coming and a bugfix for this?

Comment 37 James M. Leddy 2008-05-27 20:34:24 UTC
(In reply to comment #34)
> The fedora bug 156959 seems to be tracking this better than the enterprise linux
> bug.  Shouldn't there be an advisory coming and a bugfix for this?

This bug got a little sidetracked, though both this and fedora bugs have the
same root cause.  Pending approval from development, fixes should be forthcoming
for updates of RHEL and Fedora.

Comment 38 RHEL Program Management 2008-05-27 20:52:03 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 39 Issue Tracker 2008-05-28 17:35:19 UTC
After compiling the source rpm for my client, they report that it works as
expected.



Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by mbelangia 
 issue 167186

Comment 45 errata-xmlrpc 2008-07-30 13:00:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0609.html