Bug 627835

Summary: libguestfs protocol loses synchronization if you 'upload' before mounting disks
Product: Red Hat Enterprise Linux 6 Reporter: Jinxin Zheng <jzheng>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: dallan, leiwang, llim, mbooth, mshao, rjones, syeghiay, virt-maint
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libguestfs-1.7.17-24.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 576879 Environment:
Last Closed: 2011-12-06 10:42:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 576879, 613593, 719879    
Bug Blocks: 584228, 591155, 591250    

Description Jinxin Zheng 2010-08-27 05:18:34 UTC
Clone to RHEL 6 to ensure it will get fixed in 6.1.

+++ This bug was initially created as a clone of Bug #576879 +++

[Originally reported by Seth Vidal]

Description of problem:

guestfish <<EOF
> add f12-minimal.img
> run
> upload /var/tmp/guestfish-1.0.85-1.el5.7.x86_64.rpm /home/vmbuild/guestfish-1.0.85-1.el5.7.x86_64.rpm
> EOF
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x64 from daemon, expected 0xffffeeee
 
libguestfs: error: message length (536933877) > maximum possible size (4194304)
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x1 from daemon, expected 0xffffeeee

With another version of libguestfs:

$ cat test
#!/bin/sh -

guestfish -x <<EOF
add f12.img
run
upload a_big_file /
EOF
$ sh test
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x64 from daemon, expected 0xffffeeee

libguestfs: error: message length (536933877) > maximum possible size (4194304)
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x1 from daemon, expected 0xffffeeee

Version-Release number of selected component (if applicable):

libguestfs 1.0.87

How reproducible:

Always.

The problem here is we're uploading without first mounting
any disk.  Upload is failing and probably reporting an error,
but then the protocol loses synchronization and it's game over.

--- Additional comment from rjones on 2010-04-17 17:32:59 EDT ---

Fix posted:
http://git.annexia.org/?p=libguestfs.git;a=commitdiff;h=5922d7084d6b43f0a1a15b664c7082dfeaf584d0

--- Additional comment from rjones on 2010-04-18 05:29:19 EDT ---

Setting back to ASSIGNED, since we're still not quite there
with this patch.

><fs> sparse /tmp/test.img 10M
><fs> run
><fs> tar-in /tmp/foobar /blah
libguestfs: error: open: /tmp/foobar: No such file or directory
><fs> list-devices 
libguestfs: error: unexpected procedure number (69/7)
><fs> list-devices 
/dev/vda

The first error from list-devices shouldn't happen.

--- Additional comment from rjones on 2010-04-18 05:30:09 EDT ---

Another example:

><fs> tar-in /tmp/foobar /blah
libguestfs: error: open: /tmp/foobar: No such file or directory
><fs> ping-daemon
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x64 from daemon, expected 0xffffeeee
><fs> ping-daemon
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x2000f5f5 from daemon, expected 0xffffeeee

--- Additional comment from rjones on 2010-05-12 14:06:12 EDT ---

Here's a one line reproducer for the latest libguestfs:

$ ./fish/guestfish -N disk -- -tar-in /dev/nofile /blah : ping-daemon
libguestfs: error: open: /dev/nofile: No such file or directory
libguestfs: error: unexpected procedure number (69/92)
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x18 from daemon, expected 0xffffeeee

libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x2000f5f5 from daemon, expected 0xffffeeee

--- Additional comment from rjones on 2010-05-12 14:22:26 EDT ---

OK I understand what's going on here.  Both ends simultaneously send
cancel messages:

library                 daemon
  |
  V
 sends RPC message -------+
  |                       |
  |                 receives RPC message
  |                       |
  V                       V
 opens file,        filesystem not mounted!
 error: not found!        |
  |                       |
  V                       V
 sends cancel       sends cancel
  +------->      <--------+
            !!!!

--- Additional comment from rjones on 2010-05-12 15:00:36 EDT ---

Patch posted upstream to fix the issue described in comment 5:
https://www.redhat.com/archives/libguestfs/2010-May/msg00061.html

Comment 1 Richard W.M. Jones 2010-11-24 09:09:09 UTC
Will probably be fixed by the rebase.  Needs QA to
verify that.

Comment 2 Richard W.M. Jones 2011-01-04 14:15:25 UTC
Going to claim that this is fixed by the
rebase.  QA please check this one carefully
since there are lots of corner cases in the
code, and we're not really sure that we have
fixed all of them properly.

Comment 4 Jinxin Zheng 2011-01-31 07:18:00 UTC
This is found not completely fixed. Actually it looks even worse:

guestfish <<EOF
add test.img
run
upload test.txt /test.txt
EOF

it is getting hang running the above.

so I would change this back to ASSIGNED.

Comment 5 Richard W.M. Jones 2011-01-31 09:48:32 UTC
Fair enough, this isn't fixed.  In fact we suspected this
when the regression test started failing:
https://bugzilla.redhat.com/show_bug.cgi?id=576879#c7

I think I'm going to leave this one and not fix it for 6.1.
There's an easy workaround for users, and we can fix it
for 6.2 instead.

Comment 6 RHEL Program Management 2011-01-31 10:05:14 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Comment 8 Richard W.M. Jones 2011-07-12 10:48:31 UTC
Safest to fix this by rebasing (bug 719879).

Comment 9 Richard W.M. Jones 2011-08-10 17:24:39 UTC
Actually we have included all the relevant commits
so this should be fixed in 6.2.

Comment 10 Richard W.M. Jones 2011-08-10 17:26:43 UTC
This is what the correct output should be (verified
for me on RHEL 6.2 with libguestfs 1.7.17-24.el6):

$ guestfish -N disk -- -tar-in /dev/nofile /blah : ping-daemon : echo OK
libguestfs: error: open: /dev/nofile: No such file or directory
OK

Comment 13 Jinxin Zheng 2011-08-26 09:10:17 UTC
Verified this using the reproducer in comment 10.

$ guestfish -N disk -- -tar-in /dev/nofile /blah : ping-daemon : echo OK

libguestfs-1.7.17-19:
The script hangs after printing
libguestfs: error: open: /dev/nofile: No such file or directory

libguestfs-1.7.17-26:
The scripts prints the following then exit.
libguestfs: error: open: /dev/nofile: No such file or directory
OK

Comment 14 errata-xmlrpc 2011-12-06 10:42:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1512.html