Bug 576879 - libguestfs protocol loses synchronization if you 'upload' before mounting disks
libguestfs protocol loses synchronization if you 'upload' before mounting disks
Status: CLOSED UPSTREAM
Product: Virtualization Tools
Classification: Community
Component: libguestfs (Show other bugs)
unspecified
All Linux
high Severity high
: ---
: ---
Assigned To: Richard W.M. Jones
:
: 624035 (view as bug list)
Depends On:
Blocks: 584228 591155 591250 627835
  Show dependency treegraph
 
Reported: 2010-03-25 10:49 EDT by Richard W.M. Jones
Modified: 2011-07-14 15:04 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 627835 (view as bug list)
Environment:
Last Closed: 2011-07-14 15:04:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard W.M. Jones 2010-03-25 10:49:07 EDT
[Originally reported by Seth Vidal]

Description of problem:

guestfish <<EOF
> add f12-minimal.img
> run
> upload /var/tmp/guestfish-1.0.85-1.el5.7.x86_64.rpm /home/vmbuild/guestfish-1.0.85-1.el5.7.x86_64.rpm
> EOF
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x64 from daemon, expected 0xffffeeee
 
libguestfs: error: message length (536933877) > maximum possible size (4194304)
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x1 from daemon, expected 0xffffeeee

With another version of libguestfs:

$ cat test
#!/bin/sh -

guestfish -x <<EOF
add f12.img
run
upload a_big_file /
EOF
$ sh test
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x64 from daemon, expected 0xffffeeee

libguestfs: error: message length (536933877) > maximum possible size (4194304)
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x1 from daemon, expected 0xffffeeee

Version-Release number of selected component (if applicable):

libguestfs 1.0.87

How reproducible:

Always.

The problem here is we're uploading without first mounting
any disk.  Upload is failing and probably reporting an error,
but then the protocol loses synchronization and it's game over.
Comment 2 Richard W.M. Jones 2010-04-18 05:29:19 EDT
Setting back to ASSIGNED, since we're still not quite there
with this patch.

><fs> sparse /tmp/test.img 10M
><fs> run
><fs> tar-in /tmp/foobar /blah
libguestfs: error: open: /tmp/foobar: No such file or directory
><fs> list-devices 
libguestfs: error: unexpected procedure number (69/7)
><fs> list-devices 
/dev/vda

The first error from list-devices shouldn't happen.
Comment 3 Richard W.M. Jones 2010-04-18 05:30:09 EDT
Another example:

><fs> tar-in /tmp/foobar /blah
libguestfs: error: open: /tmp/foobar: No such file or directory
><fs> ping-daemon
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x64 from daemon, expected 0xffffeeee
><fs> ping-daemon
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x2000f5f5 from daemon, expected 0xffffeeee
Comment 4 Richard W.M. Jones 2010-05-12 14:06:12 EDT
Here's a one line reproducer for the latest libguestfs:

$ ./fish/guestfish -N disk -- -tar-in /dev/nofile /blah : ping-daemon
libguestfs: error: open: /dev/nofile: No such file or directory
libguestfs: error: unexpected procedure number (69/92)
libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x18 from daemon, expected 0xffffeeee

libguestfs: error: check_for_daemon_cancellation_or_eof: read 0x2000f5f5 from daemon, expected 0xffffeeee
Comment 5 Richard W.M. Jones 2010-05-12 14:22:26 EDT
OK I understand what's going on here.  Both ends simultaneously send
cancel messages:

library                 daemon
  |
  V
 sends RPC message -------+
  |                       |
  |                 receives RPC message
  |                       |
  V                       V
 opens file,        filesystem not mounted!
 error: not found!        |
  |                       |
  V                       V
 sends cancel       sends cancel
  +------->      <--------+
            !!!!
Comment 6 Richard W.M. Jones 2010-05-12 15:00:36 EDT
Patch posted upstream to fix the issue described in comment 5:
https://www.redhat.com/archives/libguestfs/2010-May/msg00061.html
Comment 7 Richard W.M. Jones 2010-10-27 08:23:10 EDT
Setting back to ASSIGNED, since the regression test for
this test has started to hang.

See also:
http://git.annexia.org/?p=libguestfs.git;a=commitdiff;h=fb998000e60b32219c2bf839044cff59f499dff1
Comment 8 Richard W.M. Jones 2011-03-18 09:34:05 EDT
[Copy of a note sent to the mailing list]

I just pushed a commit which reenables two tests for [this bug]:

http://git.annexia.org/?p=libguestfs.git;a=commitdiff;h=dc8e4b057ecd3984d7c27c8e
ce54048b6a06d662

This is a really long-standing bug which we thought we'd fixed, but
then turned up again.  It currently is *not* failing on my machine.  I
added some clearer debug messages to the code paths involved.

It seems to be highly timing related and I doubt that it is fully
squashed, so it is quite probably that these tests will fail for
somebody somewhere.  If you can get it to fail with LIBGUESTFS_DEBUG=1
then please post the full log into the bug report.
Comment 9 Richard W.M. Jones 2011-03-18 12:41:55 EDT
I think I've nailed this one finally.  Posted a patch
here, still testing it:

https://www.redhat.com/archives/libguestfs/2011-March/msg00090.html
Comment 11 Richard W.M. Jones 2011-03-18 16:03:57 EDT
Fixes included in 1.9.12.
Comment 12 Richard W.M. Jones 2011-04-12 13:35:30 EDT
*** Bug 624035 has been marked as a duplicate of this bug. ***
Comment 13 Richard W.M. Jones 2011-07-14 15:04:46 EDT
Haven't seen this for quite a while.  FIXED!

Note You need to log in before you can comment on or make changes to this bug.