Bug 1319347 - File upload when creating an image does not work correctly with python2-libguestfs (creates truncated file)
Summary: File upload when creating an image does not work correctly with python2-libgu...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: libguestfs
Version: 23
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-18 22:47 UTC by Adam Williamson
Modified: 2016-03-19 08:08 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-18 23:57:05 UTC
Type: Bug


Attachments (Terms of Use)
test.py (401 bytes, text/plain)
2016-03-18 23:02 UTC, Richard W.M. Jones
no flags Details

Description Adam Williamson 2016-03-18 22:47:19 UTC
Well this is pretty odd, but I did test it a couple of times and got the same result both times.

I have a little script, createhdds.py , which creates several pre-canned images for openQA testing purposes. It creates some of the images with python-guestfs. It has the ability to download a file from a remote server and 'upload' it to the disk image it's creating. The code looks like this:

# do file 'uploads'. in guestfs-speak that means transfer
# a file from the host to the image, we use it to mean
# download a file from an http server and transfer that
# to the image
for upload in self.uploads:
    # download the file to a temp file - borrowed from
    # fedora_openqa_schedule (which stole it from SO)
    with tempfile.NamedTemporaryFile(prefix="createhdds") as dltmpfile:
        resp = urlopen(upload['source'])
        while True:
            # This is the number of bytes to read between buffer
            # flushes. Value taken from the SO example.
            buff = resp.read(8192)
            if not buff:
                break
            dltmpfile.write(buff)
        # as with write, the dict must specify a target
        # partition and location ('target')
        partn = gfs.list_partitions()[int(upload['part'])-1]
        gfs.mount(partn, "/")
        gfs.upload(dltmpfile.name, upload['target'])

where 'gfs' is a guestfs.GuestFS instance (with python_return_dict=True) where we've already created some partitions.

One of our images is supposed to include a copy of https://fedorapeople.org/groups/qa/updates/updates-unipony.img , which is 6509 bytes long. If I run the script through Python 2, then mount the created hard disk image and inspect the contents, the file is only 4096 bytes long. If I run the script through Python 3, the file is 6509 bytes (with the correct checksum).

I checked whether the problem is in the download code by changing it a bit to download to a regular file (not a NamedTemporaryFile) so the file would still be around and I could inspect it. With both Python 2 and 3, the downloaded file is 6509 bytes, so it seems the problem is definitely not with the download code, but in the guestfs upload.

The behaviour seems to be the same on current F23 and F24.

To reproduce, try this:

git clone https://bitbucket.org/rajcze/openqa_fedora_tools.git
dnf install python2-fedfind python3-fedfind python2-libguestfs python3-libguestfs python2-pexpect python3-pexpect
cd openqa_fedora_tools/tools
python2 ./createhdds.py updates_img
(inspect created disk_updates_img.img)
python3 ./createhdds.py updates_img
(inspect created disk_updates_img.img)

Comment 1 Richard W.M. Jones 2016-03-18 23:02:08 UTC
Created attachment 1137903 [details]
test.py

I can't reproduce this at the moment with a small test script (see
attachment).

Some things to check:

- Do you call g.shutdown() before closing the handle?  If you're
  writing to a disk image, then this will catch any errors if qemu
  is crashing during shutdown, whereas just g.close() will ignore qemu
  crashes.
  https://www.redhat.com/archives/libguestfs/2012-July/msg00014.html

- Can you capture the output when g.trace(1) is enabled?  Perhaps
  the wrong file is being uploaded?  It will be obvious from the trace.

Comment 2 Adam Williamson 2016-03-18 23:48:17 UTC
"- Can you capture the output when g.trace(1) is enabled?  Perhaps
  the wrong file is being uploaded?"

This seems very unlikely. There is no other file that could be uploaded.

"- Do you call g.shutdown() before closing the handle?  If you're
  writing to a disk image, then this will catch any errors if qemu
  is crashing during shutdown, whereas just g.close() will ignore qemu
  crashes."

only gfs.close(), it looks like. I'll try adding a gfs.shutdown() and see what happens.

Comment 3 Adam Williamson 2016-03-18 23:57:05 UTC
aha, I think I have it. It's a bug on my side, sorry. Code is missing a:

dltmpfile.close()

which is clearly a mistake on my part. (I have to do the upload inside the 'with tempfile.NamedTemporaryFile(prefix="createhdds") as dltmpfile:' block, because once we leave that block, the temp file is removed). It seems python3 does better with this case than python2. If I add a dltmpfile.close() right after the while True: block, it seems to solve the problem.

closing NOTABUG , meaning NOTYOURBUG. :)

Comment 4 Adam Williamson 2016-03-19 00:00:01 UTC
hmmm, actually, that's no good - closing a NamedTemporaryFile removes it. Using a .close() worked fine on my modified version which was using a regular file. I'll have to think of something else. Maybe this is a bug in python2 tempfile...hum.

Comment 5 Adam Williamson 2016-03-19 00:07:19 UTC
for the record, I switched to tempfile.mkstemp and making the code deal with cleaning up the temp file after the upload. That seems to have done the trick.

Comment 6 Richard W.M. Jones 2016-03-19 08:08:16 UTC
Glad you fixed it.  Do please add g.shutdown() call though!  Very
occasionally we have had bugs in qemu which caused data corruption,
and those will only be caught and signalled back to your application
if you shutdown the handle before closng it.


Note You need to log in before you can comment on or make changes to this bug.