Bug 1437393

Summary: snapshot created base on the image in https server will hang during booting
Product: Red Hat Enterprise Linux 7 Reporter: Suqin Huang <shuang>
Component: qemu-kvm-rhevAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Suqin Huang <shuang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: chayang, coli, hreitz, juzhang, knoel, michen, mrezanin, mtessun, ngu, pbonzini, qizhu, rjones, shuang, virt-maint, xfu
Target Milestone: rcKeywords: Regression, TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-6.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:35:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1447590    
Bug Blocks:    

Description Suqin Huang 2017-03-30 09:00:33 UTC
Description of problem:


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-0.el7.patchwork201703291116.x86_64
libcurl-7.29.0-42.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a snapshot
# qemu-img create -f qcow2 /home/overlay -b 'json:{"file.driver":"https", "file.url":"https://10.66.10.53/rhel73-64-virtio.qcow2", "file.sslverify":"off", "file.readahead":"64k"}'

2. boot up snapshot

/usr/libexec/qemu-kvm \
    -S \
    -vga std  \
    -device pvpanic,ioport=0x505,id=id4FDckd  \
    -chardev socket,id=serial_id_serial0,path=/tmp/serial-log,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/home/overlay \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=03 \
    -device virtio-net-pci,mac=9a:0d:0e:0f:10:11,id=idKK4Mdq,vectors=4,netdev=idgjGCvt,bus=pci.0,addr=05  \
    -netdev tap,id=idgjGCvt,vhost=on,script=/etc/qemu-ifup \
    -m 2048  \
    -smp 2,maxcpus=2,cores=2,threads=1,sockets=1  \
    -cpu Nehalem \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio 

3.

Actual results:

Hang at the last step

From serial:

[    7.057310] systemd[1]: Inserted module 'ip_tables'
[    7.090429] systemd[1]: Relabelled /dev and /run in 28.931ms.

Welcome to Red Hat Enterprise Linux Server 7.3 (Maipo)!



Expected results:


Additional info:

1. kvm_stat

 Event                                        Total Current
 kvm_exit                                      1125	 48
 kvm_entry                                     1125	 48
 kvm_msr                                        516	 21
 kvm_apic                                       516	 21
 kvm_page_fault                                 480	 21
 kvm_vcpu_wakeup                                129	  5
 kvm_pv_eoi                                     129	  5
 kvm_eoi                                        129	  5
 kvm_inj_virq                                   129	  5
 kvm_apic_accept_irq                            129	  5


2. top info

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                           
25606 root      20   0 2661540 381316  14432 S   0.0  9.8   0:19.80 qemu-kvm 

3. host
processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz

4. base image:
rhel73-64-virtio.qcow2

Comment 2 Hanna Czenczek 2017-04-03 14:14:15 UTC
Hi,

Could you provide me with a URL I can use to reproduce the issue? Because on my machine pretty much the same setup (Live CD ISO though, but used as a backing file of a qcow2 image connected to virtio-blk) works just fine.

Thanks!

Max

Comment 3 Suqin Huang 2017-04-10 02:55:59 UTC
Hi Max,

I test with image here http://10.66.10.53/test/

Regards,
Suqin

Comment 4 Hanna Czenczek 2017-05-10 16:27:06 UTC
Hi,

Sorry for the long delay. Here is a short status update: I was able to reliably reproduce the issue now, and this upstream series seems to fix it: https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg02432.html


Max

Comment 6 Hanna Czenczek 2017-05-19 12:49:18 UTC
*** Bug 1451637 has been marked as a duplicate of this bug. ***

Comment 7 Miroslav Rezanina 2017-05-23 08:14:13 UTC
Fix included in qemu-kvm-rhev-2.9.0-6.el7

Comment 8 Suqin Huang 2017-06-01 02:00:17 UTC
Result: Boot up snapshot successfully

Package: qemu-kvm-rhev-2.9.0-6.el7.x86_64

Steps:
1. Create a snapshot
# qemu-img create -f qcow2 /home/overlay -b 'json:{"file.driver":"https", "file.url":"https://10.66.10.53/rhel73-64-virtio.qcow2", "file.sslverify":"off", "file.readahead":"64k"}'

2. boot up snapshot

Comment 9 Suqin Huang 2017-06-01 02:00:56 UTC
According to comment8

update status to verified

Comment 10 Suqin Huang 2017-06-16 02:36:28 UTC
Not sure what happen, when I re-test with  

qemu-kvm-rhev-2.9.0-6.el7.x86_64 and qemu-kvm-rhev-2.9.0-10.el7.x86_64

The test failed 

# qemu-img info /home/rhel74.qcow2 
image: /home/rhel74.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: json:{"file.driver":"https", "file.url":"https://10.66.10.53/test/rhel74-64-virtio.qcow2", "file.sslverify":"off", "file.readahead":"64k"}
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false


(qemu) qemu-kvm: curl: Operation timed out after 5029 milliseconds with 1409024 out of 2162688 bytes received


[**    ] A start job is running for /sysroot (32s / 1min 56s)[   36.070425] SGI XFS with ACLs, security attributes, no debug enabled
[   36.301993] XFS (dm-0): Mounting V5 Filesystem
[ ***  ] A start job is running for /sysroot (39s / 1min 56s)[   43.430840] blk_update_request: I/O error, dev vda, sector 24131016
[   43.431820] blk_update_request: I/O error, dev vda, sector 24132024
[   43.431820] blk_update_request: I/O error, dev vda, sector 24133048
[   43.431820] blk_update_request: I/O error, dev vda, sector 24134072
[   43.431820] blk_update_request: I/O error, dev vda, sector 24135096
[   43.541685] XFS (dm-0): metadata I/O error: block 0x11025c8 ("xlog_bread_noalign") error 5 numblks 8192
[***   ] A start job is running for /sysroot (40s / 1min 56s)[   43.579249] XFS (dm-0): failed to find log head
[   43.579254] XFS (dm-0): log mount/recovery failed: error -5
[   43.579414] XFS (dm-0): log mount failed
[FAILED] Failed to mount /sysroot.
See 'systemctl status sysroot.mount' for details.
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Reload Configuration from the Real Root.
[  OK  ] Reached target Initrd File Systems.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped dracut pre-mount hook.
         Stopping dracut pre-mount hook...
[  OK  ] Stopped dracut initqueue hook.
         Stopping dracut initqueue hook...
[  OK  ] Closed Open-iSCSI iscsiuio Socket.
[  OK  ] Stopped target System Initialization.
[  OK  ] Started Emergency Shell.
         Starting Emergency Shell...
[  OK  ] Reached target Emergency Mode.
[  OK  ] Stopped dracut pre-trigger hook.
         Stopping dracut pre-trigger hook...
[  OK  ] Stopped dracut pre-udev hook.
         Stopping dracut pre-udev hook...
[  OK  ] Stopped dracut cmdline hook.

Comment 12 Suqin Huang 2017-06-16 02:57:30 UTC
I can boot up base image successfully

Comment 14 Suqin Huang 2017-06-20 01:48:12 UTC
1. # ping 10.66.10.53
PING 10.66.10.53 (10.66.10.53) 56(84) bytes of data.
64 bytes from 10.66.10.53: icmp_seq=1 ttl=60 time=0.917 ms
64 bytes from 10.66.10.53: icmp_seq=2 ttl=60 time=1.36 ms
64 bytes from 10.66.10.53: icmp_seq=3 ttl=60 time=1.25 ms
64 bytes from 10.66.10.53: icmp_seq=4 ttl=60 time=1.00 ms
64 bytes from 10.66.10.53: icmp_seq=5 ttl=60 time=0.905 ms
64 bytes from 10.66.10.53: icmp_seq=6 ttl=60 time=0.953 ms
64 bytes from 10.66.10.53: icmp_seq=7 ttl=60 time=1.06 ms
64 bytes from 10.66.10.53: icmp_seq=8 ttl=60 time=0.952 ms
64 bytes from 10.66.10.53: icmp_seq=9 ttl=60 time=1.10 ms
64 bytes from 10.66.10.53: icmp_seq=10 ttl=60 time=1.00 ms
64 bytes from 10.66.10.53: icmp_seq=11 ttl=60 time=0.930 ms
64 bytes from 10.66.10.53: icmp_seq=12 ttl=60 time=0.936 ms
q64 bytes from 10.66.10.53: icmp_seq=13 ttl=60 time=0.983 ms


2. it doesn't work as well when add "file.timeout":10

Comment 15 Paolo Bonzini 2017-06-20 12:37:28 UTC
Thanks.  Please write me by email the credentials for the system under test so that I can debug it!  (I erroneously thought it was due to long latency and tried debugging from home).

Comment 17 Suqin Huang 2017-06-21 09:16:13 UTC
it's tree related, not kernel related

keep the qemu-kvm version to qemu-kvm-rhev-2.9.0-12.el7.x86_64

20170504.0 tree + 3.10.0-663.el7.x86_64 --> works
20170504.0 tree + 3.10.0-682.el7.x86_64 --> works

20170616.3 tree + 3.10.0-663.el7.x86_64 --> don't work
20170616.3 tree + 3.10.0-682.el7.x86_64 --> don't work

Comment 19 Paolo Bonzini 2017-06-21 17:46:08 UTC
0504 has 5124 packages.

0525 has 5142 packages.

Of these, 4416 haven't changed (are present in both).

The machine that shuang set up has 977 packages installed, of which 52 come from either Beaker, EPEL (for Autotest) or layered products (Ceph, Gluster, qemu-kvm-rhev).

There are 141 packages that have been updated between 0504 and 0525.  After removing dependent packages (e.g. elfutils-libs for elfutils) and looking at the changelogs, this is a shortlist:

0504                                           0525                                           date of change
-------------------------------------------------------------------------------------------------------------
bind-libs-9.9.4-49.el7.x86_64                  bind-libs-9.9.4-50.el7.x86_64                  22-May-2017
bind-libs-lite-9.9.4-49.el7.x86_64             bind-libs-lite-9.9.4-50.el7.x86_64             22-May-2017
ca-certificates-2017.2.11-73.el7.noarch        ca-certificates-2017.2.14-71.el7.noarch        05-May-2017
dbus-1.6.12-17.el7.x86_64                      dbus-1.6.12-18.el7.x86_64                      11-May-2017
device-mapper-1.02.140-1.el7.x86_64            device-mapper-1.02.140-2.el7.x86_64            24-May-2017
device-mapper-event-1.02.140-1.el7.x86_64      device-mapper-event-1.02.140-2.el7.x86_64      24-May-2017
dhclient-4.2.5-53.el7.x86_64                   dhclient-4.2.5-58.el7.x86_64                   16-May-2017
dhcp-common-4.2.5-53.el7.x86_64                dhcp-common-4.2.5-58.el7.x86_64                16-May-2017
dhcp-libs-4.2.5-53.el7.x86_64                  dhcp-libs-4.2.5-58.el7.x86_64                  16-May-2017
efivar-libs-31-1.el7.x86_64                    efivar-libs-31-4.el7.x86_64                    10-May-2017
firewalld-0.4.4.4-3.el7.noarch                 firewalld-0.4.4.4-4.el7.noarch                 18-May-2017
iproute-3.10.0-82.el7.x86_64                   iproute-3.10.0-83.el7.x86_64                   18-May-2017
iputils-20160308-9.el7.x86_64                  iputils-20160308-10.el7.x86_64                 22-May-2017
irqbalance-1.0.7-9.el7.x86_64                  irqbalance-1.0.7-10.el7.x86_64                 16-May-2017
kernel-3.10.0-663.el7.x86_64                   kernel-3.10.0-671.el7.x86_64                   24-May-2017
libsss_idmap-1.15.2-24.el7.x86_64              libsss_idmap-1.15.2-33.el7.x86_64              24-May-2017
libsss_nss_idmap-1.15.2-24.el7.x86_64          libsss_nss_idmap-1.15.2-33.el7.x86_64          24-May-2017
libtirpc-0.2.4-0.9.el7.x86_64                  libtirpc-0.2.4-0.10.el7.x86_64                 17-May-2017
libvirt-3.2.0-4.el7.x86_64                     libvirt-3.2.0-6.el7.x86_64                     24-May-2017
lvm2-2.02.171-1.el7.x86_64                     lvm2-2.02.171-2.el7.x86_64                     24-May-2017
NetworkManager-1.8.0-0.4.rc3.el7.x86_64        NetworkManager-1.8.0-3.el7.x86_64              19-May-2017
nfs-utils-1.3.0-0.41.el7.x86_64                nfs-utils-1.3.0-0.43.el7.x86_64                24-May-2017
nss-3.28.4-6.el7.x86_64                        nss-3.28.4-8.el7.x86_64                        05-May-2017
nss-softokn-3.28.3-4.el7.x86_64                nss-softokn-3.28.3-5.el7.x86_64                17-May-2017
nss-sysinit-3.28.4-6.el7.x86_64                nss-sysinit-3.28.4-8.el7.x86_64                05-May-2017
nss-tools-3.28.4-6.el7.x86_64                  nss-tools-3.28.4-8.el7.x86_64                  05-May-2017
nss-util-3.28.4-2.el7.x86_64                   nss-util-3.28.4-3.el7.x86_64                   17-May-2017
openssh-7.4p1-5.el7.x86_64                     openssh-7.4p1-11.el7.x86_64                    22-May-2017
rpcbind-0.2.0-39.el7.x86_64                    rpcbind-0.2.0-41.el7.x86_64                    17-May-2017
rsyslog-8.24.0-10.el7.x86_64                   rsyslog-8.24.0-12.el7.x86_64                   10-May-2017
tuned-2.8.0-2.el7.noarch                       tuned-2.8.0-3.el7.noarch                       05-May-2017
unbound-libs-1.4.20-31.el7.x86_64              unbound-libs-1.4.20-33.el7.x86_64              25-May-2017
util-linux-2.23.2-39.el7.x86_64                util-linux-2.23.2-41.el7.x86_64                24-May-2017

I don't really see any culprits though.  If I have time tonight I'll repeat the test with a newer kernel.

Comment 20 Paolo Bonzini 2017-06-22 10:05:31 UTC
It's not a bug.  The 0525 machine has a proxy configured for whatever reason, so everything is slooooow.

After typing this in 0504:

  http_proxy=http://squid.apac.redhat.com:3128
  https_proxy=https://squid.apac.redhat.com:3128
  export http_proxy https_proxy

it starts failing like the others.  So I think the bug can stay VERIFIED!

Comment 21 CongLi 2017-06-23 05:19:41 UTC
(In reply to Paolo Bonzini from comment #20)
> It's not a bug.  The 0525 machine has a proxy configured for whatever
> reason, so everything is slooooow.
> 
> After typing this in 0504:
> 
>   http_proxy=http://squid.apac.redhat.com:3128
>   https_proxy=https://squid.apac.redhat.com:3128
>   export http_proxy https_proxy
> 
> it starts failing like the others.  So I think the bug can stay VERIFIED!

Confirmed, there is no more problems in QE side.

Thanks Paolo.

Comment 23 errata-xmlrpc 2017-08-02 04:35:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392