Bug 1086163
| Summary: | Guest os works abnormally after running 35 days; XFS is stuck at xlog_grant_head_wait() | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Xu Han <xuhan> | ||||||
| Component: | kernel | Assignee: | Radim Krčmář <rkrcmar> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.0 | CC: | acathrow, hhuang, juzhang, kwolf, lcapitulino, lersek, michen, virt-maint, vyasevic, xfu, xuhan | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-05-20 17:51:22 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 884822 [details]
dmesg log
Guest dmesg log which was generated from 2nd mem dump.
host res info:
# free -m
total used free shared buffers cached
Mem: 3664 3561 102 1 0 512
-/+ buffers/cache: 3049 614
Swap: 3855 1900 1955
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Model name: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
Stepping: 10
CPU MHz: 2826.179
BogoMIPS: 5652.35
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 6144K
NUMA node0 CPU(s): 0-3
Created attachment 884868 [details]
screen shot
Command:
mysql-test: sudo -u mysql ./mysql-test-run
mysql-bench: run-all-tests --server=mysql --cmp=mysql --user=root --password=$PW
I think the network outage is a symptom, not the cause. The fact the dbus service can't receive a response leads to a an wider issue with the VM. Attempts to switch consoles on the system (sending Alt-Ctl-3) failed. Based on the network config on the command line, I don't think this is network related. The VM is using an e1000 nic without any acceleration. I might be plugged into an OVS switch on the host side, but at this point this isn't the issue. The issue is the VM appears halted and doesn't respond to any key presses. -vlad Vlad, while I agree with you, I think that the starting point to debug this issue is to understand why the network is not working. That's the main symptom we have. Can you help me with that?
As far as I could look into this issue, it seems that systemd-logind calls bus_method_call_with_reply(), which will in turn call dbus_connection_send_with_reply_and_block() which is the function that's failing because a reply is not received within a timeout. This could happen because of several reasons. A non-working network is one of them.
Xu Han, I'm under the impression that you were able to keep a shell into the guest before the problem happened. Is this true? If it is, can you please do the following:
1. Send the latest dmesg output (if you have it, and if attachment #884822 [details]) is not the latest one already)
2. Run systemd-loginctl list-sessions. If it doesn't work, run "strace systemd-logincttl list-sessions" (hopefully you have strace)
(In reply to Luiz Capitulino from comment #8) > Xu Han, I'm under the impression that you were able to keep a shell into the > guest before the problem happened. Is this true? If it is, can you please do > the following: Yep, there is a login shell on tty1 running mysql tests. Unfortunately, it is out of control after this issue happened. (could access to the guest following comment 4) The dmesg output which I pasted in comment 1 was fetched from the guest memory dump, and the memory dump was generated after this issue happened (comment 6, '2nd-guest-mem.dump'). Sorry, I have no ideas about gathering more debug infos :( (In reply to Luiz Capitulino from comment #8) > Vlad, while I agree with you, I think that the starting point to debug this > issue is to understand why the network is not working. That's the main > symptom we have. Can you help me with that? Sure > > As far as I could look into this issue, it seems that systemd-logind calls > bus_method_call_with_reply(), which will in turn call > dbus_connection_send_with_reply_and_block() which is the function that's > failing because a reply is not received within a timeout. This could happen > because of several reasons. A non-working network is one of them. The network we are talking about here is either loopback (less likely) or unix domain sockets (more likely). This is internal to the system and not part of the qemu network stack itself. I tried using the Magic SysRQ key on the VM in question and it didn't respond. I was trying to get a stack dump of the current processes. The cpus are still showing as halted, that might explain why the os in not responding. -vlad Here's the latest info on this BZ:
1. We have no access to the guest whatsoever, as all vcpus are halted. We do have access to QEMU through QMP though. And we also have a guest memory dump
2. Here's a stack dump of the systemd-logind init. Nothing unusual.
crash> bt 622
PID: 622 TASK: ffff8800babac1a0 CPU: 1 COMMAND: "systemd-logind"
#0 [ffff8800b9541d68] schedule at ffffffff815c83b0
#1 [ffff8800b9541de0] schedule_hrtimeout_range_clock at ffffffff815c779d
#2 [ffff8800b9541e78] schedule_hrtimeout_range at ffffffff815c77d3
#3 [ffff8800b9541e88] ep_poll at ffffffff811e86d3
#4 [ffff8800b9541f38] sys_epoll_wait at ffffffff811e9675
#5 [ffff8800b9541f80] system_call_fastpath at ffffffff815d2c99
3. Also, some guest network info. The ens4 lost its IP (considering it had one)
crash> net
NET_DEVICE NAME IP ADDRESS(ES)
ffff8800bad47000 lo 127.0.0.1
ffff8800b8842000 ens4
crash>
4. Now, something which looks puzzling to me: query-block says that the image file (which is partition /dev/sdb6) has actual-size=0:
{"return": [{"io-status": "ok", "device": "drive-virtio0", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "image": {"virtual-size": 32212254720, "filename": "/dev/sdb6", "format": "raw", "actual-size": 0}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "/dev/sdb6", "encryption_key_missing": false}, "type": "unknown"}]}
qemu-img info tells the same thing. Xu Han, this image can't be empty, right? Kevin, is it possible to show actual-size=0 (say, QEMU wasn't able to determine the device's actual size).
(In reply to Luiz Capitulino from comment #11) > Here's the latest info on this BZ: > > 1. We have no access to the guest whatsoever, as all vcpus are halted. We do > have access to QEMU through QMP though. And we also have a guest memory dump > > 2. Here's a stack dump of the systemd-logind init. Nothing unusual. > > crash> bt 622 > PID: 622 TASK: ffff8800babac1a0 CPU: 1 COMMAND: "systemd-logind" > #0 [ffff8800b9541d68] schedule at ffffffff815c83b0 > #1 [ffff8800b9541de0] schedule_hrtimeout_range_clock at ffffffff815c779d > #2 [ffff8800b9541e78] schedule_hrtimeout_range at ffffffff815c77d3 > #3 [ffff8800b9541e88] ep_poll at ffffffff811e86d3 > #4 [ffff8800b9541f38] sys_epoll_wait at ffffffff811e9675 > #5 [ffff8800b9541f80] system_call_fastpath at ffffffff815d2c99 > > 3. Also, some guest network info. The ens4 lost its IP (considering it had > one) > > crash> net > NET_DEVICE NAME IP ADDRESS(ES) > ffff8800bad47000 lo 127.0.0.1 > ffff8800b8842000 ens4 > crash> > > 4. Now, something which looks puzzling to me: query-block says that the > image file (which is partition /dev/sdb6) has actual-size=0: > > {"return": [{"io-status": "ok", "device": "drive-virtio0", "locked": false, > "removable": false, "inserted": {"iops_rd": 0, "image": {"virtual-size": > 32212254720, "filename": "/dev/sdb6", "format": "raw", "actual-size": 0}, > "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "raw", "iops": 0, > "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "/dev/sdb6", > "encryption_key_missing": false}, "type": "unknown"}]} > > qemu-img info tells the same thing. Xu Han, this image can't be empty, > right? Kevin, is it possible to show actual-size=0 (say, QEMU wasn't able to > determine the device's actual size). Hi Xuhan, Can you add your comment? Best Regards, Junyi (In reply to Luiz Capitulino from comment #11) > 4. Now, something which looks puzzling to me: query-block says that the > image file (which is partition /dev/sdb6) has actual-size=0: > > {"return": [{"io-status": "ok", "device": "drive-virtio0", "locked": false, > "removable": false, "inserted": {"iops_rd": 0, "image": {"virtual-size": > 32212254720, "filename": "/dev/sdb6", "format": "raw", "actual-size": 0}, > "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "raw", "iops": 0, > "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "/dev/sdb6", > "encryption_key_missing": false}, "type": "unknown"}]} > > qemu-img info tells the same thing. Xu Han, this image can't be empty, > right? Kevin, is it possible to show actual-size=0 (say, QEMU wasn't able to > determine the device's actual size). sdb6 is not empty. I got the similar output on another VM(qcow2 format): # qemu-img info /dev/sdb7 image: /dev/sdb7 file format: qcow2 virtual size: 30G (32212254720 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false {"return": [{"io-status": "ok", "device": "drive-scsi0-0-0", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "image": {"virtual-size": 32212254720, "filename": "/dev/sdb7", "cluster-size": 65536, "format": "qcow2", "actual-size": 0, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "/dev/sdb7", "encryption_key_missing": false}, "type": "unknown"}]} (In reply to Luiz Capitulino from comment #11) > 3. Also, some guest network info. The ens4 lost its IP (considering it had > one) > > crash> net > NET_DEVICE NAME IP ADDRESS(ES) > ffff8800bad47000 lo 127.0.0.1 > ffff8800b8842000 ens4 ens4 had an IP address which was generated by a DHCP server. (In reply to Luiz Capitulino from comment #11) > 4. Now, something which looks puzzling to me: query-block says that the > image file (which is partition /dev/sdb6) has actual-size=0: > [...] > qemu-img info tells the same thing. Xu Han, this image can't be empty, > right? Kevin, is it possible to show actual-size=0 (say, QEMU wasn't able to > determine the device's actual size). On block devices (as opposed to regular files), you always get an actual-size of 0 because that is what fstat() returns for st_blocks. I guess we could change the host_device driver to return the full size instead if this is confusing. I find it confusing myself, but it's minor. I thought that the fact that actual-size=0 could have something to do with this bug, but it doesn't. Thanks. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Randim, I couldn't find anything wrong in the guest, the next thing I'd do would be to check in the host why the guest's vCPUs are stopped. As long as the qemu process is running (which I imagine is the case, given that the monitor responds), one could also carefully attach to the qemu process, and peek around the VCPU threads (just make sure you detach before exiting gdb). I didn't find anything unusual in kvm/qemu - qemu scheduled vcpus, which periodically handled irqs - I could switch consoles with alt+f. and write in them (login wasn't there, understandable with systemd-logind failure) - new systemd.logind timeouts popped up between writing - SysRq didn't work because we have 'kernel.sysrq = 16' (insane modern default) I was getting more desperate in trying to find out what could have gone wrong, messed up and lost the guest after gdb hung when single-stepping qemu-kvm ... So I looked into the guest dumpfiles after that. The guest was weird: A lot (113) of processes were 'UN' (uninterruptible), like dbus-daemon, systemd and dhclient[1], which wasn't scheduled for more that 22 hours. They were all waiting on xfs. I think it is the same bug 1052789. Going to dig deeper into xfs structures to confirm that, tomorrow. --- 1: PID: 1633 TASK: ffff8800b96c8000 CPU: 3 COMMAND: "dhclient" #0 [ffff8800b95b9ae8] __schedule at ffffffff815c7e9d #1 [ffff8800b95b9b50] schedule at ffffffff815c83d9 #2 [ffff8800b95b9b60] xlog_grant_head_wait at ffffffffa01edfed [xfs] #3 [ffff8800b95b9bb0] xlog_grant_head_check at ffffffffa01ee155 [xfs] #4 [ffff8800b95b9bf0] xfs_log_reserve at ffffffffa01f188f [xfs] #5 [ffff8800b95b9c20] xfs_trans_reserve at ffffffffa01ad524 [xfs] #6 [ffff8800b95b9c68] xfs_vn_update_time at ffffffffa01a3b76 [xfs] #7 [ffff8800b95b9ca8] update_time at ffffffff811bc575 #8 [ffff8800b95b9cd8] file_update_time at ffffffff811bc7e0 #9 [ffff8800b95b9d10] xfs_file_aio_write_checks at ffffffffa019b5ab [xfs] #10 [ffff8800b95b9d58] xfs_file_buffered_aio_write at ffffffffa019b63e [xfs] #11 [ffff8800b95b9de0] xfs_file_aio_write at ffffffffa019b821 [xfs] #12 [ffff8800b95b9e30] do_sync_write at ffffffff811a1ec0 #13 [ffff8800b95b9f00] vfs_write at ffffffff811a25fd #14 [ffff8800b95b9f40] sys_write at ffffffff811a2fc9 #15 [ffff8800b95b9f80] system_call_fastpath at ffffffff815d2c99 RIP: 00007faeb9e92c30 RSP: 00007fff5409b5c8 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff815d2c99 RCX: 0000000000000033 RDX: 000000000000026b RSI: 00007faebb1a3000 RDI: 0000000000000004 RBP: 00007faebb1a3000 R8: 00007faebb189840 R9: 000000000000001f R10: 00007faebb189840 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000026b R14: 00007faebd11c9f0 R15: 000000000000026b ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b Wow, nice, it really looks like the same issue. Maybe Xu Han could run two VMs to reproduce the problem. One with the same kernel as in the description and the other with kernel-3.10.0-113.el7 and see which one reproduces the problem? Having to wait more than a month to get the results looks cumbersome though. But what are our options? Run the same test case as bug 1052789? Does anyone have better ideas to confirm it's the same issue by testing? generic/270 xfstest is executing right now; delays caused by iscsi backing could have played a role in our original bug, but the test requires two dedicated partitions, so raw file is used instead, which makes it almost useless for us, but it took me few hours to debug their compilation process, so I wanted to try at least. I wasn't very keen on reading xfs code after that, still, crash analysis is one of our better chances, as this bug might not be hit again. Does QE have enough capacity to run two guests? It looks similar to 1091136 as well, but we have empty xlog->cil->xc_cil.
crash> bt 12568
PID: 12568 TASK: ffff880086995780 CPU: 0 COMMAND: "kworker/0:0"
#0 [ffff88002313fc70] __schedule at ffffffff815c7e9d
#1 [ffff88002313fcd8] schedule at ffffffff815c83d9
#2 [ffff88002313fce8] xlog_grant_head_wait at ffffffffa01edfed [xfs]
#3 [ffff88002313fd38] xlog_grant_head_check at ffffffffa01ee155 [xfs]
#4 [ffff88002313fd78] xfs_log_reserve at ffffffffa01f188f [xfs]
#5 [ffff88002313fda8] xfs_trans_reserve at ffffffffa01ad524 [xfs]
#6 [ffff88002313fdf0] xfs_fs_log_dummy at ffffffffa019d775 [xfs]
#7 [ffff88002313fe10] xfs_log_worker at ffffffffa01f1168 [xfs]
#8 [ffff88002313fe28] process_one_work at ffffffff8107860b
#9 [ffff88002313fe70] worker_thread at ffffffff810793bb
#10 [ffff88002313fed0] kthread at ffffffff8107fc10
#11 [ffff88002313ff50] ret_from_fork at ffffffff815d2bec
crash> p &((struct xlog *)0xffff8800b89a5800)->l_cilp->xc_cil
$1 = (struct list_head *) 0xffff8800b8ba8e08
crash> p ((struct xlog *)0xffff8800b89a5800)->l_cilp->xc_cil
$2 = {
next = 0xffff8800b8ba8e08,
prev = 0xffff8800b8ba8e08
}
XFS was slowly leaking log space until it couldn't reserve more, and a lot of processes got stuck at that point. (See http://oss.sgi.com/pipermail/xfs/2014-May/036276.html for fix and details.) The patch is being backported, *** This bug has been marked as a duplicate of bug 1091136 *** |
Description of problem: One guest running mysql bench & test for 35 days then works abnormally. 1. can not access guest network. 2. can not login guest through tty(got messages below). [3093270.843198] systemd-logind[622]: Failed to issue method call: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Execute 'query-cpus' always get: {"return": [{"current": true, "CPU": 0, "pc": -2130427674, "halted": true, "thread_id": 2986}, {"current": false, "CPU": 1, "pc": -2130427674, "halted": true, "thread_id": 2987}, {"current": false, "CPU": 2, "pc": -2130427674, "halted": true, "thread_id": 2988}, {"current": false, "CPU": 3, "pc": -2130427674, "halted": true, "thread_id": 2989}]} (PC didnot changed) Some guest info: KERNEL: /usr/lib/debug/lib/modules/3.10.0-97.el7.x86_64/vmlinux DUMPFILE: guest-mem.dump CPUS: 4 DATE: Thu Apr 10 13:18:44 2014 UPTIME: 35 days, 19:10:13 LOAD AVERAGE: 113.00, 112.99, 112.95 TASKS: 298 NODENAME: mysql.el7.x86_64.longevity RELEASE: 3.10.0-97.el7.x86_64 VERSION: #1 SMP Tue Feb 25 15:22:08 EST 2014 MACHINE: x86_64 (2826 Mhz) MEMORY: 3 GB Something need to clarify: After found this issue, I did 'sync' & 'drop_caches' on host, also, did twice: { "execute": "dump-guest-memory", "arguments": { "protocol": "file:guest-mem.dump", "paging": true } } Not sure if these above operations would affect debug. Version-Release number of selected component (if applicable): guest kernel: 3.10.0-97.el7.x86_64 qemu-kvm-rhev-1.5.3-49.el7.x86_64 host kernel: 3.10.0-95.el7.x86_64 How reproducible: 1/1 Steps to Reproduce: 1. 2. 3. Actual results: Guest os worked abnormally. Expected results: Guest os works properly. Additional info: QEMU CLI: /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -cpu Penryn -m 3G -smp 4,threads=1,cores=4,sockets=1 -enable-kvm -name dell-op760-02-RHEL-7.0-x86_64-mysql -uuid 1ab4f15a-054a-494e-a113-6fcd0af703d9 -nodefconfig -nodefaults -no-shutdown -k en-us -rtc base=utc,clock=host,driftfix=slew -qmp tcp:0:5000,server,nowait -boot order=c,menu=on -iscsi initiator-name=iqn.1994-05.com.redhat:longevity-test -vga qxl -global qxl-vga.vram_size=67108864 -spice port=6000,password=123@qwe -drive file=/dev/sdb6,if=none,id=drive-virtio0,cache=none,aio=native,rerror=stop,werror=stop -device virtio-blk-pci,drive=drive-virtio0,id=os-disk,scsi=off,bootindex=1 -netdev tap,id=tap0,vhost=off,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown -device e1000,netdev=tap0,mac=54:e1:d2:c3:b4:a3,id=net0 -chardev socket,id=charserial0,path=/var/local/qemu/dell-op760-02-RHEL-7.0-x86_64-mysql/console,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -device virtio-serial-pci,id=virtio-serial0,max_ports=16 -chardev socket,id=channel0,path=/var/local/qemu/dell-op760-02-RHEL-7.0-x86_64-mysql/virtserial,server,nowait -device virtserialport,chardev=channel0,name=org.linux-kvm.port.0,bus=virtio-serial0.0,id=port1 -chardev socket,id=qemu-ga0,path=/var/local/qemu/dell-op760-02-RHEL-7.0-x86_64-mysql/qemu-ga,server,nowait -device virtserialport,chardev=qemu-ga0,name=org.qemu.guest_agent.0,bus=virtio-serial0.0,id=port2 -device virtio-balloon-pci,id=balloon0