Bug 1931929 - VDSM crash without any visible reasons
Summary: VDSM crash without any visible reasons
Keywords:
Status: CLOSED DUPLICATE of bug 1931331
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 8.4
Assignee: Peter Krempa
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-23 15:30 UTC by Pavel Zinchuk
Modified: 2021-04-21 04:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-10 12:29:41 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
maps file from issue with date 2021-02-23 14:38 (73.77 KB, text/plain)
2021-02-25 16:16 UTC, Pavel Zinchuk
no flags Details
core_backtrace file from issue with date 2021-02-23 14:38 (121.33 KB, text/plain)
2021-02-25 16:16 UTC, Pavel Zinchuk
no flags Details

Description Pavel Zinchuk 2021-02-23 15:30:31 UTC
Description of problem:
Randomly on different oVirt nodes occurred issue with VDSM service - (MainThread) [vds] Received signal 15, shutting down (vdsmd:73)

At the same time I see events from oVirt Engine:
 - VDSM ovirt-host-02.local command Get Host Statistics failed: Broken pipe
 - Invalid status on Data Center ClusterName. Setting Data Center status to Non Responsive (On host ovirt-host-02.local, Error: Network error during communication with the Host.).
 - VDSM command GetStoragePoolInfoVDS failed: Connection timeout for host ' ovirt-host-02.local', last response arrived 22501 ms ago
 - Host ovirt-host-02.local is non responsive

During couple of minutes oVirt node restart VDSM and server again connected to the oVirt Cluster, but this unexpected VDSM crashing trigger VMs migration.

Version-Release number of selected component (if applicable):
ovirt-engine version 4.4.4.7-1.el8

# rpm -qa | grep vdsm
vdsm-network-4.40.40-1.el8.x86_64
vdsm-client-4.40.40-1.el8.noarch
vdsm-4.40.40-1.el8.x86_64
vdsm-hook-ethtool-options-4.40.40-1.el8.noarch
vdsm-common-4.40.40-1.el8.noarch
vdsm-hook-fcoe-4.40.40-1.el8.noarch
vdsm-http-4.40.40-1.el8.noarch
vdsm-hook-openstacknet-4.40.40-1.el8.noarch
vdsm-yajsonrpc-4.40.40-1.el8.noarch
vdsm-jsonrpc-4.40.40-1.el8.noarch
vdsm-hook-vmfex-dev-4.40.40-1.el8.noarch
vdsm-python-4.40.40-1.el8.noarch
vdsm-gluster-4.40.40-1.el8.x86_64
vdsm-hook-vhostmd-4.40.40-1.el8.noarch
vdsm-api-4.40.40-1.el8.noarch

# rpm -qa | grep ovirt
ovirt-ansible-collection-1.2.4-1.el8.noarch
ovirt-host-dependencies-4.4.1-4.el8.x86_64
ovirt-host-4.4.1-4.el8.x86_64
ovirt-imageio-common-2.1.1-1.el8.x86_64
ovirt-openvswitch-ovn-common-2.11-0.2020061801.el8.noarch
ovirt-imageio-client-2.1.1-1.el8.x86_64
ovirt-python-openvswitch-2.11-0.2020061801.el8.noarch
python3-ovirt-setup-lib-1.3.2-1.el8.noarch
cockpit-ovirt-dashboard-0.14.17-1.el8.noarch
ovirt-vmconsole-host-1.0.9-1.el8.noarch
ovirt-vmconsole-1.0.9-1.el8.noarch
ovirt-openvswitch-ovn-host-2.11-0.2020061801.el8.noarch
ovirt-provider-ovn-driver-1.2.33-1.el8.noarch
ovirt-release44-4.4.4.1-1.el8.noarch
ovirt-hosted-engine-ha-2.4.5-1.el8.noarch
ovirt-openvswitch-ovn-2.11-0.2020061801.el8.noarch
ovirt-openvswitch-2.11-0.2020061801.el8.noarch
ovirt-imageio-daemon-2.1.1-1.el8.x86_64
ovirt-hosted-engine-setup-2.4.9-1.el8.noarch
python3-ovirt-engine-sdk4-4.4.8-1.el8.x86_64


How reproducible:
I don't have reproduce steps.
Issue occurs on oVirt host without any action from oVirt users. 


Actual results:
Crash VDSM service on oVirt node.


Expected results:
SuperVDSM server shouldn't terminate


Additional info:

vdsm.log:
2021-02-23 14:38:27,587+0000 INFO  (jsonrpc/0) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': '935bc35f-8cf8-4d42-944e-14a6ccfe1154'} (api:129)
2021-02-23 14:38:27,587+0000 INFO  (jsonrpc/0) [api.virt] FINISH getStats return={'status': {'code': 1, 'message': "Virtual machine does not exist: {'vmId': '935bc35f-8cf8-4d42-944e-14a6ccfe1154'}"}} from=::1,60484, vmId=935bc35f-8cf8-4d42-944e-14a6ccfe1154 (api:54)
2021-02-23 14:38:27,587+0000 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:312)
2021-02-23 14:38:29,305+0000 WARN  (qgapoller/1) [root] connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)
2021-02-23 14:38:29,306+0000 CRIT  (qgapoller/1) [root] taking calling process down. (libvirtconnection:158)
2021-02-23 14:38:29,306+0000 WARN  (qgapoller/1) [root] connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)
2021-02-23 14:38:29,307+0000 CRIT  (qgapoller/1) [root] taking calling process down. (libvirtconnection:158)
2021-02-23 14:38:29,307+0000 INFO  (qgapoller/1) [vds] Failed to get guest info for vm=d95e4196-2b5e-487b-97aa-daf9ad8b3272, error: internal error: client socket is closed (qemuguestagent:414)
2021-02-23 14:38:29,310+0000 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:73)
2021-02-23 14:38:29,311+0000 INFO  (MainThread) [root] Stopping DHCP monitor. (dhcp_monitor:106)
2021-02-23 14:38:29,541+0000 INFO  (MainThread) [root] Stopping Bond monitor. (bond_monitor:53)
2021-02-23 14:38:29,543+0000 INFO  (MainThread) [health] Stopping health monitor (health:71)
2021-02-23 14:38:29,556+0000 INFO  (MainThread) [vds] Stopping http server (http:85)
2021-02-23 14:38:29,556+0000 INFO  (http) [vds] Server stopped (http:75)
2021-02-23 14:38:29,557+0000 INFO  (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:446)
2021-02-23 14:38:29,565+0000 INFO  (MainThread) [root] Unregistering all secrets (secret:94)
2021-02-23 14:38:29,565+0000 WARN  (MainThread) [root] connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)
2021-02-23 14:38:29,565+0000 CRIT  (MainThread) [root] taking calling process down. (libvirtconnection:158)
2021-02-23 14:38:29,566+0000 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:73)
2021-02-23 14:38:29,566+0000 ERROR (MainThread) [vds] Exception raised (vdsmd:164)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 162, in run
    serve_clients(log)
  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 129, in serve_clients
    cif.prepareForShutdown()
  File "/usr/lib/python3.6/site-packages/vdsm/clientIF.py", line 391, in prepareForShutdown
    secret.clear()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/secret.py", line 96, in clear
    for virsecret in con.listAllSecrets():
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 5692, in listAllSecrets
    raise libvirtError("virConnectListAllSecrets() failed", conn=self)
libvirt.libvirtError: internal error: client socket is closed
2021-02-23 14:38:29,570+0000 INFO  (MainThread) [vds] Stopping threads (vdsmd:166)
2021-02-23 14:38:29,570+0000 INFO  (MainThread) [vds] Exiting (vdsmd:177)
2021-02-23 14:40:01,004+0000 INFO  (MainThread) [vds] (PID: 3722708) I am the actual vdsm 4.40.40.1 ovirt-host-02.local (4.18.0-240.10.1.el8_3.x86_64) (vdsmd:155)
2021-02-23 14:40:01,004+0000 INFO  (MainThread) [vds] VDSM will run with cpu affinity: frozenset({1}) (vdsmd:274)
2021-02-23 14:40:01,016+0000 INFO  (MainThread) [storage.HSM] START HSM init (hsm:376)
2021-02-23 14:40:01,021+0000 INFO  (MainThread) [storage.HSM] Creating data-center mount directory '/rhev/data-center/mnt' (hsm:381)
2021-02-23 14:40:01,021+0000 INFO  (MainThread) [storage.fileUtils] Creating directory: /rhev/data-center/mnt mode: None (fileUtils:201)
2021-02-23 14:40:01,528+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/19822b30-5d5b-4b5b-a003-49a6537bde9f' (hsm:532)
2021-02-23 14:40:01,529+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/24d3498c-9ddf-4b0c-a5ab-591c90daf695' (hsm:532)
2021-02-23 14:40:01,529+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/35b65f82-7b9f-4304-ab09-e1ddd5d13fa3' (hsm:532)
2021-02-23 14:40:01,530+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/de7fec7c-f7d3-4f9c-b378-0442a873fa69' (hsm:532)
2021-02-23 14:40:01,530+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/4c76052f-9281-474c-9af7-c742ae8b2592' (hsm:532)
2021-02-23 14:40:01,530+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/dc474146-14ca-4883-a042-72f8543e001c' (hsm:532)
2021-02-23 14:40:01,531+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/f98a9d77-2e28-41c1-940a-da4a06ed2bab' (hsm:532)
2021-02-23 14:40:01,531+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/7453f6ce-d98e-4378-b424-4de420eb1d47' (hsm:532)
2021-02-23 14:40:01,531+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/mastersd' (hsm:532)
2021-02-23 14:40:01,532+0000 INFO  (MainThread) [storage.HSM] Removing directory '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af' (hsm:546)
2021-02-23 14:40:01,534+0000 INFO  (MainThread) [storage.check] Starting check service (check:91)
2021-02-23 14:40:01,535+0000 INFO  (hsm/init) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
2021-02-23 14:40:01,536+0000 INFO  (hsm/init) [storage.ISCSI] Scanning iSCSI devices (iscsi:442)
2021-02-23 14:40:01,546+0000 INFO  (MainThread) [storage.Dispatcher] Starting StorageDispatcher... (dispatcher:47)
2021-02-23 14:40:01,549+0000 INFO  (check/loop) [storage.asyncevent] Starting <EventLoop running=True closed=False at 0x140547172350944> (asyncevent:126)
2021-02-23 14:40:01,652+0000 INFO  (MainThread) [vdsm.api] START registerDomainStateChangeCallback(callbackFunc=functools.partial(<function clientIF.contEIOVms at 0x7fd3ac0a9b70>, <weakproxy at 0x7fd3ac424d68 to clientIF at 0x7fd3ac438588>)) from=internal, task_id=5568047
d-91fd-48c3-8bb4-61844a3167ff (api:48)
2021-02-23 14:40:01,653+0000 INFO  (MainThread) [vdsm.api] FINISH registerDomainStateChangeCallback return=None from=internal, task_id=5568047d-91fd-48c3-8bb4-61844a3167ff (api:54)
2021-02-23 14:40:01,654+0000 INFO  (MainThread) [vds] Using libvirt for querying QEMU-GA (qemuguestagent:162)
2021-02-23 14:40:01,655+0000 INFO  (MainThread) [MOM] Preparing MOM interface (momIF:85)
2021-02-23 14:40:01,655+0000 INFO  (MainThread) [MOM] MOM: Using named unix socket: /run/vdsm/mom-vdsm.sock (momIF:94)
2021-02-23 14:40:01,656+0000 INFO  (MainThread) [root] Unregistering all secrets (secret:94)
2021-02-23 14:40:01,683+0000 INFO  (MainThread) [vds] Setting channels' timeout to 30 seconds. (vmchannels:224)
2021-02-23 14:40:01,684+0000 INFO  (MainThread) [vds] Starting QEMU-GA poller (qemuguestagent:176)
2021-02-23 14:40:01,687+0000 INFO  (MainThread) [vds.MultiProtocolAcceptor] Listening at :::54321 (protocoldetector:183)
2021-02-23 14:40:01,822+0000 INFO  (http) [vds] Server running (http:64)


supervdsm.log:
MainThread::DEBUG::2021-02-23 14:38:29,977::supervdsm_server::314::SuperVdsm.Server::(main) Terminated normally
Thread-1::DEBUG::2021-02-23 14:38:29,979::concurrent::265::root::(run) FINISH thread <Thread(Thread-1, stopped daemon 139762298169088)> (0)
MainThread::DEBUG::2021-02-23 14:38:40,079::__init__::64::blivet::(register_device_format) registered device format class DeviceFormat as None
MainThread::DEBUG::2021-02-23 14:38:40,092::__init__::64::blivet::(register_device_format) registered device format class BIOSBoot as biosboot
MainThread::DEBUG::2021-02-23 14:38:40,101::__init__::64::blivet::(register_device_format) registered device format class DiskLabel as disklabel
MainThread::DEBUG::2021-02-23 14:38:40,101::__init__::64::blivet::(register_device_format) registered device format class DMRaidMember as dmraidmember
MainThread::DEBUG::2021-02-23 14:38:40,111::__init__::64::blivet::(register_device_format) registered device format class Ext2FS as ext2
MainThread::DEBUG::2021-02-23 14:38:40,111::__init__::64::blivet::(register_device_format) registered device format class Ext3FS as ext3
MainThread::DEBUG::2021-02-23 14:38:40,111::__init__::64::blivet::(register_device_format) registered device format class Ext4FS as ext4
MainThread::DEBUG::2021-02-23 14:38:40,111::__init__::64::blivet::(register_device_format) registered device format class FATFS as vfat
MainThread::DEBUG::2021-02-23 14:38:40,112::__init__::64::blivet::(register_device_format) registered device format class EFIFS as efi
MainThread::DEBUG::2021-02-23 14:38:40,112::__init__::64::blivet::(register_device_format) registered device format class BTRFS as btrfs
MainThread::DEBUG::2021-02-23 14:38:40,112::__init__::64::blivet::(register_device_format) registered device format class GFS2 as gfs2
MainThread::DEBUG::2021-02-23 14:38:40,112::__init__::64::blivet::(register_device_format) registered device format class JFS as jfs
MainThread::DEBUG::2021-02-23 14:38:40,112::__init__::64::blivet::(register_device_format) registered device format class ReiserFS as reiserfs
MainThread::DEBUG::2021-02-23 14:38:40,113::__init__::64::blivet::(register_device_format) registered device format class XFS as xfs
MainThread::DEBUG::2021-02-23 14:38:40,113::__init__::64::blivet::(register_device_format) registered device format class HFS as hfs
MainThread::DEBUG::2021-02-23 14:38:40,113::__init__::64::blivet::(register_device_format) registered device format class AppleBootstrapFS as appleboot
MainThread::DEBUG::2021-02-23 14:38:40,113::__init__::64::blivet::(register_device_format) registered device format class HFSPlus as hfs+
MainThread::DEBUG::2021-02-23 14:38:40,113::__init__::64::blivet::(register_device_format) registered device format class MacEFIFS as macefi
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class NTFS as ntfs
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class NFS as nfs
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class NFSv4 as nfs4
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class Iso9660FS as iso9660
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class NoDevFS as nodev
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class DevPtsFS as devpts
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class ProcFS as proc
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class SysFS as sysfs
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class TmpFS as tmpfs
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class BindFS as bind
MainThread::DEBUG::2021-02-23 14:38:40,114::__init__::64::blivet::(register_device_format) registered device format class SELinuxFS as selinuxfs
MainThread::DEBUG::2021-02-23 14:38:40,115::__init__::64::blivet::(register_device_format) registered device format class USBFS as usbfs
MainThread::DEBUG::2021-02-23 14:38:40,115::__init__::64::blivet::(register_device_format) registered device format class EFIVarFS as efivarfs
MainThread::INFO::2021-02-23 14:38:40,134::__init__::55::program::(log_bd_message) Running [1] lvm version ...
MainThread::INFO::2021-02-23 14:38:40,144::__init__::55::program::(log_bd_message) stdout[1]:   LVM version:     2.03.09(2)-RHEL8 (2020-05-28)
  Library version: 1.02.171-RHEL8 (2020-05-28)
  Driver version:  4.42.0
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/in
clude --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/
lock/lvm --with-usrlibdir=/usr/lib64 --enable-fsadm --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --with-cluster=internal --enabl
e-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-dlmcontrol --enable-lvmlockd-sanlock --enable-dbus-service --enable-notify-dbus --enable-dmfilemapd --wi
th-writecache=internal --with-vdo=internal --with-vdo-format=/usr/bin/vdoformat --with-integrity=internal --disable-silent-rules

MainThread::INFO::2021-02-23 14:38:40,145::__init__::55::program::(log_bd_message) stderr[1]:
MainThread::INFO::2021-02-23 14:38:40,145::__init__::55::program::(log_bd_message) ...done [1] (exit code: 0)
MainThread::DEBUG::2021-02-23 14:38:40,162::__init__::64::blivet::(register_device_format) registered device format class LUKS as luks
MainThread::DEBUG::2021-02-23 14:38:40,162::__init__::64::blivet::(register_device_format) registered device format class Integrity as integrity
MainThread::DEBUG::2021-02-23 14:38:40,164::__init__::64::blivet::(register_device_format) registered device format class LVMPhysicalVolume as lvmpv
MainThread::DEBUG::2021-02-23 14:38:40,165::__init__::64::blivet::(register_device_format) registered device format class MDRaidMember as mdmember
MainThread::DEBUG::2021-02-23 14:38:40,165::__init__::64::blivet::(register_device_format) registered device format class MultipathMember as multipath_member
MainThread::DEBUG::2021-02-23 14:38:40,166::__init__::64::blivet::(register_device_format) registered device format class PPCPRePBoot as prepboot
MainThread::DEBUG::2021-02-23 14:38:40,167::__init__::64::blivet::(register_device_format) registered device format class SwapSpace as swap
MainThread::DEBUG::2021-02-23 14:38:40,446::supervdsm_server::274::SuperVdsm.Server::(main) Making sure I'm root - SuperVdsm
MainThread::DEBUG::2021-02-23 14:38:40,446::supervdsm_server::283::SuperVdsm.Server::(main) Parsing cmd args
MainThread::DEBUG::2021-02-23 14:38:40,446::supervdsm_server::286::SuperVdsm.Server::(main) Cleaning old socket /run/vdsm/svdsm.sock
MainThread::DEBUG::2021-02-23 14:38:40,446::supervdsm_server::290::SuperVdsm.Server::(main) Setting up keep alive thread
MainThread::DEBUG::2021-02-23 14:38:40,446::supervdsm_server::296::SuperVdsm.Server::(main) Creating remote object manager
Thread-1::DEBUG::2021-02-23 14:38:40,447::concurrent::258::root::(run) START thread <Thread(Thread-1, started daemon 140413524825856)> (func=<bound method Server.serve_forever of <multiprocessing.managers.Server object at 0x7fb4a45fb208>>, args=(), kwargs={})
MainThread::INFO::2021-02-23 14:38:40,448::fileUtils::252::storage.fileUtils::(chown) Changing owner for /run/vdsm/svdsm.sock, to (36:36)
MainThread::DEBUG::2021-02-23 14:38:40,448::cmdutils::130::root::(exec_cmd) /usr/bin/systemctl status lldpad.service (cwd None)
MainThread::DEBUG::2021-02-23 14:38:40,472::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:40,472::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool -ping (cwd None)
MainThread::DEBUG::2021-02-23 14:38:40,524::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,135::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i eno1 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:41,283::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,290::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i enp59s0f0 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:41,298::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,337::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i eno2 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:41,415::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,422::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i enp59s0f1 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:41,443::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,451::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i eno3 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:41,504::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,523::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i enp175s0 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:41,765::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,772::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i enp175s1 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:41,932::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:41,976::cmdutils::130::root::(exec_cmd) /usr/sbin/lldptool get-lldp -i eno4 adminStatus (cwd None)
MainThread::DEBUG::2021-02-23 14:38:42,117::cmdutils::138::root::(exec_cmd) SUCCESS: <err> = b''; <rc> = 0
MainThread::DEBUG::2021-02-23 14:38:42,127::supervdsm_server::309::SuperVdsm.Server::(main) Started serving super vdsm object
restore-net::INFO::2021-02-23 14:38:43,161::restore_net_config::286::root::(restore) networks already restored. doing nothing.

Comment 1 Michal Skrivanek 2021-02-24 05:34:12 UTC
Please attach libvirt logs / journal. Apparently libvirt crashed/exited and in that case we restart vdsm

Comment 2 RHEL Program Management 2021-02-24 05:34:17 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Pavel Zinchuk 2021-02-24 06:36:24 UTC
(In reply to Michal Skrivanek from comment #1)
> Please attach libvirt logs / journal. Apparently libvirt crashed/exited and
> in that case we restart vdsm

Hi Michal, what exactly libvirt log file or journal you need?
I can't provide whole content of /var/log/libvirt/qemu/ because this is production cluster, that has private VM names.

Based on logs, the source of issue - this is requesting stats from not existing VM on oVirt node. This cause crash:
[api.virt] FINISH getStats return={'status': {'code': 1, 'message': "Virtual machine does not exist: {'vmId': '935bc35f-8cf8-4d42-944e-14a6ccfe1154'}"}}

Comment 4 Pavel Zinchuk 2021-02-24 09:48:23 UTC
I have found name of VM that does not exists at time when issue occurred on node-02.
This is HostedEngine VM.

It turns out that an unexpected request for stats of HostedEngine VM on node which not handle this VM cause this crash.

<vm href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154" id="935bc35f-8cf8-4d42-944e-14a6ccfe1154">
<actions>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/shutdown" rel="shutdown"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/start" rel="start"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/stop" rel="stop"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/suspend" rel="suspend"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/detach" rel="detach"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/ticket" rel="ticket"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/previewsnapshot" rel="previewsnapshot"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/reordermacaddresses" rel="reordermacaddresses"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/thawfilesystems" rel="thawfilesystems"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/undosnapshot" rel="undosnapshot"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/reboot" rel="reboot"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/cancelmigration" rel="cancelmigration"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/commitsnapshot" rel="commitsnapshot"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/clone" rel="clone"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/freezefilesystems" rel="freezefilesystems"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/logon" rel="logon"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/maintenance" rel="maintenance"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/migrate" rel="migrate"/>
<link href="/ovirt-engine/api/vms/935bc35f-8cf8-4d42-944e-14a6ccfe1154/export" rel="export"/>
</actions>
<name>HostedEngine</name>
<description>Hosted engine VM</description>
<comment/>

Comment 5 Tomáš Golembiovský 2021-02-24 12:16:05 UTC
(In reply to Pavel Zinchuk from comment #3)
> (In reply to Michal Skrivanek from comment #1)
> > Please attach libvirt logs / journal. Apparently libvirt crashed/exited and
> > in that case we restart vdsm
> 
> Hi Michal, what exactly libvirt log file or journal you need?
> I can't provide whole content of /var/log/libvirt/qemu/ because this is
> production cluster, that has private VM names.

We don't need content of /var/log/libvirt/qemu at the moment. Sending the content of journal should be good enough to give us a starting point (the libvirt logs we require are part of the journal).

Comment 6 Pavel Zinchuk 2021-02-24 14:07:58 UTC
#  journalctl -u libvirtd.service --since "2021-02-23 14:00:00"
-- Logs begin at Tue 2021-02-16 18:51:27 GMT, end at Wed 2021-02-24 14:06:21 GMT. --
Feb 23 14:35:58 ovirt-host-02.local libvirtd[2909]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:36:10 ovirt-host-02.local libvirtd[2909]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:36:11 ovirt-host-02.local libvirtd[2909]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:37:13 ovirt-host-02.local libvirtd[2909]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:37:15 ovirt-host-02.local libvirtd[2909]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:38:17 ovirt-host-02.local libvirtd[2909]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:38:23 ovirt-host-02.local libvirtd[2909]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:38:29 ovirt-host-02.local systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV
Feb 23 14:38:29 ovirt-host-02.local systemd[1]: libvirtd.service: Failed with result 'signal'.
Feb 23 14:38:29 ovirt-host-02.local systemd[1]: libvirtd.service: Service RestartSec=100ms expired, scheduling restart.
Feb 23 14:38:29 ovirt-host-02.local systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 1.
Feb 23 14:38:30 ovirt-host-02.local systemd[1]: Stopped Virtualization daemon.
Feb 23 14:38:30 ovirt-host-02.local systemd[1]: Starting Virtualization daemon...
Feb 23 14:38:30 ovirt-host-02.local systemd[1]: Started Virtualization daemon.
Feb 23 14:39:56 ovirt-host-02.local libvirtd[3710529]: libvirt version: 6.6.0, package: 7.3.el8 (CBS <cbs>, 2021-01-27-15:50:28, )
Feb 23 14:39:56 ovirt-host-02.local libvirtd[3710529]: hostname: ovirt-host-02.local
Feb 23 14:39:56 ovirt-host-02.local libvirtd[3710529]: Failed to open file '/proc/3713202/stat': No such file or directory
Feb 23 14:40:12 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:40:39 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:40:52 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:41:06 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:43:26 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:43:40 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:44:07 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:44:23 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:44:36 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:44:50 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:45:10 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:45:31 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:46:28 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:48:12 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:48:19 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now
Feb 23 14:49:21 ovirt-host-02.local libvirtd[3710529]: Guest agent is not responding: Guest agent not available for now

Comment 7 Tomáš Golembiovský 2021-02-24 17:00:49 UTC
This confirms our suspicion that vdsm is restarted because libvirtd crashed:

> Feb 23 14:38:29 ovirt-host-02.local systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV

Unfortunately it looks like the core dump was not generated. Could you please check if 'abrt-cli list' shows anything?
If yes, could you provide the backtrace from libvirtd?

Comment 8 Pavel Zinchuk 2021-02-24 22:02:54 UTC
Yes, 'abrt-cli list' show next:

# abrt-cli list
id a683eea721ef2c5f010ee406f635958712817c1f
reason:         stop_all_submonitors(): monitor.py:119:stop_all_submonitors:RuntimeError: dictionary changed size during iteration
time:           Thu Jul  2 13:30:34 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.3-1.el8
uid:            36 (vdsm)
count:          32
Directory:      /var/tmp/abrt/Python3-2020-07-02-13:30:34-69333

id 0ee3f1e103a4fe31468deebe9a4526886fe62146
reason:         connect(): decoder.py:355:raw_decode:json.decoder.JSONDecodeError: Invalid control character at: line 1 column 51 (char 50)
time:           Fri Dec 11 21:21:46 2020
cmdline:        /usr/bin/python3 -m ovirt_hosted_engine_setup.vm_status --json
package:        ovirt-hosted-engine-setup-2.4.8-1.el8
uid:            0 (root)
count:          2
Directory:      /var/tmp/abrt/Python3-2020-12-11-21:21:46-10054

id 7b2a37152b2b98f7e097402ecf08c5471e0696b8
reason:         libvirtd killed by SIGSEGV
time:           Tue Feb 23 14:38:24 2021
cmdline:        /usr/sbin/libvirtd
package:        libvirt-daemon-6.6.0-7.3.el8
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/ccpp-2021-02-23-14:38:24-2909

id 7932d2460de0726bb49bb2ea7894e3f7f0703f31
reason:         print_json(): display_status.py:133:print_json:BrokenPipeError: [Errno 32] Broken pipe
time:           Sat Nov  7 00:53:10 2020
cmdline:        /bin/python3 /bin/gstatus -a -o json
uid:            0 (root)
count:          2
Directory:      /var/tmp/abrt/Python3-2020-11-07-00:53:10-136647

id 390a6a6f038b73df29746e1266027e7da1b44b3e
reason:         XML(): ElementTree.py:1315:XML:  File "<string>", line None
time:           Tue Feb 16 18:32:49 2021
cmdline:        /bin/python3 /bin/gstatus -a -o json
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/Python3-2021-02-16-18:32:49-1143953

id 6b067a1f3461666bb8731843339226a52ae26c1c
reason:         connect_storage_server(): broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Tue Jul  7 06:21:55 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.3-1.el8
uid:            36 (vdsm)
count:          34
Directory:      /var/tmp/abrt/Python3-2020-07-07-06:21:55-1458

id 55b09ddf81526217f6df5b08b9005cb05a99388a
reason:         __vdsm_json_rpc_connect(): broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Sun Aug  2 08:31:22 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.4-1.el8
uid:            36 (vdsm)
count:          25
Directory:      /var/tmp/abrt/Python3-2020-08-02-08:31:22-1608

id 5fa179db944148c7e360b3c2a8caf2d8f2489b02
reason:         ecp22_start.cold.10(): lldpad killed by SIGSEGV
time:           Fri Jul  3 16:55:38 2020
cmdline:        /usr/sbin/lldpad -t
package:        lldpad-1.0.1-13.git036e314.el8
uid:            0 (root)
count:          16
Directory:      /var/tmp/abrt/ccpp-2020-07-03-16:55:38-73653

id f335d16e8d567091cf0bbbad239e2b65e0bec06c
reason:         broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Sat Nov  7 01:51:49 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
uid:            36
Directory:      /var/tmp/abrt/Python3-2020-11-07-01:51:49-9138
Reported:       cannot be reported

id efb1daf0d9a333a6944cbdd4a8d79769f7da1a8c
reason:         broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Sat Nov  7 01:39:56 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
uid:            36
Directory:      /var/tmp/abrt/Python3-2020-11-07-01:39:56-10853
Reported:       cannot be reported

id 242b25636aa5c61d73a4299de965284bd2aa950f
reason:         reload_service(): peer_eventsapi.py:142:reload_service:PermissionError: [Errno 13] Permission denied
time:           Fri Aug 21 14:46:37 2020
cmdline:        /usr/libexec/platform-python /usr/libexec/glusterfs/peer_eventsapi.py node-reload
package:        glusterfs-events-7.7-1.el8
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/Python3-2020-08-21-14:46:37-1254122

id 7315cc9665ab61d4088a900d56140200372a81f6
reason:         define_volumes(): gstatus:134:main:TypeError: catching classes that do not inherit from BaseException is not allowed
time:           Fri Aug 14 08:00:32 2020
cmdline:        /bin/python3.6 /bin/gstatus -a -o json
uid:            0 (root)
count:          1508
Directory:      /var/tmp/abrt/Python3-2020-08-14-08:00:32-50758

id 075f0debff7de3dc02f3b5bdc77d2c74c5a06956
reason:         run(): subprocess.py:438:run:subprocess.CalledProcessError: Command 'gstatus -a -o json ' returned non-zero exit status 1.
time:           Fri Aug 14 08:06:19 2020
cmdline:        /usr/bin/python3 /var/lib/zabbix/gstatus_discovery.py
uid:            0 (root)
count:          43
Directory:      /var/tmp/abrt/Python3-2020-08-14-08:06:19-55709

id 9dca7c942dedc97cf413c84eb99d409d87e6de19
reason:         update_state(): gstatus:144:main:TypeError: catching classes that do not inherit from BaseException is not allowed
time:           Wed Aug  5 01:08:22 2020
cmdline:        /bin/python3.6 /bin/gstatus -a -o json
uid:            0 (root)
count:          3
Directory:      /var/tmp/abrt/Python3-2020-08-05-01:08:22-2506497

id f3caced4d854517cf79e61a4a1092d18375d6d2b
reason:         connect(): unixrpc.py:74:connect:FileNotFoundError: [Errno 2] No such file or directory
time:           Thu Jul  2 13:12:29 2020
cmdline:        /usr/bin/python3 -m ovirt_hosted_engine_setup.reinitialize_lockspace --force
package:        ovirt-hosted-engine-setup-2.4.4-1.el8
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/Python3-2020-07-02-13:12:29-69342

Comment 9 Pavel Zinchuk 2021-02-24 22:05:10 UTC
Content of directory /var/tmp/abrt/ccpp-2021-02-23-14:38:24-2909 I can't provide in public.
This is production ovirt node with private hostnames.

If you need some specific information from /var/tmp/abrt/ccpp-2021-02-23-14:38:24-290 please specify required file or information.

Thanks.

Comment 10 Pavel Zinchuk 2021-02-25 06:45:37 UTC
During last day, we had 3 incidents with te same issue, described previously.
All time the same error "connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)" in the logs.
After update to the latest oVirt version 4.4.4.7-1.el8 the product has become unstable.

Can i ask to increase priority of this ticket to speed up solving the issue?


vdsm.log from latest incident
2021-02-25 04:54:09,912+0000 INFO  (jsonrpc/0) [api.virt] FINISH getStats return={'status': {'code': 1, 'message': "Virtual machine does not exist: {'vmId': '935bc35f-8cf8-4d42-944e-14a6ccfe1154'}"}} from=::1,49970, vmId=935bc35f-8cf8-4d42-944e-14a6ccfe1154 (api:54)
2021-02-25 04:54:09,912+0000 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:312)
2021-02-25 04:54:10,317+0000 INFO  (jsonrpc/7) [api.host] START getAllVmStats() from=::ffff:192.168.45..51,44854 (api:48)
2021-02-25 04:54:10,341+0000 INFO  (jsonrpc/7) [api.host] FINISH getAllVmStats return={'status': {'code': 0, 'message': 'Done'}, 'statsList': (suppressed)} from=::ffff:192.168.45..51,44854 (api:54)
2021-02-25 04:54:11,091+0000 INFO  (jsonrpc/6) [api.host] START getStats() from=::ffff:192.168.45..51,44854 (api:48)
2021-02-25 04:54:11,309+0000 INFO  (jsonrpc/6) [vdsm.api] START repoStats(domains=()) from=::ffff:192.168.45..51,44854, task_id=cba96e56-37d8-4bf4-82de-dc8b44d2baa4 (api:48)
2021-02-25 04:54:11,310+0000 INFO  (jsonrpc/6) [vdsm.api] FINISH repoStats return={'7453f6ce-d98e-4378-b424-4de420eb1d47': {'code': 0, 'lastCheck': '5.8', 'delay': '0.000927147', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, 'dc474146-14ca-4883-a042-72f8
543e001c': {'code': 0, 'lastCheck': '4.6', 'delay': '0.000855514', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, 'f98a9d77-2e28-41c1-940a-da4a06ed2bab': {'code': 0, 'lastCheck': '4.6', 'delay': '0.000876128', 'valid': True, 'version': 0, 'acquired': True
, 'actual': True}, '24d3498c-9ddf-4b0c-a5ab-591c90daf695': {'code': 0, 'lastCheck': '3.1', 'delay': '0.000582697', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, '35b65f82-7b9f-4304-ab09-e1ddd5d13fa3': {'code': 0, 'lastCheck': '1.7', 'delay': '0.0005198',
 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, '4c76052f-9281-474c-9af7-c742ae8b2592': {'code': 0, 'lastCheck': '3.6', 'delay': '0.0011351', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, '19822b30-5d5b-4b5b-a003-49a6537bde9f': {'code':
0, 'lastCheck': '2.4', 'delay': '0.000368789', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}, 'de7fec7c-f7d3-4f9c-b378-0442a873fa69': {'code': 0, 'lastCheck': '1.7', 'delay': '0.000478712', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}} f
rom=::ffff:192.168.45..51,44854, task_id=cba96e56-37d8-4bf4-82de-dc8b44d2baa4 (api:54)
2021-02-25 04:54:11,310+0000 INFO  (jsonrpc/6) [vdsm.api] START multipath_health() from=::ffff:192.168.45..51,44854, task_id=dac513a3-c3b5-4c7f-bca7-630bc1f9347c (api:48)
2021-02-25 04:54:11,310+0000 INFO  (jsonrpc/6) [vdsm.api] FINISH multipath_health return={} from=::ffff:192.168.45..51,44854, task_id=dac513a3-c3b5-4c7f-bca7-630bc1f9347c (api:54)
2021-02-25 04:54:11,357+0000 INFO  (jsonrpc/6) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:192.168.45..51,44854 (api:54)
2021-02-25 04:54:13,109+0000 INFO  (jsonrpc/4) [vdsm.api] START repoStats(domains=['7453f6ce-d98e-4378-b424-4de420eb1d47']) from=::1,49970, task_id=e345ec5e-4ec1-4887-a6f8-37f9d2efcb4e (api:48)
2021-02-25 04:54:13,110+0000 INFO  (jsonrpc/4) [vdsm.api] FINISH repoStats return={'7453f6ce-d98e-4378-b424-4de420eb1d47': {'code': 0, 'lastCheck': '1.6', 'delay': '0.000927147', 'valid': True, 'version': 5, 'acquired': True, 'actual': True}} from=::1,49970, task_id=e345e
c5e-4ec1-4887-a6f8-37f9d2efcb4e (api:54)
2021-02-25 04:54:15,732+0000 WARN  (qgapoller/2) [root] connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)
2021-02-25 04:54:15,733+0000 CRIT  (qgapoller/2) [root] taking calling process down. (libvirtconnection:158)
2021-02-25 04:54:15,734+0000 WARN  (qgapoller/2) [root] connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)
2021-02-25 04:54:15,734+0000 CRIT  (qgapoller/2) [root] taking calling process down. (libvirtconnection:158)
2021-02-25 04:54:15,734+0000 INFO  (qgapoller/2) [vds] Failed to get guest info for vm=648b771d-ef30-4fdf-a57e-aec187445eeb, error: internal error: client socket is closed (qemuguestagent:414)
2021-02-25 04:54:15,734+0000 WARN  (qgapoller/2) [root] connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)
2021-02-25 04:54:15,734+0000 CRIT  (qgapoller/2) [root] taking calling process down. (libvirtconnection:158)
2021-02-25 04:54:15,734+0000 ERROR (qgapoller/2) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7ff9581bbb70>> operation failed (periodic:188)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/periodic.py", line 186, in __call__
    self._func()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 346, in _poller
    if not self._runnable_on_vm(vm_obj):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 547, in _runnable_on_vm
    if not vm.isDomainRunning():
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2426, in isDomainRunning
    status = self._dom.info()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1518, in info
    if ret is None: raise libvirtError ('virDomainGetInfo() failed', dom=self)
libvirt.libvirtError: internal error: client socket is closed
2021-02-25 04:54:15,741+0000 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:73)
2021-02-25 04:54:15,742+0000 INFO  (MainThread) [root] Stopping DHCP monitor. (dhcp_monitor:106)
2021-02-25 04:54:15,842+0000 INFO  (MainThread) [root] Stopping Bond monitor. (bond_monitor:53)
2021-02-25 04:54:15,843+0000 INFO  (MainThread) [health] Stopping health monitor (health:71)
2021-02-25 04:54:15,853+0000 INFO  (MainThread) [vds] Stopping http server (http:85)
2021-02-25 04:54:15,853+0000 INFO  (http) [vds] Server stopped (http:75)
2021-02-25 04:54:15,853+0000 INFO  (MainThread) [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:446)
2021-02-25 04:54:16,068+0000 INFO  (MainThread) [root] Unregistering all secrets (secret:94)
2021-02-25 04:54:16,068+0000 WARN  (MainThread) [root] connection to libvirt broken. ecode: 1 edom: 7 (libvirtconnection:156)
2021-02-25 04:54:16,068+0000 CRIT  (MainThread) [root] taking calling process down. (libvirtconnection:158)
2021-02-25 04:54:16,068+0000 INFO  (MainThread) [vds] Received signal 15, shutting down (vdsmd:73)
2021-02-25 04:54:16,068+0000 ERROR (MainThread) [vds] Exception raised (vdsmd:164)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 162, in run
    serve_clients(log)
  File "/usr/lib/python3.6/site-packages/vdsm/vdsmd.py", line 129, in serve_clients
    cif.prepareForShutdown()
  File "/usr/lib/python3.6/site-packages/vdsm/clientIF.py", line 391, in prepareForShutdown
    secret.clear()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/secret.py", line 96, in clear
    for virsecret in con.listAllSecrets():
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 5692, in listAllSecrets
    raise libvirtError("virConnectListAllSecrets() failed", conn=self)
libvirt.libvirtError: internal error: client socket is closed
2021-02-25 04:54:16,070+0000 INFO  (MainThread) [vds] Stopping threads (vdsmd:166)
2021-02-25 04:54:16,070+0000 INFO  (MainThread) [vds] Exiting (vdsmd:177)
2021-02-25 04:55:28,038+0000 INFO  (MainThread) [vds] (PID: 3511517) I am the actual vdsm 4.40.40.1 ovirt-host-05.local (4.18.0-240.10.1.el8_3.x86_64) (vdsmd:155)
2021-02-25 04:55:28,039+0000 INFO  (MainThread) [vds] VDSM will run with cpu affinity: frozenset({1}) (vdsmd:274)
2021-02-25 04:55:28,049+0000 INFO  (MainThread) [storage.HSM] START HSM init (hsm:376)
2021-02-25 04:55:28,052+0000 INFO  (MainThread) [storage.HSM] Creating data-center mount directory '/rhev/data-center/mnt' (hsm:381)
2021-02-25 04:55:28,053+0000 INFO  (MainThread) [storage.fileUtils] Creating directory: /rhev/data-center/mnt mode: None (fileUtils:201)
2021-02-25 04:55:28,408+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/19822b30-5d5b-4b5b-a003-49a6537bde9f' (hsm:532)
2021-02-25 04:55:28,409+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/24d3498c-9ddf-4b0c-a5ab-591c90daf695' (hsm:532)
2021-02-25 04:55:28,409+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/35b65f82-7b9f-4304-ab09-e1ddd5d13fa3' (hsm:532)
2021-02-25 04:55:28,409+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/de7fec7c-f7d3-4f9c-b378-0442a873fa69' (hsm:532)
2021-02-25 04:55:28,410+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/4c76052f-9281-474c-9af7-c742ae8b2592' (hsm:532)
2021-02-25 04:55:28,410+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/dc474146-14ca-4883-a042-72f8543e001c' (hsm:532)
2021-02-25 04:55:28,410+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/f98a9d77-2e28-41c1-940a-da4a06ed2bab' (hsm:532)
2021-02-25 04:55:28,410+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/7453f6ce-d98e-4378-b424-4de420eb1d47' (hsm:532)
2021-02-25 04:55:28,410+0000 INFO  (MainThread) [storage.HSM] Unlinking file '/rhev/data-center/9eaab007-b755-4563-93a0-5776944327af/mastersd' (hsm:532)
2021-02-25 04:55:28,412+0000 INFO  (MainThread) [storage.check] Starting check service (check:91)
2021-02-25 04:55:28,412+0000 INFO  (hsm/init) [storage.StorageDomainCache] Refreshing storage domain cache (resize=True) (sdc:80)
2021-02-25 04:55:28,413+0000 INFO  (hsm/init) [storage.ISCSI] Scanning iSCSI devices (iscsi:442)
2021-02-25 04:55:28,419+0000 INFO  (MainThread) [storage.Dispatcher] Starting StorageDispatcher... (dispatcher:47)
2021-02-25 04:55:28,421+0000 INFO  (check/loop) [storage.asyncevent] Starting <EventLoop running=True closed=False at 0x139714285661264> (asyncevent:126)
2021-02-25 04:55:28,469+0000 INFO  (MainThread) [vdsm.api] START registerDomainStateChangeCallback(callbackFunc=functools.partial(<function clientIF.contEIOVms at 0x7f11a05efb70>, <weakproxy at 0x7f11c057fd68 to clientIF at 0x7f11c0593748>)) from=internal, task_id=eb2b388
e-d432-45eb-97e6-5a69ded28db9 (api:48)
2021-02-25 04:55:28,470+0000 INFO  (MainThread) [vdsm.api] FINISH registerDomainStateChangeCallback return=None from=internal, task_id=eb2b388e-d432-45eb-97e6-5a69ded28db9 (api:54)
2021-02-25 04:55:28,470+0000 INFO  (MainThread) [vds] Using libvirt for querying QEMU-GA (qemuguestagent:162)
2021-02-25 04:55:28,471+0000 INFO  (MainThread) [MOM] Preparing MOM interface (momIF:85)
2021-02-25 04:55:28,471+0000 INFO  (MainThread) [MOM] MOM: Using named unix socket: /run/vdsm/mom-vdsm.sock (momIF:94)
2021-02-25 04:55:28,471+0000 INFO  (MainThread) [root] Unregistering all secrets (secret:94)
2021-02-25 04:55:28,491+0000 INFO  (MainThread) [vds] Setting channels' timeout to 30 seconds. (vmchannels:224)
2021-02-25 04:55:28,491+0000 INFO  (MainThread) [vds] Starting QEMU-GA poller (qemuguestagent:176)
2021-02-25 04:55:28,493+0000 INFO  (MainThread) [vds.MultiProtocolAcceptor] Listening at :::54321 (protocoldetector:183)
2021-02-25 04:55:28,547+0000 INFO  (http) [vds] Server running (http:64)




# abrt-cli list
id 745c9c5f0430e3b6df6a2f14e6832bf96c926cc0
reason:         g_main_context_check(): libvirtd killed by SIGSEGV
time:           Thu Feb 25 04:54:11 2021
cmdline:        /usr/sbin/libvirtd
package:        libvirt-daemon-6.6.0-7.3.el8
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/ccpp-2021-02-25-04:54:11-3986073

id 26740be4fd85d4c55dc0e001b3505ece4ca72f46
reason:         stop_all_submonitors(): monitor.py:119:stop_all_submonitors:RuntimeError: dictionary changed size during iteration
time:           Fri Aug 21 07:29:05 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.4-1.el8
uid:            36 (vdsm)
count:          8
Directory:      /var/tmp/abrt/Python3-2020-08-21-07:29:05-3019

id 5c16b5b21a0c603453a3006c874b6ad30d9b5685
reason:         connect_storage_server(): broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Thu Aug 20 08:49:53 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.4-1.el8
uid:            36 (vdsm)
count:          129
Directory:      /var/tmp/abrt/Python3-2020-08-20-08:49:53-31179

id f4c5ce95d41d55b15df8f4f5ad6b16244e374792
reason:         __vdsm_json_rpc_connect(): broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Thu Aug 20 09:59:33 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.4-1.el8
uid:            36 (vdsm)
count:          147
Directory:      /var/tmp/abrt/Python3-2020-08-20-09:59:33-12749

id 61417beee541dcb3f8df8b1f73b022c43408e8cb
reason:         get_ovf_from_dir(): query_ova.py:47:get_ovf_from_dir:FileNotFoundError: [Errno 2] No such file or directory: '/tmp/*.ova'
time:           Thu Sep  3 16:41:17 2020
cmdline:        /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1599151276.8783715-2813736-253499112333134/query_ova.py vm /tmp/*.ova True
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/Python3-2020-09-03-16:41:17-3496217

id 93b942fdd204e3e5acbe21051b7f06a0b88abdbf
reason:         get_ovf_from_dir(): query_ova.py:47:get_ovf_from_dir:FileNotFoundError: [Errno 2] No such file or directory: 'll /tmp/vcp_20.2R1.10.ova'
time:           Thu Sep  3 11:35:32 2020
cmdline:        /usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1599132932.4139793-2779162-216986455300300/query_ova.py vm 'll /tmp/vcp_20.2R1.10.ova' True
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/Python3-2020-09-03-11:35:32-3230539

id 979832e576b63a9e36febfe3ec27e65b208e920a
reason:         execute_in_peers(): cliutils.py:134:execute_in_peers:gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Commit failed on prd-ovirt-host-07.estore.rot01.kwebbl.cloud. Error: Unable to end. Error : Success\nCommit failed on prd-ovirt-host-03.estore.rot01.kwebbl.cloud. Error: Unable to end. Error : Success\nCommit failed on prd-ovirt-host-06.estore.rot01.kwebbl.cloud. Error: Unable to end. Error : Success\nCommit failed on prd-ovirt-host-02.estore.rot01.kwebbl.cloud. Error: Unable to end. Error : Success\nCommit failed on prd-ovirt-host-01.estore.rot01.kwebbl.cloud. Error: Unable to end. Error : Success\n', 'gluster system:: execute eventsapi.py node-reload')
time:           Fri Aug 21 14:46:37 2020
cmdline:        /usr/libexec/platform-python /sbin/gluster-eventsapi webhook-add http://prd-ovirt-engine-01.live.rot01.kwebbl.cloud:80/ovirt-engine/services/glusterevents
package:        glusterfs-events-7.7-1.el8
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/Python3-2020-08-21-14:46:37-113908

id 16d49298f5f821ba48d1bf66072e416077903304
reason:         glusterd_friend_rpc_create(): glusterfsd killed by SIGSEGV
time:           Fri Aug 21 11:22:21 2020
cmdline:        /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
package:        glusterfs-fuse-7.7-1.el8
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/ccpp-2020-08-21-11:22:21-1802

id b2bc77f543e5fd6d5630ad787fc18c05317b86b8
reason:         ecp22_start.cold.10(): lldpad killed by SIGSEGV
time:           Fri Aug 21 09:04:03 2020
cmdline:        /usr/sbin/lldpad -t
package:        lldpad-1.0.1-13.git036e314.el8
uid:            0 (root)
count:          1
Directory:      /var/tmp/abrt/ccpp-2020-08-21-09:04:03-12794

id f6e48ce88537d1b60b0729394d5d5b7d9dfb0471
reason:         broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Fri Aug 21 08:33:40 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
uid:            36
Directory:      /var/tmp/abrt/Python3-2020-08-21-08:33:40-7300
Reported:       cannot be reported

id 6309a503b0d6e015d2fbc05ea9f4c499ff0f9f19
reason:         _call(): broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Thu Aug 20 09:42:21 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.4-1.el8
uid:            36 (vdsm)
count:          11
Directory:      /var/tmp/abrt/Python3-2020-08-20-09:42:21-7101

id de46d38de6f21dfdfe55e34be3a44876c0494e3d
reason:         _handle_quit(): broker.py:114:_handle_quit:AttributeError: 'NoneType' object has no attribute 'close_connections'
time:           Thu Aug 20 09:09:06 2020
cmdline:        /usr/libexec/platform-python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
package:        ovirt-hosted-engine-ha-2.4.4-1.el8
uid:            36 (vdsm)
count:          3
Directory:      /var/tmp/abrt/Python3-2020-08-20-09:09:06-19319

Comment 11 Tomáš Golembiovský 2021-02-25 15:45:15 UTC
(In reply to Pavel Zinchuk from comment #9)
> Content of directory /var/tmp/abrt/ccpp-2021-02-23-14:38:24-2909 I can't
> provide in public.
> This is production ovirt node with private hostnames.

That's understandable.

> 
> If you need some specific information from
> /var/tmp/abrt/ccpp-2021-02-23-14:38:24-290 please specify required file or
> information.

Yes, having the coredump would be best, but that file will be huge. So for now just files 'core_backtrace' and 'maps'. Those two should not contain any sensitive information, but feel free to send me the files by email if you are concerned.

Also please report what versions of libvirt RPMs are installed in your node.

Comment 12 Pavel Zinchuk 2021-02-25 16:16:21 UTC
Created attachment 1759339 [details]
maps file from issue with date 2021-02-23 14:38

Comment 13 Pavel Zinchuk 2021-02-25 16:16:55 UTC
Created attachment 1759340 [details]
core_backtrace file from issue with date 2021-02-23 14:38

Comment 14 Pavel Zinchuk 2021-02-25 16:18:02 UTC
(In reply to Tomáš Golembiovský from comment #11)
> (In reply to Pavel Zinchuk from comment #9)
> > Content of directory /var/tmp/abrt/ccpp-2021-02-23-14:38:24-2909 I can't
> > provide in public.
> > This is production ovirt node with private hostnames.
> 
> That's understandable.
> 
> > 
> > If you need some specific information from
> > /var/tmp/abrt/ccpp-2021-02-23-14:38:24-290 please specify required file or
> > information.
> 
> Yes, having the coredump would be best, but that file will be huge. So for
> now just files 'core_backtrace' and 'maps'. Those two should not contain any
> sensitive information, but feel free to send me the files by email if you
> are concerned.
> 
> Also please report what versions of libvirt RPMs are installed in your node.

I have attached required files.

List of installed libvirt packages:
# rpm -qa | grep libvirt
libvirt-admin-6.6.0-7.3.el8.x86_64
libvirt-daemon-config-nwfilter-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-scsi-6.6.0-7.3.el8.x86_64
libvirt-bash-completion-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-nodedev-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-iscsi-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-secret-6.6.0-7.3.el8.x86_64
libvirt-daemon-kvm-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-mpath-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-network-6.6.0-7.3.el8.x86_64
libvirt-libs-6.6.0-7.3.el8.x86_64
libvirt-daemon-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-logical-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-qemu-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-gluster-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-iscsi-direct-6.6.0-7.3.el8.x86_64
libvirt-lock-sanlock-6.6.0-7.3.el8.x86_64
python3-libvirt-6.6.0-1.el8.x86_64
libvirt-daemon-driver-nwfilter-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-rbd-6.6.0-7.3.el8.x86_64
libvirt-daemon-config-network-6.6.0-7.3.el8.x86_64
libvirt-client-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-core-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-storage-disk-6.6.0-7.3.el8.x86_64
libvirt-daemon-driver-interface-6.6.0-7.3.el8.x86_64

Comment 15 Tomáš Golembiovský 2021-02-27 14:58:49 UTC
This is rather bad. The stack of the crashed thread is smashed (other threads are fine waiting in poll or pthread_cond_*wait). That is unfortunate as one needs to first figure out what exactly leads to the crash (and how to reproduce it) before anyone can do anything with the problem. You will need to enable debug logging in libvirtd on your machine and hopefully this will give enough clues to what is happening.

Comment 16 Pavel Zinchuk 2021-03-02 14:49:25 UTC
I've enabled libvirt debug on one production oVirt node.
Waiting for issue on this node.

Comment 18 Peter Krempa 2021-03-08 12:47:06 UTC
Also since libvirtd seems to have crashed, please also attach the backtrace from abrt.

Comment 19 Peter Krempa 2021-03-08 12:50:12 UTC
Umm, nevermind I looked into the attachments now:


      , {   "frames":
              [ {   "address": 140074265750689
                ,   "build_id": "ade58d86662aceee2210a9ef12018705e978965d"
                ,   "build_id_offset": 990369
                ,   "function_name": "__poll"
                ,   "file_name": "/lib64/libc.so.6"
                }
              , {   "address": 140074277673398
                ,   "build_id": "58f989a97b7d84e636e5885db27a42281581baef"
                ,   "build_id_offset": 317878
                ,   "function_name": "g_main_context_iterate.isra.21"
                ,   "file_name": "/lib64/libglib-2.0.so.0"
                }
              , {   "address": 140074277674354
                ,   "build_id": "58f989a97b7d84e636e5885db27a42281581baef"
                ,   "build_id_offset": 318834
                ,   "function_name": "g_main_loop_run"
                ,   "file_name": "/lib64/libglib-2.0.so.0"
                }
              , {   "address": 140074331771774
                ,   "build_id": "3032f94742c49f5235533c10277403b2009a9a97"
                ,   "build_id_offset": 881534
                ,   "function_name": "virEventThreadWorker"
                ,   "file_name": "/lib64/libvirt.so.0"
                }
              , {   "address": 140074277838154
                ,   "build_id": "58f989a97b7d84e636e5885db27a42281581baef"
                ,   "build_id_offset": 482634
                ,   "function_name": "g_thread_proxy"
                ,   "file_name": "/lib64/libglib-2.0.so.0"
                }
              , {   "address": 140074273046858
                ,   "build_id": "634e9ea2594f8c82ddde20ab179bda7daee7e18f"
                ,   "build_id_offset": 33098
                ,   "function_name": "start_thread"
                ,   "file_name": "/lib64/libpthread.so.0"
                }
              , {   "address": 140074265796387
                ,   "build_id": "ade58d86662aceee2210a9ef12018705e978965d"
                ,   "build_id_offset": 1036067
                ,   "function_name": "__clone"
                ,   "file_name": "/lib64/libc.so.6"
                } ]
        } ]
}

This looks like the same problem as https://bugzilla.redhat.com/show_bug.cgi?id=1894045

Comment 20 Peter Krempa 2021-03-08 12:57:40 UTC
Umm, nevermind2, that thread is okay, indeed the crashed thread seems to have the stack corrupted. There's one more thread waiting for agent response, all other threads are idle, so we indeed might need to look into the logs what was happening.

Comment 21 Yash Mankad 2021-03-09 15:40:54 UTC
@pkrempa - looks like you agree that further investigation is required. Can I assign the bug to yo

Comment 22 Pavel Zinchuk 2021-03-10 10:47:54 UTC
(In reply to Peter Krempa from comment #18)
> Also since libvirtd seems to have crashed, please also attach the backtrace
> from abrt.

Hi, we still receive issue on random servers. We can't enable debug on all servers, waiting when issue will occurs again on servers with enabled libvirtd debug.

Peter, you still need backtrace from abrt? If yes, please tell me how to get this information.

Comment 23 Peter Krempa 2021-03-10 12:29:41 UTC
Based on the backtrace and more closely on other race-condition induced crashes I'm fairly certain that the root cause for this bug is the same as in https://bugzilla.redhat.com/show_bug.cgi?id=1931331 . The only difference here is that in this instance it happened inside a guest-agent call, whereas in 1931331 it happened in a monitor command. Since both actually share the same backend code for accessing the monitor socket and are fixed by the same commit we can assume that the same can happen here too.

The discrepancy in the ABRT backtrace not showing the crashed thread is probably a fluke. 

The backtrace of the crashed thread should look like:

Thread 1 (Thread 0x7f6576ffd700 (LWP 31672)):
#0  0x00007f66500000f0 in ?? ()
#1  0x00007f6677c9e7bd in g_source_unref_internal () from /lib64/libglib-2.0.so.0
#2  0x00007f6677c9e90e in g_source_iter_next () from /lib64/libglib-2.0.so.0
#3  0x00007f6677ca0e23 in g_main_context_prepare () from /lib64/libglib-2.0.so.0
#4  0x00007f6677ca18eb in g_main_context_iterate.isra () from /lib64/libglib-2.0.so.0
#5  0x00007f6677ca1d72 in g_main_loop_run () from /lib64/libglib-2.0.so.0
#6  0x00007f667b03937e in virEventThreadWorker (opaque=0x7f6650269980) at ../../src/util/vireventthread.c:120
#7  0x00007f6677cc9d4a in g_thread_proxy () from /lib64/libglib-2.0.so.0
#8  0x00007f667783814a in start_thread () from /lib64/libpthread.so.0
#9  0x00007f667714df23 in clone () from /lib64/libc.so.6

For reference the waiting thread from the other bug:
 
Thread 2 (Thread 0x7f666f413700 (LWP 2838)):
#0  0x00007f667783e2fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f667b093aea in virCondWait (c=<optimized out>, m=<optimized out>) at ../../src/util/virthread.c:148
#2  0x00007f6631402104 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#3  0x00007f6631411ca5 in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4  0x00007f6631414813 in qemuMonitorJSONGetMemoryStats () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5  0x00007f663142a79a in qemuDomainMemoryStatsInternal () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#6  0x00007f663142a8f4 in qemuDomainGetStatsBalloon () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
[truncated]

As of such. I'll close it as duplicate of 1931331 for now.

If you happen to collect a better backtrace and it would contradict my assumption, please reopen this bug.

Another possibility to try to produce a better coredump of the existing crash should be to use 'coredumpctl dump /usr/sbin/libvirtd > dump', but I'm not sure whether that will produce a better result. Possibly if you happen to encounter another crash the backtrace may be better.

*** This bug has been marked as a duplicate of bug 1931331 ***


Note You need to log in before you can comment on or make changes to this bug.