Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1777485

Summary: oVirt Node 4.3.7 in unusable (multiple problems)
Product: [oVirt] ovirt-node Reporter: Szymon Madej <szmadej>
Component: Installation & UpdateAssignee: Yuval Turgeman <yturgema>
Status: CLOSED NOTABUG QA Contact: peyu
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3CC: bugs, cshao, lsvaty, mavital, nlevy, peyu, qiyuan, sbonazzo, shlei, weiwang, yaniwang, yturgema
Target Milestone: ---Flags: cshao: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-29 10:49:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Szymon Madej 2019-11-27 16:09:31 UTC
Description of problem:

oVirt Node 4.3.7 is unusable because of multiple problems in low-level system libriaries.

Version-Release number of selected component (if applicable): 4.3.7

How reproducible: Always

Steps to Reproduce:
1. Install 4.3.7 ( yum makecache ; yum update )
2. Change default layer to ovirt-node-ng-4.3.7-0.20191121.0+1
3. Reboot 

Actual results:

Node is unusable. Multiple commands are crashing i.e.

1. firewalld - undefined symbols

# firewall-cmd --get-default-zone
Traceback (most recent call last):
  File "/usr/bin/firewall-cmd", line 24, in <module>
    from gi.repository import GObject
  File "/usr/lib64/python2.7/site-packages/gi/__init__.py", line 37, in <module>
    from . import _gi
ImportError: /lib64/libgio-2.0.so.0: undefined symbol: g_option_group_unref
#


2. yum - SEGFAULT

# yum clean all
Loaded plugins: enabled_repos_upload, fastestmirror, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager, vdsmupgrade, versionlock
This system is not registered with an entitlement server. You can use subscription-manager to register.
Cleaning repos: centos-sclo-rh-release ovirt-4.3 ovirt-4.3-centos-gluster5 ovirt-4.3-centos-opstools ovirt-4.3-centos-ovirt43 ovirt-4.3-centos-qemu-ev ovirt-4.3-epel ovirt-4.3-virtio-win-latest sac-gluster-ansible
Other repos take up 2.2 M of disk space (use --verbose for details)
Uploading Enabled Repositories Report
Cannot upload enabled repos report, is this client registered?
Segmentation fault (core dumped)
#


3. Node activation - undefined symbol

When trying to activate Node it falls in NotOperational state. NFS Storage Domains can't be activated. In /var/log/vdsm/vdsm.log I see such tracebacks:

2019-11-27 16:09:05,979+0100 INFO  (ioprocess/6153) [IOProcess] (7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c) Starting ioprocess (__init__:434)
2019-11-27 16:09:05,987+0100 INFO  (ioprocess/6159) [IOProcess] (8da37bcb-8e65-413c-8ee9-f81d16e82af8) Starting ioprocess (__init__:434)
2019-11-27 16:09:05,987+0100 WARN  (ioprocess/6153) [IOProcessClient] (7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c) Invalid log message u'/usr/libexec/ioprocess: symbol lookup error: /usr/libexec/ioprocess: undefined symbol: g_uuid_string_random\n' (__init__:424)
2019-11-27 16:09:05,988+0100 ERROR (ioprocess/6153) [IOProcessClient] (7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c) Communication thread failed (__init__:160)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate
    raise Exception("FD closed")
Exception: FD closed
2019-11-27 16:09:05,989+0100 ERROR (monitor/7f0b6cf) [storage.Monitor] Setting up monitor for 7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c failed (monitor:330)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 327, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 349, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 159, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 367, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 145, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__
    manifest.sdUUID, manifest.mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 100001] ioprocess crashed unexpectedly
2019-11-27 16:09:05,990+0100 INFO  (monitor/7f0b6cf) [storage.Monitor] Domain 7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c became INVALID (monitor:470)
2019-11-27 16:09:06,001+0100 WARN  (ioprocess/6159) [IOProcessClient] (8da37bcb-8e65-413c-8ee9-f81d16e82af8) Invalid log message u'/usr/libexec/ioprocess: symbol lookup error: /usr/libexec/ioprocess: undefined symbol: g_uuid_string_random\n' (__init__:424)
2019-11-27 16:09:06,002+0100 ERROR (ioprocess/6159) [IOProcessClient] (8da37bcb-8e65-413c-8ee9-f81d16e82af8) Communication thread failed (__init__:160)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate
    raise Exception("FD closed")
Exception: FD closed
2019-11-27 16:09:06,002+0100 ERROR (monitor/8da37bc) [storage.Monitor] Setting up monitor for 8da37bcb-8e65-413c-8ee9-f81d16e82af8 failed (monitor:330)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 327, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 349, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 159, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 367, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 145, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__
    manifest.sdUUID, manifest.mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 100001] ioprocess crashed unexpectedly
2019-11-27 16:09:06,003+0100 INFO  (monitor/8da37bc) [storage.Monitor] Domain 8da37bcb-8e65-413c-8ee9-f81d16e82af8 became INVALID (monitor:470)


Expected results:

Node should work normally after upgrade.

Additional info:

Varsion 4.3.7 in unusable. I had to rollback to 4.3.5. After rolling back everything works as before. Currently on that upgraded node I have such layers:

# nodectl info
layers:
  ovirt-node-ng-4.3.7-0.20191121.0:
    ovirt-node-ng-4.3.7-0.20191121.0+1
  ovirt-node-ng-4.3.5.2-0.20190805.0:
    ovirt-node-ng-4.3.5.2-0.20190805.0+1
bootloader:
  default: ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64)
  entries:
    ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64):
      index: 0
      title: ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64)
      kernel: /boot/ovirt-node-ng-4.3.7-0.20191121.0+1/vmlinuz-3.10.0-1062.4.3.el7.x86_64
      args: "ro crashkernel=auto rd.lvm.lv=onn_ovirt-node001/ovirt-node-ng-4.3.7-0.20191121.0+1 rd.lvm.lv=onn_ovirt-node001/swap rhgb quiet LANG=en_US.UTF-8 img.bootid=ovirt-node-ng-4.3.7-0.20191121.0+1 kvm-intel.nested=1"
      initrd: /boot/ovirt-node-ng-4.3.7-0.20191121.0+1/initramfs-3.10.0-1062.4.3.el7.x86_64.img
      root: /dev/onn_ovirt-node001/ovirt-node-ng-4.3.7-0.20191121.0+1
    ovirt-node-ng-4.3.5.2-0.20190805.0 (3.10.0-957.27.2.el7.x86_64):
      index: 1
      title: ovirt-node-ng-4.3.5.2-0.20190805.0 (3.10.0-957.27.2.el7.x86_64)
      kernel: /boot/ovirt-node-ng-4.3.5.2-0.20190805.0+1/vmlinuz-3.10.0-957.27.2.el7.x86_64
      args: "ro crashkernel=auto rd.lvm.lv=onn_ovirt-node001/ovirt-node-ng-4.3.5.2-0.20190805.0+1 rd.lvm.lv=onn_ovirt-node001/swap rhgb quiet LANG=en_US.UTF-8 img.bootid=ovirt-node-ng-4.3.5.2-0.20190805.0+1 kvm-intel.nested=1"
      initrd: /boot/ovirt-node-ng-4.3.5.2-0.20190805.0+1/initramfs-3.10.0-957.27.2.el7.x86_64.img
      root: /dev/onn_ovirt-node001/ovirt-node-ng-4.3.5.2-0.20190805.0+1
current_layer: ovirt-node-ng-4.3.5.2-0.20190805.0+1

Comment 1 Szymon Madej 2019-11-29 10:49:30 UTC
I have found the core of the problem.
Described situation was caused by using the oVirt Node as ManageIQ Conversion Host. Details in bug 1778121
I'm closing this bug.