Bug 1777485 - oVirt Node 4.3.7 in unusable (multiple problems)
Summary: oVirt Node 4.3.7 in unusable (multiple problems)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-node
Classification: oVirt
Component: Installation & Update
Version: 4.3
Hardware: x86_64
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: Yuval Turgeman
QA Contact: peyu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-27 16:09 UTC by Szymon Madej
Modified: 2019-11-29 10:49 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-29 10:49:30 UTC
oVirt Team: Node
Embargoed:
cshao: testing_ack?


Attachments (Terms of Use)

Description Szymon Madej 2019-11-27 16:09:31 UTC
Description of problem:

oVirt Node 4.3.7 is unusable because of multiple problems in low-level system libriaries.

Version-Release number of selected component (if applicable): 4.3.7

How reproducible: Always

Steps to Reproduce:
1. Install 4.3.7 ( yum makecache ; yum update )
2. Change default layer to ovirt-node-ng-4.3.7-0.20191121.0+1
3. Reboot 

Actual results:

Node is unusable. Multiple commands are crashing i.e.

1. firewalld - undefined symbols

# firewall-cmd --get-default-zone
Traceback (most recent call last):
  File "/usr/bin/firewall-cmd", line 24, in <module>
    from gi.repository import GObject
  File "/usr/lib64/python2.7/site-packages/gi/__init__.py", line 37, in <module>
    from . import _gi
ImportError: /lib64/libgio-2.0.so.0: undefined symbol: g_option_group_unref
#


2. yum - SEGFAULT

# yum clean all
Loaded plugins: enabled_repos_upload, fastestmirror, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager, vdsmupgrade, versionlock
This system is not registered with an entitlement server. You can use subscription-manager to register.
Cleaning repos: centos-sclo-rh-release ovirt-4.3 ovirt-4.3-centos-gluster5 ovirt-4.3-centos-opstools ovirt-4.3-centos-ovirt43 ovirt-4.3-centos-qemu-ev ovirt-4.3-epel ovirt-4.3-virtio-win-latest sac-gluster-ansible
Other repos take up 2.2 M of disk space (use --verbose for details)
Uploading Enabled Repositories Report
Cannot upload enabled repos report, is this client registered?
Segmentation fault (core dumped)
#


3. Node activation - undefined symbol

When trying to activate Node it falls in NotOperational state. NFS Storage Domains can't be activated. In /var/log/vdsm/vdsm.log I see such tracebacks:

2019-11-27 16:09:05,979+0100 INFO  (ioprocess/6153) [IOProcess] (7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c) Starting ioprocess (__init__:434)
2019-11-27 16:09:05,987+0100 INFO  (ioprocess/6159) [IOProcess] (8da37bcb-8e65-413c-8ee9-f81d16e82af8) Starting ioprocess (__init__:434)
2019-11-27 16:09:05,987+0100 WARN  (ioprocess/6153) [IOProcessClient] (7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c) Invalid log message u'/usr/libexec/ioprocess: symbol lookup error: /usr/libexec/ioprocess: undefined symbol: g_uuid_string_random\n' (__init__:424)
2019-11-27 16:09:05,988+0100 ERROR (ioprocess/6153) [IOProcessClient] (7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c) Communication thread failed (__init__:160)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate
    raise Exception("FD closed")
Exception: FD closed
2019-11-27 16:09:05,989+0100 ERROR (monitor/7f0b6cf) [storage.Monitor] Setting up monitor for 7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c failed (monitor:330)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 327, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 349, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 159, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 367, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 145, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__
    manifest.sdUUID, manifest.mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 100001] ioprocess crashed unexpectedly
2019-11-27 16:09:05,990+0100 INFO  (monitor/7f0b6cf) [storage.Monitor] Domain 7f0b6cf7-2d64-4da9-8d29-58a6a2ed4e6c became INVALID (monitor:470)
2019-11-27 16:09:06,001+0100 WARN  (ioprocess/6159) [IOProcessClient] (8da37bcb-8e65-413c-8ee9-f81d16e82af8) Invalid log message u'/usr/libexec/ioprocess: symbol lookup error: /usr/libexec/ioprocess: undefined symbol: g_uuid_string_random\n' (__init__:424)
2019-11-27 16:09:06,002+0100 ERROR (ioprocess/6159) [IOProcessClient] (8da37bcb-8e65-413c-8ee9-f81d16e82af8) Communication thread failed (__init__:160)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 114, in _communicate
    raise Exception("FD closed")
Exception: FD closed
2019-11-27 16:09:06,002+0100 ERROR (monitor/8da37bc) [storage.Monitor] Setting up monitor for 8da37bcb-8e65-413c-8ee9-f81d16e82af8 failed (monitor:330)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 327, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 349, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 159, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 367, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 145, in findDomain
    return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__
    manifest.sdUUID, manifest.mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size
    block_size = iop.probe_block_size(mountpoint)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size
    return self._ioproc.probe_block_size(dir_path)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 100001] ioprocess crashed unexpectedly
2019-11-27 16:09:06,003+0100 INFO  (monitor/8da37bc) [storage.Monitor] Domain 8da37bcb-8e65-413c-8ee9-f81d16e82af8 became INVALID (monitor:470)


Expected results:

Node should work normally after upgrade.

Additional info:

Varsion 4.3.7 in unusable. I had to rollback to 4.3.5. After rolling back everything works as before. Currently on that upgraded node I have such layers:

# nodectl info
layers:
  ovirt-node-ng-4.3.7-0.20191121.0:
    ovirt-node-ng-4.3.7-0.20191121.0+1
  ovirt-node-ng-4.3.5.2-0.20190805.0:
    ovirt-node-ng-4.3.5.2-0.20190805.0+1
bootloader:
  default: ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64)
  entries:
    ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64):
      index: 0
      title: ovirt-node-ng-4.3.7-0.20191121.0 (3.10.0-1062.4.3.el7.x86_64)
      kernel: /boot/ovirt-node-ng-4.3.7-0.20191121.0+1/vmlinuz-3.10.0-1062.4.3.el7.x86_64
      args: "ro crashkernel=auto rd.lvm.lv=onn_ovirt-node001/ovirt-node-ng-4.3.7-0.20191121.0+1 rd.lvm.lv=onn_ovirt-node001/swap rhgb quiet LANG=en_US.UTF-8 img.bootid=ovirt-node-ng-4.3.7-0.20191121.0+1 kvm-intel.nested=1"
      initrd: /boot/ovirt-node-ng-4.3.7-0.20191121.0+1/initramfs-3.10.0-1062.4.3.el7.x86_64.img
      root: /dev/onn_ovirt-node001/ovirt-node-ng-4.3.7-0.20191121.0+1
    ovirt-node-ng-4.3.5.2-0.20190805.0 (3.10.0-957.27.2.el7.x86_64):
      index: 1
      title: ovirt-node-ng-4.3.5.2-0.20190805.0 (3.10.0-957.27.2.el7.x86_64)
      kernel: /boot/ovirt-node-ng-4.3.5.2-0.20190805.0+1/vmlinuz-3.10.0-957.27.2.el7.x86_64
      args: "ro crashkernel=auto rd.lvm.lv=onn_ovirt-node001/ovirt-node-ng-4.3.5.2-0.20190805.0+1 rd.lvm.lv=onn_ovirt-node001/swap rhgb quiet LANG=en_US.UTF-8 img.bootid=ovirt-node-ng-4.3.5.2-0.20190805.0+1 kvm-intel.nested=1"
      initrd: /boot/ovirt-node-ng-4.3.5.2-0.20190805.0+1/initramfs-3.10.0-957.27.2.el7.x86_64.img
      root: /dev/onn_ovirt-node001/ovirt-node-ng-4.3.5.2-0.20190805.0+1
current_layer: ovirt-node-ng-4.3.5.2-0.20190805.0+1

Comment 1 Szymon Madej 2019-11-29 10:49:30 UTC
I have found the core of the problem.
Described situation was caused by using the oVirt Node as ManageIQ Conversion Host. Details in bug 1778121
I'm closing this bug.


Note You need to log in before you can comment on or make changes to this bug.