Bug 1653571

Summary: Block device creation fails with error: "[heketi] failed to create volume: server did not provide a message"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachael <rgeorge>
Component: cns-deploy-toolAssignee: Niels de Vos <ndevos>
Status: CLOSED ERRATA QA Contact: Prasanth <pprakash>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: amark, bgoyal, fabio.martinelli, hchiramm, jarrpa, kramdoss, luca.mercuri, madam, mrobson, ndevos, pkarampu, pprakash, prasanna.kalever, rhs-bugs, sankarshan, sarumuga, vbellur, vinug, xiubli
Target Milestone: ---Keywords: Regression, TestBlocker, ZStream
Target Release: OCS 3.11.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cns-deploy-7.0.0-9.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-07 03:38:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1662312    
Bug Blocks: 1644160    

Comment 5 Niels de Vos 2018-11-29 14:14:08 UTC
bind-mounts are configured in cns-deploy and openshift-ansible. Changes will need to go there.

Comment 6 Niels de Vos 2018-11-29 14:28:51 UTC
Prasanna, it is possible to load the target_core_user module with the right parameters to prevent tcmu-runner from changing them? That would be a simple change/addition to /etc/modprobe.d/tcmu.conf. We may want to add that file if it is not provided yet.

Which parameters and which values would be needed? You can easily check this by loading the module with default parameters, checking /sys/module/target_core_user/parameters/* before and after it was modified.

Comment 8 Niels de Vos 2018-12-05 09:30:47 UTC
(In reply to Niels de Vos from comment #6)

It seems that tcmu-runner depends on at least the following behaviour:

1. modifying target_core_user kernel parameters while adding new block devices, 
   specially for resetting connections (looks like a *bad* API)

2. writable access to /sys/class/uio to create/configure /dev/uioN devices

3. read from dynamically created /dev/uioN devices


Because there is no guarantee that the required kernel modules are loaded on the host before the container starts, the /sys/module/target_core_user and /sys/class/uio directories may not exist yet. If these directories are missing, it will not be possible to configure bind-mounts for them. Failing to do so will prevent containers from starting.

This means that (1) and (2) can only be solved by adding bind-mounts to their parent directories that are guaranteed to exist (/sys/module and /sys/class).

(3) is more difficult to solve, as CRI-O does not allow bind-mounting /dev anymore. It is unclear if uio is namespace aware (unlikely). Creating /dev/uioN device nodes from within the container will most likely only create the device nodes on the host, where tcmu-runner can not access them.


Prasanna, is that a complete description of what we discussed last week?

Comment 11 Niels de Vos 2018-12-27 15:23:44 UTC
https://github.com/gluster/gluster-kubernetes/pull/545 contains the changes to the daemonset templates that are needed to make gluster-block function again with recent versions of OCP.

Comment 23 errata-xmlrpc 2019-02-07 03:38:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0284