Bug 1577529 - [RFE] Support multiple hosts in posix storage domain path for cephfs
Summary: [RFE] Support multiple hosts in posix storage domain path for cephfs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.20.23
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact: Avihai
URL:
Whiteboard:
: 1557827 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-12 18:55 UTC by Sven Vogel
Modified: 2020-11-17 12:54 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1305529
Environment:
Last Closed: 2020-10-08 10:19:27 UTC
oVirt Team: Storage
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
vdsm.log (3.23 MB, text/plain)
2018-05-15 12:29 UTC, Sven Vogel
no flags Details
supervdsm.log (3.90 MB, text/plain)
2018-05-15 12:30 UTC, Sven Vogel
no flags Details

Description Sven Vogel 2018-05-12 18:55:48 UTC
i think this bug is not resolved in version 4.2.2 or 4.2.3

if try to add a ceph storage over domain i get the following error. i tried different things but its not resolved.

i chose posix compatible file system and the following settings


path: host1.example.de:6789,host2.example.de:6789,host3.example.de:6789:/
vfs type: ceph
mount options. e.g. noatime

i get the following error in /var/log/vdsm/supervdsm.log

MainProcess|jsonrpc/2::DEBUG::2018-05-12 20:43:42,085::supervdsm_server::103::SuperVdsm.ServerCallback::(wrapper) return getHardwareInfo with {'systemProductName': 'NUC7i7DNKE', 'systemSerialNumber': 'DW1664633000177', 'systemFamily': 'Intel NUC', 'systemVersion': 'J85069-202', 'systemUUID': 'C28FBA12-9ED3-4743-9B8F-D45DDF134D2F', 'systemManufacturer': 'Intel Corporation'}
MainProcess|jsonrpc/4::DEBUG::2018-05-12 20:50:34,388::supervdsm_server::96::SuperVdsm.ServerCallback::(wrapper) call mount with (u'host1.example.de:6789,host2.example.de:6789,host3.example.de:6789', u'/rhev/data-center/mnt/host1.example.de:6789,host2.example.de:6789,host3.example.de:6789') {'vfstype': u'ceph', 'mntOpts': '', 'cgroup': None}
MainProcess|jsonrpc/4::DEBUG::2018-05-12 20:50:34,388::commands::65::root::(execCmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/mount -t ceph host1.example.de:6789,host2.example.de:6789,host3.example.de:6789 /rhev/data-host1.example.de:6789,host2.example.de:6789,host3.example.de:6789 (cwd None)
MainProcess|jsonrpc/4::DEBUG::2018-05-12 20:50:34,395::commands::86::root::(execCmd) FAILED: <err> = ''; <rc> = 1
MainProcess|jsonrpc/4::ERROR::2018-05-12 20:50:34,395::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) Error in mount
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 98, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 140, in mount
    cgroup=cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 277, in _mount
    _runcmd(cmd)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 305, in _runcmd
    raise MountError(rc, b";".join((out, err)))
MountError: (1, 'source mount path was not specified\nfailed to resolve source\n;')

you will see the mount point :/ will be removed
/usr/bin/mount -t ceph host1.example.de:6789,host2.example.de:6789,host3.example.de:6789 /rhev/data-center/mnt/host1.example.de:6789,host2.example.de:6789,host3.example.de:6789 

that will produce the error.

can anyone fix that or reopen please the bug?

thanks

Sven

+++ This bug was initially created as a clone of Bug #1305529 +++

Description of problem:
On POSIXFS storage domain creation, if nothing is given after '/' in the path, the '/' is ignored in the mount command that vdsm executes. 
For example, the following path, which has nothing after '/' is executed on vdsm without the '/':

Path = 10.35.65.18:/

VFS Type = ceph


jsonrpc.Executor/3::DEBUG::2016-01-26 17:51:04,338::mount::229::Storage.Misc.excCmd::(_runcmd) /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /usr/bin/mount -t ceph -o name=ad
min,secret=AQC3W1dWhplVLBAARW/zKtQzjafZDKAGfVpWbQ== 10.35.65.18: /rhev/data-center/mnt/10.35.65.18:___ (cwd None)


This is mainly relevant for cephfs.


Version-Release number of selected component (if applicable):
vdsm-4.17.19-0.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Try to create a POSIXFS with path that has nothing after '/'
For example:
Path = 10.35.65.18:/


Actual results:
mount command on vdsm is executed without the '/'

Expected results:
vdsm should execute the mount command with the exact syntax provided in the path.


Additional info:

The following syntax works while mounting manually from host:

# mount -t ceph ceph-1.qa.lab:6789:/ /mnt/cephfs/ -o name=admin,secret=<key>

(The key is not hidden in the actual mount command).

Tried to use this way for ceph based POSIX storage domain creation as follows:

=======================================
Path = 10.35.65.18:/

VFS Type = ceph

Mount Options = name=admin,secret=<key>
=======================================

The failure in vdsm.log:

jsonrpc.Executor/3::ERROR::2016-01-26 17:51:04,372::hsm::2473::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2470, in connectStorageServer
    conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 234, in connect
    six.reraise(t, v, tb)
  File "/usr/share/vdsm/storage/storageServer.py", line 226, in connect
    self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/share/vdsm/storage/mount.py", line 225, in mount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 241, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (1, 'source mount path was not specified\nfailed to resolve source\n;')



Seems that no matter how many '/' are given, vdsm ignores all if there is nothing after them. (I tried a path with '//' at the end, didn't work.)

This has to be fixed for ceph fs to work as a VFS type for POSIXFS compliant storage domain creation.

--- Additional comment from Tal Nisan on 2016-02-08 09:44:55 EST ---

Same behavior exists for NFS as well as described in bug 1228239

--- Additional comment from Mike McCune on 2016-03-28 18:44:25 EDT ---

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions

--- Additional comment from Allon Mureinik on 2016-04-18 07:15:45 EDT ---

Patch posted, but at risk for the 3.6.6 date. Pushing out to 3.6.7, as this is is not a blocker.

--- Additional comment from Idan Shaby on 2016-04-21 10:20:26 EDT ---

Allon, the fix that I am working on might be too risky for 3.6.z.
I think that we should postpone it to 4.0.

--- Additional comment from Sandro Bonazzola on 2016-05-02 06:07:01 EDT ---

Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

--- Additional comment from Yaniv Lavi on 2016-05-23 09:21:51 EDT ---

oVirt 4.0 beta has been released, moving to RC milestone.

--- Additional comment from Yaniv Lavi on 2016-05-23 09:24:36 EDT ---

oVirt 4.0 beta has been released, moving to RC milestone.

--- Additional comment from Allon Mureinik on 2016-06-07 07:56:16 EDT ---

Merged on the 4.0 branch, setting to MODIFIED

--- Additional comment from Elad on 2016-07-27 08:28:31 EDT ---

POSIXFS (ceph) storage domain creation, when nothing is given after '/' is done with the '/' in the mount command that vdsm executes.

supervdsm.log:

 MainProcess|jsonrpc.Executor/3::DEBUG::2016-07-27 15:24:42,814::commands::68::root::(execCmd) /usr/bin/taskset --cpu-list 0-3 /usr/bin/mount -t ceph -o name=admin,secret=AQC76JVXuYUIMxAAudC/3dUtqhLIOSL8AUuXrQ== 10.35.140.90:/ /rhev/data-center/mnt/10.35.140.90:_ (cwd None)


Storage domain creation succeeds.

Verified using:
vdsm-4.18.8-1.el7ev.x86_64
libcephfs1-10.2.2-29.el7cp.x86_64
ceph-base-10.2.2-29.el7cp.x86_64
ceph-common-10.2.2-29.el7cp.x86_64
python-cephfs-10.2.2-29.el7cp.x86_64
ceph-selinux-10.2.2-29.el7cp.x86_64
rhevm-4.0.2-0.1.rc.el7ev.noarch

Comment 1 Sven Vogel 2018-05-13 14:35:10 UTC
i tried other things. 

if i use only one host

path: host1.example.de
filesystem: ceph
fs options: e.g. noatime

i get the a success but domain will not be added.

/var/log/vdsm/supervdsm.log

MainProcess|jsonrpc/7::DEBUG::2018-05-13 16:27:12,150::supervdsm_server::96::SuperVdsm.ServerCallback::(wrapper) call mount with (u'host1.example.de:6789:/', u'/rhev/data-center/mnt/host1.example.de:6789:_') {'vfstype': u'ceph', 'mntOpts': u'noatime', 'cgroup': None}
MainProcess|jsonrpc/7::DEBUG::2018-05-13 16:27:12,150::commands::65::root::(execCmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/mount -t ceph -o noatime host1.example.de:6789:/ /rhev/data-center/mnt/host1.example.de:6789:_ (cwd None)
MainProcess|jsonrpc/7::DEBUG::2018-05-13 16:27:12,172::commands::86::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|jsonrpc/7::DEBUG::2018-05-13 16:27:12,172::supervdsm_server::103::SuperVdsm.ServerCallback::(wrapper) return mount with None

i get an error in web ovirt like 

The error message for connection host1.example.de:6789:/ returned by VDSM was: General Exception

mount seems to be created

192.168.102.90:6789:/  442G    792M  441G    1% /rhev/data-center/mnt/host1.example.de:6789:_

normally one hosts seems not a good idea if we use ceph. the other problem i dont know why it will be added ...


thanks

Sven

Comment 2 Tal Nisan 2018-05-14 08:54:47 UTC
Idan, you've handled this issue for 4.0, can you have a look please?

Comment 3 Sven Vogel 2018-05-14 11:24:52 UTC
Hi Tal,

i dont tried ovirt versio  4.0. i tried before and now with 4.2.2 and 4.2.3.

thanks

Sven

Comment 4 Idan Shaby 2018-05-15 09:07:02 UTC
(In reply to Tal Nisan from comment #2)
> Idan, you've handled this issue for 4.0, can you have a look please?

Sure, Sven can you please attach the full vdsm, supervdsm and engine logs?

Comment 5 Idan Shaby 2018-05-15 09:19:39 UTC
Hi Sven,

From what I know, we don't support multiple hosts in the "Path" field.
The right way to use a single remote server is "server:port:/path" where the "port:" part is not mandatory, and the "server" part can be a DNS name, an ipv4 address or an ipv6 address using quoted form.

Nir, any idea if we can workaround this inability and use multiple hosts in this case?

Comment 6 Nir Soffer 2018-05-15 10:00:24 UTC
Idan, I don't know about supporting multiple hosts. Maybe the mount command should
use a special mount options.

Sven, we must have vdsm and supervdsm logs.

Comment 7 Nir Soffer 2018-05-15 10:00:45 UTC
Elad, do you have cephfs system for testing? do we test cephfs using posix storage
domain?

Comment 8 Nir Soffer 2018-05-15 10:04:45 UTC
Sven, can  you mount ceph using multiple hosts:post pairs in the path from the
shell?

If you can we need an example mount command.

Comment 9 Nir Soffer 2018-05-15 10:46:48 UTC
Currently we support:

    server:port:/path
    server:port:/

According to ceph docs http://docs.ceph.com/docs/cuttlefish/man/8/mount.ceph/
we need to support also:

    server1,server2,...:/
    server1,server2,...:/path
    server1:port,server2:port,...:/
    server1:port,server2:port,...:/path

We can allow this format when using vfstype=ceph.

Using multiple hosts allows mounting even if one of the hosts is down.

Next steps:
- get cephfs system for testing first
- change current code to allow server:port only when vfstype=ceph
- when vfstype=ceph, support multiple host[:port] separated by comma

Comment 10 Idan Shaby 2018-05-15 10:53:16 UTC
Thanks, Nir.
Tal, can you please target this bug further to Nir's comment?

Comment 11 Sven Vogel 2018-05-15 12:28:40 UTC
Hi Idan, Hi Sir,

yes i saw that you only support a simple mount but for future ha and usage of ceph it should be good to use multiple mount points.

--> if i use one mount point i get an error like below.

"The error message for connection host1.example.de:6789:/ returned by VDSM was: General Exception"

i add the vdsm.log and supervdsm.log

greets

Sven

Comment 12 Sven Vogel 2018-05-15 12:29:22 UTC
Created attachment 1436784 [details]
vdsm.log

Comment 13 Sven Vogel 2018-05-15 12:30:14 UTC
Created attachment 1436785 [details]
supervdsm.log

Comment 14 Sven Vogel 2018-05-15 12:31:57 UTC
(In reply to Nir Soffer from comment #9)
> Currently we support:
> 
>     server:port:/path
>     server:port:/
> 
> According to ceph docs http://docs.ceph.com/docs/cuttlefish/man/8/mount.ceph/
> we need to support also:
> 
>     server1,server2,...:/
>     server1,server2,...:/path
>     server1:port,server2:port,...:/
>     server1:port,server2:port,...:/path
> 
> We can allow this format when using vfstype=ceph.
> 
> Using multiple hosts allows mounting even if one of the hosts is down.
> 
> Next steps:
> - get cephfs system for testing first
> - change current code to allow server:port only when vfstype=ceph
> - when vfstype=ceph, support multiple host[:port] separated by comma

this sounds good but it will not clear the problem why i get a error with 

server:port:/
ceph
noatime

:)

Comment 15 Nir Soffer 2018-05-15 14:50:46 UTC
(In reply to Sven Vogel from comment #14)
...
> this sounds good but it will not clear the problem why i get a error with 
> 
> server:port:/

Current code supports server:port:/. If this does not work please file another
bug.

Comment 16 Eyal Shenitzky 2018-08-30 05:36:21 UTC
*** Bug 1557827 has been marked as a duplicate of this bug. ***

Comment 18 Matt Kimberley 2019-01-07 10:26:42 UTC
Hi,

We have the same issue with not being able to mount multiple ceph monitors in the target path, running under oVirt 4.2.7.5-1.el7. Using the format:

host:port,host:port,host:port:/

oVirt fails to parse the mount point and fails. Using a single monitor:

host:port:/

Ovirt successfully mounts the target.

For both attempts, VFS type "ceph" and mount options "name=admin,secret=<secret>" were used.

From a HA perspective as mentioned previously, upon loosing the mounted single Ceph Monitor all hosts continue to function with the mounted ceph storage domain until a host is rebooted (in the absence of the ceph monitor, the rebooted host cannot mount the ceph storage domain which is an issue from an availability point of view)

I can happily provide logs if they would be useful

Comment 19 Emilio 2019-07-12 18:06:57 UTC
We are also seeing this. Again, if logs are needed please feel free to reach out.

Comment 20 Logan Brown 2019-08-19 20:23:19 UTC
I just encountered this issue while attempting to import a disk, and I was able to work around it by changing `FIELD_SEP = ","` to `FIELD_SEP = ";"` at line 81 of vdsm/storage/task.py on my SPM and restarting vdsm and supervdsm while in maintenance mode. 

Based on my brief reading of the code, the only impact this appears to have is changing the printed output from ParamList when casting it as a string. That being said, I'm not familiar enough with the VDSM codebase to know if this will have unexpected side effects.

Comment 21 Emilio 2020-02-04 13:51:38 UTC
just poking this to see if any updates have been made in this area ahead of the oVirt 4.4.0 / VDSM 4.2.x release?

Comment 22 Michal Skrivanek 2020-06-23 12:34:24 UTC
This request is not currently committed to 4.4.z, moving it to 4.5

Comment 23 Tal Nisan 2020-10-08 10:19:27 UTC
Our solution for Cinderlib support is via Cinderlib, you can add a Cinder domain as "Managed Block Storage"

Comment 24 Emilio 2020-10-22 15:24:50 UTC
Just to add a comment for future reference on an possible work around: when dealing with CephFS specifically, the mount can be done with a single ceph monitor IP address and path, but the underlying kernel mount will resolve all monitors and work as expected. The issue with this is that if the specified monitor is unavailable when the mount is attempted it will fail during that mon unavailability.


Note You need to log in before you can comment on or make changes to this bug.