Description of problem: When adding glusterfs storage domains in parallel via ansbile modules I got on the last one SD error: Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400. In ovirt log I see: 2018-05-04 11:24:37,389+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default t ask-46) [5ea48b7d] Command 'CreateStorageDomainVDSCommand(HostName = host_mixed_1, CreateStorageDomainVDSCommandParameters:{hostId='2a5f8f74-d7de-419f-a43a-b612932d7411', storageDomain='StorageDomainStatic:{name='test_gluster_2', id='03cc8bca-218d-47ac-801e-327f966c1df7'}', args='storage.server.com:/GE_he2_volume03'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStorageDomainVDS, error = Error creating a storage domain: ('storageType=7, sdUUID=03cc8bca-218d-47ac-801e-327f966c1df7, domainName=test_gluster_2, domClass=1, typeSpecificArg=storage.server.com:/GE_he2_volume03 domVersion=4',), code = 351 for rest look at log files. From UI when I tried add this SD I didn't hit any issue again. Version-Release number of selected component (if applicable): rhv 4.2.3-5 How reproducible: Not sure it it's easily reproducible, but as I told we hit this issue when parallel adding these glusterfs SDs. Actual results: Failing with error Expected results: All SDs added OK Additional info: Logs attached
Have you seen anything on VDSM? Gluster logs?
Maor, can you take a look please?
It seems like the VDSM logs fomr the time the exception occured (2018-05-04 11:24:37) are missing. Can you please add the VDSM logs of host_mixed_2 and host_mixed_1 which includes the entire hour. From what we have so far I can tell that the reason which you probably succeeded to create the SD from the GUI was because host_mixed_2 was used to create the storage domain. The exception which occured in 11:24 was when host_mixed_1 was used to create the storage domain.
I reproduced issue when I ran again on the same env, so hope I attached all the logs you need now.
It seems like the problem is because the glusterfs is in read only mode (see [1]), that is why the Host can't create the storage domain. [1] 2018-05-09 17:26:15,020+0300 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='a1c7ef58-1570-4b89-a885-74a5c4f4a1e5') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in createStorageDomain File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2591, in createStorageDomain storageType, domVersion) File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 83, in create version) File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 50, in _preCreateValidation fileSD.validateFileSystemFeatures(sdUUID, domPath) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 104, in validateFileSystemFeatures oop.getProcessPool(sdUUID).directTouch(testFilePath) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 320, in directTouch ioproc.touch(path, flags, mode) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 567, in touch self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 451, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 30] Read-only file system
Created attachment 1434239 [details] rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume01
Created attachment 1434240 [details] rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume02
Created attachment 1434241 [details] rhev-data-center-mnt-glusterSD-gluster01.lab.eng.tlv2.redhat.com:_GE__he2__volume03
Created attachment 1434242 [details] mountpoint
Created attachment 1434244 [details] cli log
I've attached the gluster logs from Petr attachment. Sahina, can you please help with understand why the glusterfs became readonly?
Krutika, could you take a look please? There are messages of the form "failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running." in the mount logs
(In reply to Sahina Bose from comment #16) > Krutika, could you take a look please? > There are messages of the form "failed to get the port number for remote > subvolume. Please run 'gluster volume status' on server to see if brick > process is running." in the mount logs Sure. Funny thing is that the there is no log from GE_he2_volume03-client-1 indicating whether it successfully connected to the client or not. And a fuse client is not supposed to process any IO (such as the "create" on __DIRECT_IO_TEST__) from the application until AFR has heard from all of its children. In this case, AFR either hasn't heard from GE_he2_volume03-client-1 at all and still notified FUSE to go ahead with IO, or there is some codepath in which a CONNECT/DISCONNECT event from GE_he2_volume03-client-1 is NOT logged. Let me dig into client translator code and get back. Keeping the needinfo intact until then. -Krutika
Could you please attach glusterd logs as well from all 3 hosts? You'll find them under /var/log/glusterfs and named glusterd.log. -Krutika
(In reply to Krutika Dhananjay from comment #18) > Could you please attach glusterd logs as well from all 3 hosts? > > You'll find them under /var/log/glusterfs and named glusterd.log. > > -Krutika Also attach all glusterd.log* files in case they got rotated over the weekend. -Krutika
And also the log files under /var/log/glusterfs/bricks of all 3 hosts please ...
Seems like this issue is more glusterfs oriented. and it is currently wait for needinfo, therefore posponing it to 4.3
(In reply to Krutika Dhananjay from comment #18) > Could you please attach glusterd logs as well from all 3 hosts? > > You'll find them under /var/log/glusterfs and named glusterd.log. > > -Krutika Hey, we cannot reproduce this issue with latest builds. So cannot provide any another logs, and env was already provisioned multiple times... All logs from first host were attached in Comment 8 , you can find copied whole folder /var/log/glusterfs in logs under ./HostsLogs/tmp/ovirt-logs-hypervisor/glusterfs, but anyway I see there is no glusterd.log.
Due to the bug not being able to reproduce and the lack of additional logs due to that closing as INSUFFICIENT_DATA, please reopen if you manage to reproduce in the future.