Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1624692

Summary: [FUSE]: Crefi IO failed with Transport endpoint connection error.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Ramakrishnan Periyasamy <rperiyas>
Component: CephFSAssignee: Patrick Donnelly <pdonnell>
Status: CLOSED NOTABUG QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 3.1CC: ceph-eng-bugs, john.spray, pdonnell
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 20:59:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/message log
none
dmesg
none
/var/log/message log second run none

Description Ramakrishnan Periyasamy 2018-09-03 05:48:14 UTC
Created attachment 1480437 [details]
/var/log/message log

Description of problem:

Crefi IO failed with "Transport endpoint is not connected: '/mnt/fuse/fuse'". Observed this problem while running IO during weekend. 

IO Profile: for i in create rename; do sudo ./crefi.py --multi -b 2 -d 10 -n 10000000 -t sparse --random --min=1K --max=10K /mnt/fuse/fuse/dir4/ ; done

Error in Crefi IO:
--------------------
2018-08-31 10:17:01,476 - thread-0: Total files created -- 9500
Process thread-0:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/Crefi/crefi_helper.py", line 353, in multipledir
    dir_path)
  File "/home/ubuntu/Crefi/crefi_helper.py", line 129, in sparse_files
    create_sparse_file(fil, size, mins, maxs, rand)
  File "/home/ubuntu/Crefi/crefi_helper.py", line 52, in create_sparse_file
    os_wr(fil, data)
  File "/home/ubuntu/Crefi/crefi_helper.py", line 39, in os_wr
    os.write(fd, data)
OSError: [Errno 103] Software caused connection abort
1
Traceback (most recent call last):
  File "./crefi.py", line 123, in <module>
    main()
  File "./crefi.py", line 119, in main
    args.randname, args.threads)
  File "./crefi.py", line 22, in multiple
    os.makedirs(dir_path)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 107] Transport endpoint is not connected: '/mnt/fuse/fuse'


Call Trace in /var/log/messages:
----------------------------------
Aug 31 10:17:04 magna070 kernel: python invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Aug 31 10:17:04 magna070 kernel: python cpuset=/ mems_allowed=0
Aug 31 10:17:04 magna070 kernel: CPU: 3 PID: 9277 Comm: python Kdump: loaded Not tainted 3.10.0-933.el7.x86_64 #1
Aug 31 10:17:04 magna070 kernel: Hardware name: Supermicro SYS-F627R3-RTB+/X9DRFR, BIOS 3.0b 04/24/2014
Aug 31 10:17:04 magna070 kernel: Call Trace:
Aug 31 10:17:04 magna070 kernel: [<ffffffffacd5fd2b>] dump_stack+0x19/0x1b
Aug 31 10:17:04 magna070 kernel: [<ffffffffacd5aece>] dump_header+0x90/0x229
Aug 31 10:17:04 magna070 kernel: [<ffffffffac900f4b>] ? cred_has_capability+0x6b/0x120
Aug 31 10:17:04 magna070 kernel: [<ffffffffac7ba154>] oom_kill_process+0x254/0x3d0
Aug 31 10:17:04 magna070 kernel: [<ffffffffac90102e>] ? selinux_capable+0x2e/0x40
Aug 31 10:17:04 magna070 kernel: [<ffffffffac7ba996>] out_of_memory+0x4b6/0x4f0
Aug 31 10:17:04 magna070 kernel: [<ffffffffac7c1414>] __alloc_pages_nodemask+0xaa4/0xbb0
Aug 31 10:17:04 magna070 kernel: [<ffffffffac8117b5>] alloc_pages_vma+0xb5/0x200
Aug 31 10:17:04 magna070 kernel: [<ffffffffac7e9d77>] handle_pte_fault+0x887/0xd10
Aug 31 10:17:04 magna070 kernel: [<ffffffffac7ec31d>] handle_mm_fault+0x39d/0x9b0
Aug 31 10:17:04 magna070 kernel: [<ffffffffacd6d547>] __do_page_fault+0x197/0x4f0
Aug 31 10:17:04 magna070 kernel: [<ffffffffacd6d8d5>] do_page_fault+0x35/0x90
Aug 31 10:17:04 magna070 kernel: [<ffffffffacd69758>] page_fault+0x28/0x30

Please check the attached system logs for more details about trace.

mount point is not accessible.

After unmount and re-mount was able to access FUSE mount without any issues.

Version-Release number of selected component (if applicable):
ceph version 12.2.5-41.el7cp
Linux kernel 3.10.0-933.el7.x86_64

How reproducible:
1/1, not sure either this can reproduced.

Steps to Reproduce:
1. Configured cluster with 3 MDS (2 active, 1 standby)
2. Created FS with 2 data pools.
3. Continued running IO's from same client, IO tools used crefi, smallfiles and fio ran IO's in different directories.

Actual results:
FUSE mount not accessible with "Transport endpoint connection" error

Expected results:
Should not be any problem to access the mount.

Additional info:
NA

Comment 3 Ramakrishnan Periyasamy 2018-09-03 05:50:50 UTC
Created attachment 1480438 [details]
dmesg

dmesg

Comment 5 Ramakrishnan Periyasamy 2018-09-03 07:21:32 UTC
Created attachment 1480459 [details]
/var/log/message log second run

Observed the issue again in the same setup with same IO profile. attached /var/log/messages.

Comment 6 Patrick Donnelly 2018-09-04 16:19:20 UTC
Looks like ceph-fuse is using about 1.2GB of RAM:

> Aug 31 10:17:07 magna070 kernel: [31680]     0 31680   428603   307058     676        0             0 ceph-fuse

on a 32GB machine:

> Aug 31 10:17:07 magna070 kernel: 8379877 pages RAM

?

I don't see a bug here. The oom killer nuked ceph-fuse but it doesn't appear to have been leaking memory.

Comment 9 Ramakrishnan Periyasamy 2018-09-05 06:36:58 UTC
Removing the blocker tag.