Bug 1624692
| Summary: | [FUSE]: Crefi IO failed with Transport endpoint connection error. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Ramakrishnan Periyasamy <rperiyas> | ||||||||
| Component: | CephFS | Assignee: | Patrick Donnelly <pdonnell> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | low | ||||||||||
| Version: | 3.1 | CC: | ceph-eng-bugs, john.spray, pdonnell | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | 3.* | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-10-30 20:59:00 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 1480438 [details]
dmesg
dmesg
Created attachment 1480459 [details]
/var/log/message log second run
Observed the issue again in the same setup with same IO profile. attached /var/log/messages.
Looks like ceph-fuse is using about 1.2GB of RAM: > Aug 31 10:17:07 magna070 kernel: [31680] 0 31680 428603 307058 676 0 0 ceph-fuse on a 32GB machine: > Aug 31 10:17:07 magna070 kernel: 8379877 pages RAM ? I don't see a bug here. The oom killer nuked ceph-fuse but it doesn't appear to have been leaking memory. Removing the blocker tag. |
Created attachment 1480437 [details] /var/log/message log Description of problem: Crefi IO failed with "Transport endpoint is not connected: '/mnt/fuse/fuse'". Observed this problem while running IO during weekend. IO Profile: for i in create rename; do sudo ./crefi.py --multi -b 2 -d 10 -n 10000000 -t sparse --random --min=1K --max=10K /mnt/fuse/fuse/dir4/ ; done Error in Crefi IO: -------------------- 2018-08-31 10:17:01,476 - thread-0: Total files created -- 9500 Process thread-0: Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/ubuntu/Crefi/crefi_helper.py", line 353, in multipledir dir_path) File "/home/ubuntu/Crefi/crefi_helper.py", line 129, in sparse_files create_sparse_file(fil, size, mins, maxs, rand) File "/home/ubuntu/Crefi/crefi_helper.py", line 52, in create_sparse_file os_wr(fil, data) File "/home/ubuntu/Crefi/crefi_helper.py", line 39, in os_wr os.write(fd, data) OSError: [Errno 103] Software caused connection abort 1 Traceback (most recent call last): File "./crefi.py", line 123, in <module> main() File "./crefi.py", line 119, in main args.randname, args.threads) File "./crefi.py", line 22, in multiple os.makedirs(dir_path) File "/usr/lib64/python2.7/os.py", line 150, in makedirs makedirs(head, mode) File "/usr/lib64/python2.7/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 107] Transport endpoint is not connected: '/mnt/fuse/fuse' Call Trace in /var/log/messages: ---------------------------------- Aug 31 10:17:04 magna070 kernel: python invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 Aug 31 10:17:04 magna070 kernel: python cpuset=/ mems_allowed=0 Aug 31 10:17:04 magna070 kernel: CPU: 3 PID: 9277 Comm: python Kdump: loaded Not tainted 3.10.0-933.el7.x86_64 #1 Aug 31 10:17:04 magna070 kernel: Hardware name: Supermicro SYS-F627R3-RTB+/X9DRFR, BIOS 3.0b 04/24/2014 Aug 31 10:17:04 magna070 kernel: Call Trace: Aug 31 10:17:04 magna070 kernel: [<ffffffffacd5fd2b>] dump_stack+0x19/0x1b Aug 31 10:17:04 magna070 kernel: [<ffffffffacd5aece>] dump_header+0x90/0x229 Aug 31 10:17:04 magna070 kernel: [<ffffffffac900f4b>] ? cred_has_capability+0x6b/0x120 Aug 31 10:17:04 magna070 kernel: [<ffffffffac7ba154>] oom_kill_process+0x254/0x3d0 Aug 31 10:17:04 magna070 kernel: [<ffffffffac90102e>] ? selinux_capable+0x2e/0x40 Aug 31 10:17:04 magna070 kernel: [<ffffffffac7ba996>] out_of_memory+0x4b6/0x4f0 Aug 31 10:17:04 magna070 kernel: [<ffffffffac7c1414>] __alloc_pages_nodemask+0xaa4/0xbb0 Aug 31 10:17:04 magna070 kernel: [<ffffffffac8117b5>] alloc_pages_vma+0xb5/0x200 Aug 31 10:17:04 magna070 kernel: [<ffffffffac7e9d77>] handle_pte_fault+0x887/0xd10 Aug 31 10:17:04 magna070 kernel: [<ffffffffac7ec31d>] handle_mm_fault+0x39d/0x9b0 Aug 31 10:17:04 magna070 kernel: [<ffffffffacd6d547>] __do_page_fault+0x197/0x4f0 Aug 31 10:17:04 magna070 kernel: [<ffffffffacd6d8d5>] do_page_fault+0x35/0x90 Aug 31 10:17:04 magna070 kernel: [<ffffffffacd69758>] page_fault+0x28/0x30 Please check the attached system logs for more details about trace. mount point is not accessible. After unmount and re-mount was able to access FUSE mount without any issues. Version-Release number of selected component (if applicable): ceph version 12.2.5-41.el7cp Linux kernel 3.10.0-933.el7.x86_64 How reproducible: 1/1, not sure either this can reproduced. Steps to Reproduce: 1. Configured cluster with 3 MDS (2 active, 1 standby) 2. Created FS with 2 data pools. 3. Continued running IO's from same client, IO tools used crefi, smallfiles and fio ran IO's in different directories. Actual results: FUSE mount not accessible with "Transport endpoint connection" error Expected results: Should not be any problem to access the mount. Additional info: NA