1566180 – [CephFS]: Kernel client with fstab entry ,is not coming up after reboot

Bug 1566180 - [CephFS]: Kernel client with fstab entry ,is not coming up after reboot

Summary: [CephFS]: Kernel client with fstab entry ,is not coming up after reboot

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Documentation
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	3.3
Assignee:	Bara Ancincova
QA Contact:	subhash
Docs Contact:
URL:
Whiteboard:
Depends On:	1607590
Blocks:	1726134
TreeView+	depends on / blocked

Reported:	2018-04-11 17:38 UTC by Persona non grata
Modified:	2019-09-13 12:43 UTC (History)
CC List:	13 users (show)
Fixed In Version:	RHEL: ceph-12.2.5-47.el7cp Ubuntu: ceph_12.2.5-32redhat1
Doc Type:	Known Issue
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-13 12:43:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Screenshot of kernel client's console (320.21 KB, image/jpeg) 2018-04-11 17:38 UTC, Persona non grata	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1607590	0	low	CLOSED	MDS should create health warning if it detects slow metadata writes	2021-02-22 00:41:40 UTC

Internal Links: 1607590

Description Persona non grata 2018-04-11 17:38:51 UTC

Created attachment 1420491 [details]
Screenshot of kernel client's console

Description of problem:
With kclient having fstab entry with mount point,failed to come up after reboot. Clients are VMs. This had happened while IOs were happening with other clients(2 fuse and 1 kernel)

Version-Release number of selected component (if applicable):
Ceph->ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable)
Os-> RHEL 7.5


How reproducible:
Always

Steps to Reproduce:
1. Set up ceph cluster,with 4 clients (2 fuse and 2 kernel)
2. Do fstab entry on each client and make sure that mount point is exists
3. Do IOs in parallel with 3 clients and one client to reboot, after reboot of client, IOs will start on rebooted client

Actual results:
Kernel client failed to come up after reboot,while fuse clients came up successfuly

Expected results:
Kernel clients should come up after reboot and do IOs

Additional info:
Screen shot of client's console is added

Comment 6 Yan, Zheng 2018-04-12 00:49:40 UTC

yes, should be QE setup issue. does fstab entry includes _netdev ?

Comment 9 Yan, Zheng 2018-04-12 12:56:58 UTC

please attach kernel dmesg

Comment 12 Yan, Zheng 2018-04-12 13:40:59 UTC

mount did happen before network was online. I don't know why _netdev option did not work. try putting _netdev at the beginning of option list?

Comment 17 Yan, Zheng 2018-05-10 07:12:10 UTC

I think this is mount(8) issue. Shreekar, could you check if 'mount -a -O no_netdev' works as expected.

Comment 20 Yan, Zheng 2018-05-22 13:14:55 UTC

have you tried fstab entry with secret instead of secretfile

Comment 21 Persona non grata 2018-05-22 17:04:40 UTC

Zhen,
Yes,same results

Comment 22 Yan, Zheng 2018-06-19 13:42:18 UTC

FYI: this can be http://tracker.ceph.com/issues/24202

Comment 23 Yan, Zheng 2018-06-19 13:51:54 UTC

please ignore my previous comment

Comment 24 Yan, Zheng 2018-06-20 09:13:11 UTC

still can't reproduce this locally, could you setup a test environment for me.

Comment 25 Persona non grata 2018-06-20 19:05:31 UTC

Yes, I have mailed the setup info

Comment 26 Yan, Zheng 2018-06-20 23:51:55 UTC

I think this issue happens only when there are extensive IOs.  I checked mds log, mds did get session open request from client. But it took several minutes to flush log event of session open. client mount timeout before log event get flushed.

2018-06-20 10:38:03.216253 7f3eae373700  5 mds.1.log _submit_thread 3851465531~256 : ESession client.825001 172.16.115.94:0/1151192669 open cmapv 179381
…
2018-06-20 10:40:42.718710 7f3eaf375700 10 mds.1.server _session_logged client.825001 172.16.115.94:0/1151192669 state_seq 1 open 179381

I think this is cluster config issue. cephfs data pool and metadata pool are on the same set of OSDs. Heavy data IOs significantly metadata IOs.

Comment 27 Patrick Donnelly 2018-07-11 21:59:30 UTC

Zheng, I think we could detect this situation by monitoring the average RTT of object writes for the journal. Then we would be able to give an cluster health warning. What do you think?

Comment 28 Yan, Zheng 2018-07-12 01:10:29 UTC

yes, it makes sense to me

Comment 30 Patrick Donnelly 2018-07-23 19:38:49 UTC

Please retest this BZ with the fix from bz1607590. If the problem is related to the MDS suffering from slow writes to the OSDs, then this issue should be closed. (It's a cluster configuration issue and not a real bug.)

Comment 42 Yan, Zheng 2018-10-29 11:20:18 UTC

client can't connect to monitor, this is more like network issue.

Comment 44 Yan, Zheng 2018-11-06 09:44:46 UTC

If network is not reachable, what else cephfs can do?

Comment 45 Yan, Zheng 2018-11-06 09:44:47 UTC

If network is not reachable, what else cephfs can do?

Comment 46 Persona non grata 2018-11-06 10:25:45 UTC

Network disconnect happens only when there's a kernel fstab entry, the normal reboot of kernel client without a fstab entry does not cause any issues like network disconnection.

Comment 47 Yan, Zheng 2018-11-06 12:06:59 UTC

Systemd tried to mount cephfs before network was ready. Mount got stuck, systemd had no change to start network. 


I still thank this is mount(8) issue. try putting _netdev before other options.

Comment 57 Bara Ancincova 2019-09-13 12:43:14 UTC

The content is already published on the Portal:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/index#automatically-mounting-the-ceph-file-system-as-a-kernel-client
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/ceph_file_system_guide/index#automatically-mounting-the-ceph-file-system-as-a-fuse-client

Note You need to log in before you can comment on or make changes to this bug.