.Client I/O sometimes fails for CephFS FUSE clients
Client I/O sometimes fails for Ceph File System (CephFS) as a File System in User Space (FUSE) clients with the error `transport endpoint shutdown` due to assert in the FUSE service. To workaround this issues, unmount and then remount CephFS FUSE, and then start the client I/Os.
DescriptionRamakrishnan Periyasamy
2018-05-10 16:47:54 UTC
Description of problem:
dd IO failed with "dd: failed to open ‘/mnt/fuse/fuse/client1/dir1/1M_1/magna021_000000104856’: Cannot send after transport endpoint shutdown"
dd script used during testing:
function n {
printf "%s_%012d" "$(hostname)" "$1"
}
dir=/mnt/fuse/fuse/client1/dir1
time mkdir -p $dir/1M_1 && for ((i = 0; i < 2**20/10; i++)); do echo "$i" > /root/1M_1_CURRENT; dd status=none if=/dev/urandom of="$dir/1M_1/$(n "$i")" bs=4k count=1; done
Fuse assert:
2018-05-10 10:20:50.709658 7f96cadb90c0 0 ~Inode: leftover objects on inode 0x0x10000082b16
2018-05-10 10:20:50.726976 7f96cadb90c0 -1 /builddir/build/BUILD/ceph-12.2.1/src/client/Inode.cc: In function 'Inode::~Inode()' thread 7f96cadb90c0 time 2018-05-10 10:20:50.709671
/builddir/build/BUILD/ceph-12.2.1/src/client/Inode.cc: 27: FAILED assert(oset.objects.empty())
ceph version 12.2.1-46.el7cp (b6f6f1b141c306a43f669b974971b9ec44914cb0) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55ea9e2857d0]
2: (Inode::~Inode()+0x5e5) [0x55ea9e250865]
3: (Client::put_inode(Inode*, int)+0x350) [0x55ea9e1cdda0]
4: (Client::unlink(Dentry*, bool, bool)+0x11a) [0x55ea9e1d2cca]
5: (Client::trim_dentry(Dentry*)+0x93) [0x55ea9e1d3433]
6: (Client::trim_cache(bool)+0x328) [0x55ea9e1d38c8]
7: (Client::tear_down_cache()+0x2eb) [0x55ea9e1e987b]
8: (Client::~Client()+0x53) [0x55ea9e1e9ab3]
9: (StandaloneClient::~StandaloneClient()+0x9) [0x55ea9e1ea1c9]
10: (main()+0x534) [0x55ea9e19c3a4]
11: (__libc_start_main()+0xf5) [0x7f96c7af23d5]
12: (()+0x212bf3) [0x55ea9e1a5bf3]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
Version-Release number of selected component (if applicable):
Client ceph version: 12.2.1-46.el7cp
Cluster daemons are in 12.2.4-10.el7cp --> cluster upgraded from 12.2.1-46 to 12.2.4-10.el7cp version.
How reproducible:
1/1
Steps to Reproduce:
1. Configure cluster with MDS using 12.2.1-46.el7cp
2. Created around 800k inodes in the FS
3. Did cluster upgrade to 12.2.4-10.el7cp using MDS upgrade procedures manually
4. After deactivating the active MDS it was in stopping state, observed client hung and fuse assert.
Actual results:
Expected results:
Client IO should not fail
Additional info:
NA
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:2177
Description of problem: dd IO failed with "dd: failed to open ‘/mnt/fuse/fuse/client1/dir1/1M_1/magna021_000000104856’: Cannot send after transport endpoint shutdown" dd script used during testing: function n { printf "%s_%012d" "$(hostname)" "$1" } dir=/mnt/fuse/fuse/client1/dir1 time mkdir -p $dir/1M_1 && for ((i = 0; i < 2**20/10; i++)); do echo "$i" > /root/1M_1_CURRENT; dd status=none if=/dev/urandom of="$dir/1M_1/$(n "$i")" bs=4k count=1; done Fuse assert: 2018-05-10 10:20:50.709658 7f96cadb90c0 0 ~Inode: leftover objects on inode 0x0x10000082b16 2018-05-10 10:20:50.726976 7f96cadb90c0 -1 /builddir/build/BUILD/ceph-12.2.1/src/client/Inode.cc: In function 'Inode::~Inode()' thread 7f96cadb90c0 time 2018-05-10 10:20:50.709671 /builddir/build/BUILD/ceph-12.2.1/src/client/Inode.cc: 27: FAILED assert(oset.objects.empty()) ceph version 12.2.1-46.el7cp (b6f6f1b141c306a43f669b974971b9ec44914cb0) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55ea9e2857d0] 2: (Inode::~Inode()+0x5e5) [0x55ea9e250865] 3: (Client::put_inode(Inode*, int)+0x350) [0x55ea9e1cdda0] 4: (Client::unlink(Dentry*, bool, bool)+0x11a) [0x55ea9e1d2cca] 5: (Client::trim_dentry(Dentry*)+0x93) [0x55ea9e1d3433] 6: (Client::trim_cache(bool)+0x328) [0x55ea9e1d38c8] 7: (Client::tear_down_cache()+0x2eb) [0x55ea9e1e987b] 8: (Client::~Client()+0x53) [0x55ea9e1e9ab3] 9: (StandaloneClient::~StandaloneClient()+0x9) [0x55ea9e1ea1c9] 10: (main()+0x534) [0x55ea9e19c3a4] 11: (__libc_start_main()+0xf5) [0x7f96c7af23d5] 12: (()+0x212bf3) [0x55ea9e1a5bf3] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- Version-Release number of selected component (if applicable): Client ceph version: 12.2.1-46.el7cp Cluster daemons are in 12.2.4-10.el7cp --> cluster upgraded from 12.2.1-46 to 12.2.4-10.el7cp version. How reproducible: 1/1 Steps to Reproduce: 1. Configure cluster with MDS using 12.2.1-46.el7cp 2. Created around 800k inodes in the FS 3. Did cluster upgrade to 12.2.4-10.el7cp using MDS upgrade procedures manually 4. After deactivating the active MDS it was in stopping state, observed client hung and fuse assert. Actual results: Expected results: Client IO should not fail Additional info: NA