Description of problem: dd IO failed with "dd: failed to open ‘/mnt/fuse/fuse/client1/dir1/1M_1/magna021_000000104856’: Cannot send after transport endpoint shutdown" dd script used during testing: function n { printf "%s_%012d" "$(hostname)" "$1" } dir=/mnt/fuse/fuse/client1/dir1 time mkdir -p $dir/1M_1 && for ((i = 0; i < 2**20/10; i++)); do echo "$i" > /root/1M_1_CURRENT; dd status=none if=/dev/urandom of="$dir/1M_1/$(n "$i")" bs=4k count=1; done Fuse assert: 2018-05-10 10:20:50.709658 7f96cadb90c0 0 ~Inode: leftover objects on inode 0x0x10000082b16 2018-05-10 10:20:50.726976 7f96cadb90c0 -1 /builddir/build/BUILD/ceph-12.2.1/src/client/Inode.cc: In function 'Inode::~Inode()' thread 7f96cadb90c0 time 2018-05-10 10:20:50.709671 /builddir/build/BUILD/ceph-12.2.1/src/client/Inode.cc: 27: FAILED assert(oset.objects.empty()) ceph version 12.2.1-46.el7cp (b6f6f1b141c306a43f669b974971b9ec44914cb0) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55ea9e2857d0] 2: (Inode::~Inode()+0x5e5) [0x55ea9e250865] 3: (Client::put_inode(Inode*, int)+0x350) [0x55ea9e1cdda0] 4: (Client::unlink(Dentry*, bool, bool)+0x11a) [0x55ea9e1d2cca] 5: (Client::trim_dentry(Dentry*)+0x93) [0x55ea9e1d3433] 6: (Client::trim_cache(bool)+0x328) [0x55ea9e1d38c8] 7: (Client::tear_down_cache()+0x2eb) [0x55ea9e1e987b] 8: (Client::~Client()+0x53) [0x55ea9e1e9ab3] 9: (StandaloneClient::~StandaloneClient()+0x9) [0x55ea9e1ea1c9] 10: (main()+0x534) [0x55ea9e19c3a4] 11: (__libc_start_main()+0xf5) [0x7f96c7af23d5] 12: (()+0x212bf3) [0x55ea9e1a5bf3] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- Version-Release number of selected component (if applicable): Client ceph version: 12.2.1-46.el7cp Cluster daemons are in 12.2.4-10.el7cp --> cluster upgraded from 12.2.1-46 to 12.2.4-10.el7cp version. How reproducible: 1/1 Steps to Reproduce: 1. Configure cluster with MDS using 12.2.1-46.el7cp 2. Created around 800k inodes in the FS 3. Did cluster upgrade to 12.2.4-10.el7cp using MDS upgrade procedures manually 4. After deactivating the active MDS it was in stopping state, observed client hung and fuse assert. Actual results: Expected results: Client IO should not fail Additional info: NA
https://github.com/ceph/ceph/pull/22168
cherry-pick for 3.1 is done https://bugzilla.redhat.com/show_bug.cgi?id=1585029
Manual Cluster upgrade passed without any issues. FS Sanity and regression automation runs passed. a) FS sanity in Jenkins is clean and results posted to polarion http://cistatus.ceph.redhat.com/ui/#cephci/launches/New_filter1%7Cpage.page=1&page.size=50&filter.cnt.name=sanity_fs&pag… b) http://pulpito.ceph.redhat.com/vasu-2018-06-21_16:30:55-fs-luminous-distro-basic-argo/ Moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2177