Hide Forgot
[Migrated from RT] - ticket 979 - [http://support.gluster.com/rt/Ticket/Display.html?id=979] Wed Apr 22 03:13:15 2009 guru - Ticket created version: glusterfs 2.0.0 rc8 * 100 TB cluster * 7 server distribute over replicate * Rebooted brick7 (which was exporting brick7-ib and brick7-tcp) * brick7-ib and brick8-tcp were afr'd, brick6-ib and brick7-tcp were afr'd. So bringing down one server and back up should have been handled without errors. Kernel compile (on brick3): i=0; while true; do ((i++)); echo "======= $i =======" | tee -a results; /opt/qa/tools/kernel_compile.sh linux-2.6.29.tar.bz2 || break; rm -rf linux-2.6.29 || break; done .. ======= 2 ======= Extracting Tarball .. bzip2: I/O or other error, bailing out. Possible reason follows. bzip2: File descriptor in bad state Input file = (stdin), output file = (stdout) tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now dbench (on brick6): i=0; while true; do echo "===== $i ====="; ((i++)); /opt/benchmarks/dbench-4.0/bin/dbench -s -S -F 48 || break; done .. 48 5943 5.32 MB/sec execute 168 sec latency 17396.433 ms 48 5943 5.29 MB/sec execute 169 sec latency 18397.434 ms 48 5944 5.26 MB/sec execute 170 sec latency 19207.415 ms [5645] open ./clients/client17/~dmtmp/PARADOX/ANSWER.DB failed for handle 11085 (No such file or directory) (5646) ERROR: handle 11085 was not found [5825] open ./clients/client13/~dmtmp/SEED/MEDIUM.FIL failed for handle 11121 (No such file or directory) (5826) ERROR: handle 11121 was not found Child failed with status 1 [root@brick6 dbench]# [5712] open ./clients/client6/~dmtmp/PARADOX/STUDENTS.XG0 failed for handle 11093 (No such file or directory) [5825] open ./clients/client1/~dmtmp/SEED/MEDIUM.FIL failed for handle 11121 (No such file or directory) ------------------------------------------------------------------------------- # Wed Apr 22 17:33:27 2009 gowda - Correspondence added On Wed Apr 22 03:13:15 2009, guru wrote: > * 100 TB cluster > * 7 server distribute over replicate > * Rebooted brick7 (which was exporting brick7-ib and brick7-tcp) > * brick7-ib and brick8-tcp were afr'd, brick6-ib and brick7-tcp were > afr'd. So bringing down one server and back up should have been handled > without errors. > > > Kernel compile (on brick3): > > i=0; while true; do ((i++)); echo "======= $i =======" | tee -a results; > /opt/qa/tools/kernel_compile.sh linux-2.6.29.tar.bz2 || break; rm -rf > linux-2.6.29 || break; done > .. > ======= 2 ======= > Extracting Tarball .. > > bzip2: I/O or other error, bailing out. Possible reason follows. > bzip2: File descriptor in bad state > Input file = (stdin), output file = (stdout) > tar: Unexpected EOF in archive > tar: Unexpected EOF in archive > tar: Error is not recoverable: exiting now > from the log messages for brick3, i saw that afr7 (subvolume of dht) is down and the kernel tarball is hashed to afr7 (i assumed that backend is still valid). EBADFD is expected when a subvolume is completely down. > > > dbench (on brick6): > i=0; while true; do echo "===== $i ====="; ((i++)); > /opt/benchmarks/dbench-4.0/bin/dbench -s -S -F 48 || break; done > .. > 48 5943 5.32 MB/sec execute 168 sec latency 17396.433 ms > 48 5943 5.29 MB/sec execute 169 sec latency 18397.434 ms > 48 5944 5.26 MB/sec execute 170 sec latency 19207.415 ms > [5645] open ./clients/client17/~dmtmp/PARADOX/ANSWER.DB failed for > handle 11085 (No such file or directory) > (5646) ERROR: handle 11085 was not found > [5825] open ./clients/client13/~dmtmp/SEED/MEDIUM.FIL failed for handle > 11121 (No such file or directory) > (5826) ERROR: handle 11121 was not found > Child failed with status 1 > [root@brick6 dbench]# [5712] open > ./clients/client6/~dmtmp/PARADOX/STUDENTS.XG0 failed for handle 11093 > (No such file or directory) > [5825] open ./clients/client1/~dmtmp/SEED/MEDIUM.FIL failed for handle > 11121 (No such file or directory) -- gowda -------------------------------------------------------------------------------- # Wed Apr 22 17:45:57 2009 gowda - Correspondence added please note that client log from brick6 is 3.8 GB. please specify the circumstances under which dbench failed. -- gowda
Observed similar error in a simple-afr configuration too. Setup is simple 2-way client-side AFR. Dbench was running when one of the servers was killed and restarted. dbench failed with error: [21874] unlink ./clients/client3/~dmtmp/EXCEL/SALES1.XLS failed (No such file or directory) - expected NT_STATUS_OK ERROR: child 3 failed at line 21874 Spec files and log files are in /share/tickets/<bug id>.