Currently the GFS2 journal is mapped in the same way as any other file on the disk. Since we know that the currently active journal cannot change while its in use, it makes sense to scan all of the indirect pointer blocks at mount time and then create an in-core extent map. This can then be used for bmap operations which will then be much faster (and use less memory) than reading the indirect blocks all the time. Since we expect the journal to be laid out linearly on disk in general, the entire journal should be easily representable with very few extents.
Created attachment 282321 [details] Bob's proposed patch This patch is my first attempt at solving this problem. The good news is that it seems to function perfectly. In other words, it seems to be figuring out the journal blocks in all cases using the new method. It hasn't been put through a "serious" test though. The bad news is that it doesn't seem to be speeding things up much, at least when using my bobzone program, which hardly registers a difference. It's not a very stressful test though, so it's not too surprising. Steve, I can assume ownership of this one if you want. First I wanted you to have a look at my approach and tell me if this is what you had in mind. If you like it, I can send it to cluster-devel for inclusion upstream, but it should probably be tested further before I do.
Some comments: +struct gfs2_journal_extent { + struct list_head extent_list; + + unsigned int lblock; /* First logical block */ + u64 dblock; /* First disk block */ + u64 blocks; +}; The logical block needs to be a u64, the "blocks" cannot be more than a u32 due to the max size of an rgrp. Also this structure probably can be private to bmap.c (see below). There is no need for log_bmap to have a fall back in case of a missing extent list. Just fail in that case as it should never happen. I'd suggest moving log_bmap and your new map_journal_extents() into bmap.c to keep all the block mapping code in one place. Also, you don't have to map the journal one block at a time. You might as well try to map as many blocks as posible with a single call to gfs2_block_map(). The function gfs2_write_alloc_required() has some hints on how to do that.
Actually, scratch that last remark, gfs2_allocate_page_backing() is a far better example (only in upstream atm).
My first prototype of this fix had lblock as 64 bits. However, AFAICT the lblock can only be 32-bits due to the way it's passed in because log_bmap gets passed the lblock as an unsigned int, which comes from sdp->sd_log_flush_head, which is also an unsigned int. If this is a problem, we have several other code issues we need to fix and we should fix them across the board. Should that be the case, please open a new bugzilla record for that work. As for this one, my code to implement this was built in to my performance enhancements for bug #253990. As such, it also appears in the upstream GFS2 as well. As per your suggestion, I removed the fall-back case from comment #2. For that reason, I'm closing this as a duplicate of 253990. Perhaps I should have separated the two, but we were under a time crunch. Also, I'm not worried about doing the journal mapping one block at a time since it's only done at mount time. Perhaps we can consider improving it as an enhancement in the future. *** This bug has been marked as a duplicate of 253990 ***