RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1595389 - RFE: add alternative loop detection to list command that avoids large memory usage for large lists
Summary: RFE: add alternative loop detection to list command that avoids large memory ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: crash
Version: 7.5
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: ---
Assignee: Dave Wysochanski
QA Contact: Emma Wu
URL:
Whiteboard:
Depends On:
Blocks: 1647768
TreeView+ depends on / blocked
 
Reported: 2018-06-26 19:49 UTC by Dave Wysochanski
Modified: 2019-08-06 12:41 UTC (History)
7 users (show)

Fixed In Version: crash-7.2.3-9.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-06 12:41:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
POC untested v1 patch for a tortoise and hare loop detection algo for the 'list' command (4.29 KB, text/plain)
2018-06-27 20:25 UTC, Dave Wysochanski
no flags Details
POC v2 patch (lightly tested) for a tortoise and hare loop detection algo for the 'list' command (2.99 KB, patch)
2018-06-27 21:41 UTC, Dave Wysochanski
no flags Details | Diff
strace of crash process doing 'list -T -H 0xffff8ac03c81fc28 > /dev/null' - shows larger reads which is not obvious why (2.94 MB, text/plain)
2018-06-28 00:32 UTC, Dave Wysochanski
no flags Details
RFC PATCH Brent algo for list command loop detection - more efficient than Floyd / tortoise and hare (3.74 KB, text/plain)
2018-06-28 07:19 UTC, Dave Wysochanski
no flags Details
RFC patch (under testing) with do_list() replacement using Brent's algorithm for cycle detection (7.69 KB, text/plain)
2018-06-28 09:52 UTC, Dave Wysochanski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2071 0 None None None 2019-08-06 12:41:21 UTC

Description Dave Wysochanski 2018-06-26 19:49:36 UTC
Description of problem:
Filing this per the discussion on https://www.redhat.com/archives/crash-utility/2018-June/msg00014.html  due to the fact it looks non-trivial but doable.

A good suggestion for a possible algo was given by Jeremy:
https://www.redhat.com/archives/crash-utility/2018-June/msg00017.html

Note that we cannot just remove the hq_enter() function though:
https://www.redhat.com/archives/crash-utility/2018-June/msg00027.html

Probably focus on refactoring tools.c:do_list() function for now.

Also it's possible the hashing itself could be improved for better performance, see HQ_INDEX() definition for example.

Version-Release number of selected component (if applicable):
crash-7.3.2

How reproducible:
Easy with a list_head of many entries (we've seen this with various vmcores having dentry cache issues for example).

Steps to Reproduce:
1. Start crash on a vmcore containing a large list
2. Enumerate the list with 'list -H'
list -H 0xffff8ac03c81fc28 

Actual results:
ever increasing memory usage, and the number of entries per minute significantly slows down after a few hours (we saw a factor of 10 after about 12 hours)

Expected results:
The list command can detect loops without using any memory

Additional info:
We have sample vmcores I'll put as a private note.  While it may seem pathalogical we have increasing vmcore sizes and it would be very useful to have a more optimized list enumeration for faster vmcore diagnosis.

I intend to take a crack at this unless someone else beats to me to it.  Right now I have a few other items on my plate so it may be a couple weeks for me.

Comment 3 Dave Anderson 2018-06-27 14:13:27 UTC
(In reply to Dave Wysochanski from comment #0)
> ...
> Probably focus on refactoring tools.c:do_list() function for now.
> 
> Also it's possible the hashing itself could be improved for better
> performance, see HQ_INDEX() definition for example.
> ...
> Additional info:
> We have sample vmcores I'll put as a private note.  While it may seem
> pathalogical we have increasing vmcore sizes and it would be very useful to
> have a more optimized list enumeration for faster vmcore diagnosis.
> 
> I intend to take a crack at this unless someone else beats to me to it. 
> Right now I have a few other items on my plate so it may be a couple weeks
> for me.

Thanks Dave, I appreciate it.

If I may suggest a few ground rules first.

In addition to the user "list" command, the do_list() and hq_xxxx() facilities
are used internally throughout the crash utility proper for various purposes,
and are exported for use by extension modules,  so making wholesale changes to
them (i.e., refactoring the do_list() function), would be extremely risky.

Since this issue is (for now) specific to the user list command, I would
prefer that you generate an alternative function to do_list() that is only
used by the list command.  That way you could avoid working around
the hq_open/hq_enter/hq_close issues, and direct your efforts to making
the function do exactly what you come up with as an alternative to hashing
for loop detection.  Once that alternative do_list()-type function is
proven, then perhaps other users of do_list() could make the transition.

With respect to tweaking the current hash queue code, that's something
that certainly could be entertained.  For example, as I recall, the HQ_SHIFT
value was originally selected because the prime user of hash queues is/was
for task management, which used to be based upon the task_struct, which was
located in the kernel stack page.  And even now, the task_struct is almost
a page size in length.  So I guess it's a compromise as to what the best 
shift value should be used.  And w/respect to the hash entry memory usage,
it looks like it would be easy enough to dial back the ht->memptr size in restore_sanity().

Comment 4 Dave Wysochanski 2018-06-27 14:23:42 UTC
(In reply to Dave Anderson from comment #3)
> (In reply to Dave Wysochanski from comment #0)
> > ...
> > Probably focus on refactoring tools.c:do_list() function for now.
> > 
> > Also it's possible the hashing itself could be improved for better
> > performance, see HQ_INDEX() definition for example.
> > ...
> > Additional info:
> > We have sample vmcores I'll put as a private note.  While it may seem
> > pathalogical we have increasing vmcore sizes and it would be very useful to
> > have a more optimized list enumeration for faster vmcore diagnosis.
> > 
> > I intend to take a crack at this unless someone else beats to me to it. 
> > Right now I have a few other items on my plate so it may be a couple weeks
> > for me.
> 
> Thanks Dave, I appreciate it.
> 
> If I may suggest a few ground rules first.
> 
> In addition to the user "list" command, the do_list() and hq_xxxx()
> facilities
> are used internally throughout the crash utility proper for various purposes,
> and are exported for use by extension modules,  so making wholesale changes
> to
> them (i.e., refactoring the do_list() function), would be extremely risky.
> 
> Since this issue is (for now) specific to the user list command, I would
> prefer that you generate an alternative function to do_list() that is only
> used by the list command.  That way you could avoid working around
> the hq_open/hq_enter/hq_close issues, and direct your efforts to making
> the function do exactly what you come up with as an alternative to hashing
> for loop detection.  Once that alternative do_list()-type function is
> proven, then perhaps other users of do_list() could make the transition.
> 
Understood and what you explain here is exactly along the lines I was thinking.  I want to provide an alternative loop detection algorithm that does not use memory for the 'list' command, then prove it out and see where else (if any) it might be useful.

> With respect to tweaking the current hash queue code, that's something
> that certainly could be entertained.  For example, as I recall, the HQ_SHIFT
> value was originally selected because the prime user of hash queues is/was
> for task management, which used to be based upon the task_struct, which was
> located in the kernel stack page.  And even now, the task_struct is almost
> a page size in length.  So I guess it's a compromise as to what the best 
> shift value should be used.  And w/respect to the hash entry memory usage,
> it looks like it would be easy enough to dial back the ht->memptr size in
> restore_sanity().

I have the crash-debuginfo installed on optimus now and I want to run that vmcore and the list command again and look at the data structures under gdb.  I suspect maybe we the hash does not work well yet but I need to see what is going on.

Comment 5 Dave Wysochanski 2018-06-27 15:44:17 UTC
(In reply to Dave Wysochanski from comment #4)
> (In reply to Dave Anderson from comment #3)
> > (In reply to Dave Wysochanski from comment #0)
> 
> > With respect to tweaking the current hash queue code, that's something
> > that certainly could be entertained.  For example, as I recall, the HQ_SHIFT
> > value was originally selected because the prime user of hash queues is/was
> > for task management, which used to be based upon the task_struct, which was
> > located in the kernel stack page.  And even now, the task_struct is almost
> > a page size in length.  So I guess it's a compromise as to what the best 
> > shift value should be used.  And w/respect to the hash entry memory usage,
> > it looks like it would be easy enough to dial back the ht->memptr size in
> > restore_sanity().
> 
> I have the crash-debuginfo installed on optimus now and I want to run that
> vmcore and the list command again and look at the data structures under gdb.
> I suspect maybe we the hash does not work well yet but I need to see what is
> going on.

FWIW, an initial check of the hash_table->queue_heads[*].qcnt after a few minutes revealed a mostly even distribution in the queue_heads.  However, I didn't get into a problem until after a few hours so I'm going to let it run, look at the rate of the additions to the file, then stop gdb and look again to see if there's anything interesting going on or it's just really large.

Comment 6 Dave Wysochanski 2018-06-27 20:25:13 UTC
Created attachment 1455126 [details]
POC untested v1 patch for a tortoise and hare loop detection algo for the 'list' command

Comment 7 Dave Wysochanski 2018-06-27 20:26:45 UTC
Hey Dave A - do you have a vmcore with a list loop in it?

What do you think of my POC approach v1 in comment #6 (sorry ignore the makefile changes they are not necessary)?

Comment 8 Dave Wysochanski 2018-06-27 21:41:27 UTC
Created attachment 1455134 [details]
POC v2 patch (lightly tested) for a tortoise and hare loop detection algo for the 'list' command

Comment 9 Dave Wysochanski 2018-06-27 22:13:43 UTC
FWIW, I'm running patch in comment #8 as 'new-crash' and timing a crashrc file with the following command:
list -T -H 0xffff8ac03c81fc28 > /dev/null

Not that it means much given I'm not controlling workload on the machine or anything but it's a sanity test.

The original command where hashing was turned off took around 10 hours.

Comment 10 Dave Wysochanski 2018-06-28 00:29:09 UTC
One more note - there's a bug in my second patch - this is obviously wrong:
			ld->flags &= LIST_ALLOCATE;

Should be
			ld->flags &= ~LIST_ALLOCATE;

Comment 11 Dave Wysochanski 2018-06-28 00:32:03 UTC
Created attachment 1455139 [details]
strace of crash process doing 'list -T -H 0xffff8ac03c81fc28 > /dev/null' - shows larger reads which is not obvious why

Comment 12 Dave Wysochanski 2018-06-28 00:34:46 UTC
Comment on attachment 1455139 [details]
strace of crash process doing 'list -T -H 0xffff8ac03c81fc28 > /dev/null' - shows larger reads which is not obvious why

There's a lot of ~1k reads going on here from the disk, it looks like crash is reading outside of the list_head possibly trying to cache part of the structure?  I can understand the 24 byte read and the lseek but not sure about the larger read.

write(19, "ffff8a978a7abc40\n", 17)     = 17
write(19, "ffff8a978a7abdc0\n", 17)     = 17
lseek(3, 2122052384, SEEK_SET)          = 2122052384
read(3, "\324\247\20*\30\0\0\0\301\4\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 103784949716, SEEK_SET)        = 103784949716
read(3, "\0 \200\4\10\0\2\0\0\0\310\206\335\211\230\212\377\377HK\211\243\222\212\377\377\200|\3 \276\212"..., 1217) = 1217   <---- NOT OBVIOUS WHY???
write(19, "ffff8a978a7abf40\n", 17)     = 17
write(19, "ffff8a978a7ac140\n", 17)     = 17
write(19, "ffff8a978a7ac2c0\n", 17)     = 17
write(19, "ffff8a978a7ac440\n", 17)     = 17
write(19, "ffff8a978a7ac5c0\n", 17)     = 17
write(19, "ffff8a978a7ac740\n", 17)     = 17
write(19, "ffff8a978a7ac8c0\n", 17)     = 17
write(19, "ffff8a978a7aca40\n", 17)     = 17
write(19, "ffff8a978a7acbc0\n", 17)     = 17
write(19, "ffff8a978a7acd40\n", 17)     = 17
lseek(3, 2122052408, SEEK_SET)          = 2122052408
read(3, "\225\254\20*\30\0\0\0\236\4\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 24) = 24
lseek(3, 103784950933, SEEK_SET)        = 103784950933
read(3, "\0\0016_dOeSnotExist_.db\0+\0\0\r\300c\245\300\377\377\377"..., 1182) = 1182  <---- NOT OBVIOUS WHY???
write(19, "ffff8a978a7acec0\n", 17)     = 17
write(19, "ffff8a978a7ad040\n", 17)     = 17
write(19, "ffff8a978a7ad1c0\n", 17)     = 17

Comment 13 Dave Wysochanski 2018-06-28 00:38:04 UTC
Ok that is normal - reading the dump file.  I guess we cannot read a small amount because of the dump format so we are reading around 1k.  Confirmed in gdb.

Comment 14 Dave Wysochanski 2018-06-28 07:19:33 UTC
Created attachment 1455200 [details]
RFC PATCH Brent algo for list command loop detection - more efficient than Floyd / tortoise and hare

Comment 15 Dave Wysochanski 2018-06-28 08:33:26 UTC
I realize these initial patches are not the approach of a different do_list() function from comment #3 so they probably won't get accepted.  These were the minimal changes I thought of to get a test working and help me understand a little better.

I think I understand a bit more of what is going on now.  I knew do_list() was used by many callers but didn't understand it well and didn't understand hq_* functions.  I see now that sometimes the hash is used for memory allocation at the end (callers setting LIST_ALLOCATE flag), as well as existence checks (callers of hq_entry_exists), so the hash is used for more than cycle detection in some places.  This LIST_ALLOCATE flag is not set for the list command though, and no calls to hq_entry_exists, so it looks like hq_* functions are only used for cycle detection  there.  So far I have not found a reason that a new non-storage cycle-detector would be a non-starter for list command, and I cannot see any downside so far.

Comment 17 Dave Wysochanski 2018-06-28 09:52:54 UTC
Created attachment 1455222 [details]
RFC patch (under testing) with do_list() replacement using Brent's algorithm for cycle detection

Comment 18 Dave Wysochanski 2018-06-28 09:58:02 UTC
There is a lot more duplication in the patch in comment #17 but that is my understanding of the approach asked for in comment #3.  I am not sure it is correct though.  It's also less clear what is changing there and it's possible more can be cut out or we could refactor at least some of the duplication (for example, the 'if (CRASHDEBUG(1)) {' part at the top).

Comment 19 Dave Anderson 2018-06-28 13:23:43 UTC
(In reply to Dave Wysochanski from comment #7)
> Hey Dave A - do you have a vmcore with a list loop in it?
> 

I don't.  But it would be simple to inject an error into the
readmem() function.  Generate a long list from any vmcore, and
dump the output into a file.  Then take an address near the
very beginning (other than the first entry), and find an
address near the end of the list.  Then at the top of
readmem() do something like:

   if ((addr == <address_near_end>) &&
        STREQ(pc->curcmd, "list") &&
        STREQ(type, "list entry")) {
            ulong *listptr;

            listptr = (ulong *)buffer;
            *listptr = <address_near_beginning>;
            return TRUE;
   }

Comment 20 Dave Wysochanski 2018-06-28 15:15:46 UTC
(In reply to Dave Anderson from comment #19)
> (In reply to Dave Wysochanski from comment #7)
> > Hey Dave A - do you have a vmcore with a list loop in it?
> > 
> 
> I don't.  But it would be simple to inject an error into the
> readmem() function.  

Great suggestion thanks Dave!

Comment 22 Dave Wysochanski 2018-06-28 22:01:27 UTC
One difference with this simplified non-hash table Brent algorithm is that patch #14 won't give you the precise location of the first duplicate, but it will tell you there is a loop in the list.  Is that a show stopper?

Comment 23 Dave Wysochanski 2018-06-29 00:22:50 UTC
Ok forget my question - there are other portions to these algorithms that give the start of the cycle - duh.  Let me see how complicated that is and whether it is worthwhile.

Comment 24 Dave Wysochanski 2018-06-29 01:09:22 UTC
Ok so for floyd, if a loop is detected, we basically have to reset the tortoise to the start, make the hare the same speed, and step them until they meet.  This may mean a worst case of running the list twice to find the start of the loop.  It is probably ok.

I will have to check Brent and how much it improves on this and how much.

Seems doable so we should be able to remove the hash table from the list function with a bit more work.

Comment 25 Dave Anderson 2018-06-29 14:44:04 UTC
Just playing with your comment #17 Brent patch, and yeah, I see that the
"duplicate list entry" value is all over the place, such that it's more
confusing than helpful to show it.  In any case, it's pretty nifty how
it works -- although honestly I haven't really made sense of *how* it 
actually works.

Anyway, if we keep that algorithm in place, it would make more sense to 
make the call to the new function as an alternative "list" option
that specifically serves to check whether a given list loops around.

Comment 26 Dave Wysochanski 2018-06-29 15:52:05 UTC
(In reply to Dave Anderson from comment #25)
> Just playing with your comment #17 Brent patch, and yeah, I see that the
> "duplicate list entry" value is all over the place, such that it's more
> confusing than helpful to show it.  In any case, it's pretty nifty how
> it works -- although honestly I haven't really made sense of *how* it 
> actually works.
> 
> Anyway, if we keep that algorithm in place, it would make more sense to 
> make the call to the new function as an alternative "list" option
> that specifically serves to check whether a given list loops around.

Yeah that patch from comment #17 will not fly.  I have a v2 patch I'm working on to identify the start of the loop - i.e. the first duplicate and then stop.  I'm seeing how close to identical output I can get but I'm not sure.  I think it will likely print a few more list entries along the loop before it stops.  Basically this is the tradeoff of not using a hash table but more a mathematical algorithm.  I'm not sure if that is enough?  If it prints the same line as the hash table, i.e. 
list: duplicate list entry: ffff8abdf3bb5d00

but the list output before that contains a bit more items (some in the loop) would that be acceptable or not?  You get rid of the overhead of the hash table but the output is slightly more than before (but you still know the exact location as in the hash table).

The Brent algo is better than Floyd as far as I can tell in that you don't have to actually read from two points in the list (file) at the same time, unless you actually have a loop.  With Brent the second faster pointer just follows the slow one around, doubling every so often.  Then the slow pointer eventually hits this second pointer, and you know you're in the loop. Once you are in the loop, with Brent you pay the penalty of reading two places in the file but with Floyd even if you're not in the loop you're paying that penalty so I don't like it as much. 

I would like to at least get to a stopping point and then make a call whether this is worthwhile or not.

Comment 27 Dave Wysochanski 2018-06-29 16:07:26 UTC
Here is a concrete example of what I have now.

crash-7.3.2 (with error inject):
Working directory /cores/retrace/tasks/795082132/misc.
crash-debug> list -H 0xffff8ac03c81fc28
ffff8abdf38b7d00
ffff8abdf38b7ac0
ffff8abdf38b7f40
ffff8abe0edb4b00
ffff8abdf3bb5d00
ffff8abdf3bb5b80
ffff8abdf3bb5640
ffff8abe0e92b100
ffff8abe0e92ad40
DEBUG brent - injecting error - next = ffff8abe0ead2200 points back to ffff8abdf3bb5d00
ffff8abdf3bb5d00

list: duplicate list entry: ffff8abdf3bb5d00
crash-debug> 


Example output of Brent v2:

new-crash-v2-debug> list -H 0xffff8ac03c81fc28
ffff8abdf38b7d00
ffff8abdf38b7ac0
ffff8abdf38b7f40
ffff8abe0edb4b00
ffff8abdf3bb5d00
ffff8abdf3bb5b80
ffff8abdf3bb5640
ffff8abe0e92b100
ffff8abe0e92ad40
DEBUG brent - injecting error - next = ffff8abe0ead2200 points back to ffff8abdf3bb5d00
ffff8abdf3bb5d00
ffff8abdf3bb5b80  <---------- NEW OUTPUT
DEBUG brent - injecting error - next = ffff8abe0ead2200 points back to ffff8abdf3bb5d00

list: duplicate list entry: ffff8abdf3bb5d00
new-crash-v2-debug>

Comment 28 Dave Wysochanski 2018-06-30 00:22:30 UTC
Ok I did a floyd part (b) and it I got identical output now:

new-crash-floyd-b> list -H 0xffff8ac03c81fc28
ffff8abdf38b7d00
ffff8abdf38b7ac0
ffff8abdf38b7f40
ffff8abe0edb4b00
ffff8abdf3bb5d00
ffff8abdf3bb5b80
ffff8abdf3bb5640
ffff8abe0e92b100
ffff8abe0e92ad40
ffff8abdf3bb5d00

list: duplicate list entry: ffff8abdf3bb5d00
new-crash-floyd-b> 


Will need to test some more and I think I figured out what was wrong with brent.  I'd like to get brent part (b) in there  - that would be awesome.

Comment 30 Dave Wysochanski 2018-07-01 11:32:47 UTC
Ok I have Brent working now fully.  Here is what I have as far as output for Brent now:
new-crash-brent-v1-debug> list -H 0xffff8ac03c81fc28
ffff8abdf38b7d00
ffff8abdf38b7ac0
ffff8abdf38b7f40
ffff8abe0edb4b00
ffff8abdf3bb5d00
ffff8abdf3bb5b80
ffff8abdf3bb5640
ffff8abe0e92b100
ffff8abe0e92ad40
ffff8abdf3bb5d00
ffff8abdf3bb5b80
ffff8abdf3bb5640
list: loop detected, loop length: 5
list: length from start to loop: 4
list: duplicate list entry: ffff8abdf3bb5d00
new-crash-brent-v1-debug>

While different, I think this actually may be superior output to what is currently given.  The reason is you see part of the repeat, you have the loop length as well as the length at the start of the loop.

Brent adds almost no overhead in the list traversal case.  If there is a loop, it adds overhead of having to re-traverse the list (similar to floyd) but I think this is a good tradeoff since the overhead of a list advance is fixed vs a hash table which is increasing and necessary even for detection.

I am working on a really good patch header to explain how it works (it's not as difficult and obtuse as it may seem), and can clean up the code to be exactly a duplicate function to do_list() but with this different algorithm.  I can also add whatever documentation is necessary.  I still need a better test plan and error injection but I'll work on that soon.

Comment 33 Dave Wysochanski 2018-07-02 01:24:33 UTC
I don't think with either Floyd or Brent we will be able to guarantee identical output to existing crash with hash on in the loop case.  I'm not 100% sure it's impossible with Floyd but I'm certain it's impossible with Brent.

So if identical output to existing crash with hash on is a requirement, I'm not sure either algorithm will work.

Comment 34 Dave Anderson 2018-07-02 13:15:54 UTC
(In reply to Dave Wysochanski from comment #33)
> I don't think with either Floyd or Brent we will be able to guarantee
> identical output to existing crash with hash on in the loop case.  I'm not
> 100% sure it's impossible with Floyd but I'm certain it's impossible with
> Brent.
> 
> So if identical output to existing crash with hash on is a requirement, I'm
> not sure either algorithm will work.

The output in comment #30 seems OK/harmless/useful, although it may argue 
for a unique "list" command option whose sole purpose is to check for
loops.

Comment 35 Dave Wysochanski 2018-07-10 22:25:53 UTC
Posted the Brent implementation to upstream crash-utility list with a '-B' option for the list command, retaining the existing list function untouched.

Comment 37 Dave Anderson 2018-07-11 20:34:02 UTC
Patch applied upstream:

https://github.com/crash-utility/crash/commit/6596f1121b89162f96d1e1825c2905b83b59bec1

  The existing "list" command uses a hash table to detect duplicate
  items as it traverses the list.  The hash table approach has worked
  well for many years.  However, with increasing memory sizes and list
  sizes, the overhead of the hash table can be substantial, often
  leading to commands running for a very long time.  For large lists,
  we have found that the existing hash based approach may slow the
  system to a crawl and possibly never complete.  You can turn off
  the hash with "set hash off" but then there is no loop detection; in
  that case, loop detection must be done manually after dumping the
  list to disk or some other method.  This patch is an implementation
  of the cycle detection algorithm from R. P. Brent as an alternative
  algorithm for the "list" command.  The algorithm both avoids the
  overhead of the hash table and yet is able to detect a loop.  In
  addition, further loop characteristics are printed, such as the
  distance to the start of the loop as well as the loop length.
  An excellent description of the algorithm can be found here on
  the crash-utility mailing list:

  https://www.redhat.com/archives/crash-utility/2018-July/msg00019.html

  A new "list -B" option has been added to the "list" command to
  invoke this new algorithm rather than using the hash table.  In
  addition to low memory usage, the output of the list command is
  slightly different when a loop is detected.  In addition to printing
  the first duplicate entry, the length of the loop, and the distance
  to the loop is output.
  (dwysocha)

Comment 52 errata-xmlrpc 2019-08-06 12:41:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2071


Note You need to log in before you can comment on or make changes to this bug.