Bug 1757026 - unable to remove duplicate guest devices due to memory
Summary: unable to remove duplicate guest devices due to memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.10.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.10.13
Assignee: Adam Grare
QA Contact: Jaroslav Henner
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On: 1746600
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-30 12:24 UTC by Satoe Imaishi
Modified: 2019-12-03 06:55 UTC (History)
7 users (show)

Fixed In Version: 5.10.13.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1746600
Environment:
Last Closed: 2019-12-03 06:55:43 UTC
Category: ---
Cloudforms Team: RHEVM
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:4047 None None None 2019-12-03 06:55:50 UTC

Comment 3 CFME Bot 2019-10-04 17:10:32 UTC
New commit detected on ManageIQ/manageiq/hammer:

https://github.com/ManageIQ/manageiq/commit/1b32824661efb0835714c0816d6fbb6b4635dea5
commit 1b32824661efb0835714c0816d6fbb6b4635dea5
Author:     Gregg Tanzillo <gtanzill@redhat.com>
AuthorDate: Thu Aug 29 07:52:27 2019 -0400
Commit:     Gregg Tanzillo <gtanzill@redhat.com>
CommitDate: Thu Aug 29 07:52:27 2019 -0400

    Merge pull request #19219 from agrare/add_tool_to_cleanup_duplicate_host_guest_devices

    Add a tool to cleanup duplicate host guest_devices

    (cherry picked from commit 775ae0231932b28b637a1861e76019c44c3af640)

    https://bugzilla.redhat.com/show_bug.cgi?id=1757026

 tools/cleanup_duplicate_host_guest_devices.rb | 50 +
 1 file changed, 50 insertions(+)

Comment 4 Jaroslav Henner 2019-10-11 13:32:23 UTC
From the BZ#1746600
(In reply to Tuan from comment #4)
> Here is a reproducer with the customers DB: 10.10.181.72  admin/smartvm
> 
> 
> Customer db available here:
> http://file.rdu.redhat.com/tuado/logs/telconet02458379/
> 
> 
> Tuan

# file vmdb_production.dump 
vmdb_production.dump: data

[root@host-192-168-200-6 ~]# strings vmdb_production.dump
a$Y._
Y2*f
Vl!T
N'L7
(nm.
eP0!)Z
kr)	c


# hexdump -C vmdb_production.dump | head
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0c8f8000  1b 00 00 00 2b 02 00 00  78 9c 9d 94 5b 4f e2 50  |....+...x...[O.P|

This shows there is a huge block of zeros on the start of the file. This doesn't look like a DB dump



Tuan, can you please help? Thanks.

Comment 5 Jaroslav Henner 2019-10-14 17:15:56 UTC
The tool doesn't seem to help much. It seems to be removing like a one device per half minute.

After starting the tool:
[root@dhcp-8-198-90 vmdb]# ./tools/cleanup_duplicate_host_guest_devices.rb -e 1000000000041 --no-dry-run
Found 1095218 duplicate Guest Devices...
**** THIS WILL MODIFY YOUR DATABASE ****
     Press Enter to Continue: 

There is no feedback that it does something.

In the DB Tuan provided, after hour of running time:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                              
32968 root      20   0 9645376   8.9g   7288 R 100.0 76.9  66:43.05 ruby  

it consumed 76% of memory (growing slowly) and yes, this process is indeed the tool:
sed 's/\x0/ /g' < /proc/32968/cmdline 
ruby ./tools/cleanup_duplicate_host_guest_devices.rb -e 1000000000041


I have this much of GuestDevices:
GuestDevice.count
PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0
=> 1621372


Also, I think it needs bit better documentation:

1) it was not clear to me what to fill in the ems-name or ems-id (ExtManagementSystem). I had to read the source and check the DB to figure out it is the name/id of the provider.

2) The command-line --help is not clear about what is teh default --dry-run or the --no-dry-run?

  -d, --dry-run, --no-dry-run    Just print out what would be done without modifying anything         
                                 (default: true)

3) I think the customer should be instructed to run this tool on the providers he have added. Perhaps in the documentation/errata?


I am keeping the VMs for further investigation.

Comment 6 Jaroslav Henner 2019-10-14 17:32:18 UTC
I modified the script:

[root@dhcp-8-198-90 vmdb]# diff ./tools/cleanup_duplicate_host_guest_devices.rb  ./tools/cleanup_duplicate_host_guest_devices.moje.rb
48c48,51
< GuestDevice.destroy(guest_devices_to_delete) unless opts[:dry_run]
---
> for d in guest_devices_to_delete do
>   puts "Destroying #{d}"
>   GuestDevice.destroy(d)
> end


And after 5 minutes of working I see 
Found 1095132 duplicate Guest Devices...
**** THIS WILL MODIFY YOUR DATABASE ****
     Press Enter to Continue: 

Destroying 1000000037020
Destroying 1000000037393
Destroying 1000000037802
Destroying 1000000038119
Destroying 1000000038582
Destroying 1000000038979
Destroying 1000000039306

And from what I have read It doesn't seem that the GuestDevice.destroy(somarray) should do something much different (I mean i don't think it really does any batch destroy or anything more optimal then one-by-one destroy).

Comment 11 CFME Bot 2019-10-18 13:36:36 UTC
New commit detected on ManageIQ/manageiq/hammer:

https://github.com/ManageIQ/manageiq/commit/3174a3e8a522070f145c3c5e058b98c542b744e4
commit 3174a3e8a522070f145c3c5e058b98c542b744e4
Author:     Keenan Brock <keenan@thebrocks.net>
AuthorDate: Tue Oct 15 11:24:28 2019 -0400
Commit:     Keenan Brock <keenan@thebrocks.net>
CommitDate: Tue Oct 15 11:24:28 2019 -0400

    Merge pull request #19235 from agrare/turbo_button_for_guest_device_cleanup

    Make destroying guest_devices faster

    (cherry picked from commit 236cbdf91adad47c9e6ddd07f62848f482996098)

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1757026

 tools/cleanup_duplicate_host_guest_devices.rb | 25 +-
 1 file changed, 24 insertions(+), 1 deletion(-)

Comment 12 Jaroslav Henner 2019-10-30 09:49:33 UTC
I got an exception.

[root@dhcp-8-197-201 vmdb]# ./tools/cleanup_duplicate_host_guest_devices.rb -e 1000000000041 --no-dry-run
Found 1095218 duplicate Guest Devices...
**** THIS WILL MODIFY YOUR DATABASE ****
     Press Enter to Continue: 

Destroying slice 1 of 10953...
/opt/rh/cfme-gemset/gems/activerecord-5.0.7.2/lib/active_record/reflection.rb:173:in `join_keys': wrong number of arguments (given 0, expected 1) (ArgumentError)
	from ./tools/cleanup_duplicate_host_guest_devices.rb:59:in `block (2 levels) in <main>'
	from ./tools/cleanup_duplicate_host_guest_devices.rb:57:in `each'
	from ./tools/cleanup_duplicate_host_guest_devices.rb:57:in `block in <main>'
	from ./tools/cleanup_duplicate_host_guest_devices.rb:53:in `each'
	from ./tools/cleanup_duplicate_host_guest_devices.rb:53:in `each_slice'
	from ./tools/cleanup_duplicate_host_guest_devices.rb:53:in `with_index'
	from ./tools/cleanup_duplicate_host_guest_devices.rb:53:in `<main>'

Comment 13 Adam Grare 2019-10-30 14:58:27 UTC
Looks like activerecord changed their API between 5.0 (used in cfme 5.10) and 5.1 (used in cfme 5.11+).

Comment 15 CFME Bot 2019-11-11 14:15:56 UTC
New commit detected on ManageIQ/manageiq/hammer:

https://github.com/ManageIQ/manageiq/commit/3062fcaecccb3f01474ed9be43f4e082fbb6338a
commit 3062fcaecccb3f01474ed9be43f4e082fbb6338a
Author:     Adam Grare <agrare@redhat.com>
AuthorDate: Wed Oct 30 11:03:36 2019 -0400
Commit:     Adam Grare <agrare@redhat.com>
CommitDate: Wed Oct 30 11:03:36 2019 -0400

    In active_record 5.0 join_keys takes a klass arg

    In active_record 5.1 join_keys takes no arguments [0] but in 5.0 it
    takes a required klass argument [1].

    [0] https://apidock.com/rails/v5.1.7/ActiveRecord/Reflection/AbstractReflection/join_keys
    [1] https://apidock.com/rails/ActiveRecord/Reflection/BelongsToReflection/join_keys

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1757026

 tools/cleanup_duplicate_host_guest_devices.rb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 16 Jaroslav Henner 2019-11-25 15:02:58 UTC
I restored the DB from Tuan (sha256sum faa449dab7f6e60d28259de57b1902d94f12971bae84509745bdcc22b0bf0141). I removed the devices from both provders, used fixauth to reset paswords, set the providers zone to default one and pointed the providers to our system and the providers got refreshed

Comment 18 errata-xmlrpc 2019-12-03 06:55:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4047


Note You need to log in before you can comment on or make changes to this bug.