Description of problem: Customer with +1 different configuration files *different vCenters*, then virt-who is able to reach all of them and collect the info BUT the behaviour to prepare the data and upload to Satellite is different *causing error on Satellite side* Version-Release number of selected component (if applicable): virt-who-0.21.7-1.el7_5.noarch Satellite 6.3.x How reproducible: Just on customer environment. I'm not able to reproduce locally. Steps to Reproduce: 1. Configure multiple vCenters 2. Execute virt-who -o 3. Check the output Actual results: are are able to see one *Sending updated Host-to-guest mapping to* to each *Hosts-to-guests mapping for config*, then customer with 4 config files, we will see --- Hosts-to-guests mapping for config conf_file_1 Sending updated Host-to-guest mapping to "ORG" Hosts-to-guests mapping for config conf_file_2 Sending updated Host-to-guest mapping to "ORG" Hosts-to-guests mapping for config conf_file_3 Sending updated Host-to-guest mapping to "ORG" Hosts-to-guests mapping for config conf_file_4 Sending updated Host-to-guest mapping to "ORG" --- Expected results: See just one *Host-to-guest mapping to* according below --- Hosts-to-guests mapping for config conf_file_1 Hosts-to-guests mapping for config conf_file_2 Hosts-to-guests mapping for config conf_file_3 Hosts-to-guests mapping for config conf_file_4 Sending updated Host-to-guest mapping to "ORG" --- Additional info: More detail below.
Waldirio, I see one error in the candlepin log file: RROR org.hornetq.core.server - HQ224016: Caught exception java.lang.IllegalStateException: Can't write records bigger than the bufferSize(102400) on the journal I would recommend setting the candlepin.audit.hornetq.large_msg_size property in the /etc/candlepin/candlepin.conf file to something larger than 102400 and see if it makes a difference. Please try setting that value to 1024000 and see if that fixes the problem.
Waldirio, if setting the larger value does not not fix the problem please scan the candlepin log file for new errors If you are no longer seeing the HQ224016 error (or something else that stands out) in the candlepin log then the next place to look would be the foreman log.
On my reproducer increasing large_msg_size to 1024000 as per c#4 resolves the failed upload issue. I'll be attaching log files and virt-who -o output files (in a tarball) following this scheme: * Running virt-who on the capsule, identified as $CAPSULE on the logs * Two hypervisors, $HYPERVISOR01 and $HYPERVISOR02 * Two virt-who config files, virt-who-config-1.conf and virt-who-config-2.conf pointing to respective hypervisors. Run virt-who -o -c virt-who-config-1.conf on capsule, capture command output on capsule command line and production.log on satellite. * Files: - virt-who-hypervisor01.log - virt-who-1-output.out Run virt-who -o -c virt-who-config-2.conf on capsule, capture command output on capsule command line and production.log on satellite. * Files: - virt-who-hypervisor02.log - virt-who-2-output.out Run virt-who -o on capsule, capture output, and production.log on satellite with a backtrace: * Files: - virt-who-all-together-07.log - virt-who-all-together-output-07.out Add line to /etc/candlepin/candlepin.conf: candlepin.audit.hornetq.large_msg_size=1024000 Restart tomcat service, run virt-who -o a few times on capsule (see why below). Then run virt-who -o on capsule, capture command output, and production.log with no backtrace. * Files: - virt-who-all-together.log - virt-who-all-together-output.out ## NOTE: not every run of virt-who was failing before modifying the config file, only between 20% and 50% of runs were failing. ## NOTE 2: after modifying candlepin config file to increase or decrease large_msg_size the first run of virt-who goes as if no modification was made. Example: 1. Candlepin is running with increased large_msg_size=1024000. No tracebacks on the logs after 10 runs of virt-who -o. 2. Reduce large_msg_size by commenting out the line in candlepin conf file. 3. Restart tomcat 4. Run virt-who -o, no traceback appears. 5. Run virt-who -o again 30 seconds after the last one finished, get a traceback. 6. Further runs of virt-who -o will produce a traceback between 20% and 50% of the time.
Created attachment 1470037 [details] virt-who outputs and snippets from production.log As described on my latest comment.
Created attachment 1503629 [details] trace from 6.5 snap 1
This BZ has become far too unwieldy. Customer issues are attached simply because there is a problem with virt-who and not because of the specific scenario. This will not help get the issues resolved, just the opposite. Individual problems are likely to be ignored. If the issue is about a db error in candlepin because the IN clause is too large you should use: https://bugzilla.redhat.com/show_bug.cgi?id=1599752 If the issue is that you are using multiple users to do hypervisor updates for one org, instead of a single one, you should use: https://bugzilla.redhat.com/show_bug.cgi?id=1667545 If the issue is that you are trying to submit a hypervisor in the report that does not have a value for the field you have chosen as the hostname then you should use: https://bugzilla.redhat.com/show_bug.cgi?id=1667522