Description of problem: While running network stress test at 540k pps with PACKET socket application, observed that thread waiting in either 'read()' or 'poll()' [mmap'ed mode] and receiving absolutely zero packets consumes 30% again as much as the CPU consumed by the thread writing data to same socket, and the writer thread consumes about 10-15% more CPU than it does when no thread is waiting on the socket. Looks like a lock contention issue where pended reader thread is run after a lock release. Inefficient of course, and was cause of data loss in socket write-side queue. Eliminating reader removed data loss and lowered overall CPU consumption dramatically. Suspect this may be fundamental issue with all sockets, but did not delve into the possibility. Version-Release number of selected component (if applicable): kernel 2.6.18-92.1.10.el5 How reproducible: Write 280,000 pps on two separate packet sockets which have a reader thread blocked on a 'read()' or 'poll()'. Each socket is bound to a different NIC. Observe CPU consumption in 'top' with threads view active. Note: Writer is bound to one quad-core CPU as performance is much worse if it floats to both CPUs. Also IRQ are bound to CPUs on same node. Actual results: Idle reader thread waiting on socket consumes up to 25% of a core and writer thread consumes 10-25% more CPU than it does when no reader is present. Expected results: Idle reader thread waiting on socket should consume zero CPU and writer thread should consume less CPU and be less prone to dropping data.
Can you still reproduce this issue on RHEL5.8?
Probably. Before I spend/waste two or three hours firing up the test configuration, are you actually likely to fix this? It's only been four short years since this issue was opened, and RHEL 6 has arrived in the interim. Correcting this issue would probably require big changes to the socket logic, and I've seen RH mark much simpler issues WONTFIX due to an aversion to modifying mature kernels that aren't seeing significant development. On the other hand we're finding the newer kernel performance is exceedingly bad, so it would be nice to have this one last a few more years. If your're just looking for an excuse to close this issue tell me now and don't waste my time.
(In reply to comment #4) > If your're just looking for an excuse to > close this issue tell me now and don't > waste my time. I don't need an excuse to close this. If I want to close it I would have done so. I'm considering to fix this for 5.9 if is it doable without major changes. Note that this bug report is not backed with a support case which means it will receive much lower priority than others. You also never provided a reproducer for this problem.
Ok, I'll re-test sometime in the next ten days. Unless significant restructuring of the socket logic has happened since the original report, it's probable that the issue still exists. This is not a big issue for us--reported it as it seems the sort of thing that ought to be fixed for the general performance benefit, even under less intense loading scenarios.