This portal is to open public enhancement requests against IBM Power Systems products, including IBM i. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,
Post an idea.
Get feedback from the IBM team and other customers to refine your idea.
Follow the idea through the IBM Ideas process.
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
Available in the December 2021 GA of AIX 7.3
Hello,
One more aspect which shall be considered. In case if poll() is monitoring several FDs which has the POLLEXCL set and events on several FDs in the same time are present, then poll() shall return only one event per poll() call. Several poll() calls shall return events from busy FDs by selecting single FD in round-robin fashion.
This logic is required, to utilize the load balance provided by poll(), if there are several processes/threads doing the poll(). So that jobs are split between them evenly, otherwise one process will grab all the events, while others would not process any.
If IBM sees that previous logic of "POLLEXCL" where several FDs are returned is suitable, then please add new flag like "POLLONE" to activate only one event per poll() call in round-robin fashion.
As from Mavimax perspective, for our middleware to work effectively (and whole purpose of this change request) is that we need that for several exclusive FDs monitored, only one event is returned per poll() call.
So two major changes:
- POLLEXCL FD are triggered only for one thread/process poll() (if there are several ones doing poll)
- POLLONE makes local process to receive only one event from several FDs monitored in round-robin fashion, so that other FDs are triggered in other thread/process poll() calls, if there are concurrent events.
- From Mavimax perspective, this "POLLONE" flag logic may be built in automatically in the POLLEXCL flag logic.
Illustrated here:
--------------------------------------------------------------------
struct pollfd fds[3];
int ret;
fds[0].fd = fd1;
fds[0].events = POLLIN | POLLEXCL | POLLONE;
fds[1].fd = fd2;
fds[1].events = POLLIN | POLLEXCL | POLLONE;
fds[2].fd = fd3;
fds[2].events = POLLIN | POLLEXCL | POLLONE;
/* Assuming that fd1, fd2, fd3 has full POLLIN of events */
/* then 1st call to: */
poll(fds, 3, TIMEOUT * 1000);
/* shall return 1, and fds[0].revents is set to POLLIN all others fds revetns are 0 */
/* then 2nd call to: */
poll(fds, 3, TIMEOUT * 1000);
/* shall return 1, and fds[1].revents is set to POLLIN all others fds revetns are 0 */
/* then 3rd call to: */
poll(fds, 3, TIMEOUT * 1000);
/* shall return 1, and fds[2].revents is set to POLLIN all others fds revetns are 0 */
/* then 4th call to: */
poll(fds, 3, TIMEOUT * 1000);
/* shall return 1, and fds[0].revents is set to POLLIN all others fds revetns are 0 */
--------------------------------------------------------------------
Attachment (Description)
Hello,
- This affects poll(), due to fact that it supports System-V message queue polling. Pollset() according to documentation, does not support it.
- But we did some testing with pollset() over the unnamed pipes and threads which individually via new pollsets monitor the pipe, we get the same thundering herd issue. This scenario via several pollsets (rather than one) is must have for our middleware, as we basically run several executables on shared resources, thus pollset as object cannot be shared.
This ideally IBM could:
1. Add flag for POLLEXCL poll() to have single wakeup when on shared resource (socket/pipe/msgqueue) event appears.
2. Add the same flag to pollset()
3. Add support for msgqueue monitoring via pollset() API.
For us scenariou 1. is critical, if IBM could do the 2.+3. changes, that would be even better.
Here is test run from pollset() on unnmaed pipes (source attached as pollset2.c):
-------------------------------------------------------------------------------------------
$ cc pollset2.c -lpthreads
$ ./a.out 1 1000000
Wait 5s for threads to start... (num_threads=1 num_msg=1000000)
server: FD = R: 3 W: 4
server: START 2020-08-30 16:37:57
server: STOP 2020-08-30 16:38:03 DELTA sec: 6
server: Messages 1000000 sent successfully
server: waiting 1s... M_msg_proc=967252, num_msg=1000000
server: Messages 1000000
server: COMPLETED 2020-08-30 16:38:04 WASTED WAEKUPS: 0
server: done waiting. remove threads...
server: done waiting. remove pipes...
-------------------------------------------------------------------------------------------
With one thread test runs for 6 seconds and gets 0 wasted wakeups.
Then testing with 500 polling threads...:
-------------------------------------------------------------------------------------------
$ ./a.out 500 1000000
Wait 5s for threads to start... (num_threads=500 num_msg=1000000)
server: FD = R: 3 W: 4
server: START 2020-08-30 16:38:15
server: STOP 2020-08-30 16:39:22 DELTA sec: 67
server: Messages 1000000 sent successfully
server: Messages 1000000
server: COMPLETED 2020-08-30 16:39:22 WASTED WAEKUPS: 1178438
server: done waiting. remove threads...
server: done waiting. remove pipes...
-------------------------------------------------------------------------------------------
We see that same workload took 67 seconds and wasted/thundering herd got: 1178438 number of times. So load balancing across multiple processes (or independent threads), actually worsen then work time for 1000%.
Can you please clarify if this enhancement request is for the poll() or pollset() service? Would one be preferable over the other?
Note that pollset() is the scalable I/O event notification service for AIX and is similar to Linux epoll().
Attached example source code and results shows the unwanted effect of thundering herd problem. I.e. if 1 thread vs 500 threads waiting on poll. Results are 1 minute vs 11 minutes for the same work load. With introduction of POLLEXCL flag, the results would be the same, such work load shall take 1 minute, for no matter of the polling thread count.
Attachment (Description): Test results on AIX 7.2 Contains two runs for bulk of 10M messages. Where in first test case, with on receiver thread (poll() + msgrcv()) is completed within 1 minute. Where in second test case with 500 threads doing poll() + msgrcv() test is completed within 11 minutes.
Attachment (Description): Test program example: - program main does send bulk of msgs - number of threads via poll() receive messages may be used to test thundering herd issue with poll