https://bz.apache.org/bugzilla/show_bug.cgi?id=53555
--- Comment #59 from Thomas Jarosch <***@intra2net.com> ---
Hi Stefan,
the patch from #55 seems to make things scale a lot better.
Also the status output is very helpful.
ServerLimit was changed back to 16 before the tests.
I did a graceful restart at 13:09:35h.
/server-status at 14:19:36h (*before* the next graceful restart):
-----------------------
Total accesses: 23693 - Total Traffic: 200.0 MB
100 requests currently being processed, 150 idle workers
+--------------------------------------------------------------------------------------------+
| | | | Connections | Threads | Async
connections |
| Slot | PID | Stopping
|-------------------+-------------+--------------------------------|
| | | | total | accepting | busy | idle | writing |
keep-alive | closing |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|0 |19952 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|1 |20006 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|2 |20060 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|3 |20160 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|4 |20224 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|5 |20725 |no |2 |yes |2 |48 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|6 |27470 |no |50 |yes |50 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|7 |24389 |yes |3 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|8 |27104 |no |18 |yes |18 |32 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|9 |27346 |no |3 |yes |3 |47 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|10 |22579 |yes |2 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|11 |27674 |no |29 |yes |27 |23 |0 |3
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|13 |25055 |yes |8 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|14 |25350 |yes |2 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|15 |25475 |yes |5 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|Sum |15 |10 |137 | |100 |150 |0 |3
|0 |
+--------------------------------------------------------------------------------------------+
.G.G...............G............................................
..............G.....G.....G.........G..............G............
.........G.....G...G..................GG........................
...........................G........G.....................______
___________R_______________R________________RRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR.....................G............
.....G...G......___R____R_RR_R__R______RRRRR__R__R______RR__R_R_
_R________________R__________________R_____________RGG__RRRRRRR_
_RRRR___R____RR__RR____R__R_W__RRRRR_RRRGGGGGGGGGGGGGGG
-----------------------
As you can see, there are still processes from "old gen" after one hour.
This is due to long running HTTP CONNECT requests to google / dropbox / etc.
Probably GracefulShutdownTimeout will help here, may be
having a default value of one hour might make sense
for httpd in general?
Next graceful restart at 14:19:51h.
Errors start to appear in the log two seconds later:
[Wed Oct 26 14:19:53.926229 2016] [mpm_event:error] [pid 19951:tid 3071850240]
AH: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
/server-status at 14:20:06h:
-----------------------
Total accesses: 23744 - Total Traffic: 200.9 MB
8 requests currently being processed, 42 idle workers
+--------------------------------------------------------------------------------------------+
| | | | Connections | Threads | Async
connections |
| Slot | PID | Stopping
|-------------------+-------------+--------------------------------|
| | | | total | accepting | busy | idle | writing |
keep-alive | closing |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|0 |19952 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|1 |20006 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|2 |20060 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|3 |20160 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|4 |20224 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|5 |20725 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|6 |27470 |yes (old |42 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|7 |24389 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|8 |27104 |yes (old |18 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|9 |27346 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|10 |22579 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|11 |27674 |yes (old |24 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|12 |28054 |no |9 |yes |8 |42 |0 |2
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|13 |25055 |yes (old |8 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|14 |25350 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|15 |25475 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|Sum |16 |15 |133 | |8 |42 |0 |2
|0 |
+--------------------------------------------------------------------------------------------+
.G.G...............G............................................
..............G.....G.....G.........G..............G............
.........G.....G...G..................GG........................
...........................G........G...........................
...........G...............G................G.GGGGG.G.G..GGGGGG.
GGGGGGGGGGGGGG.GGGGGGGGGGG.GGG.....................G............
.....G...G......GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG_
______RRRR____RRRW_______________________________GGGGGGGGGGGGGGG
-----------------------
The forward proxy became unresponsive again.
/server-status at 14:29:16h:
-----------------------
Total accesses: 24453 - Total Traffic: 226.8 MB
50 requests currently being processed, 0 idle workers
+--------------------------------------------------------------------------------------------+
| | | | Connections | Threads | Async
connections |
| Slot | PID | Stopping
|-------------------+-------------+--------------------------------|
| | | | total | accepting | busy | idle | writing |
keep-alive | closing |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|0 |19952 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|1 |20006 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|2 |20060 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|3 |20160 |yes (old |1 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|4 |20224 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|5 |20725 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|6 |27470 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|7 |24389 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|8 |27104 |yes (old |1 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|9 |27346 |yes (old |1 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|10 |22579 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|11 |27674 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|12 |28054 |no |51 |yes |50 |0 |0 |0
|1 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|13 |25055 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|14 |25350 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|15 |25475 |yes (old |4 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|Sum |16 |15 |86 | |50 |0 |0 |0
|1 |
+--------------------------------------------------------------------------------------------+
.G.G...............G............................................
..............G.....G.....G.........G..............G............
.........G.....G...G...................G........................
...........................G........G...........................
...........G...............G....................................
...........G.............G.........................G............
.....G..........GGGGGGGRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRWRGGGGGGGG
-----------------------
As you can see, there was plenty of room in the scoreboard now,
but the process list slots were used up by old processes
serving just a handful of connections.
One option would be to increase ServerLimit to let's say 128,
but that also raises the resource limits during normal operation.
If I raise ServerLimit too much, I have to lower the thread count again.
Sounds a bit like the prefork mpm...
Another option would be to add a config setting to ignore
processes for the ServerLimit calculation if they are
in graceful shutdown mode. They probably don't consume
a lot of resources and we can have a GracefulShutdownTimeout
of one hour to expire them, too.
Third option (preferred one): Have an own GracefulShutdownLimit
that's separate from ServerLimit. If we have too many processes,
start killing of oldest process from the graceful shutdown list.
Process in graceful shutdown mode don't count for ServerLimit.
I've raised ServerLimit to 32 on the box again.
The users can't be annoyed too much ;)
Cheers,
Thomas
PS: Forget about the idea about cross-talk of anonymous shared memory segments
from #58. It's not the case.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org