Discussion:
[Bug 53555] Scoreboard full error with event/ssl
b***@apache.org
2015-06-03 00:42:35 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #12 from ***@gmail.com ---
After migrating from worker MPM to event MPM with Apache 2.4.7 we are seeing
this same problem.

Server version: Apache/2.4.7 (Ubuntu)
Ubuntu Trusty 14.04.2 LTS
Linux 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64
x86_64 x86_64 GNU/Linux

We explicitly moved to event MPM for this workload, which is a proxy of
thousands of mostly-idle HTTP Keep-Alive connections - since event MPM doesn't
require a thread per Keep-Alive connection. Although our number of clients is
fairly consistent, and we have MaxConnectionsPerChild=0, we observe Apache
processes going into GGGGGG state until eventually Apache no longer accepts
connections.

If we set MinSpareThreads and MaxSpareThreads equal to MaxRequestWorkers (so
Apache doesn't attempt to scale down processes), the issue goes away (as
expected, but validates (maybe?) this has to do with Apache scale-down).

Since client connections can be connected for hours or days, Apache processes
stay in this state for a very long time, eventually rejecting client
connections and becoming wedged.

Our clients are not browsers - Apache is being used for a mid-tier load
balancer/proxy with client connections that are very long lived (long
Keep-Alive times).

248 requests/sec - 0.7 MB/second - 3114 B/request
2 requests currently being processed, 38 idle workers
PID Connections Threads Async connections
total accepting busy idle writing keep-alive closing
28483 1642 no 0 0 0 1642 0
29672 553 yes 1 19 0 552 0
29696 9 no 0 0 0 9 0
29588 173 no 0 0 0 173 0
29618 1 no 0 0 0 1 0
29644 6 no 0 0 0 6 0
29719 30 no 0 0 0 30 0
29743 237 yes 1 19 0 236 0
Sum 2651 2 38 0 2649 0
GGGGGGGGGGGGGGGGGGGG________W___________GGGGGGGGGGGGGGGGGGGGGGWG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGG________W___________................................
........
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-06-05 11:36:29 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #13 from Olivier Jaquemet <***@jalios.com> ---
We are having the symptoms here :

Server Version: Apache/2.4.7 (Ubuntu) SVN/1.8.8 mod_jk/1.2.37 OpenSSL/1.0.1f
Ubuntu 14.04.2 LTS
Linux 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:59 UTC 2015 x86_64
x86_64 x86_64 GNU/Linux

Many logs :
[mpm_event:error] [pid 6332:tid 140558940702592] AH00485: scoreboard is full,
not at MaxRequestWorkers

From the server status

Right after start :
__RR___________R________________________W__________________W____
___________.....................................................
......................

After one hour :

___________________W_____GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGG_______W___W_____________............................
......................

Two hours later :

GGGGGGGGGGGGGGGGGGGGGGGGGW_W_____W________W____W__GGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGG

Is there anything we can provide to help in the diagnostic of the issue ?

Do you know of any workaround through configuration ?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-06-05 16:38:39 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #14 from ScottE <***@gmail.com> ---
In case others find it useful, the approach we used to mitigate this was
several things:

1. Increased MinSpareThreads and MaxSpareThreads, as well as the range between
them. By making Apache less aggressive about scaling the number of servers
down, it's less likely to run into this issue. Our new values are:

MinSpareThreads = MaxRequestWorkers / 4
MaxSpareThreads = MinSpareThreads * 3

2. Lowered MaxKeepAliveRequests. By looking at a histogram of request counts
per connection on an equivalent Apache running with worker MPM (first value in
Acc column), I found a very long tail of few connections out to our old value,
but a clear cluster at the lower end. Our new MaxKeepAliveRequests is a bit
beyond the critical-mass cluster, but significantly lower than the old value.
This will allow servers to recycle quicker when they scale down, but not cause
any significant impact to client connections, since the relative number of
connections we'll close early is small.

3. Increased AsyncWorkerFactor. When Apache servers are scaling down (in
Gracefully Finishing state), this allows other servers to pick up the slack by
handling a larger number of total client connections (in HTTP Keep-Alive, this
does not increase the number of workers), where before these processes had
reached their limit of connections and were rejecting new ones. Event MPM does
a reasonably good job of spreading load between processes, and with our larger
spare threads range we now tend to have more alive processes as well.

We also considered lowering KeepAliveTimeout, but using a similar histogram as
I did for KeepAliveRequests from a worker MPM configuration (using the SS
column as a reasonable analog). That histogram showed a nice distribution for
us, so lowering this would have affected clients and not helped for this
workload.

These are the values that worked for us, with our workload, to mitigate this
issue. Of course your workload and values will be different, but this may be a
reasonable strategy to try as well.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-07-07 16:03:23 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

***@kace.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@kace.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-08-08 11:33:50 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Leho Kraav @lkraav <***@kraav.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@kraav.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-08-08 13:29:43 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #15 from Leho Kraav @lkraav <***@kraav.com> ---
2.4.16 and the following configuration hits scoreboard full with 3-4 reloads

StartServers 2
MinSpareThreads 50
MaxSpareThreads 150
ThreadsPerChild 25
MaxRequestWorkers 200
MaxConnectionsPerChild 10000

Any advice?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-08-25 15:09:19 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #16 from ***@safe-mail.net ---
This is certainly a bug and not a configuration issue. I have had this error
happen with the default (Debian) configuration and other people online report
the same. I have had this happen with mpm_event and mpm_worker.

It's very reproducible. It happens with almost any thread related settings I
have tried. It stops new requests from being served and is a serious problem.

There is some bug with the way Apache handles its servers/threads. This is not
something that can be fixed by tweaking the configuration. At best it might be
mitigated by setting:

StartServers 1
ServerLimit X
ThreadsPerChild XXX
ThreadLimit <ThreadsPerChild>
MaxRequestWorkers <ServerLimit * ThreadLimit>
MinSpareThreads <MaxRequestWorkers>
MaxSpareThreads <MaxRequestWorkers>
MaxRequestsPerChild 0

In other words, make it so a thread stays alive forever and therefore the buggy
part of the code that is responsible for killing and reusing threads is never
hit. Of course this requires always using the maximum amount of RAM since
threads never die even when there is no traffic.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-08-25 15:50:48 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #17 from Leho Kraav @lkraav <***@kraav.com> ---
(In reply to gobbledance from comment #16)
Post by b***@apache.org
This is certainly a bug and not a configuration issue. I have had this error
happen with the default (Debian) configuration and other people online
report the same. I have had this happen with mpm_event and mpm_worker.
It's very reproducible. It happens with almost any thread related settings I
have tried. It stops new requests from being served and is a serious problem.
I have found no way around it with a variety of worker configuration
parameters. Looks like the best bet would be to have fail2ban or similar
monitor the error_log and restart the server when scoreboard hits the DoS
condition.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-09-09 10:06:25 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Peter <***@blunix.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Hardware|Sun |Other
OS|SunOS |Linux
Version|2.4.6 |2.4.7

--- Comment #18 from Peter <***@blunix.org> ---
I'm also affected by this bug running Apache/2.4.7 (Ubuntu) on 14.04. I setup a
logfile watch daemon that force restarts apache2 if the line shows up in the
error.log as a hotfix.

Has anyone tested this with the current stable release 2.4.16?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-09-29 22:17:45 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #19 from Stefan Fritsch <***@sfritsch.de> ---
(In reply to ScottE from comment #12)
Post by b***@apache.org
Our clients are not browsers - Apache is being used for a mid-tier load
balancer/proxy with client connections that are very long lived (long
Keep-Alive times).
This seems to be a problem that should not be too difficult to fix. When a
process is shutting down, it should close its keepalive connections. Can you
please check if the attached patch helps?


The case where long-running transfers are keeping a process from shutting down
is much more difficult to fix.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-09-29 22:18:15 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #20 from Stefan Fritsch <***@sfritsch.de> ---
Created attachment 33154
--> https://bz.apache.org/bugzilla/attachment.cgi?id=33154&action=edit
close keepalive connections if process is shutting down
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-09-29 22:22:41 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Stefan Fritsch <***@sfritsch.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@sfritsch.de
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-02 15:33:35 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

***@mightytikigod.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@mightytikigod.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-03 15:31:46 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Stefan Fritsch <***@sfritsch.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #33154|0 |1
is obsolete| |

--- Comment #21 from Stefan Fritsch <***@sfritsch.de> ---
Created attachment 33158
--> https://bz.apache.org/bugzilla/attachment.cgi?id=33158&action=edit
exit some threads early during gracful shutdown of a process

The attached diff against the 2.4.x branch makes unneeded threads exit earlier
during graceful shutdown of a process. This then allows new processes to use
the freed scoreboard slots.

I am interested in real-live experiences with this patch. It has two known
problems, though:

- If httpd is shut down (ungracefully) while there are some old processes
around serving long lasting requests, those processes won't die peacefully but
will be SIGKILLed by the parent after 10 seconds.

- server-status shows incomplete information (that is, even more incomplete
than in 2.4 ;) )
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-03 23:28:29 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #22 from ***@mightytikigod.com ---
I have applied the patch on our own production server, which experiences this
problem sometimes twice a day, and sometimes not for a week or so.

So now we wait. I will report immediately if the problem recurs, and I will
also report in a week if the problem does not recur.

PS: If "Graceful, but sigkill after 10 seconds" were an actual option, I would
probably use it all the time.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-05 08:37:17 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #23 from Yann Ylavic <***@gmail.com> ---
(In reply to Stefan Fritsch from comment #21)
Post by b***@apache.org
- If httpd is shut down (ungracefully) while there are some old processes
around serving long lasting requests, those processes won't die peacefully
but will be SIGKILLed by the parent after 10 seconds.
Wasn't that already the case for ungraceful stop/restart?
Post by b***@apache.org
- server-status shows incomplete information (that is, even more incomplete
than in 2.4 ;) )
How about not setting SERVER_GRACEFUL in close_listeners() and worker_thread()?
The old generation's state could be relevent, since the new generation does not
"steal" the scoreboard now (until the old worker exits).
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-05 09:10:26 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Jean-Loup C. <***@hfox.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC|***@hfox.org |
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-05 15:31:04 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Ludovico Cavedon <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC|***@gmail.com |
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-05 22:25:35 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #24 from Stefan Fritsch <***@sfritsch.de> ---
(In reply to bucky from comment #22)
Post by b***@apache.org
I have applied the patch on our own production server, which experiences
this problem sometimes twice a day, and sometimes not for a week or so.
Thanks for that already.


(In reply to Yann Ylavic from comment #23)
Post by b***@apache.org
(In reply to Stefan Fritsch from comment #21)
Post by b***@apache.org
- If httpd is shut down (ungracefully) while there are some old processes
around serving long lasting requests, those processes won't die peacefully
but will be SIGKILLed by the parent after 10 seconds.
Wasn't that already the case for ungraceful stop/restart?
Normally, those child process should react to the SIGTERM that is sent first.
But that is currently broken by my patch.
Post by b***@apache.org
Post by b***@apache.org
- server-status shows incomplete information (that is, even more incomplete
than in 2.4 ;) )
How about not setting SERVER_GRACEFUL in close_listeners() and
worker_thread()?
The old generation's state could be relevent, since the new generation does
not "steal" the scoreboard now (until the old worker exits).
Yes, that would proabaly be better, I'll have to test that. But it would not
fix the incompleteness I was referring to: The old and the new process have
only one process slot in the scoreboard, which makes the async overview table
show sometimes the info from the old and sometimes from the new process,
depending on who updated it last.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-05 22:40:15 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #25 from Yann Ylavic <***@gmail.com> ---
(In reply to Stefan Fritsch from comment #24)
Post by b***@apache.org
(In reply to Yann Ylavic from comment #23)
Post by b***@apache.org
How about not setting SERVER_GRACEFUL in close_listeners() and
worker_thread()?
The old generation's state could be relevent, since the new generation does
not "steal" the scoreboard now (until the old worker exits).
Yes, that would proabaly be better, I'll have to test that. But it would not
fix the incompleteness I was referring to: The old and the new process have
only one process slot in the scoreboard, which makes the async overview
table show sometimes the info from the old and sometimes from the new
process, depending on who updated it last.
It seems to me that the new generation's worker threads are not started now
unless their scoreboard slot is marked SERVER_DEAD (was also SERVER_GRACEFUL
before attachment 33158).
So AIUI, there shouldn't be two workers using the same slot.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-05 23:00:30 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #26 from Stefan Fritsch <***@sfritsch.de> ---
(In reply to Yann Ylavic from comment #25)

This technical discussion has been moved to the dev mailing list.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-10 22:59:30 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #27 from ***@mightytikigod.com ---
It's been a week.

The scoreboard errors haven't stopped altogether. Every so often I still get
one a second for a short time, but now they last for about 1 or 2 minutes, and
that's it.

I haven't gotten any lockups since I applied the patch.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-11 07:33:53 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #28 from Leho Kraav @lkraav <***@kraav.com> ---
mod_h2 did some significant cleanups for resource handling in the 0.9.x branch.
"Scoreboard full" errors seem to have been completely eliminated for me. Uptime
of several weeks goes with no issues now. So looks like external modules'
individual cleanup abilities are directly related to this issue.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-11 22:40:15 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #29 from ***@mightytikigod.com ---
I'm confused. To my knowledge, mod_h2 is a 3rd party module. It it somehow an
integral part of the latest httpd (2.4.16)?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-12 05:44:27 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #30 from Leho Kraav @lkraav <***@kraav.com> ---
(In reply to bucky from comment #29)
Post by b***@apache.org
I'm confused. To my knowledge, mod_h2 is a 3rd party module. It it somehow
an integral part of the latest httpd (2.4.16)?
Yes, it is already part of trunk and backported to 2.4.x.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-12 07:18:05 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555
Post by b***@apache.org
mod_h2 did some significant cleanups for resource handling in the 0.9.x
branch. "Scoreboard full" errors seem to have been completely eliminated for
me.
mod_http2 (being released in 2.4.17) has its own connection handling (somehow
appart from the MPM, for now), and shouldn't be seen as a workaround to this
issue.
The more testing on Stefan's proposed patch (regarding MPM event), without
mod_http2, the quicker it will be backported in a release.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-10-12 07:36:41 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #32 from Stefan Eissing <***@eissing.org> ---
The fixes I did in mod_http2, mentioned by Leho, were just related to the fact
that early 0.9.x version of that module did not properly mark connections for
reclaiming, so cleanup work was not run all the time, leading to memory loss
and scoreboard handle waste.

That has been fixed in mod_http2 alone and does not affect other connections.
Since the bug happens without the module as well, its presence is not
mitigation.

If the patch by Stefan does not fix it, we should review again if there are
races that prevent cleanup from happening in the HTTP/1.1 cases.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-11-02 18:28:43 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #33 from Thierry Bastian <***@filewave.com> ---
WE got into a situation where the users of our product were stuck with G. We've
got severe performance issues in those cases. We've tried patch
https://bz.apache.org/bugzilla/attachment.cgi?id=33158&action=diff on a couple
of installs and it made things much much better. On one install it would get
stuck with 2000 clients coming in at roughly the same time. Now it can handle
10K gracefully.
Hope that helps.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-11-25 16:04:07 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

***@compodata.com changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@compodata.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-11-30 10:00:07 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Bernhard Friedreich <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2015-12-27 15:15:23 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Luca Toscano <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-01-07 11:07:49 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Chris Mear <***@feedmechocolate.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@feedmechocolate.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-01-30 15:53:01 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

David Galloway <***@redhat.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@redhat.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-03-08 09:25:18 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Sander Hoentjen <***@hoentjen.eu> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@hoentjen.eu
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-03-25 17:20:42 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #34 from Leho Kraav @lkraav <***@kraav.com> ---
I'm hitting this on a production server with 2.4.18 now. Can't apply custom
patches here.

ServerLimit 30
MaxRequestWorkers 30
MaxConnectionsPerChild 600
KeepAlive On
KeepAliveTimeout 1
MaxKeepAliveRequests 20
Timeout 50

mod_h2 isn't enabled here.

From above discussion, I can't get a clear indiciation if any core developers
have confirmed this to be a bug or a configuration issue?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-03-29 07:35:20 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #35 from Sander Hoentjen <***@hoentjen.eu> ---
After applying the patch I ran into "No space left on device: AH00023: Couldn't
create the proxy mutex" I haven't seen that issue without the patch.

Log says:
[Sat Mar 26 07:00:34.857694 2016] [core:emerg] [pid 787770:tid 140551243081696]
(28)No space left on device: AH00023: Couldn't create the proxy mutex
[Sat Mar 26 07:00:34.857764 2016] [proxy:crit] [pid 787770:tid 140551243081696]
(28)No space left on device: AH02478: failed to create proxy mutex
AH00016: Configuration Failed

# ipcs -s

------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 0 root 600 1
0x00000000 65537 root 600 1
0x00000000 131074 apache 600 1
0x7a00179d 59899907 zabbix 600 13
0x00000000 3866628 apache 600 1
0x00000000 3899397 apache 600 1
0x00000000 3932166 apache 600 1
0x00000000 21397511 apache 600 1
0x00000000 21495816 apache 600 1
0x00000000 21528585 apache 600 1
0x00000000 21561354 apache 600 1
0x00000000 21594123 apache 600 1
0x00000000 21626892 apache 600 1
0x00000000 21659661 apache 600 1
0x00000000 29294606 apache 600 1
0x00000000 29327375 apache 600 1
0x00000000 29360144 apache 600 1
0x00000000 29392913 apache 600 1
0x00000000 29425682 apache 600 1
0x00000000 29458451 apache 600 1
0x00000000 29884436 apache 600 1
0x00000000 29917205 apache 600 1
0x00000000 29949974 apache 600 1
0x00000000 29982743 apache 600 1
0x00000000 30015512 apache 600 1
0x00000000 30048281 apache 600 1
0x00000000 30310426 apache 600 1
0x00000000 30343195 apache 600 1
0x00000000 30375964 apache 600 1
0x00000000 30408733 apache 600 1
0x00000000 30441502 apache 600 1
0x00000000 30474271 apache 600 1
0x00000000 30736416 apache 600 1
0x00000000 30769185 apache 600 1
0x00000000 30801954 apache 600 1
0x00000000 30834723 apache 600 1
0x00000000 30867492 apache 600 1
0x00000000 30900261 apache 600 1
0x00000000 30998566 apache 600 1
0x00000000 31031335 apache 600 1
0x00000000 31064104 apache 600 1
0x00000000 31096873 apache 600 1
0x00000000 31129642 apache 600 1
0x00000000 31162411 apache 600 1
0x00000000 31260716 apache 600 1
0x00000000 31293485 apache 600 1
0x00000000 31326254 apache 600 1
0x00000000 31359023 apache 600 1
0x00000000 31391792 apache 600 1
0x00000000 31424561 apache 600 1
0x00000000 37257266 apache 600 1
0x00000000 37290035 apache 600 1
0x00000000 37322804 apache 600 1
0x00000000 37355573 apache 600 1
0x00000000 37388342 apache 600 1
0x00000000 37421111 apache 600 1
0x00000000 37519416 apache 600 1
0x00000000 37552185 apache 600 1
0x00000000 37584954 apache 600 1
0x00000000 37617723 apache 600 1
0x00000000 37650492 apache 600 1
0x00000000 37683261 apache 600 1
0x00000000 37781566 apache 600 1
0x00000000 37814335 apache 600 1
0x00000000 37847104 apache 600 1
0x00000000 37879873 apache 600 1
0x00000000 37912642 apache 600 1
0x00000000 37945411 apache 600 1
0x00000000 38043716 apache 600 1
0x00000000 38076485 apache 600 1
0x00000000 38109254 apache 600 1
0x00000000 38142023 apache 600 1
0x00000000 38174792 apache 600 1
0x00000000 38207561 apache 600 1
0x00000000 41091146 apache 600 1
0x00000000 41123915 apache 600 1
0x00000000 41156684 apache 600 1
0x00000000 41189453 apache 600 1
0x00000000 41222222 apache 600 1
0x00000000 41254991 apache 600 1
0x00000000 44466256 apache 600 1
0x00000000 44499025 apache 600 1
0x00000000 44531794 apache 600 1
0x00000000 44564563 apache 600 1
0x00000000 44597332 apache 600 1
0x00000000 44630101 apache 600 1
0x00000000 49315926 apache 600 1
0x00000000 49348695 apache 600 1
0x00000000 49381464 apache 600 1
0x00000000 49414233 apache 600 1
0x00000000 49447002 apache 600 1
0x00000000 49479771 apache 600 1
0x00000000 49578076 apache 600 1
0x00000000 49610845 apache 600 1
0x00000000 49643614 apache 600 1
0x00000000 49676383 apache 600 1
0x00000000 49709152 apache 600 1
0x00000000 49741921 apache 600 1
0x00000000 55574626 apache 600 1
0x00000000 55607395 apache 600 1
0x00000000 55640164 apache 600 1
0x00000000 55672933 apache 600 1
0x00000000 55705702 apache 600 1
0x00000000 55738471 apache 600 1
0x00000000 58785896 apache 600 1
0x00000000 58818665 apache 600 1
0x00000000 58851434 apache 600 1
0x00000000 58884203 apache 600 1
0x00000000 58916972 apache 600 1
0x00000000 58949741 apache 600 1
0x00000000 61571182 apache 600 1
0x00000000 61603951 apache 600 1
0x00000000 61636720 apache 600 1
0x00000000 61669489 apache 600 1
0x00000000 61702258 apache 600 1
0x00000000 61735027 apache 600 1
0x00000000 63635572 apache 600 1
0x00000000 63668341 apache 600 1
0x00000000 63701110 apache 600 1
0x00000000 63733879 apache 600 1
0x00000000 63766648 apache 600 1
0x00000000 63799417 apache 600 1
0x00000000 65372282 apache 600 1
0x00000000 65405051 apache 600 1
0x00000000 65437820 apache 600 1
0x00000000 65470589 apache 600 1
0x00000000 65503358 apache 600 1
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-03-30 16:53:49 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #36 from ScottE <***@gmail.com> ---
(In reply to Sander Hoentjen from comment #35)
Couldn't create the proxy mutex" I haven't seen that issue without the patch.
Hi Sander, I don't believe this is related to the patch - I've seen this happen
(on vanilla 2.4.7) with a bad configuration and something like daemontools
constantly restarting Apache. This is likely a valid bug, where Apache can leak
mutexes under some conditions, but I don't think it's caused by the patch.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-03-31 08:11:19 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #37 from Sander Hoentjen <***@hoentjen.eu> ---
(In reply to ScottE from comment #36)
Post by b***@apache.org
(In reply to Sander Hoentjen from comment #35)
Couldn't create the proxy mutex" I haven't seen that issue without the patch.
Hi Sander, I don't believe this is related to the patch - I've seen this
happen (on vanilla 2.4.7) with a bad configuration and something like
daemontools constantly restarting Apache. This is likely a valid bug, where
Apache can leak mutexes under some conditions, but I don't think it's caused
by the patch.
Well, we have apache 2.4 in event model on tens of servers and besides the bug
in this ticket they are doing fine. On one of them we applied the patch (no
other changes) and got AH00023 so while I believe there are other ways to
trigger it, it seems that the patch also can play a role in it.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-07 11:05:30 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Mike Williams <***@comodo.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@comodo.com
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-07 14:36:03 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #38 from Mike Williams <***@comodo.com> ---
(In reply to Thierry Bastian from comment #33)
Post by b***@apache.org
WE got into a situation where the users of our product were stuck with G.
We've got severe performance issues in those cases. We've tried patch
https://bz.apache.org/bugzilla/attachment.cgi?id=33158&action=diff on a
couple of installs and it made things much much better. On one install it
would get stuck with 2000 clients coming in at roughly the same time. Now it
can handle 10K gracefully.
Hope that helps.
I've been trying that today after an update from 2.2.something to 2.4.18.
Still get the "scoreboard is full, ..." error though.


One server looks like this when emitting the "scoreboard is full, ..." error, a
few moments before becoming entirely unresponsive.


179 requests currently being processed, 461 idle workers
PID Connections Threads Async connections
total accepting busy idle writing keep-alive closing
25580 205 no 15 49 0 147 44
21331 293 no 0 0 0 0 292
19389 1 yes 0 0 0 0 0
25924 164 no 12 52 0 151 0
23217 432 no 15 49 0 146 270
23361 457 no 18 46 0 140 298
24175 458 no 13 51 0 149 297
20428 246 yes 0 0 0 0 244
21641 439 no 17 47 0 145 283
21739 435 no 16 48 0 143 277
23506 448 no 18 46 0 139 293
26180 30 yes 41 23 0 3 0
20174 2 no 0 0 0 0 1
20527 209 no 0 0 0 0 208
22470 448 no 14 50 0 149 287
20551 209 no 0 0 0 0 209
Sum 4476 179 461 0 1312 3003

R_R_R______R_R___R_________W_______R__R_R_WR________R__R___R____
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
______W_R_R____R__R__R________________R__RR________R_____R_R____
___R_RR________WRR____R__R_R______R__________WR______R____R___R_
R________R______RR__R__RR___R______RR___RRR______R__R___RR_R____
RR______________W__R_______R_________R_____RRW_____R____RWR_____
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
_____R___R_____R_R_R_RR_R___R_W_R__R__R___R______R____R_R_______
R______RR__R_R__RR_________R____R___R___RRR________R_______R___R
_________R__RR_______RR__R___R___R_____RRR____R_R_RR___R____R_W_
R___R_RRW___RRRR_RRRRRRRR_WRR_RR_RRRRRRRRRRRRRR__RRRRR__________
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
_____W______R__R____R_________R_____R_RWR_RR_R_______R____R_____
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG


Shortly afterwards all the Gs are cleared and it gets back to doing useful work
for a while.
Sometimes "a while" can be 15 minutes, other times less than 1 second.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-11 20:56:58 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #39 from Stefan Fritsch <***@sfritsch.de> ---
As a summary, the problem is that old processes that are shutting down but are
still processing some long lasting connetions take up all open scoreboard
slots. It may be triggered in two ways:

a) when doing a graceful restart (apachectl graceful)

b) when the server load goes down in a way that causes httpd to stop some
processes. This is particularily problematic because when the load increases
again, httpd will try to start more processes. If the pattern repeats, the
number of processes can rise quite a bit.

I think two things should be done:

1) Allow to use some extra scoreboard slots for processes that are gracefully
shutting down. This is necessary to fix a) and will help a bit with b). To
avoid these extra processes taking too much resources, they should try to free
resources to the OS as soon as possible.

2) When some process is doing idle shutdown in situation b) and httpd wants
more active processes due to rising load, it should not start new processes but
rather tell the finishing processes to abort shutdown and resume full
operation. This helps with b) but not with a). It is also a lot more invasive
to implement than 1).


My previous patch https://bz.apache.org/bugzilla/attachment.cgi?id=33158 did 1)
to some extent by allowing re-use of some scoreboard slots. I will post a new
patch in a minute.


As configuration, I recommend (this one is true even if not using any patch):

MaxspareThreads - MinSpareThreads >= 2 * ThreadsPerChild

Higher values of the difference may work better. This reduces the likelyhood of
situation b) appearing.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-11 20:59:05 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Stefan Fritsch <***@sfritsch.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #33158|0 |1
is obsolete| |

--- Comment #40 from Stefan Fritsch <***@sfritsch.de> ---
Created attachment 33749
--> https://bz.apache.org/bugzilla/attachment.cgi?id=33749&action=edit
Allow to use more scoreboad slots

The new patch goes a step further and allows in total 10 times as many
processes as configured by MaxRequestWorkers / ThreadsPerChild , though
ServerLimit is still honored. The number 10 is currently hard-coded but would
probably be configurable in the end.


If using the patch, you should also set

ServerLimit >= 10 * MaxRequestWorkers / ThreadsPerChild

Though a smaller value may make sense if you are short of RAM.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-11 21:05:09 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #41 from Stefan Fritsch <***@sfritsch.de> ---
(In reply to Sander Hoentjen from comment #35)
Couldn't create the proxy mutex" I haven't seen that issue without the patch.
[Sat Mar 26 07:00:34.857694 2016] [core:emerg] [pid 787770:tid
140551243081696] (28)No space left on device: AH00023: Couldn't create the
proxy mutex
[Sat Mar 26 07:00:34.857764 2016] [proxy:crit] [pid 787770:tid
140551243081696] (28)No space left on device: AH02478: failed to create
proxy mutex
AH00016: Configuration Failed
You could try using different Mutex types. On Linux, pthread may work best. Or
you may try to increase the allowed ressources, possibly shared memory. How
that is done depends on your OS.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-11 21:18:39 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #42 from Stefan Fritsch <***@sfritsch.de> ---
Created attachment 33750
--> https://bz.apache.org/bugzilla/attachment.cgi?id=33750&action=edit
same as above, but for trunk

Attaching the same patch, but for trunk.


(In reply to Stefan Fritsch from comment #40)
Created attachment 33749 [details]
Allow to use more scoreboad slots
That patch is for 2.4 and also includes these commits from trunk:

https://svn.apache.org/r1703241
https://svn.apache.org/r1705922
https://svn.apache.org/r1706523
https://svn.apache.org/r1738464
https://svn.apache.org/r1738466
https://svn.apache.org/r1738486
https://svn.apache.org/r1738631
https://svn.apache.org/r1738632
https://svn.apache.org/r1738633
https://svn.apache.org/r1738635
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-12 07:16:04 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #43 from Sander Hoentjen <***@hoentjen.eu> ---
(In reply to Stefan Fritsch from comment #41)
Post by b***@apache.org
(In reply to Sander Hoentjen from comment #35)
Couldn't create the proxy mutex" I haven't seen that issue without the patch.
[Sat Mar 26 07:00:34.857694 2016] [core:emerg] [pid 787770:tid
140551243081696] (28)No space left on device: AH00023: Couldn't create the
proxy mutex
[Sat Mar 26 07:00:34.857764 2016] [proxy:crit] [pid 787770:tid
140551243081696] (28)No space left on device: AH02478: failed to create
proxy mutex
AH00016: Configuration Failed
You could try using different Mutex types. On Linux, pthread may work best.
Or you may try to increase the allowed ressources, possibly shared memory.
How that is done depends on your OS.
But is there anything in the patch that changes this? Because without your
patch we never ran into that issue.
Would the new patch behave differently in this regard?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-12 07:49:37 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #44 from Yann Ylavic <***@gmail.com> ---
(In reply to Sander Hoentjen from comment #43)
Post by b***@apache.org
Would the new patch behave differently in this regard?
Your issue is probably not related to the patch.
It is usually caused by an unclean shutdown of httpd (eg. kill -9), or a crash
of the parent process (you should see this in the system logs), possibly if you
upgraded the binaries while httpd was still running.
The number of IPC SysV semaphores is limited on the system, if the previous
ones were not cleanly deleted on shutdown, the new startup won't complete.
As suggested by Stefan, you could use another Mutex mechanism (pthread) which
does not leak on unclean shutdown (even if httpd is killed).
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-04-30 05:36:48 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #45 from mbs <***@gmail.com> ---
I was able to manage this issue by reducing GracefulShutdownTimeout value and
increasing MaxClients / MaxRequestWorkers value to make more room for Apache
scoreboard .

Also I reduce no of MaxKeepAliveRequests Apache global level.

For more info :-
https://www.tectut.com/2016/04/workaround-for-scoreboard-is-full-not-at-maxrequestworkers
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-08-21 21:12:44 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Terry Burton <***@terryburton.co.uk> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@terryburton.co.uk
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-02 11:59:02 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Valentin Gjorgjioski <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #46 from Valentin Gjorgjioski <***@gmail.com> ---
Hitting me as well and making lot of troubles.

When is this going to be fixed?

What it the recommendation for production server?

Is it better if upgrade to 2.4.18? 2.4.10 backport?

or going back to which one is the best for 14.04.5 LTS ?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-02 15:24:08 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #47 from ***@mightytikigod.com ---
(In reply to Valentin Gjorgjioski from comment #46)
Post by b***@apache.org
Hitting me as well and making lot of troubles.
Is it better if upgrade to 2.4.18? 2.4.10 backport?
Upgrading to 2.4.18 hasn't helped everyone, but it did help me. The
"centos-sclo-rh" repository was a solution in my situation.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-02 15:48:45 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #48 from Luca Toscano <***@gmail.com> ---
(In reply to Valentin Gjorgjioski from comment #46)
Post by b***@apache.org
Hitting me as well and making lot of troubles.
Hi Valentin,

can you give us a bit more details about your use case? Does the max scoreboard
issue happens regularly after certain events or randomly? What is your
configuration (if you can share it) and httpd version? It would help a lot :)

Luca
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-02 16:38:42 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #49 from Valentin Gjorgjioski <***@gmail.com> ---
Hi,


This started happening after recent upgrade of Ubuntu. Apache was the same, and
now it is the same. Ubuntu is 14.04.5 LTS, Apache is 2.4.7.

This is high load, production server. Working for 1.5 year without any problems
so far.

Here is some log of that update, when the problem started:

[UPGRADE] apache2:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-bin:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-data:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-mpm-worker:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-utils:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[INSTALL] php5-mysqlnd:amd64
[UPGRADE] php5-cli:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-common:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-curl:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-fpm:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-gd:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-intl:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-pgsql:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-pspell:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-readline:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-recode:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-sqlite:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-tidy:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-xmlrpc:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-xsl:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19


Here is what I nailed it down to:
1. After this upgrade I needed to DISABLE the opcache in PHP, because problems
started with fatal errors and segmentation faults with wordpress.
2. Because of the 1. the server got even higher load.
3. Higher load caused full scoreboard, and maxRequestWorkersk.

What I found were two problems:

1. When high load occurs and MaxReqeustWorkers is hit, the apache stops
responding (dies). It should slow down, should not accept new requests until
free slot, but it shouldn't stop responding. I think I saw this reported
somewhere else, e.g.:
https://www.digitalocean.com/community/questions/apache2-crash-on-ubuntu-14-04-maxrequestworkers-issue

2. When I found a way to solve the problem with high load (enable wp cache
plugins), now the second problem started, mainly on apache reload (log
rotation) or even on regular basis WHEN MaxConnectionsPerChild is different
from 0, and/or when pm.max_requests is different from 0. Why this is a problem
- because children are dying after certain numbers of requests, and then they
get stuck into "G" state, and never completing. This is filling your scoreboard
and you are ending with that error. Once you set these to 0, problem more or
less disappears.

Workaround is setting these to 0, and hoping all scripts are good, no memory
leaks, lowering memory usage in php.ini, and restaring the server each day (on
logrotate restart and not reload).


Very important trick that I learned in during this is also this one: ALWAYS
restart php-fpm and apache together. Failing to do so leads to some
instabilities.

For me that workaround work, but I would like to hear why this happens, and how
we can prevent it (especially the problem when Apache dies when
MaxRequestWorkers is readched).
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-05 13:18:21 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #50 from Luca Toscano <***@gmail.com> ---
Thanks a lot for the details Valentin, will try to add my thoughts inline:

(In reply to Valentin Gjorgjioski from comment #49)
Post by b***@apache.org
This started happening after recent upgrade of Ubuntu. Apache was the same,
and now it is the same. Ubuntu is 14.04.5 LTS, Apache is 2.4.7.
This is a very old version of httpd, so if you could if would be really great
to upgrade Trusty to something more recent to see the differences.
Post by b***@apache.org
This is high load, production server. Working for 1.5 year without any
problems so far.
[UPGRADE] apache2:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-bin:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-data:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-mpm-worker:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[UPGRADE] apache2-utils:amd64 2.4.7-1ubuntu4.9 -> 2.4.7-1ubuntu4.13
[INSTALL] php5-mysqlnd:amd64
[UPGRADE] php5-cli:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-common:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-curl:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-fpm:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-gd:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-intl:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-pgsql:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-pspell:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-readline:amd64 5.5.9+dfsg-1ubuntu4.14 ->
5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-recode:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-sqlite:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-tidy:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-xmlrpc:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5-xsl:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
[UPGRADE] php5:amd64 5.5.9+dfsg-1ubuntu4.14 -> 5.5.9+dfsg-1ubuntu4.19
1. After this upgrade I needed to DISABLE the opcache in PHP, because
problems started with fatal errors and segmentation faults with wordpress.
2. Because of the 1. the server got even higher load.
3. Higher load caused full scoreboard, and maxRequestWorkersk.
Stating the obvious but the httpd issue seems to be a consequence of all the
php upgrades happened at the same time. Have you tried to rollback the last
upgrade to see if the issue persists?
Post by b***@apache.org
1. When high load occurs and MaxReqeustWorkers is hit, the apache stops
responding (dies). It should slow down, should not accept new requests until
free slot, but it shouldn't stop responding. I think I saw this reported
https://www.digitalocean.com/community/questions/apache2-crash-on-ubuntu-14-
04-maxrequestworkers-issue
Would you mind to include the logs and/or more details about this? Again it
would be really great to know if the problem is the same with a more recent
version of httpd.
Post by b***@apache.org
2. When I found a way to solve the problem with high load (enable wp cache
plugins), now the second problem started, mainly on apache reload (log
rotation) or even on regular basis WHEN MaxConnectionsPerChild is different
from 0, and/or when pm.max_requests is different from 0. Why this is a
problem - because children are dying after certain numbers of requests, and
then they get stuck into "G" state, and never completing. This is filling
your scoreboard and you are ending with that error. Once you set these to 0,
problem more or less disappears.
Do you have long timeouts (proxy, etc..) in your httpd configuration? This
would be a useful information for us, it happened in the past that long proxy
timeouts where exacerbating the issue that you described.
Post by b***@apache.org
Workaround is setting these to 0, and hoping all scripts are good, no memory
leaks, lowering memory usage in php.ini, and restaring the server each day
(on logrotate restart and not reload).
Very important trick that I learned in during this is also this one: ALWAYS
restart php-fpm and apache together. Failing to do so leads to some
instabilities.
For me that workaround work, but I would like to hear why this happens, and
how we can prevent it (especially the problem when Apache dies when
MaxRequestWorkers is readched).
As written above it would be great to know more about the "Apache dies" part.
Any detail that you could share with us would be really appreciated.

Thanks!

Luca
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-05 15:57:47 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #51 from Valentin Gjorgjioski <***@gmail.com> ---
Hi Luca,

at the moment upgrading to trusty is not really an option, scared mostly from
PHP7, and compatibility issues that might arise. Maybe next year.

Haven't tried to rollback, was not even sure how to do that, and if that is
easy.

the link to digitalocean is another user, but I'm experiencing exactly.
Unfortunately nothing in the log. Except the message stated there.

I'm not sure what long timeout is, but probably default of (300seconds?!) for
php-fpm using sockets is long. And yes, I guess this exacerbating the issue.
No proxies defined. To me it seems like when some processes hang on php side,
they are not getting killed on the apache side and connection is not released.
Not even after those 5minutes. It gets stuck there and that's it.

Apache dies means - apache processes are there, using no cpu, accepting no
connections, and only restart helps. Nothing in the logs.

I just went to prefork. I think it will be stable for now. I had tons of
problems these 5 days, I don't know why I didn't switch to prefork earlier. It
seems like e good workaround for me right now.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-05 19:20:46 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #52 from Valentin Gjorgjioski <***@gmail.com> ---
Hi,

now I believe I have clear picture what it is going on:

1. I'm using FastCGI, obviously dead project and not supported ?!

2. I'm not sure whether there is a directive such as connect timeout (fcgid has
this). It seems either there is no timeout or it is quite big.

3. When Apache get hardly hit, then php-fpm get hardly hit as well. In my case
PHP-FPM started having problems to do its job when I disabled the opcache
mentioned earlier. So it get stuck with a longer and longer queue. Then apache
continue sending processes to php-gpm even when php-fpm reached the limit
(pm.max_children). In such scenario php-fpm stops opening new processes, but
somehow old processes get stuck. Then apache continue doing this until full
scoreboard. And now CPU usage is very low, it seems like some I/O block, many
apache processes (1500?! ) waiting to open socket, but the socket is not
available.

However, at this point it is not very clear to me why Apache builds up the
queue and the queue is not getting emptied - there is no high processor usage,
it seems that php-fpm/apache got stuck and nothing can be done. Could be this
apache not handling sockets properly?

4. Even with prefork this happens, it's not the mpm_event problem in this case.


Workaround for the next month or so: Optimize work of PHP, lower the load so
PHP-FPM can handl timely. Also, ubuntu upgrade and including more stable php
opcache will help towards this.

Long time solution: There must be a solution for this problem in general.
Either it is time to move to nginx, or it is time to move to better module for
fastcgi. By the way, what will you sugest at this point, what is the easier
migration path from fastcgi to another apache module?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-05 20:48:21 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555
Post by b***@apache.org
However, at this point it is not very clear to me why Apache builds up the
queue and the queue is not getting emptied - there is no high processor
usage, it seems that php-fpm/apache got stuck and nothing can be done. Could
be this apache not handling sockets properly?
I'd suggest starting a thread on ***@httpd.apache.org.

If you can get this error, you should be able to find some processes trying to
exit but hanging on the way out waiting for requests to complete. Showing
their backtrace with gdb (or pstack) will tell us exactly what they're doing.

Your MPM configuration will also tell us if you have unnecessary process churn.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-05 21:45:56 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Stefan Fritsch <***@sfritsch.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #33750|0 |1
is obsolete| |

--- Comment #54 from Stefan Fritsch <***@sfritsch.de> ---
Created attachment 34201
--> https://bz.apache.org/bugzilla/attachment.cgi?id=34201&action=edit
Use all scoreboard entries up to ServerLimit, for trunk

New patch: This time use the whole scoreboard up to the configured ServerLimit.
Also fixed some issues with the previous patch.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-05 21:50:31 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Stefan Fritsch <***@sfritsch.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #33749|0 |1
is obsolete| |

--- Comment #55 from Stefan Fritsch <***@sfritsch.de> ---
Created attachment 34202
--> https://bz.apache.org/bugzilla/attachment.cgi?id=34202&action=edit
Use all scoreboard entries up to ServerLimit, for 2.4

Same as above, but for 2.4.

This contains the trunk patch plus these commits from trunk:

r1705922
r1706523
r1738464
r1738466
r1738486
r1738628
r1738631
r1738632
r1738633
r1738635
r1756848
r1757009
r1757011
r1757029
r1757030
r1757031
r1757056
r1757061

It would be really nice if someone could give this a try in a real-life setup.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-09-05 21:58:13 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #56 from Valentin Gjorgjioski <***@gmail.com> ---
from what I understand, it seems that Apache can't do anything about this, it
seems correct behavior. It waits on the socket for its output. Timeouts are
high (30 seconds) so on a busy server if all php-fpm processes working on that
socket are occupied (not returning result), queue is getting bigger and bigger.

And indeed every-time this crashed happened I found timeout in error logs (just
for certain web sites), which I have missed previously.

It seems like the problem is in php-fpm, that started with my recent upgrade.
Problems with the opcache started also there. And I replaced mysql with mysqlnd
in that update. So many changes, something was broken, but I think there is
nothing wrong with apache. Problem should be either in php-fpm or php-mysqlnd
or maybe in the web-sites themselves.


At the end it will be great if apache provides ability to limit number of
processes per virtual host (as php-gpm allows this). This way it will be also
much easier to isolate/solve the problem.
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-10-25 21:46:14 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #57 from Thomas Jarosch <***@intra2net.com> ---
Hi Stefan,

thanks for trying to solve the "scoreboard full" issue :)

I've been hit by it badly today, the affected machine
is a forward proxy and stalls the traffic almost completely.

Some background info:
- event mpm on httpd 2.4.23
- forward proxy setup via mod_proxy
- 280 real users + other machines. ~370 clients
- server load is around 0.2, plenty of free RAM
- file descriptor limit is 1024
- logrotate sends a graceful restart every hour

If the problem occurs, httpd doesn't even respond
to the /server-status page reliably.

A small script logs the /server-status page every 30s to disk.
Specific case: logrotate sends a "graceful restart" at 13h.

/server-status output at 13:04:24h:
-------------------
Total accesses: 8801 - Total Traffic: 74.6 MB
75 requests currently being processed, 125 idle workers
+---------------------------------------------------------------------------+
| | Connections | Threads | Async connections |
| PID |-------------------+-------------+---------------------------------|
| | total | accepting | busy | idle | writing | keep-alive | closing ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 14906 | 7 | yes | 6 | 44 | 0 | 1 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 14959 | 9 | yes | 9 | 41 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15014 | 3 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15015 | 49 | yes | 50 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15329 | 3 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15893 | 15 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 17762 | 11 | yes | 10 | 40 | 0 | 1 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| Sum | 97 |   | 75 | 125 | 0 | 2 | 0 ||
+---------------------------------------------------------------------------+

_________R_____R__________________R___R___R__R________R______R_R
R_____R__R_________________R__R____RGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGRRRRRRRRRRRRRRRRRRRRRRRRRRRWRRRRRRRRRRRRRR
RRRRRRRRGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGR__________R__R_____
_______R_RR_________R_RR_R____
-------------------


/server-status output at 13:15:25h:
-------------------
Total accesses: 12929 - Total Traffic: 90.9 MB
87 requests currently being processed, 63 idle workers
+---------------------------------------------------------------------------+
| | Connections | Threads | Async connections |
| PID |-------------------+-------------+---------------------------------|
| | total | accepting | busy | idle | writing | keep-alive | closing ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 14906 | 18 | yes | 16 | 34 | 0 | 2 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 14959 | 27 | yes | 26 | 24 | 0 | 2 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15014 | 2 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15015 | 2 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15329 | 2 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 18564 | 45 | yes | 45 | 5 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 17762 | 39 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 18078 | 44 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| Sum | 179 | | 87 | 63 | 0 | 4 | 0 ||
+---------------------------------------------------------------------------+

_____R__R___R_RR_RR_R_RR__R_____R_R___R_R_____R___W_RR__RR_RR__R
RR__R_RR____RRRRR_R_RR___R_RR_RR____GGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGRRRRRR
RRRRRRRRR_RRRRRRRRR_RRRR_RRRRRRRRRRR_R_RRRRRGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGG
-------------------


/server-status at 13:25:20h:
(httpd hardly responding anymore):
-------------------
Total accesses: 14630 - Total Traffic: 97.4 MB
50 requests currently being processed, 0 idle workers
+---------------------------------------------------------------------------+
| | Connections | Threads | Async connections |
| PID |-------------------+-------------+---------------------------------|
| | total | accepting | busy | idle | writing | keep-alive | closing ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 14906 | 36 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 14959 | 2 | yes | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15014 | 2 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15015 | 2 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 15329 | 2 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 18564 | 50 | yes | 50 | 0 | 0 | 1 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 17762 | 3 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| 18078 | 1 | no | 0 | 0 | 0 | 0 | 0 ||
|-------+-------+-----------+------+------+---------+------------+---------||
| Sum | 98 | | 50 | 0 | 0 | 1 | 0 ||
+---------------------------------------------------------------------------+

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGWRRRRR
RRRRWRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGG
-------------------

I can provide more /server-status output if needed.

After around 30 mins, the external "mon" watchdog
kills httpd and restarts it. Traffic continues to flow.


httpd config:
-------------------
Timeout 300
KeepAliveTimeout 300

<IfModule mpm_event_module>
# Number of concurrent connections is: ServerLimit * ThreadsPerChild
# Result: 16 * 50 -> 800
#
StartServers 1
ServerLimit 16
ThreadLimit 50
ThreadsPerChild 50
MaxConnectionsPerChild 1000
</IfModule>

No other performance related settings.

-------------------

I've now increased ServerLimit to 32 and disabled
logrotate as a quick fix. It holds so far.
Occasionally I still see the "scoreboard full" message,
even though there are just ~160 active connections and some processes
are (still?) in the graceful shutdown state.


I'll put the patch from #55 on the productive machine tomorrow :o)
It already runs on my own proxy and the one from my department.

Anything else to watch out for?

I can provide gdb backtraces if you tell
me to look for something specific, too.

Triggering a graceful restart during peak traffic might be a good test...

Cheers,
Thomas
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-10-26 06:39:03 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #58 from Thomas Jarosch <***@intra2net.com> ---
Another info about my setup:

There are two other httpd instances running on different ports.
One is using the event MPM, the other one prefork MPM.

I didn't configure an explicit ScoreBoardFile, so the scoreboard is in
anonymous shared memory. Could there be cross-talk of those three httpds?
--
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-10-26 13:44:54 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #59 from Thomas Jarosch <***@intra2net.com> ---
Hi Stefan,

the patch from #55 seems to make things scale a lot better.
Also the status output is very helpful.

ServerLimit was changed back to 16 before the tests.
I did a graceful restart at 13:09:35h.

/server-status at 14:19:36h (*before* the next graceful restart):
-----------------------
Total accesses: 23693 - Total Traffic: 200.0 MB
100 requests currently being processed, 150 idle workers
+--------------------------------------------------------------------------------------------+
| | | | Connections | Threads | Async
connections |
| Slot | PID | Stopping
|-------------------+-------------+--------------------------------|
| | | | total | accepting | busy | idle | writing |
keep-alive | closing |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|0 |19952 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|1 |20006 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|2 |20060 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|3 |20160 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|4 |20224 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|5 |20725 |no |2 |yes |2 |48 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|6 |27470 |no |50 |yes |50 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|7 |24389 |yes |3 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|8 |27104 |no |18 |yes |18 |32 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|9 |27346 |no |3 |yes |3 |47 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|10 |22579 |yes |2 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|11 |27674 |no |29 |yes |27 |23 |0 |3
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|13 |25055 |yes |8 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|14 |25350 |yes |2 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|15 |25475 |yes |5 |no |0 |0 |0 |0
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|Sum |15 |10 |137 |  |100 |150 |0 |3
|0 |
+--------------------------------------------------------------------------------------------+

.G.G...............G............................................
..............G.....G.....G.........G..............G............
.........G.....G...G..................GG........................
...........................G........G.....................______
___________R_______________R________________RRRRRRRRRRRRRRRRRRRR
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRR.....................G............
.....G...G......___R____R_RR_R__R______RRRRR__R__R______RR__R_R_
_R________________R__________________R_____________RGG__RRRRRRR_
_RRRR___R____RR__RR____R__R_W__RRRRR_RRRGGGGGGGGGGGGGGG

-----------------------

As you can see, there are still processes from "old gen" after one hour.
This is due to long running HTTP CONNECT requests to google / dropbox / etc.

Probably GracefulShutdownTimeout will help here, may be
having a default value of one hour might make sense
for httpd in general?


Next graceful restart at 14:19:51h.

Errors start to appear in the log two seconds later:

[Wed Oct 26 14:19:53.926229 2016] [mpm_event:error] [pid 19951:tid 3071850240]
AH: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.


/server-status at 14:20:06h:
-----------------------
Total accesses: 23744 - Total Traffic: 200.9 MB
8 requests currently being processed, 42 idle workers
+--------------------------------------------------------------------------------------------+
| | | | Connections | Threads | Async
connections |
| Slot | PID | Stopping
|-------------------+-------------+--------------------------------|
| | | | total | accepting | busy | idle | writing |
keep-alive | closing |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|0 |19952 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|1 |20006 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|2 |20060 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|3 |20160 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|4 |20224 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|5 |20725 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|6 |27470 |yes (old |42 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|7 |24389 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|8 |27104 |yes (old |18 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|9 |27346 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|10 |22579 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|11 |27674 |yes (old |24 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|12 |28054 |no |9 |yes |8 |42 |0 |2
|0 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|13 |25055 |yes (old |8 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|14 |25350 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|15 |25475 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|Sum |16 |15 |133 |  |8 |42 |0 |2
|0 |
+--------------------------------------------------------------------------------------------+

.G.G...............G............................................
..............G.....G.....G.........G..............G............
.........G.....G...G..................GG........................
...........................G........G...........................
...........G...............G................G.GGGGG.G.G..GGGGGG.
GGGGGGGGGGGGGG.GGGGGGGGGGG.GGG.....................G............
.....G...G......GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG_
______RRRR____RRRW_______________________________GGGGGGGGGGGGGGG
-----------------------



The forward proxy became unresponsive again.
/server-status at 14:29:16h:
-----------------------
Total accesses: 24453 - Total Traffic: 226.8 MB
50 requests currently being processed, 0 idle workers
+--------------------------------------------------------------------------------------------+
| | | | Connections | Threads | Async
connections |
| Slot | PID | Stopping
|-------------------+-------------+--------------------------------|
| | | | total | accepting | busy | idle | writing |
keep-alive | closing |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|0 |19952 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|1 |20006 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|2 |20060 |yes (old |5 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|3 |20160 |yes (old |1 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|4 |20224 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|5 |20725 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|6 |27470 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|7 |24389 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|8 |27104 |yes (old |1 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|9 |27346 |yes (old |1 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|10 |22579 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|11 |27674 |yes (old |3 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|12 |28054 |no |51 |yes |50 |0 |0 |0
|1 |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|13 |25055 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|14 |25350 |yes (old |2 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|15 |25475 |yes (old |4 |no |0 |0 |0 |0
|0 |
| | |gen) | | | | | |
| |
|------+-------+----------+-------+-----------+------+------+---------+------------+---------|
|Sum |16 |15 |86 |  |50 |0 |0 |0
|1 |
+--------------------------------------------------------------------------------------------+

.G.G...............G............................................
..............G.....G.....G.........G..............G............
.........G.....G...G...................G........................
...........................G........G...........................
...........G...............G....................................
...........G.............G.........................G............
.....G..........GGGGGGGRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
RRRRRRRWRGGGGGGGG
-----------------------

As you can see, there was plenty of room in the scoreboard now,
but the process list slots were used up by old processes
serving just a handful of connections.


One option would be to increase ServerLimit to let's say 128,
but that also raises the resource limits during normal operation.
If I raise ServerLimit too much, I have to lower the thread count again.
Sounds a bit like the prefork mpm...

Another option would be to add a config setting to ignore
processes for the ServerLimit calculation if they are
in graceful shutdown mode. They probably don't consume
a lot of resources and we can have a GracefulShutdownTimeout
of one hour to expire them, too.

Third option (preferred one): Have an own GracefulShutdownLimit
that's separate from ServerLimit. If we have too many processes,
start killing of oldest process from the graceful shutdown list.
Process in graceful shutdown mode don't count for ServerLimit.


I've raised ServerLimit to 32 on the box again.
The users can't be annoyed too much ;)

Cheers,
Thomas

PS: Forget about the idea about cross-talk of anonymous shared memory segments
from #58. It's not the case.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-11-04 13:57:23 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #60 from Stefan Fritsch <***@sfritsch.de> ---
(In reply to Thomas Jarosch from comment #59)
Post by b***@apache.org
the patch from #55 seems to make things scale a lot better.
Also the status output is very helpful.
Glad to hear that and thanks for testing it.
Post by b***@apache.org
As you can see, there are still processes from "old gen" after one hour.
This is due to long running HTTP CONNECT requests to google / dropbox / etc.
There is no way to determine if such connections can be "safely" interrupted or
if they are in the middle of a long download.
Post by b***@apache.org
Probably GracefulShutdownTimeout will help here, may be
having a default value of one hour might make sense
for httpd in general?
Currently the children won't honor GracefulShutdownTimeout. But that should be
added.
Post by b***@apache.org
As you can see, there was plenty of room in the scoreboard now,
but the process list slots were used up by old processes
serving just a handful of connections.
One option would be to increase ServerLimit to let's say 128,
but that also raises the resource limits during normal operation.
If I raise ServerLimit too much, I have to lower the thread count again.
Sounds a bit like the prefork mpm...
During normal operation, the number of threads will be limited by
MaxRequestWorkers. The idea of my patch is that you can increase Serverlimit
quite a bit without using too many ressources. The processes serving old
connections should terminate most of their threads and free most of their
memory, so the resource usage should not be too much. But it of course depends
on how may old connections are still open.
Post by b***@apache.org
Another option would be to add a config setting to ignore
processes for the ServerLimit calculation if they are
in graceful shutdown mode. They probably don't consume
a lot of resources and we can have a GracefulShutdownTimeout
of one hour to expire them, too.
You are confusing ServerLimit with MaxRequestWorkers here. While the latter is
a number of threads and not processes, it does what you think ServerLimit
should do.
Post by b***@apache.org
Third option (preferred one): Have an own GracefulShutdownLimit
that's separate from ServerLimit. If we have too many processes,
start killing of oldest process from the graceful shutdown list.
Process in graceful shutdown mode don't count for ServerLimit.
Yes, we could do that, too. But first I need something like
GracefulShutdownTimeout to work for the old child processes.


If you have any more experiences with the patch I am certainly interested. Even
if it has simply run for some time without (new) bugs exposed.

Cheers,
Stefan
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-11-04 23:04:30 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #61 from Yann Ylavic <***@gmail.com> ---
Some quick note about the patch (unfortunately I could not carry out my testing
since a colleague reused the machine, resetting my local patches/work
altogether...).

Anyway, there is possibly an issue with retained->total_daemons which is
incremented (unconditionally) whenever a child is created (make_child), but not
always decremented when one finishes (server_main_loop, depending on whether or
not it died smoothly and it still uses a scoreboard slot).

IOW, I think this hunk:
ps->quiescing = 0;
+ retained->total_daemons--;

should probably be moved up here:
ap_wait_or_timeout(&exitwhy, &status, &pid, pconf, ap_server_conf);
if (pid.pid != -1) {
+ retained->total_daemons--;

Will restart my tests ASAP...
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-11-06 20:36:40 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #62 from Stefan Fritsch <***@sfritsch.de> ---
(In reply to Yann Ylavic from comment #61)
Post by b***@apache.org
Anyway, there is possibly an issue with retained->total_daemons which is
incremented (unconditionally) whenever a child is created (make_child), but
not always decremented when one finishes (server_main_loop, depending on
whether or not it died smoothly and it still uses a scoreboard slot).
ps->quiescing = 0;
+ retained->total_daemons--;
ap_wait_or_timeout(&exitwhy, &status, &pid, pconf, ap_server_conf);
if (pid.pid != -1) {
+ retained->total_daemons--;
No, I think the code in the patch is correct: There is only one case where the
code will return from the function before reaching the "if (child_slot >= 0) {"
block which contains the "retained->total_daemons--;" line. And in this case
the whole server will exit, so correct counting is not an issue any more.

On the other hand, total_daemons must not be decremented if child_slot < 0,
because in this case the dead process was not a worker process (but e.g. a
cgid-process).

But this should be made clearer, either by rearranging the code or by adding
some comments.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-11-07 10:24:40 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #63 from ***@tikon.ch ---
We have successfully used patch in #55 for 50 days now on mid-sized production
server with 1-2 million hits per day. No issues encountered. Previous issues
disappeared (we think the original bug had been abused in DoS attack, but we
might be wrong on this).
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-11-21 20:21:36 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Jim Jagielski <***@apache.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@apache.org

--- Comment #64 from Jim Jagielski <***@apache.org> ---
Comment on attachment 34202
--> https://bz.apache.org/bugzilla/attachment.cgi?id=34202
Use all scoreboard entries up to ServerLimit, for 2.4

This looks good. Should be proposed for back port!!
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-11-21 20:48:11 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #65 from Stefan Fritsch <***@sfritsch.de> ---
Rest of the trunk patch committed as

r1770750
r1770752
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-12-02 23:35:57 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Dru <***@treshna.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@treshna.com
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-12-06 17:25:44 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Eric Covener <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Keywords| |FixedInTrunk
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2016-12-31 00:18:31 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Eric Covener <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED

--- Comment #66 from Eric Covener <***@gmail.com> ---
Fixed in 2.4.25
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2017-01-25 11:37:55 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

Thomas Jarosch <***@intra2net.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@intra2net.co
| |m
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2017-01-25 12:00:44 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #67 from Thomas Jarosch <***@intra2net.com> ---
Hi Stefan,

(In reply to Stefan Fritsch from comment #60)
Post by b***@apache.org
Post by b***@apache.org
the patch from #55 seems to make things scale a lot better.
Also the status output is very helpful.
Glad to hear that and thanks for testing it.
Sorry, I didn't see your reply as bugzilla
didn't add me to CC: automatically. Which is rather
odd since it's the default setting.
Post by b***@apache.org
Post by b***@apache.org
Probably GracefulShutdownTimeout will help here, may be
having a default value of one hour might make sense
for httpd in general?
Currently the children won't honor GracefulShutdownTimeout. But that should
be added.
very nice.
Post by b***@apache.org
Post by b***@apache.org
Third option (preferred one): Have an own GracefulShutdownLimit
that's separate from ServerLimit. If we have too many processes,
start killing of oldest process from the graceful shutdown list.
Process in graceful shutdown mode don't count for ServerLimit.
Yes, we could do that, too. But first I need something like
GracefulShutdownTimeout to work for the old child processes.
ok.

In the meantime I've decreased the ServerLimit/ThreadLimit to 5 and increased
the ServerLimit 160 and more. The results with these settings are very good, no
more user complaints (see below).

Otherwise those long running HTTP CONNECT sessions were still maxing out the
total number of allowed processes.
Post by b***@apache.org
If you have any more experiences with the patch I am certainly interested.
Even if it has simply run for some time without (new) bugs exposed.
the patch had been deployed to about ~3.000 servers since November 2016 with
different work loads from 10 users to 400+ users. After applying your patch +
the ThreadLimit change, there were no more complaints :)

I've also diffed httpd 2.4.23 + the patch with the version of the code that
landed in 2.4.25 and it's exactly the same. I'm soon going to roll out 2.4.25
to those boxes.

Thanks again!
Thomas
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
b***@apache.org
2017-01-31 10:06:49 UTC
Permalink
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

--- Comment #68 from Luca Toscano <***@gmail.com> ---
*** Bug 56101 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-***@httpd.apache.org
For additional commands, e-mail: bugs-***@httpd.apache.org
Loading...