Author Topic: ConnectException without any other clue  (Read 14945 times)

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
ConnectException without any other clue
« on: December 20, 2011, 08:16:30 AM »
Our production systems encounter the dreadful ConnectException again:

Code: [Select]
2011/12/15 14:51:42:932 : http-10.1.200.34-8080-6 : [com.creamtec.ajaxswing.ServletHelper] Failed to process a request for session = org.apache.catalina.session.StandardSessionFacade@593bdac8 (Exception java.rmi.ConnectException, Connection refused to host: 172.25.161.113; nested exception is:
java.net.ConnectException: Connection refused: connect)
java.rmi.ConnectException: Connection refused to host: 172.25.161.113; nested exception is:
java.net.ConnectException: Connection refused: connect
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
at com.creamtec.ajaxswing.ClientAgentImpl_Stub.processHttpRequest(Unknown Source)
at com.creamtec.ajaxswing.ServletHelper.delegateRequestToClientAgent(ServletHelper.java:549)
at com.creamtec.ajaxswing.ServletHelper.doPocessHttpRequest(ServletHelper.java:381)
at com.creamtec.ajaxswing.ServletHelper.processHttpRequest(ServletHelper.java:293)
at com.creamtec.ajaxswing.AjaxSwingRouter.doPost(AjaxSwingRouter.java:141)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)
at com.creamtec.ajaxswing.AjaxSwingRouter.service(AjaxSwingRouter.java:106)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:864)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1665)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.<init>(Socket.java:375)
at java.net.Socket.<init>(Socket.java:189)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595)
... 23 more

We checked all the previous causes for this, like:
- 2 network interfaces where one is blocked by firewall
- System.exit() that closes the whole JVM instead of only one session
- OutOfMemory issues
- Filesystem has no space left
- clientAgent.sh not executable

The interesting thing is, that these problems occur on both load balanced servers at nearly the same time. The machines are not at all on high load. The users as it seems get error pages when trying to get new sessions. Already running sessions seem not to be affected. This currently occurs with our own version of the 3.2.3 release. The problem only occured some days ago and previously worked flawlessly for some time.

The logs can be downloaded here: https://www.zeta-uploader.com/558570766
The download will expire in 30 days.

Do you have any other ideas why this could occur?

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #1 on: December 20, 2011, 08:25:57 AM »
Also checked for the license, this seems to be ok aswell with the following output:

Code: [Select]
2011/12/20 06:21:19:843 : main : [com.creamtec.ajaxswing.core.LicenseManager] Reading lic...
2011/12/20 06:21:19:843 : main : [com.creamtec.ajaxswing.core.LicenseManager] Initializing un

Dev Team

  • Customer
  • Hero Member
  • *
  • Posts: 593
    • View Profile
Re: ConnectException without any other clue
« Reply #2 on: December 21, 2011, 11:58:52 AM »
For 20111220_loglevel6.zip logs are for December 20, while latest connection exceptions in ajaxswing.logs are from December 16

For system1_loglevel4.zip any logs (any requests) in first JVM are done 10 minutes before ConnectionException and second JVM is started after that, you're saying these errors occurs when user is trying to get new session? On guess here might be that old JVM is stopped since user session expired (no actions, even pings), and then user opens application again having old session cookie, so he gets to same server and it finds old JVM by session cookie and opens RMI connection resulting in connection refused since that JVM was already turned off. (what is your clientsPerJVM and session timeout value? does timeout match with time difference between requests?) But this should not happen since AjaxSwing would know that session expired and start new instead of connecting to old JVM.

Do you have any custom mechanisms for session handling? Is it possible that you close application by your own timeout and AjaxSwing for some time still thinks that JVM/instance is alive (your timeout smaller then AjaxSwing timeout)?

One case when you get ConnectionException without any additional logs during new session start is if there's not enough PermGen memory in JVM to start new session.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #3 on: December 22, 2011, 12:04:06 PM »
Those are our properties:
Code: [Select]
router.preloadWaitForApp=false
agent.appStartClass=<...>
router.retireJVMAfterClients=20
router.inProcess=false
router.clientsPerJVM=20
common.classPath=<...>
router.sessionTimeout=900
Timeout thus is 15 minutes and clientsPerJVM is 20.

We have no own mechanism for session timeouts or session handling. Only our System.exit() replacement that is invoked by user actions:
Code: [Select]
package de.bertelsmann.general;

import java.security.AccessControlException;

import org.apache.log4j.Logger;

import com.creamtec.ajaxswing.AjaxSwingManager;
import com.creamtec.ajaxswing.core.ClientAgent;

public class SystemExitHelper {

private SystemExitHelper() {
}

@SuppressWarnings("rawtypes")
public static void initSystemExit(Class caller, int status) {
Logger log = Logger.getLogger(caller);
log.info("Excecute system exit with status " + status);
if (AjaxSwingManager.isAjaxSwingRunning()) {
if ((ClientAgent.getCurrentInstance() == null)) {
throw new Error("System.exit hätte ausgeführt werden sollen, jedoch ist der ClientAgent schon heruntergefahren.");
} else {
try{
ClientAgent.getCurrentInstance().exit();
}catch(AccessControlException e){
//Siehe: http://creamtec.com/forums/index.php?topic=942.0
if(!"exitVM".equals(e.getPermission().getName())){
throw e;
}
}
}
} else {
System.exit(status);
}
}
}
This should be no problem, because AjaxSwings SecurityManager calls this himself aswell on a System.exit() catch. We introduced this, because the SecurityManager seemed to not be able to catch all System.exit() calls and thus JVM were shutdown completely. This problem aswell caused ConnectExceptions, but with different symptoms than we have now.
Another project with AjaxSwing 3.2.5_2 runs fine with that code currently, only our other project with 3.2.3 has problems with this right now.

About the PermGen memory, we have the following setting in clientAgent.bat:
Code: [Select]
set JVM_OPTS=-Xms512m -Xmx1024m -XX:MaxPermSize=256m %JVM_OPTS%Or do you mean the AjaxSwingRouter inside Tomcat has not enough PermGen?

We also made sure that enableJDK16.bat is executed on the target machine and the asboot.jar and ajaxswing.jar gets copied from ajaxswing/lib/ to WEB-INF/lib and our own client/lib. But the ConnectExceptions still occur there.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #4 on: December 22, 2011, 12:15:43 PM »
Just noticed that we changed ClientsPerJVM and RetireJVM parameters from 20 to 5 when we changed the loglevel from 4 to 6. Thus since 20.12.1011 we have this setting on 5 instead of 20.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #5 on: December 22, 2011, 12:17:56 PM »
I've uploaded a new batch of logs from today: https://www.zeta-uploader.com/1259506115
The download will expire in 30 days.

Dev Team

  • Customer
  • Hero Member
  • *
  • Posts: 593
    • View Profile
Re: ConnectException without any other clue
« Reply #6 on: December 23, 2011, 11:34:01 AM »
Do you have any information from users on when they get it?
1) Where they working with application and error suddenly appeared?
2) Or, was it during new session request
3) Or, after doing some action after long waiting from previous action

Based on log it seems that ConnectionException happens between ending old JVM and starting new one. Not sure if ConnectionException is a reason for new JVM startup or caused by startup.

You were saying you were not getting these errors previously and last thing you changed was changing ClientsPerJVM from 20 to 5? Can you try ClientPerJVM=5, but RetireJVM=20? Seems like there might be a race condition during JVM retire when it happens same time as new JVM startup.

Did you change anything else before ConnectionException started to appear?

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #7 on: January 09, 2012, 06:04:33 AM »
There was nothing particularly changed before the ConnectExceptions occured.

The settings currently are:
router.clientsPerJVM=5
router.retireJVMAfterClients=5

We changed it from 20/20 to 5/5 after the ConnectExceptions in this particular case started to emerge to maybe decrease the impact of the bug if it again causes the whole jvm to crash. Though this is not the case since now.

We tried to use 5/20 and 20/5 that you suggested and it changed nothing, the ConnectExceptions still occured.
We were unable to reproduce this yet on our testing machines.

With the information I got I can tell that users get this on 2) during new session request. Though I don't know if they previously closed their open session or restarted the browser. I would suspect it is the first case.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #8 on: January 09, 2012, 07:25:42 AM »
Maybe we have a change that could be the cause of this. Since 15.12.2011 we have the ConnectExceptions. Since 13.12.2011 we installed a script on both production servers that restarts tomcat on 03:00 am evey night.

Could it be the case that simply restarting tomcat leaves some ClientAgents running which somehow get reused later without being fully capable of something or getting killed later somehow? Have you encountered problems like this previously? Maybe it would be better to restart the servers instead of the tomcats?

Though this idea is quite vague, because the exceptions occur during the day, while the tomcats are restarted during night. This is quite a loose connection.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #9 on: January 10, 2012, 10:52:48 AM »
I was able to reproduce this with a stress test locally. The ConnectExceptions occur both with AjaxSwing 3.2.6 and AjaxSwing 3.2.3. I've done this with our application and 5 parallel test users sequencially logging in and out.

I am currently working on trying to isolate this problem into a sample, which was unsuccessful for now.

Dev Team

  • Customer
  • Hero Member
  • *
  • Posts: 593
    • View Profile
Re: ConnectException without any other clue
« Reply #10 on: January 10, 2012, 11:31:47 AM »
Looking forward for a sample.

Script for server restart should not cause ConnectionException, even if it would, that should happen in next 2-3 minutes after restart because old ClientAgent would exit after that with timeout.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #11 on: January 11, 2012, 08:08:06 AM »
I was able to reproduce the problem in a sample. I've sent it to support@creamtec.com and to Alexander Ponomarenko who is familiar with our samples.

I am now experimenting with trying out some workarounds like disabling router.preloadApp.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #12 on: January 12, 2012, 07:49:15 AM »
Were you able to download the sample and reproduce the problem?

Here are the results of some workarounds I've tried:
1) Changing to router.preloadApp=false does not help.
2) The problem is also reproducible both with common.classPath and agent.classpath.
3) I've tried to change to router.inProcess=false which seems to help in the sample. This might be because ClientAgents don't get created in the first place and thus cannot eat up the memory. Though we cannot go with this workaround in our application because it is incompatible with this setting at the moment.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #13 on: January 16, 2012, 08:21:43 AM »
We were able to find a connection between the sample we've sent you and the problem we encountered in our production systems.
Here you can download the logs that show the ConnectExceptions on the 10.01.2012:
https://www.zeta-uploader.com/531935087

I've also attached the picture of the statistics of the memory usage on the machine on that day. The logs show that between 10:30 and 12:10 lots of ConnectExceptions occured. The statistics identically show that the memory usage is on its peak for that day during that time. Similarly to our sample, when the ConnectExceptions appear, lots of memory gets freed up in a short time.

Interesting about this is the fact, that the machine still had about 3GB of RAM left when the peak occured. This leads to the conclusion that it was not the machines memory limit that was reached (in our test environment our machines have only 3GB RAM, which is way less than our production machines have), but instead another one before that. We currently have the suspicion that the tomcat process suffers from not having enough max memory configured. Thus we are going to monitor our tomcat process to verify that.

subes

  • Full Member
  • ***
  • Posts: 154
    • View Profile
Re: ConnectException without any other clue
« Reply #14 on: January 16, 2012, 08:24:33 AM »
I've also tested router.inProcess=false in a minimal stress test over the weekend. This was done with 3 parallel clients in a lighweight login-logout-test seperate from our application. After about 3 days, tomcat used up all the available memory and let sessions crash with OutOfMemoryExceptions. Thus using this setting as a workaround is impossible.
« Last Edit: January 16, 2012, 08:28:57 AM by subes »