ode-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Taylor <saurs...@yahoo.com>
Subject Re: Client calling retired process?
Date Tue, 25 Nov 2008 17:06:03 GMT
We are planning to change our ODE deployment so that it is on a separate Node from other application
instances. When we do this, i'll change the logging configuration as you mentioned and capture
what happens.

In the meantime, this is causing a secondary issue in that when we hit the original OOM, we
build up a lot of rescheduled jobs (sometimes well over a hundred) apparently for requests
that cannot be satisfied.  When the server starts up again, it immediately pegs at full capacity
trying to satisfy these.  Other than deleting the rescheduled jobs from ODE_JOB, is there
some way to change the configuration of ODE to limit how many of these it reschedules so as
not to back it up?




________________________________
From: Matthieu Riou <matthieu@offthelip.org>
To: user@ode.apache.org
Sent: Tuesday, November 25, 2008 10:20:07 AM
Subject: Re: Client calling retired process?

On Mon, Nov 24, 2008 at 7:14 AM, Chris Taylor <saursoor@yahoo.com> wrote:

> Some more information regarding this error:
>
> we are still seeing this even with the ODE Trunk 1.2.1 deployment. It
> occurs quite rarely, but it seems the catalyst is an OutOfMemoryError raised
> by ODE when a new request comes in:
>

Reviewing the code again I couldn't spot anything that would produce this
behavior. The process or the process data aren't stored in structures that
would be sensitive to OOM. One thing that could help would be a debug log of
BpelEngineImpl when the problem occurs as routing to a given process from
the message happens in BpelEngineImpl.route(). So you could just set that
logger to debug and see the next time it happens.

Thanks,
Matthieu


>
>
> java.lang.OutOfMemoryError
>
> at
> org.apache.ode.bpel.engine.MyRoleMessageExchangeImpl$ResponseFuture.get(MyRoleMessageExchangeImpl.java:201)
>
> at
> org.apache.ode.axis2.ODEService.onAxisMessageExchange(ODEService.java:149)
>
> at
> org.apache.ode.axis2.hooks.ODEMessageReceiver.invokeBusinessLogic(ODEMessageReceiver.java:67)
>
> at
> org.apache.ode.axis2.hooks.ODEMessageReceiver.invokeBusinessLogic(ODEMessageReceiver.java:50)
>
> at
> org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:96)
>
> at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:145)
>
> at
> org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:275)
>
> at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:120)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:763)
>
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
>
> at com.ibm.ws <http://com.ibm.ws.webcontainer.servlet.servletwrapper.se/>
> .webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1075)
>
> at com.ibm.ws
> .webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:550)
>
> at
> com.ibm.ws.wswebcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:478)
>
> at
> com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:90)
>
> at
> com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:744)
>
> at
> com.ibm.ws.wswebcontainer.WebContainer.handleRequest(WebContainer.java:1455)
>
> at com.ibm.ws <http://com.ibm.ws.webcontainer.channel.wcchannellink.re/>
> .webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:115)
>
> at com.ibm.ws <http://com.ibm.ws.http.channel.inbound.impl.ht/>
> .http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:458)
>
> at
> com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewInformation(HttpInboundLink.java:387)
>
> at com.ibm.ws<http://com.ibm.ws.http.channel.inbound.impl.httpiclreadcallback.com/>
> .http.channel.inbound.impl.HttpICLReadCallback.complete(HttpICLReadCallback.java:102)
>
> at
> com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)
>
> at com.ibm.io <http://com.ibm.io.async.abstractasyncfuture.in/>
> .async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
>
> at com.ibm.io <http://com.ibm.io.async.asyncchannelfuture.fi/>
> .async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
>
> at com.ibm.io <http://com.ibm.io.async.asyncfuture.com/>
> .async.AsyncFuture.completed(AsyncFuture.java:136)
>
> at com.ibm.io <http://com.ibm.io.async.resulthandler.com/>
> .async.ResultHandler.complete(ResultHandler.java:195)
>
> at com.ibm.io <http://com.ibm.io.async.resulthandler.ru/>
> .async.ResultHandler.runEventProcessingLoop(ResultHandler.java:743)
>
> at com.ibm.io <http://com.ibm.io.async.re/>
> .async.ResultHandler$2.run(ResultHandler.java:873)
>
> at com.ibm.ws <http://com.ibm.ws.util.th/>
> .util.ThreadPool$Worker.run(ThreadPool.java:1473)
>
>
>
> After Websphere recovers, from this point on until we redeploy the process
> in question to a new version, ODE attempts to route subsequent requests to a
> retired version.
>
>
>
> [11/20/08 14:29:26:968 CST] 00000046 SystemOut O 14:29:26,967 ERROR
> [BpelEngineImpl] Scheduled job failed; jobDetail={type=INVOKE_INTERNAL,
> pid={http://eclipse.org/bpel/sample}AdminYNProcess-195,<http://eclipse.org/bpel/sample%7DAdminYNProcess-195,>inmem=true,
mexid=4611686018427387977}
>
> org.apache.ode.bpel.runtime.InvalidProcessException: Process is retired.
>
> at
> org.apache.ode.bpel.engine.PartnerLinkMyRoleImpl.invokeNewInstance(PartnerLinkMyRoleImpl.java:173)
>
> at
> org.apache.ode.bpel.engine.BpelProcess.invokeProcess(BpelProcess.java:204)
>
> at
> org.apache.ode.bpel.engine.BpelProcess.handleWorkEvent(BpelProcess.java:372)
>
> at
> org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:326)
>
> at
> org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:373)
>
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:337)
>
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4$1.call(SimpleScheduler.java:336)
>
> at
> org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:174)
>
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:335)
>
> at
> org.apache.ode.scheduler.simple.SimpleScheduler$4.call(SimpleScheduler.java:332)
>
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690)
>
> at java.lang.Thread.run(Thread.java:810)
>
> Attached is the Java core dump file from the time of the original
> OutOfMemoryError, showing that it was caused by excessive garbage
> collection.  the VM this runs under allocates 1 Gig of memory on the heap.
>
> - Chris Taylor
>
>  ------------------------------
> *From:* Matthieu Riou <matthieu@offthelip.org>
> *To:* user@ode.apache.org
> *Cc:* Dave Cecchi <dave.cecchi@perficient.com>
> *Sent:* Thursday, October 16, 2008 10:40:57 AM
> *Subject:* Re: Client calling retired process?
>
> On Wed, Oct 15, 2008 at 9:27 AM, Chris Taylor <saursoor@yahoo.com> wrote:
>
> > Matthieu, Yes would appreciate if you could put that latest built war
> > somewhere.  We have attempted to build with buildr without success.
> >
>
> Here it is:
>
> http://people.apache.org/~mriou/ode-axis2-war-1.2.1-SNAPSHOT.war<http://people.apache.org/%7Emriou/ode-axis2-war-1.2.1-SNAPSHOT.war>
>
> Let me know how it goes.
>
> Cheers,
> Matthieu
>
>
> >
> >
> >
> > ----- Original Message ----
> > From: Matthieu Riou <matthieu@offthelip.org>
> > To: user@ode.apache.org
> > Sent: Monday, October 13, 2008 1:30:56 PM
> > Subject: Re: Client calling retired process?
> >
> > On Mon, Oct 13, 2008 at 10:55 AM, Chris Taylor <saursoor@yahoo.com>
> wrote:
> >
> > > Thanks, Matthieu.  Some background:
> > >
> > > we're running ODE 1.2 on Websphere 6.1, with Oracle 10g as the process
> > > store.
> > >
> > > This scenario consistently fails in the manner I described, but it
> seems
> > > only for certain processes.
> > >
> > > So, for example, if i have the following:
> > >
> > > ProcessA-20
> > > ProcessB-21
> > > ProcessC-22
> > >
> > > deployed in my environment, the scenario would be that something causes
> > > ProcessA-20 to hang - at which point it goes into recovery mode and
> > spawns
> > > an ode job to retry.  From this point on, new requests to (not just)
> > > ProcessA get routed to the now-retired ProcessA-19, but also new
> requests
> > to
> > > ProcessB get routed to (now-retired) ProcessB-20!  The weird thing is,
> > > ProcessC-22 is apparently unaffected.  It still gets calls legitimately
> > > routed to its latest versioned deployment, ProcessC-22.
> > >
> > > I do not know if this happens under other scenarios unrelated to
> > recovery.
> > > I think I just do not have enough data points yet to say.
> > >
> > >
> >
> > If you have a reproducible test scenario, it would be great if you could
> > try
> > it with the current stable branch. I've fixed something related to what
> > you're describing a couple of months ago. If doing a build is an issue
> for
> > you, I can upload the WAR to a public place.
> >
> > Thanks,
> > Matthieu
> >
> >
> > >
> > >
> > >
> > > ----- Original Message ----
> > > From: Matthieu Riou <matthieu@offthelip.org>
> > > To: user@ode.apache.org
> > > Sent: Monday, October 13, 2008 12:33:18 PM
> > > Subject: Re: Client calling retired process?
> > >
> > > On Mon, Oct 13, 2008 at 8:17 AM, Chris Taylor <saursoor@yahoo.com>
> > wrote:
> > >
> > > > Thanks, Alexis, but i'm no closer to fully understanding why this
> > occurs.
> > > > It happens periodically now almost everyday with different deployed
> > > > processes.  Although I don't understand it, I have done some research
> > > into
> > > > the behaviour.  Here's a scenario:
> > > >
> > > > we'll deploy ProcessA-19, then retire it with ProcessA-20 deployment.
> > At
> > > > some point it, or another, process will fail and attempt to go into
> > > recovery
> > > > mode (excuse me if I state this incorrectly),  at this point ODE will
> > > create
> > > > a scheduled job in an attempt to retry the service later.
> > > >
> > > > Here's where it gets screwy.  From then on, all new calls to ProcessA
> > > will
> > > > not route to ProcessA-20, but ode will attempt to route them to
> > > ProcessA-19,
> > > > which is of course retired. Ode does not recover from this.  It seems
> > the
> > > > only way to compensate is to redeploy ProcessA as ProcessA-21.  New
> > > requests
> > > > will then route correctly.
> > > >
> > > > Any idea here?
> > > >
> > >
> > > I'll have to ask a few more questions to narrow it down and make sure I
> > > understand correctly:
> > >
> > >  * Does the exact same scenario sometimes works and sometimes doesn't?
> > >  * Is it always happening in relation with recovery and retry or did
> you
> > > see it happen in other situations as well?
> > >  * Which version of ODE are you using? Have you tried with a recent 1.X
> > > branch?
> > >
> > > Thanks,
> > > Matthieu
> > >
> > >
> > > >
> > > >
> > > >
> > > > ----- Original Message ----
> > > > From: Alexis Midon <midon@intalio.com>
> > > > To: user@ode.apache.org
> > > > Sent: Wednesday, October 8, 2008 7:26:54 PM
> > > > Subject: Re: Client calling retired process?
> > > >
> > > > Hi Chris,
> > > >
> > > > No new executions can be started on a retired process, but running
> > > > instances
> > > > can still finish their job. [1]
> > > >
> > > > I'm not really familiar with this part of the code, but after looking
> > at
> > > > it,
> > > > it seems to me that the deployment of a new version is not atomic.
> > > Meaning
> > > > that a process could be flagged as retired while the creation of a
> new
> > > > instance is in progress, hence you're exception.
> > > >
> > > > does it make sense regarding your scenario? is it possible that the
> > > process
> > > > gets retired while messages are coming in?
> > > >
> > > > [1] further details here:
> > > > http://ode.apache.org/user-guide.html#UserGuide-Versioning
> > > >
> > > >
> > > >
> > > > On Wed, Oct 8, 2008 at 11:37 AM, Chris Taylor <saursoor@yahoo.com>
> > > wrote:
> > > >
> > > > > Okay, I've a deployment (called GetCodes) bundle that includes 5
> > > > > processes.  4 of the processes make calls to the fifth (it's an
> > > > abstraction
> > > > > layer of process business logic).  When I deploy this "GetCodes"
> > bundle
> > > > > using the DeploymentService utility, I can see an incremented
> > > deployment
> > > > > (say, GetCodes-40) alongside previous iterations.
> > > > >
> > > > > Occasionally, I'll have a client making soap calls to one of the
> > > > processes
> > > > > under this logical bundle that will fail with the following error:
> > > > >
> > > > > InvalidProcessException: Process is retired.
> > > > >
> > > > > In the logs, it's clear that ODE is directing this client call to
> > > > > GetCodes-39 - though the client isn't explicitly attempting to call
> a
> > > > > specific version (is that even possible?).  Any clue why some
> clients
> > > > > periodically - erroneously - are directed by ODE to a retired
> process
> > > > > version?
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message