ode-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthieu Riou" <matth...@offthelip.org>
Subject Re: INTERNAL ERROR: No ENTRY for RESPONSE CHANNEL 69
Date Mon, 17 Dec 2007 19:52:38 GMT
On Dec 17, 2007 8:23 AM, René Bos <r.bos@pagelink.nl> wrote:

> Hello!!
>
> I did some research with one of my colleagues and found a strange thing. I
> turned on PostgreSQL logging and saw this:
> 2007-12-17 15:20:00 LOG:  execute <unnamed>: SELECT t0.CORRELATOR_ID,
> t1.MESSAGE_ROUTE_ID, t1.CORRELATION_KEY, t1.CORR_ID, t1.GROUP_ID,
> t1.ROUTE_INDEX, t1.PROCESS_INSTANCE_ID FROM ODE_CORRELATOR t0 INNER JOIN
> ODE_MESSAGE_ROUTE t1 ON t0.CORRELATOR_ID = t1.CORR_ID WHERE (
> t0.CORRELATOR_KEY = $1 AND t0.PROC_ID = $2) ORDER BY t0.CORRELATOR_ID ASC
> 2007-12-17 15:20:00 DETAIL:  parameters: $1 = '104.saveOrAanbieden', $2 =
> '51'
>
> When I executed this by myself I found out that it returned two rows (I
> displayed all rows from the both tables):
> 53;"104.saveOrAanbieden
> ";51;174;"103~nl.pagelink.torque.opm.ObjectenMut_20581##1188214622828283";"69";0;53;63
> 53;"104.saveOrAanbieden
> ";51;302;"103~nl.pagelink.torque.opm.ObjectenMut_20581##1188214622828283";"149";0;53;63
>
> It looks like old routes are not cleaned up, so when it reached findRoute
> in PartnerLinkMyRoleImpl it can return an old route, with a wrong channel.
> An other possibility  would be that when the process gets by the
> saveOrAanbieden receive the second time, it creates a new route, but a route
> already existed because it was not removed (and was not meant to be removed,
> I don't know exactly how this works).
>
> Please note that the problem appears only when it reached saveOrAanbieden
> or approveOrDisapprove for the second time (Because of the used while).
>
> In the following code fragment from PartnerLinkMyRoleImpl I see that it
> returns the first route found. Note that this is the Ode 1.1 source, not
> the current trunk (Because we use Ode 1.1)
>
> // Try to find a route for one of our keys.
> for (CorrelationKey key : keys) {
>        messageRoute = correlator.findRoute(key);
>        if (messageRoute != null) {
>                if (__log.isDebugEnabled()) {
>                        __log.debug("INPUTMSG: " + correlatorId + ": ckey "
> + key + " route is to " + messageRoute);
>                }
>                matchedKey = key;
>                break;
>        }
> }
>
> I hope you can see what the problem exactly is and give us some fix.
> Because the crashing processes (2 of them) are already running by a customer
> we did like to get a solution within a short time.
> Can you please tell us if we can do a temporary fix in the source so that
> we can make our customer happy again? We are thinging of something to find
> only the newest route and discard the previous ones. Maybe a order by in a
> query? We don't know where..
> Also I was thinking of removing the break in the code fragment above,
> could this fix the problem?
>

That would add more uncertainty. As a quick hack you could change
CorrelatorDAOImpl.findRoute to return the latest route (the one with the
highest groupId which is actually the channel id) when there are more than
one instead of the first one that matches the correlation.

However you should really check whether you see a delete happening on
ODE_MESSAGE_ROUTE at some point, both in your environment and the
environment where it breaks. You should never have two routes matching the
same correlation on a given correlator.

Matthieu


>
> Thanks!!
>
> René
>
> -----Original Message-----
> From: René Bos [mailto:r.bos@pagelink.nl]
> Sent: zaterdag 15 december 2007 13:43
> To: user@ode.apache.org
> Subject: RE: INTERNAL ERROR: No ENTRY for RESPONSE CHANNEL 69
>
> Yeah I also searched for a difference between the two configurations! But
> could not find anything. One difference was the Java versions, 5 and 6. But
> it don't work with both of them on the working machine. Another differnce is
> Win 2000 vs Win XP on the testmachine but that don't have to be a problem I
> think. Another thing is that the testmachine is a lot faster, more RAM and 2
> cores.
>
> The strange thing is that I copied the entire Tomcat folder from my
> machine to the testmachien (to the same location) and also copied the used
> databases. But then the problems still exists.
>
> I remember now something that could be usefull to. When the error comes
> up, in the message exchange table a UKNOWN_ENDPOINT status is set to the
> message. But after some time (more than half a hour) when I restarted
> tomcat, some of the UKNOWN_ENDPOINT's were processed. Not all. That happend
> to me some times..
>
> I'm not at work at the moment but I think we used (on both machines
> because they are copies):
> ode-axis2.db.mode=EXTERNAL
> ode-axis2.db.ext.dataSource=java:comp/env/jdbc/OdeDS
>
> And OdeDS is configured in Tomcat 5.5.23.
>
> At the moment I'm thinking of a timing problem or something. But I find it
> very strange!
> I have a database dump (SQL) with 3 processes deployed, but only with one
> process instance. And that process instances failed with the error. Maybe
> you can do something with that?
>
> Thanks!
>   Rene
>
> -----Oorspronkelijk bericht-----
> Van: matthieu.riou@gmail.com namens Matthieu Riou
> Verzonden: vr 14-12-2007 18:14
> Aan: user@ode.apache.org
> Onderwerp: Re: INTERNAL ERROR: No ENTRY for RESPONSE CHANNEL 69
>
> Sounds to me like a transaction manager problem, when channels can't be
> found it's usually a missing commit somewhere. Since it works on your
> machine and not on the others, and also that problems with unfound
> channels
> usually don't happen on normal configuration, I'd lean toward a
> configuration problem. Which leads me to the questions: what is the
> difference between you configuration and the configuration on your test
> machine? Postgres? Are you running in internal, embedded or external mode?
>
> Thanks,
> Matthieu
>
> On Dec 14, 2007 8:12 AM, René Bos <r.bos@pagelink.nl> wrote:
>
> >  Hello!
> >
> >
> >
> > I have a problem with two of my processes. I'm running Ode 1.1 with a
> > PostgreSQL database. I attached one of the processes so you can see the
> BPEL
> > code. I attached also the error.
> >
> >
> >
> > The error appears sometimes when I do the following calls:
> >
> > Initiate
> >
> > saveOrAanbieden with completionValue save
> >
> > saveOrAanbieden with completionValue aanbieden
> >
> >
> >
> > Or when I do:
> >
> > Initiate
> >
> > saveOrAanbieden with completionValue aanbieden
> >
> > approveOrDisapprove with completionValue disapprove
> >
> > saveOrAanbieden with completionValue aanbieden or save
> >
> >
> >
> > The strange thing is the problem does not exists on my local
> workstation,
> > but it does on another testing machine!
> >
> > On the testing machine it sometimes does show up, other times not.
> >
> >
> >
> > Can you tell me if something is fixed in this area? Or can you help me
> by
> > checking my process/reproduce it..
> >
> >
> >
> > Rene
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message