ode-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthieu Riou" <matth...@offthelip.org>
Subject Re: INTERNAL ERROR: No ENTRY for RESPONSE CHANNEL 69
Date Mon, 17 Dec 2007 20:45:39 GMT
Giving it some additional testing I've found the bug. We're deleting the
route from a collection and the relation is supposed to have cascade delete
in OpenJPA but I realized it was actually not deleted. The removal from the
collection is simply ignored.

So I've added an explicit delete and it's now properly issued against the
database (tracked the SQL logs). This has been committed in SVN but if you
want to retrofit it against your own codebase the according patch is at the
end of this e-mail.

Cheers,
Matthieu

Index: src/main/java/org/apache/ode/dao/jpa/CorrelatorDAOImpl.java
===================================================================
--- src/main/java/org/apache/ode/dao/jpa/CorrelatorDAOImpl.java (revision
601374)
+++ src/main/java/org/apache/ode/dao/jpa/CorrelatorDAOImpl.java (working
copy)
@@ -42,7 +42,7 @@

 @Entity
 @Table(name="ODE_CORRELATOR")
-public class CorrelatorDAOImpl implements CorrelatorDAO {
+public class CorrelatorDAOImpl extends OpenJPADAO implements CorrelatorDAO
{

     @Id @Column(name="CORRELATOR_ID")
     @GeneratedValue(strategy=GenerationType.AUTO)
@@ -109,9 +109,10 @@
     void removeLocalRoutes(String routeGroupId, ProcessInstanceDAO target)
{
         for (Iterator itr=_routes.iterator(); itr.hasNext(); ) {
             MessageRouteDAOImpl mr = (MessageRouteDAOImpl)itr.next();
-            if ( mr.getGroupId().equals(routeGroupId) &&
-                    mr.getTargetInstance().equals(target))
+            if ( mr.getGroupId().equals(routeGroupId) &&
mr.getTargetInstance().equals(target)) {
                 itr.remove();
+                getEM().remove(mr);
+            }
         }
     }
 }


On Dec 17, 2007 11:52 AM, Matthieu Riou <matthieu@offthelip.org> wrote:

> On Dec 17, 2007 8:23 AM, René Bos <r.bos@pagelink.nl> wrote:
>
> > Hello!!
> >
> > I did some research with one of my colleagues and found a strange thing.
> > I turned on PostgreSQL logging and saw this:
> > 2007-12-17 15:20:00 LOG:  execute <unnamed>: SELECT t0.CORRELATOR_ID,
> > t1.MESSAGE_ROUTE_ID , t1.CORRELATION_KEY, t1.CORR_ID, t1.GROUP_ID,
> > t1.ROUTE_INDEX, t1.PROCESS_INSTANCE_ID FROM ODE_CORRELATOR t0 INNER JOIN
> > ODE_MESSAGE_ROUTE t1 ON t0.CORRELATOR_ID = t1.CORR_ID WHERE (
> > t0.CORRELATOR_KEY = $1 AND t0.PROC_ID = $2) ORDER BY t0.CORRELATOR_IDASC
> > 2007-12-17 15:20:00 DETAIL:  parameters: $1 = '104.saveOrAanbieden', $2
> > = '51'
> >
> > When I executed this by myself I found out that it returned two rows (I
> > displayed all rows from the both tables):
> > 53;"104.saveOrAanbieden
> > ";51;174;"103~nl.pagelink.torque.opm.ObjectenMut_20581##1188214622828283";"69";0;53;63
> > 53;"104.saveOrAanbieden";51;302;"103~nl.pagelink.torque.opm.ObjectenMut_20581##1188214622828283";"149";0;53;63
> >
> >
> > It looks like old routes are not cleaned up, so when it reached
> > findRoute in PartnerLinkMyRoleImpl it can return an old route, with a wrong
> > channel.
> > An other possibility  would be that when the process gets by the
> > saveOrAanbieden receive the second time, it creates a new route, but a route
> > already existed because it was not removed (and was not meant to be removed,
> > I don't know exactly how this works).
> >
> > Please note that the problem appears only when it reached
> > saveOrAanbieden or approveOrDisapprove for the second time (Because of the
> > used while).
> >
> > In the following code fragment from PartnerLinkMyRoleImpl I see that it
> > returns the first route found. Note that this is the Ode 1.1 source, not
> > the current trunk (Because we use Ode 1.1)
> >
> > // Try to find a route for one of our keys.
> > for (CorrelationKey key : keys) {
> >        messageRoute = correlator.findRoute(key);
> >        if (messageRoute != null) {
> >                if (__log.isDebugEnabled()) {
> >                        __log.debug("INPUTMSG: " + correlatorId + ": ckey
> > " + key + " route is to " + messageRoute);
> >                }
> >                matchedKey = key;
> >                break;
> >        }
> > }
> >
> > I hope you can see what the problem exactly is and give us some fix.
> > Because the crashing processes (2 of them) are already running by a customer
> > we did like to get a solution within a short time.
> > Can you please tell us if we can do a temporary fix in the source so
> > that we can make our customer happy again? We are thinging of something to
> > find only the newest route and discard the previous ones. Maybe a order by
> > in a query? We don't know where..
> > Also I was thinking of removing the break in the code fragment above,
> > could this fix the problem?
> >
>
> That would add more uncertainty. As a quick hack you could change
> CorrelatorDAOImpl.findRoute to return the latest route (the one with the
> highest groupId which is actually the channel id) when there are more than
> one instead of the first one that matches the correlation.
>
> However you should really check whether you see a delete happening on
> ODE_MESSAGE_ROUTE at some point, both in your environment and the
> environment where it breaks. You should never have two routes matching the
> same correlation on a given correlator.
>
> Matthieu
>
>
> >
> > Thanks!!
> >
> > René
> >
> > -----Original Message-----
> > From: René Bos [mailto:r.bos@pagelink.nl]
> > Sent: zaterdag 15 december 2007 13:43
> > To: user@ode.apache.org
> > Subject: RE: INTERNAL ERROR: No ENTRY for RESPONSE CHANNEL 69
> >
> > Yeah I also searched for a difference between the two configurations!
> > But could not find anything. One difference was the Java versions, 5 and 6.
> > But it don't work with both of them on the working machine. Another
> > differnce is Win 2000 vs Win XP on the testmachine but that don't have to be
> > a problem I think. Another thing is that the testmachine is a lot faster,
> > more RAM and 2 cores.
> >
> > The strange thing is that I copied the entire Tomcat folder from my
> > machine to the testmachien (to the same location) and also copied the used
> > databases. But then the problems still exists.
> >
> > I remember now something that could be usefull to. When the error comes
> > up, in the message exchange table a UKNOWN_ENDPOINT status is set to the
> > message. But after some time (more than half a hour) when I restarted
> > tomcat, some of the UKNOWN_ENDPOINT's were processed. Not all. That happend
> > to me some times..
> >
> > I'm not at work at the moment but I think we used (on both machines
> > because they are copies):
> > ode-axis2.db.mode=EXTERNAL
> > ode-axis2.db.ext.dataSource=java:comp/env/jdbc/OdeDS
> >
> > And OdeDS is configured in Tomcat 5.5.23.
> >
> > At the moment I'm thinking of a timing problem or something. But I find
> > it very strange!
> > I have a database dump (SQL) with 3 processes deployed, but only with
> > one process instance. And that process instances failed with the error.
> > Maybe you can do something with that?
> >
> > Thanks!
> >   Rene
> >
> > -----Oorspronkelijk bericht-----
> > Van: matthieu.riou@gmail.com namens Matthieu Riou
> > Verzonden: vr 14-12-2007 18:14
> > Aan: user@ode.apache.org
> > Onderwerp: Re: INTERNAL ERROR: No ENTRY for RESPONSE CHANNEL 69
> >
> > Sounds to me like a transaction manager problem, when channels can't be
> > found it's usually a missing commit somewhere. Since it works on your
> > machine and not on the others, and also that problems with unfound
> > channels
> > usually don't happen on normal configuration, I'd lean toward a
> > configuration problem. Which leads me to the questions: what is the
> > difference between you configuration and the configuration on your test
> > machine? Postgres? Are you running in internal, embedded or external
> > mode?
> >
> > Thanks,
> > Matthieu
> >
> > On Dec 14, 2007 8:12 AM, René Bos < r.bos@pagelink.nl> wrote:
> >
> > >  Hello!
> > >
> > >
> > >
> > > I have a problem with two of my processes. I'm running Ode 1.1 with a
> > > PostgreSQL database. I attached one of the processes so you can see
> > the BPEL
> > > code. I attached also the error.
> > >
> > >
> > >
> > > The error appears sometimes when I do the following calls:
> > >
> > > Initiate
> > >
> > > saveOrAanbieden with completionValue save
> > >
> > > saveOrAanbieden with completionValue aanbieden
> > >
> > >
> > >
> > > Or when I do:
> > >
> > > Initiate
> > >
> > > saveOrAanbieden with completionValue aanbieden
> > >
> > > approveOrDisapprove with completionValue disapprove
> > >
> > > saveOrAanbieden with completionValue aanbieden or save
> > >
> > >
> > >
> > > The strange thing is the problem does not exists on my local
> > workstation,
> > > but it does on another testing machine!
> > >
> > > On the testing machine it sometimes does show up, other times not.
> > >
> > >
> > >
> > > Can you tell me if something is fixed in this area? Or can you help me
> > by
> > > checking my process/reproduce it..
> > >
> > >
> > >
> > > Rene
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message