lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Mateja <peter.mat...@gmail.com>
Subject Re: Lucene project announcement
Date Thu, 18 Nov 2010 19:54:09 GMT
Karell,

This is probably drifting way too much off topic, but you could consider
using index sharding by document hashing, with multiple cloud drives to take
advantage of multiple, simultaneous writers, and a MultiSearcher to
aggregate search results across all indices (with the potential deal breaker
that TFIDF would be isolated to each individual index, possibly affecting
your relevance scores.)  Of course, this all depends on the profile of what
you're trying to accomplish.

Peter Mateja
peter.mateja@gmail.com



On Thu, Nov 18, 2010 at 1:28 PM, Karell Ste-Marie
<stemarie@brain-bank.com>wrote:

> Alex,
>
> I stand corrected and offer my appologies - Lucene.NET will run on Azure:
>
> http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/eefa15f3-fef3-4ade-bc3a-ec7fb8e137d9
> It's not elegant but it will work using an Azure drive (equivalent of say a
> filesystem drive)
>
> My claim was based on the outdated fact that when we started using
> Lucene.NET a few years back it would need to create temporary files before
> inserting them into the indexes. It seems that this is now either no longer
> a problem or never was with Azure out of CTP.
>
> I will say that it seems that while it does work, only 1 Azure node may use
> an IndexWriter at any given time which means that only one node may gain
> write access at any time. An Azure Cloud Drive is only even accessible by 1
> worker process at a time since the Azure infrastructure will switch the
> "shared drive" to a readonly mode to other nodes. Therefore if you have say
> 2 applications using Lucene.NET at the same time, they will not be able to
> update it unless you place Lucene.NET on say a central node which the other
> 2 nodes will then access (client/server model). This is not a Lucene.NET
> limitation at this point, this is an Azure Drive limitation - however the
> fact that Lucene.NET needs the filesystem forces this limitation upon us.
>
>
>
> Karell Ste-Marie
> C.I.O. - BrainBank Inc
>
>
> -----Original Message-----
> From: Alex Thompson [mailto:pierogitus@hotmail.com]
> Sent: Thursday, November 18, 2010 1:20 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: RE: Lucene project announcement
>
> Karell,
> What would be the problem with Lucene.Net in Azure?
>
> -----Original Message-----
> From: Karell Ste-Marie [mailto:stemarie@brain-bank.com]
> Sent: Thursday, November 18, 2010 6:01 AM
> To: lucene-net-dev@lucene.apache.org
> Cc: dev@lucene.apache.org
> Subject: RE: Lucene project announcement
>
> I'm forced politely disagree with some of these thoughts, let me explain
> why:
>
> I order for this technique to be successful it seems that there is as much
> work being poured into the porting technique as there is with the port
> itself. To my point of view it does seem like this is double the work for
> benefits that are perhaps not as good as they could be (I am not saying in
> any way that Lucene.NET today is not good, it is quite good and is the
> results of great efforts from a lot of very dedicated people) since it
> follows a java-style design which is great of the java world, but perhaps
> not always optimal for the C# world. The project should be doing one thing,
> either:
> A) make a great Java to C# porting tool
> B) make a great search engine in C#
>
> As an example, it would be a hair-pulling experience to take Lucene.NET as
> it is today and use it on Microsoft Azure, an environment that is
> specifically designed for .NET applications.
>
> As I said before, besides using Lucene.NET itself I haven't contributed
> much and only in discussions - I haven't committed any code. However I will
> say this: I personally don't know nor care about the Java language just as
> I'm sure many of you don't care about Prolog. In order to help out, I feel
> that I need to be able to read and understand the Lucene version in order to
> make the same stuff happen in the Lucene.NET version. This means I have to
> be both a Java and C# developer at the same time?
>
> Mathematicians have been using math to explain algorithms for years, it is
> a universal language that is (to different levels) understood by all.
>
> How those functional algorithms are implemented in a imperative language
> makes no difference, so long as they are implemented and produce the
> intended result.
>
> I think that in the end, there should be at least 3 projects for Lucene:
> 1. The Lucene algorithms, in a platform-neutral language - let the search
> engine gurus implement how this should be done without having to worry about
> imperative programming and the hacks to get there - either a compiler or a
> manual model would be used to implement these algorithms 2. Lucene -
> Architecture of the project(s) - perhaps a lot of UML here in a format where
> it can be fed to quickly produce skeleton files 3.x. Lucene -
> language-specific versions
>
> As Grant points out it is up to the community to make a decision, then
> let's all get together and see if collectively a decision can be made.
>
> And for the record, I personally think that when an open source project has
> 3+ ports to the same language - there is a problem. What that problem is
> however, I won't venture in taking any guesses.
>
> I make these comments for the good of the project(s) and it is in no way my
> intention to offend anyone and I salute all work and effort done thus far,
> we would not be here were it not for everyone involved.
>
>
> Karell Ste-Marie
> C.I.O. - BrainBank Inc
>
> -----Original Message-----
> From: Alex Thompson [mailto:pierogitus@hotmail.com]
> Sent: Thursday, November 18, 2010 3:58 AM
> To: lucene-net-dev@lucene.apache.org
> Subject: RE: Lucene project announcement
>
> I don't think Lucene.Net staying a line-by-line port is craziness. We're
> not saying that Lucene.Net is the one true implementation and there can be
> no others. I see Lucene.Net as part of a spectrum of solutions.
>
> On one end of the spectrum is IKVM. If you want all the java lucene
> features immediately and the constraints of IKVM work for your scenario then
> great, off you go.
> Then there is Lucene.Net. This is good if IKVM doesn't work for you, you
> want short lag time behind java lucene (yes this needs improvement but we're
> working on it), and ability to read java lucene books/examples and apply
> that relatively seamlessly to your .NET code.
> Then on the other end of the spectrum is the forks
> (wrapper/extension/refactor etc.) that try to make things ideal for the .NET
> world.
>
> I think it's clear there is interest and support for both Lucene.Net and
> the forks. They should both exist and be complimentary, not competitive. The
> forks provide greater flexibility and greater exposure so more users and
> contributors can get involved. Lucene.Net provides the benefits listed above
> and provides an avenue for features to trickle down from java lucene to the
> forks.
>
> So bottom line there is no one-size-fits-all implementation. Lucene.Net (as
> a line-by-line) provides good value to a significant user base and (assuming
> we can optimize the porting) takes relatively little effort, so it is a
> useful part of the spectrum.
>
> Alex
>
> -----Original Message-----
> From: Andrew Busby [mailto:andrew.busby@aimstrategic.com]
> Sent: Wednesday, November 17, 2010 5:06 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: RE: Lucene project announcement
>
> Dear All,
>
> I have not yet spoken up on this issue yet but I felt that I could not sit
> in silence any more.
>
> I completely understand the stand point of the current development team and
> agree with the goals that they are setting out to achieve.
>
> Keep this index format compatible with the java version:  (check great
> work)
>
> Ensure that an search on a .net version will return the exact same results
> as the java version: (check great work)
>
> But,
>
> This is where the sense seems to end.   The "it must be a direct port of
> java stance" is completely craziness.
>
> I am not taking about the use of java conventions, I do not care if a have
> to get a value using something with a prefix of "get", I am talking about
> not making the best use of the tool at hand in this case .NET
>
> There also a clear indication that community  (maybe just the vocal ones)
> is saying we want to help be feel we can't or are not inclined to.
>
> Open Source software is about enjoyment and a project that basically says
> if you want to help just translate this code file from java to c#.
>
> Was this not a punishment at school? Translate this passage from Latin to
> English during your break time!
>
> This whole discussion started because the lucence.net project made an
> announcement that we need help, it is not working.  It now appears that we
> are going to continue to carry on using the same model, isn't the definition
> of insanity "continuing to do the same thing and expect a different result".
>
> If you are going to be getting an automated tool to do the work great, is a
> community even need? I doubt there will be much for anyone to get involved
> with, except fixing api conflicts between nunit and junit which can probably
> be scripted anyway.
>
> I have seen several people rush out and create their own forks with big
> promises (I know one of them is even being backed by codeproject.com)
> would it not be better to try to channel all of the energy of these people
> on to a branch, homed within apache which is the best place for it and see
> what they come up with?
>
> It is a no lose situation, the current trunk will continue as is but
> something great may appear that everyone is happy with and end this unrest.
>
> Before everyone shouts that people should be putting their efforts into the
> current truck version, it is just not going to happen.  You cannot jump up
> and down and say that we are in charge, you must commit our way (it says so
> on the web page) or your energy is not welcome.
>
>  I reality, watching the current events unfold, I cannot see much changing.
> Maybe one or two new committers but most people will just wait for the new
> automated tool to get setup, the java guys to fix the bugs and the tool to
> keep the versions up today or the current committers get really "pissed off"
> at continuingly coming under fire and give up (worse outcome possible).
>
> Having said all of that, I just want to say thank you to all of the
> lucence.net committer that have got us to this point. Just should be proud
> of what you have achieved and that actually do have a community that wants
> to see the project continue.
>
> Anyway just how I see things.
>
> Thanks,
>
> Andrew
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message