lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Gramana <>
Subject Re: Reviving DistributedSearch
Date Mon, 20 Aug 2012 16:37:51 GMT

Thanks for the feedback. You may not have been looking for this long and complex of a reply,
but I wanted to share my thinking and validate some assumptions with the group before I get
too much further down the road.

Let me walk you through where my thinking is at, and see what you think.

First, some observations:

* MultiSearcher and RemoteSearchable is deprecated starting in Java Lucene starting with 3.1
and for good reason. Not only does it have some bugs related to scoring, etc., i

* IndexReader, as the service interface, results in excessive network chatter. Query, in my
mind, sounds like the right abstraction. Parse an incoming query request once, distribute
the query objects to core instances, then merge the results. IndexSearcher in 3.3 implements
a merge TopDocs method, so this approach seems promising. This would also enable each core
to use a request queue to handle concurrent requests. Query, Filter, etc., have been marked
serlializable for a long time.

* I like Solr's separated Web/Core approach. The remoting-based approaches buy into a few
of the 8 fallacies of distributed computing. The web/core approach, not so much.

* Java-Lucene has recently delegated distributed search to Solr (and ElasticSearch, Katta,
IndexTank, etc) in v3.1 and later. This says (a) distributed search is hard, and (b) requires
solving problems that are beyond the scope of Lucene. Unfortunately, this highlights the lack
of a .NET Solr analog.

These observations lead me to the following questions:

1. Jeez, it would be nice if we had a .NET Solr-ish project. Kidding, kidding. Kind of.
2. Should distributed search live in Contribs, or in another project altogether?
3. Is there value in an in-between solution for #2? Perhaps something like a Solr Core only
implementation, or a reference implementation that tackles a limited set of requirements?

I should disclose here that my interest in this code is part of a broader project that I'm
running at my place of employment. This project will be released as open source once it hits
minimum viable product (it's not proprietary, just early in development). This project is
tightly integrated with ServiceStack. It is also currently self-hosted, with an IIS host coming

That said, Web API is very ServiceStack-like, though ServiceStack has some additional benefits:
.NET 3.5 and Mono support, out of the box protocol buffers integration (and around another
two dozen serialization formats, including a very fast JSON serializer),  nice cache and auth
interfaces, and a simple plugin architecture. It's also based on the request/response pattern
using strongly-typed DTO's, which I am a big proponent of. My project leverages these features
quite a bit.

I anticipate following a model similar to Solr Web/Core. The biggest questions I'm currently
wrestling with are #3 and #2. Should the core be able to stand alone in a limited capacity?
If so, does it makes sense for it to live in Contribs? I would naturally prefer to use ServiceStack
to build it, consistent with the rest of my project. I would also take advantage of its protocol
buffers support to improve performance, since this would be a peer-to-peer API and not client-server
API. However, if a standalone core were to live in contribs I would want to make sure most
people have a comfort level with that dependency.

When I think of all of the features that need to be implemented in a core, like configuration
and authentication, I start heading back towards distributed search living outside of Contribs.

- Zack

On Aug 17, 2012, at 8:43 PM, Nicholas Paldino [.NET/C# MVP] <>

> Zach,
> Just a suggestion, maybe going the web API route and self hosting (which allows for something
more RESTful and  with good bindings for JSON, XML, et al):
> - Nick

View raw message