From lucene-net-dev-return-1975-apmail-incubator-lucene-net-dev-archive=incubator.apache.org@incubator.apache.org Mon Apr 27 07:48:35 2009 Return-Path: Delivered-To: apmail-incubator-lucene-net-dev-archive@minotaur.apache.org Received: (qmail 73935 invoked from network); 27 Apr 2009 07:48:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Apr 2009 07:48:35 -0000 Received: (qmail 19347 invoked by uid 500); 27 Apr 2009 07:48:34 -0000 Delivered-To: apmail-incubator-lucene-net-dev-archive@incubator.apache.org Received: (qmail 19314 invoked by uid 500); 27 Apr 2009 07:48:34 -0000 Mailing-List: contact lucene-net-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucene-net-dev@incubator.apache.org Delivered-To: mailing list lucene-net-dev@incubator.apache.org Received: (qmail 19304 invoked by uid 99); 27 Apr 2009 07:48:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2009 07:48:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of digydigy@gmail.com designates 209.85.218.219 as permitted sender) Received: from [209.85.218.219] (HELO mail-bw0-f219.google.com) (209.85.218.219) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2009 07:48:23 +0000 Received: by bwz19 with SMTP id 19so2155890bwz.12 for ; Mon, 27 Apr 2009 00:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=DNDX6EioJLeSARo2W+7PZAWvk6h/9NzIp1Fbvx7kfLU=; b=DvW6o6WgQpqjuCjMwg0EB0cmWy1pnfeXfOXhMEpuitHI+wviyWlewoVZvShAzS3wE+ 3LaMnU8Qe2tn6u91SeJ1/PWvvXVE3IpU+/aA4t50vwgUOe/g6x19rMGzS8wNMbfpItHk uhlXmINYyfz4MveG0l0GQyooKzpWNcUCx6QR4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=A5AiWp7Z+uO53uStr60uUZwtk33M9lcmSoVHcJFGm/AqUfRq2Po2FXk0xybJ9MlXJQ CdM32OGOasx/ZrJt7BjVG3U1R6wm9zSvk+RtiXOH3qGemAGGLQmsDpvMSnfCnhZBTjjR gmQb1OxS/giidqF/s5Sktd/HBzO9rs2J3lr58= MIME-Version: 1.0 Received: by 10.223.105.139 with SMTP id t11mr1530544fao.11.1240818482183; Mon, 27 Apr 2009 00:48:02 -0700 (PDT) In-Reply-To: <103c59c90904262249i1a77e31cu9207c189fb1a013f@mail.gmail.com> References: <1214616587.1232244779884.JavaMail.jira@brutus> <727489773.1239233292903.JavaMail.jira@brutus> <80CF7FF9C81BC64FB8453D113F49D22C18109442D9@USWAL-MXVS1.amer.thermo.com> <003c01c9c50a$0c097120$241c5360$@com> <103c59c90904262249i1a77e31cu9207c189fb1a013f@mail.gmail.com> Date: Mon, 27 Apr 2009 10:48:02 +0300 Message-ID: <5937015f0904270048p7ec68ee2rd5e04bdd3abff513@mail.gmail.com> Subject: Re: Luke-0.9.x cannot open index files From: digy digy To: lucene-net-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=00504502b079dc804c046884901b X-Virus-Checked: Checked by ClamAV on apache.org --00504502b079dc804c046884901b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It is not a bug of Lucene.Net and as my sample code shows, Lucene.Net works well with chinese field names. I think, it is a bug in Luke. DIGY On Mon, Apr 27, 2009 at 8:49 AM, Floyd Wu wrote: > Hi Digy, > Thanks for your help. > But if chinese field name is the problem, will it be "fix" in Lucene.Net = or > how can I avoid this problem. > > Chinese field name is by design and probably not avoidable. > > Floyd > > 2009/4/25 Digy > > > I think, I found the bug. Here is the dump of the original index: > > > > > > > > NUMDOCS: 3 > > > > MAXDOCS: 7 > > > > DELETED(0): True > > > > DELETED(1): True > > > > DELETED(2): False > > > > DELETED(3): True > > > > DELETED(4): True > > > > DELETED(5): False > > > > DELETED(6): False > > > > TERM(0): _l_activationdatetime:552877632000000000 > > > > TERM(1): _l_author:admin > > > > TERM(2): _l_bookmarkcount:0 > > > > TERM(3): _l_clix:0 > > > > TERM(4): _l_clix:1 > > > > TERM(5): _l_creationdatetime:633427319866778624 > > > > TERM(6): _l_creationdatetime:633427324812559872 > > > > TERM(7): _l_creationdatetime:633760609388437504 > > > > TERM(8): _l_deactivationdatetime:155377824000000000 > > > > TERM(9): _l_deactivationdatetime:155378687999969792 > > > > TERM(10): _l_document_class:1 > > > > TERM(11): _l_document_class:98305 > > > > TERM(12): _l_folder:163841 > > > > TERM(13): _l_folder:163843 > > > > TERM(14): _l_hidden:aaa > > > > TERM(15): _l_last_modified_datetime:633427319866778624 > > > > TERM(16): _l_last_modified_datetime:633427324812559872 > > > > TERM(17): _l_last_modified_datetime:633760609388437504 > > > > TERM(18): _l_meta:abc > > > > TERM(19): _l_meta:abc.ppt > > > > TERM(20): _l_meta:ddx > > > > TERM(21): _l_meta:doc > > > > TERM(22): _l_meta:xyz > > > > TERM(23): _l_meta:=E5=90=8D > > > > TERM(24): _l_meta:=E5=95=8F > > > > TERM(25): _l_meta:=E6=9C=89 > > > > TERM(26): _l_meta:=E6=AA=94 > > > > TERM(27): _l_meta:=E6=B8=AC > > > > TERM(28): _l_meta:=E7=9C=8B > > > > TERM(29): _l_meta:=E8=A9=A6 > > > > TERM(30): _l_meta:=E9=82=84 > > > > TERM(31): _l_meta:=E9=A1=8C > > > > TERM(32): _l_parentdocument:196609 > > > > TERM(33): _l_parentdocument:327681 > > > > TERM(34): _l_parentdocument:557057 > > > > TERM(35): _l_ratingavg:0 > > > > TERM(36): _l_ratingmedian:0 > > > > TERM(37): _l_ratingstdev:0 > > > > TERM(38): _l_ratingsum:0 > > > > TERM(39): _l_read_permission:admin > > > > TERM(40): _l_rootdocument:196609 > > > > TERM(41): _l_rootdocument:327681 > > > > TERM(42): _l_rootdocument:557057 > > > > TERM(43): _l_state:0 > > > > TERM(44): _l_state:2 > > > > TERM(45): _l_summary:2123456789 > > > > TERM(46): _l_summary:abc > > > > TERM(47): _l_summary:abc.ppt > > > > TERM(48): _l_summary:ddx > > > > TERM(49): _l_summary:doc > > > > TERM(50): _l_summary:xyz > > > > TERM(51): _l_summary:=E6=9C=89 > > > > TERM(52): _l_summary:=E9=82=84 > > > > TERM(53): _l_title:123 > > > > TERM(54): _l_title:class > > > > TERM(55): _l_title:default > > > > TERM(56): _l_title:document > > > > TERM(57): _l_title:=E5=90=8D > > > > TERM(58): _l_title:=E5=95=8F > > > > TERM(59): _l_title:=E6=AA=94 > > > > TERM(60): _l_title:=E6=B8=AC > > > > TERM(61): _l_title:=E7=9C=8B > > > > TERM(62): _l_title:=E8=A9=A6 > > > > TERM(63): _l_title:=E9=A1=8C > > > > TERM(64): _l_unique_key:196609 > > > > TERM(65): _l_unique_key:327681 > > > > TERM(66): _l_unique_key:557057 > > > > TERM(67): _l_version:1 > > > > TERM(68): =E4=BD=9C=E8=80=85:123 > > > > TERM(69): =E6=91=98=E8=A6=81:2123456789 > > > > TERM(70): =E6=91=98=E8=A6=81:abc > > > > TERM(71): =E6=91=98=E8=A6=81:abc.ppt > > > > TERM(72): =E6=91=98=E8=A6=81:ddx > > > > TERM(73): =E6=91=98=E8=A6=81:doc > > > > TERM(74): =E6=91=98=E8=A6=81:xyz > > > > TERM(75): =E6=91=98=E8=A6=81:=E6=9C=89 > > > > TERM(76): =E6=91=98=E8=A6=81:=E9=82=84 > > > > TERM(77): =E6=A8=99=E9=A1=8C:123 > > > > TERM(78): =E6=A8=99=E9=A1=8C:class > > > > TERM(79): =E6=A8=99=E9=A1=8C:default > > > > TERM(80): =E6=A8=99=E9=A1=8C:document > > > > TERM(81): =E6=A8=99=E9=A1=8C:=E5=90=8D > > > > TERM(82): =E6=A8=99=E9=A1=8C:=E5=95=8F > > > > TERM(83): =E6=A8=99=E9=A1=8C:=E6=AA=94 > > > > TERM(84): =E6=A8=99=E9=A1=8C:=E6=B8=AC > > > > TERM(85): =E6=A8=99=E9=A1=8C:=E7=9C=8B > > > > TERM(86): =E6=A8=99=E9=A1=8C:=E8=A9=A6 > > > > TERM(87): =E6=A8=99=E9=A1=8C:=E9=A1=8C > > > > TERM(88): =E9=97=9C=E9=8D=B5=E8=A9=9E:123 > > > > > > > > > > > > > > > > And here is a sample code: read docs from original index and then write > to > > an new one. > > > > > > > > void CreateNewIndex(string OrgIndex) > > > > { > > > > IndexReader reader =3D IndexReader.Open(OrgIndex); > > > > IndexWriter writer =3D new IndexWriter("Floyd", new > > Lucene.Net.Analysis.WhitespaceAnalyzer(),true); > > > > > > > > for (int i =3D 0; i < reader.MaxDoc(); i++) > > > > { > > > > if (reader.IsDeleted(i) =3D=3D true) continue; > > > > > > > > Lucene.Net.Documents.Document orgDoc =3D > reader.Document(i); > > > > System.Collections.IList fields =3D orgDoc.GetFields(); > > > > > > > > Lucene.Net.Documents.Document newDoc =3D new Document(); > > > > foreach (Lucene.Net.Documents.Field field in fields) > > > > { > > > > Lucene.Net.Documents.Field newField =3D new Field( > > > > System.Convert.ToBase64String( > > System.Text.Encoding.UTF8.GetBytes(field.Name())), //=C3=A7 > > > > //field.Name(), //=C3=A7 > > > > field.StringValue(), > > > > field.IsStored() ? > > Lucene.Net.Documents.Field.Store.YES : > Lucene.Net.Documents.Field.Store.NO< > http://lucene.net.documents.field.store.no/> > > , > > > > field.IsTokenized() ? > > Lucene.Net.Documents.Field.Index.TOKENIZED : > > Lucene.Net.Documents.Field.Index.UN_TOKENIZED); > > > > > > > > newDoc.Add(newField); > > > > } > > > > writer.AddDocument(newDoc); > > > > } > > > > > > > > writer.Close(); > > > > reader.Close(); > > > > } > > > > > > > > > > > > If some field names are chinese, then Luke returns =E2=80=9Cread past E= OF=E2=80=9D. But > if > > those field names are replaced with non-chinese names, then it works. > > > > > > > > DIGY > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com] > > Sent: Friday, April 24, 2009 8:53 PM > > To: lucene-net-dev@incubator.apache.org > > Subject: Luke-0.9.x cannot open index files > > > > > > > > > > > > Digy, > > > > > > > > Some additional information from the discussion on the lucene-net-user > list > > with Floyd Wu. > > > > > > > > > > > > I ran some further tests using Java Lucene 2.3.2 and JDK 1.5. > > > > > > > > The Java equivalents of the two small test applications I use to inspec= t > an > > index and compact it, function identically to the .NET versions (that > were > > built with VS2005 and Lucene.NET 2.3.1). > > > > > > > > That Luke cannot open the index appears to be a problem within Luke. > > > > Even if Floyd's index contains some odd entries, Java Lucene 2.3.2 does > not > > flag the index as corrupt; and both the Java and .NET versions report t= he > > same index content before and after the optimize operation. > > > > > > > > > > > > -- Neal > > > > > > > > ************************************************************** > > > > Neal Granroth > > > > Software Engineer, Molecular Spectroscopy > > > > Thermo Fisher Scientific > > > > 5225 Verona Road, Madison, WI 53711 > > > > > > > > neal.granroth@thermofisher.com > > > > Tel: 608-276-5645 > > > > Fax: 608-276-6328 > > > > > > > > www.thermofisher.com > > > > > > > > WORLDWIDE CONFIDENTIALITY NOTE: Dissemination, distribution or copying = of > > this e-mail or the information herein by anyone other than the intended > > recipient, or an employee or agent of a system responsible for deliveri= ng > > the message to the intended recipient, is prohibited. If you are not th= e > > intended recipient, please inform the sender and delete all copies. > > > > > > > > -----Original Message----- > > > > From: Digy (JIRA) [mailto:jira@apache.org] > > > > Sent: Wednesday, April 08, 2009 6:28 PM > > > > To: lucene-net-dev@incubator.apache.org > > > > Subject: [jira] Commented: (LUCENENET-169) Changes to make Lucene.NET > > compatible with ASP.NET Medium Trust Level, in hostin= g > > environments (like GoDaddy...) > > > > > > > > > > > > [ > > > https://issues.apache.org/jira/browse/LUCENENET-169?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D12697= 335#action_12697335 > ] > > > > > > > > Digy commented on LUCENENET-169: > > > > -------------------------------- > > > > > > > > Although you can overcome all of them somehow; > > > > > > > > * controlling the the lifetime of IndexWriter/IndexReader in a naturall= y > > manner, > > > > * reopening the IndexReader only when needed using (for ex) > > FileSystemWatcher, > > > > * providing a separation between data & bussiness layer, > > > > * providing other apps an interface that may want to write its own user > > interface, > > > > * accessing a single search service from different web apps/from load > > balanced web servers > > > > * controlling the lifetime of searching/indexing code (without being > > effected by the restart of the IIS processes automatically when some > memory > > limit is exceeded (for ex.) ) > > > > * Ability to access some system resources that can be restricted by IIS > > > > etc. > > > > make me think a separete search service is a better idea.But at last, i= t > is > > a design decision of you. > > > > (Think, A WebApp+Solr in Java world) > > > > > > > > > > > > DIGY > > > > > > > > > Changes to make Lucene.NET compatible with ASP.NET > Medium Trust Level, in hosting environments (like GoDaddy...) > > > > > > > > -------------------------------------------------------------------------= ---------------------------------------- > > > > > > > > > > Key: LUCENENET-169 > > > > > URL: > https://issues.apache.org/jira/browse/LUCENENET-169 > > > > > Project: Lucene.Net > > > > > Issue Type: Improvement > > > > > Environment: ASP.NET > > > > > Reporter: Corey Trager > > > > > Attachments: FSDirectory.patch > > > > > > > > > > > > > > > Microsoft has a configuration file for shared hosting for what they > call > > "Medium Trust". There are a couple places in FSDirectory.cs that > violate > > the restrictions of Medium Trust, but I coded workarounds, shown below. > > > > > #1) > > > > > // Corey Trager, Oct 2008: Commented call to GetTempPath to workaroun= d > > permission restrictions at shared host. > > > > > // LOCK_DIR isn't used anyway. > > > > > public static readonly System.String LOCK_DIR =3D null; // > > SupportClass.AppSettings.Get("Lucene.Net.lockDir", > > System.IO.Path.GetTempPath()); > > > > > #2) > > > > > /// Returns an array of strings, one for each > > Lucene index file in the directory. > > > > > public override System.String[] List() > > > > > { > > > > > /* Changes by Corey Trager, Oct 2008, to workaround permission > > restrictions at shared host */ > > > > > System.IO.DirectoryInfo dir =3D new > > System.IO.DirectoryInfo(directory.FullName); > > > > > System.IO.FileInfo[] files =3D dir.GetFiles(); > > > > > string[] list =3D new string[files.Length]; > > > > > for (int i =3D 0; i < files.Length; i++) > > > > > { > > > > > list[i] =3D files[i].Name; > > > > > } > > > > > return list; > > > > > /* end of changes */ > > > > > // System.String[] files =3D > > SupportClass.FileSupport.GetLuceneIndexFiles(directory.FullName, > > IndexFileNameFilter.GetFilter()); > > > > > // for (int i =3D 0; i < files.Length; i++) > > > > > // { > > > > > // System.IO.FileInfo fi =3D new > > System.IO.FileInfo(files[i]); > > > > > // files[i] =3D fi.Name; > > > > > // } > > > > > // return files; > > > > > } > > > > > > > > -- > > > > This message is automatically generated by JIRA. > > > > - > > > > You can reply to this email to add a comment to the issue online. > > > > > --00504502b079dc804c046884901b--