lucenenet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From michael herndon <mhern...@michaelherndon.com>
Subject Re: Paul: why are we using sbyte?
Date Tue, 08 Apr 2014 12:35:37 GMT
Try using a projection with .Select(o => (sbyte)o).ToArray();

public static sbyte[] GetBytes(this string str, string encoding)
{
      return
Encoding.GetEncoding(encoding)
.GetBytes(str)
.Select(o => (sbyte)o).ToArray();
}

http://dotnetfiddle.net/rfAOeB


On Mon, Apr 7, 2014 at 6:20 PM, Paul Irwin <pirwin@feature23.com> wrote:

> Yes, that works until you call Array.Copy, which Lucene does left and
> right. The call to Array.Copy will exception out if the array has been
> (what it considers) improperly cast like that.
>
> See all the references to System.arraycopy (Java equiv of Array.Copy) here:
>
> https://github.com/apache/lucene-solr/search?q=%22System.ArrayCopy%22&type=Code
>
>
>
>
> On Mon, Apr 7, 2014 at 6:11 PM, Itamar Syn-Hershko <itamar@code972.com
> >wrote:
>
> > You don't need to array-copy - a simple cast should work. Can you test
> this
> > as well?
> >
> > I have this and it seems to work:
> >
> > public static sbyte[] getBytes(this string str, string encoding)
> >         {
> >             return
> > (sbyte[])(Array)Encoding.GetEncoding(encoding).GetBytes(str);
> >         }
> >
> > --
> >
> > Itamar Syn-Hershko
> > http://code972.com | @synhershko <https://twitter.com/synhershko>
> > Freelance Developer & Consultant
> > Author of RavenDB in Action <http://manning.com/synhershko/>
> >
> >
> > On Tue, Apr 8, 2014 at 1:03 AM, Paul Irwin <pirwin@feature23.com> wrote:
> >
> > > There were a few spots where I added a byte[] version as well for
> > > convenience, but not everywhere. And you have to use BlockCopy... you
> get
> > > an exception if you try to Array.Copy a sbyte[] to byte[] or vice
> versa,
> > > even though the storage in memory is virtually identical.
> > >
> > > And feel free to use my code here for your project for porting Java to
> > C#,
> > > it does pascal casing and .NET naming conventions (I for interfaces,
> > etc).
> > > Uses Roslyn for C# generation.
> https://github.com/paulirwin/javatocsharp
> > >
> > >
> > > On Mon, Apr 7, 2014 at 5:04 PM, Itamar Syn-Hershko <itamar@code972.com
> > > >wrote:
> > >
> > > > I'm pretty sure there's no need to BlockCopy as the underlying binary
> > > > representation is the same. I'm just wondering whether we should
> change
> > > > this internally or find the places where it aches and provide a
> byte[]
> > > API
> > > > as well
> > > >
> > > > I'm working on porting the tests now - I think we better have all
> tests
> > > > ported and running (and passing) and then make this kind of decisions
> > > >
> > > > BTW it is now much easier to port tests, you basically copy-paste and
> > > > almost everything works. I'm also working with a friend to do some
> Java
> > > to
> > > > C# auto conversion, including camelCase to PascalCase by using
> > > Reflection.
> > > >
> > > > --
> > > >
> > > > Itamar Syn-Hershko
> > > > http://code972.com | @synhershko <https://twitter.com/synhershko>
> > > > Freelance Developer & Consultant
> > > > Author of RavenDB in Action <http://manning.com/synhershko/>
> > > >
> > > >
> > > > On Mon, Apr 7, 2014 at 11:49 PM, Paul Irwin <pirwin@feature23.com>
> > > wrote:
> > > >
> > > > > Hey Itamar,
> > > > >
> > > > > There was existing Lucene.net code that used sbyte, but one of the
> > > > things I
> > > > > ran into while porting is that Java was heavily using negative
> > > constants
> > > > > for bytes since their bytes were signed. Also IIRC there were some
> > > > > greater-than/less-than comparisons that would break if wrapped
> around
> > > to
> > > > be
> > > > > between 128 and 255. I tried going down the route of making
> > everything
> > > > byte
> > > > > instead of sbyte but kept running into incompatibilities. It was
> > easier
> > > > --
> > > > > and arguably more true to the Java code -- to keep it sbyte. Using
> > > > > Buffer.BlockCopy instead of the Java-equivalent Array.Copy works
to
> > > > > transform the sbyte arrays to byte arrays.
> > > > >
> > > > > I'm open to any suggestions, and please by all means have at trying
> > to
> > > > > change it, but it became a royal pain and I got it to work with
> sbyte
> > > so
> > > > I
> > > > > didn't pursue the matter further.
> > > > >
> > > > > Paul
> > > > >
> > > > >
> > > > > On Mon, Apr 7, 2014 at 4:41 PM, Itamar Syn-Hershko <
> > itamar@code972.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Paul,
> > > > > >
> > > > > > Please refer to this commit:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/lucene.net/commit/8c23317c905d79823fd168ede778820439c8b163
> > > > > >
> > > > > > Why have you moved to using sbyte?
> > > > > >
> > > > > > I know this is one of the differences between Java and .NET,
but
> we
> > > are
> > > > > on
> > > > > > .NET and should allow using byte.
> > > > > >
> > > > > > Having Field implementation to expect sbyte[] is almost useless
> as
> > > > > > Encoding.GetEncoding(encoding).GetBytes(str); for example returns
> > > > byte[].
> > > > > >
> > > > > > Can we change it back please so it uses byte everywhere,
> especially
> > > on
> > > > > the
> > > > > > public facing API?
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Itamar Syn-Hershko
> > > > > > http://code972.com | @synhershko <https://twitter.com/synhershko
> >
> > > > > > Freelance Developer & Consultant
> > > > > > Author of RavenDB in Action <http://manning.com/synhershko/>
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Paul Irwin
> > > > > Lead Software Engineer
> > > > > feature[23]
> > > > >
> > > > > Email: pirwin@feature23.com
> > > > > Cell: 863-698-9294
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Paul Irwin
> > > Lead Software Engineer
> > > feature[23]
> > >
> > > Email: pirwin@feature23.com
> > > Cell: 863-698-9294
> > >
> >
>
>
>
> --
>
> Paul Irwin
> Lead Software Engineer
> feature[23]
>
> Email: pirwin@feature23.com
> Cell: 863-698-9294
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message