groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keegan Witt <keeganw...@gmail.com>
Subject Re: UTF16 BOM in new PrintWriter() vs withPrintWriter()
Date Tue, 09 Jun 2015 12:11:23 GMT
That's only available in Java 7.  Isn't Groovy still targeting 1.6 for the
non-indy version?

-Keegan
On Jun 9, 2015 7:56 AM, "Guillaume Laforge" <glaforge@gmail.com> wrote:

> Well spotted!
>
> You could also compare with the StandardCharset, instead of going through
> the name comparison:
>
> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/StandardCharsets.html
>
> 2015-06-09 13:49 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>
>> No, it's a Groovy bug.
>>
>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream
stream) throws IOException {
>>     if ("UTF-16BE".equals(charset)) {
>>         writeUtf16Bom(stream, true);
>>     } else if ("UTF-16LE".equals(charset)) {
>>         writeUtf16Bom(stream, false);
>>     }
>> }
>>
>> should be
>>
>> private static void writeUTF16BomIfRequired(final String charset, final OutputStream
stream) throws IOException {
>>     if ("UTF-16BE".equals(Charset.forName(charset).name())) {
>>         writeUtf16Bom(stream, true);
>>     } else if ("UTF-16LE".equals(Charset.forName(charset).name())) {
>>         writeUtf16Bom(stream, false);
>>     }
>> }
>>
>> in org.codehaus.groovy.runtime.ResourceGroovyMethods.  We'll probably
>> want to fix that regardless of what we decide on the *withPrintWriter*
>> question.  I'll open a Jira and a PR.
>>
>> -Keegan
>>
>>
>>
>> On Tue, Jun 9, 2015 at 3:21 AM, Guillaume Laforge <glaforge@gmail.com>
>> wrote:
>>
>>> From Groovy's point of view (ie. when you're coding in Groovy), the BOM
>>> is automatically discarded when you use one of our reader methods
>>> (withReader, etc), so it's transparent whether the BOM is here or not.
>>>
>>> I tend to think that having the BOM always is a good thing (I even
>>> thought that was mandatory), but Groovy should guess the endianness
>>> regardless anyway.
>>>
>>> Happy to hear what others think too about all this though.
>>>
>>> Guillaume
>>>
>>>
>>> 2015-06-08 23:20 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>
>>>> The code as-is today writes the BOM regardless of platform.  I just
>>>> tested in Linux with the same results.  I think there are 2 parts to the
>>>> question of "what's the correct behavior?"
>>>>
>>>> 1.  Should the BOM be written at all, particularly when the platform is
>>>> Windows?
>>>> 2.  Should the behavior of *withPrintWriter* differ (even if the
>>>> difference is to be smarter) from the behavior of *new PrintWriter*?
>>>>
>>>> *Discussion*
>>>> 1.  Strictly speaking, yes.  Because RFC 2781
>>>> <http://tools.ietf.org/html/rfc2781> states in section 4.3 to assume
>>>> big endian if there is no BOM.  However, in practice, many applications
>>>> disregard the RFC and assume little-endian because that's what Windows
>>>> does
>>>> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd374101%28v=vs.85%29.aspx>.
>>>> Because of this, the behavior could be changed so that when writing
>>>> UTF-16LE on Windows, it doesn't write the BOM.  But in my opinion, it's
>>>> best practice to always write a BOM when working with UTF-16, and Java
>>>> should have done this in their implementation of their PrintWriter.
>>>>
>>>> 2.  This is a tough one.  Arguably, *withPrintWriter* is doing the
>>>> smarter, more correct behavior, but the typical user would assume this is
>>>> just a shorthand convenience for newing up a PrintWriter (I certainly
>>>> did).  So the question is, is it better to just document this difference
in
>>>> the GroovyDoc?  Or to change the behavior to be closer to Java?  And if the
>>>> latter, what breakages would that cause within Groovy itself?  Making that
>>>> change could break folks in production, because they could rely on that BOM
>>>> being there, in cases for example where the file is created on Windows, but
>>>> then processed on Linux or when working with a third party library that is
>>>> more picky about the presence of a BOM.
>>>>
>>>> -Keegan
>>>>
>>>> On Mon, Jun 8, 2015 at 4:32 PM, Guillaume Laforge <glaforge@gmail.com>
>>>> wrote:
>>>>
>>>>> Now... is it what should be done or not is the good question to ask :-)
>>>>> Does Windows manages to open UTF-16 files without BOMs?
>>>>>
>>>>> 2015-06-08 22:17 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>
>>>>>> I forgot to mention that.  Yes, I ran the test mentioned in Windows.
>>>>>>
>>>>>> On Mon, Jun 8, 2015 at 3:54 PM, Guillaume Laforge <glaforge@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> That's a good question.
>>>>>>> I guess this is happening on Windows? (I haven't tried here,
since
>>>>>>> I'm on OS X)
>>>>>>> I think BOMs were mandatory in text files on Windows.
>>>>>>>
>>>>>>> 2015-06-08 17:53 GMT+02:00 Keegan Witt <keeganwitt@gmail.com>:
>>>>>>>
>>>>>>>> I've always taken a perverse pleasure in character encoding
>>>>>>>> problems.  I was intrigued by this SO question
>>>>>>>> <http://stackoverflow.com/questions/30538461/why-groovy-file-write-with-utf-16le-produce-bom-char>
on
>>>>>>>> UTF 16 BOMs in Java vs Groovy.
>>>>>>>>
>>>>>>>> It appears using withPrintWriter(charset) produces a BOM
whereas new
>>>>>>>> PrintWriter(file, charset) does not.  As demonstrated here:
>>>>>>>>
>>>>>>>> File file = new File("tmp.txt")try {
>>>>>>>>     String text = " "
>>>>>>>>     String charset = "UTF-16LE"
>>>>>>>>
>>>>>>>>     file.withPrintWriter(charset) { it << text }
>>>>>>>>     println "withPrintWriter"
>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it)
}
>>>>>>>>
>>>>>>>>     PrintWriter w = new PrintWriter(file, charset)
>>>>>>>>     w.print(text)
>>>>>>>>     w.close()
>>>>>>>>     println "\n\nnew PrintWriter"
>>>>>>>>     file.getBytes().each { System.out.format("%02x ", it)
}} finally {
>>>>>>>>     file.delete()}
>>>>>>>>
>>>>>>>> Outputs
>>>>>>>>
>>>>>>>> withPrintWriter
>>>>>>>> ff fe 20 00
>>>>>>>>
>>>>>>>> new PrintWriter
>>>>>>>> 20 00
>>>>>>>>
>>>>>>>>
>>>>>>>> Is this difference in behavior intentional?  It seems kinda
odd to
>>>>>>>> me.
>>>>>>>>
>>>>>>>> -Keegan
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Guillaume Laforge
>>>>>>> Groovy Project Manager
>>>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>>>
>>>>>>> Blog: http://glaforge.appspot.com/
>>>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guillaume Laforge
>>>>> Groovy Project Manager
>>>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>>>
>>>>> Blog: http://glaforge.appspot.com/
>>>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Guillaume Laforge
>>> Groovy Project Manager
>>> Product Ninja & Advocate at Restlet <http://restlet.com>
>>>
>>> Blog: http://glaforge.appspot.com/
>>> Social: @glaforge <http://twitter.com/glaforge> / Google+
>>> <https://plus.google.com/u/0/114130972232398734985/posts>
>>>
>>
>>
>
>
> --
> Guillaume Laforge
> Groovy Project Manager
> Product Ninja & Advocate at Restlet <http://restlet.com>
>
> Blog: http://glaforge.appspot.com/
> Social: @glaforge <http://twitter.com/glaforge> / Google+
> <https://plus.google.com/u/0/114130972232398734985/posts>
>

Mime
View raw message