ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Vodonosov <>
Subject Re: Including national characters int Tar Entry names into archive.
Date Tue, 30 Jun 2009 17:01:29 GMT

Hello Stefan,

On 2009-05-26, you wrote:

> On 2009-05-19, aborisevich <> wrote:
> > I have found the next bug using present
> > package. Tar Archive was created on one system (for example Windows
> > XP - default charset CP-1251). This tar archive contains TarEntries
> > were named with using national characters like German umlauts. Than
> > this archive file was copied on Linux system (default charset UTF-8)
> > - after unpackin this archive file there - information was lost
> > (TarEntries names were lost). There is possible solution for this
> > problem.
> While I agree that the current handling is unfortunate, your solution
> (UTF-8 encoding file names) probably doesn't really help either.
> There are various dialects of the tar file format and your solution
> would create yet another one only extractable by Ant.
> So far I haven't found a description of the latest POSIX tar format
> but if you follow the public information of older formats they are
> extraordinarily vague about file names that contain non-ASCII
> characters.  There simply doesn't seem to be a common method to encode
> them.

> If you take BSD's tar(5) man page
> (e.g. <>)
> you'll see in the Pax section that pax specifically puts non-ASCII
> file names into a separate entry and the manual points out this could
> hold non-ASCII characters (which sort of implies the "normal" name
> part was ASCII only).

> I don't think there is a real solution.  Implementing POSIX 2001
> compliance in the tar package and making Ant use that (at the whim of
> a user option) would be a long term plan, though.
> Stefan

Of course, hardcoding UTF-8 is not an option.

But why not allow user to specify file name encoding as a parameter?
(with will be current java encoding by default).

In this case default behavior will be unchanged, while interested
users have flexibility to use particular encoding.

Current implementation anyway allows national characters in file
names; but with the encoding parameter user will have more control
over how they are represented.

I am interested in this possibility and may provide a patch, it you
find the solution reasonable.

Best regards,
- Anton

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message