ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <>
Subject Re: Including national characters int Tar Entry names into archive.
Date Tue, 26 May 2009 12:27:03 GMT
On 2009-05-19, aborisevich <> wrote:

> I have found the next bug using present
> package. Tar Archive was created on one system (for example Windows
> XP - default charset CP-1251). This tar archive contains TarEntries
> were named with using national characters like German umlauts. Than
> this archive file was copied on Linux system (default charset UTF-8)
> - after unpackin this archive file there - information was lost
> (TarEntries names were lost). There is possible solution for this
> problem.

While I agree that the current handling is unfortunate, your solution
(UTF-8 encoding file names) probably doesn't really help either.

There are various dialects of the tar file format and your solution
would create yet another one only extractable by Ant.

So far I haven't found a description of the latest POSIX tar format
but if you follow the public information of older formats they are
extraordinarily vague about file names that contain non-ASCII
characters.  There simply doesn't seem to be a common method to encode

If you take BSD's tar(5) man page
(e.g. <>)
you'll see in the Pax section that pax specifically puts non-ASCII
file names into a separate entry and the manual points out this could
hold non-ASCII characters (which sort of implies the "normal" name
part was ASCII only).

I don't think there is a real solution.  Implementing POSIX 2001
compliance in the tar package and making Ant use that (at the whim of
a user option) would be a long term plan, though.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message