portals-jetspeed-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aurelien Pernoud" <apern...@sopragroup.com>
Subject RE: possible character encoding bug
Date Thu, 22 May 2003 13:40:55 GMT

In fact I've encountered troubles in my app too with encoding in UTF-8. As
far as I went, here's the trouble :

When a browser sends a request to a web server, it should send the
Content-Type header with its charset (in your case UTF-8). But
unfortunately, most of them don't (IE 5,6, and even Mozilla doesn't ! I
think it's more te act "like IE" but anyway...). When no encoding is
specified, then the servlet API says it's ISO-8859-1.

In servlet 2.2 (tomcat 3), there is no way to specify what is the encoding
of the request (in 2.3 you have the request.setcharacterencoding that saves
everything), so people came out with this "trick" now also included in
Jetspeed, that is you get the request parameter as if it was 8859-1 (because
browsers didn't say explicitely that it was UTF-8), and then you use the
good character encoding (utf-8) to decode the string.

Unfortunately, tomcat 3 and tomcat 4 don't work the same way. Tomcat 4
handles perfectly this trick, because request.getCharacterEncoding returns
null, but I don't know why tomcat 3 returns ISO-8859-1 there. Are you by any
chance using tomcat 3 ?
If so, you can change in turbineresources.properties the parameterparser to
be used :

services.RunDataService.default.parameter.parser=org.apache.jetspeed.util.pa
rser.DefaultJetspeedParameterParser
#
services.RunDataService.default.parameter.parser=org.apache.turbine.util.par
ser.DefaultParameterParser

The other one may work fine in your case. That is a real mess, and I haven't
found any way to make it work fine and be API 2.2 compatible with all webapp
servers. The only way to get rid of this is to definitely move to 2.3 :(

The revelant code is not what you said but here, in setRequest method :

        if ( req.getCharacterEncoding() != null )
        {
            enc = req.getCharacterEncoding();
        }

That's when the parameter parser tries to find what is the request encoding.
Under tomcat 4, ok, under tomcat 3, the getCharacterEncoding isn't null and
so we try to decode 8859-1 to 8859-1...
This final test is here "in case" the browser really sent its encoding, but
as none does (I've tested most used once) maybe this should be thrown
away... I don't know. Here in my app that's what I finally did.

For mor info on encoding troubles with 2.2 API, see this :
http://www.jguru.com/faq/printablefaq.jsp?topic=I18N

I hope I was clear enough, but as you see this is an awful bug.
Aurelien

Joachim Müller a écrit :

> hi, I just want to check back before I submit this
> to bugzilla:
>
>
> there is a possible character encoding bug in
>
> org.apache.jetspeed.util.parser.DefaultJetspeedParameterParser
>
> line 151
>
> return new String(str.getBytes("8859_1"), getCharacterEncoding());
>
>
> this leads to errors using german umlaute when rundata parameter
> are encoded with UTF-8. (eg. in the user name: try user name
> &uuml;bel, create an account and try to edit the account)
>
> if the rundata encoding is UTF-8 this leads to errors creating the
> string with umlauten. does somebody put the fixed encoding here on
> purpose? If not I would propose this modification:
>
> return new String(str.getBytes(getCharacterEncoding()),
> getCharacterEncoding());


---------------------------------------------------------------------
To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org


Mime
View raw message