portals-jetspeed-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joachim Müller <joac...@wemove.com>
Subject AW: possible character encoding bug
Date Thu, 22 May 2003 16:25:41 GMT

wow, this sounds like a lot of hassel...

maybe I was a bit too quick, but the problem resolved by
changing the encoding in media.xreg from UTF8 to 8859_1.
this works for me, because I only use german and english.
(quicktest: mozilla 1.3, IE6, winxp)

I am using tomcat4.1.18 and request.getCharacterEnconding
returns null correctly.

but again. isn't it dangerous to hard code the string_to_bytearray
conversion with 8859_1, not knowing what character encoding
the request was send with?

the string(byte[], charsetName) definition states that bytes[] 
must match the charsetName encoding. therefore my thought was 
to make the byte[] encoding match the charsetName. but maybe I am
thinking wrong?


joachim


> -----Ursprüngliche Nachricht-----
> Von: Aurelien Pernoud [mailto:apernoud@sopragroup.com]
> Gesendet: Donnerstag, 22. Mai 2003 15:41
> An: 'Jetspeed Developers List'
> Betreff: RE: possible character encoding bug
> 
> 
> 
> In fact I've encountered troubles in my app too with encoding in UTF-8. As
> far as I went, here's the trouble :
> 
> When a browser sends a request to a web server, it should send the
> Content-Type header with its charset (in your case UTF-8). But
> unfortunately, most of them don't (IE 5,6, and even Mozilla doesn't ! I
> think it's more te act "like IE" but anyway...). When no encoding is
> specified, then the servlet API says it's ISO-8859-1.
> 
> In servlet 2.2 (tomcat 3), there is no way to specify what is the encoding
> of the request (in 2.3 you have the request.setcharacterencoding that saves
> everything), so people came out with this "trick" now also included in
> Jetspeed, that is you get the request parameter as if it was 8859-1 (because
> browsers didn't say explicitely that it was UTF-8), and then you use the
> good character encoding (utf-8) to decode the string.
> 
> Unfortunately, tomcat 3 and tomcat 4 don't work the same way. Tomcat 4
> handles perfectly this trick, because request.getCharacterEncoding returns
> null, but I don't know why tomcat 3 returns ISO-8859-1 there. Are you by any
> chance using tomcat 3 ?
> If so, you can change in turbineresources.properties the parameterparser to
> be used :
> 
> services.RunDataService.default.parameter.parser=org.apache.jetspeed.util.pa
> rser.DefaultJetspeedParameterParser
> #
> services.RunDataService.default.parameter.parser=org.apache.turbine.util.par
> ser.DefaultParameterParser
> 
> The other one may work fine in your case. That is a real mess, and I haven't
> found any way to make it work fine and be API 2.2 compatible with all webapp
> servers. The only way to get rid of this is to definitely move to 2.3 :(
> 
> The revelant code is not what you said but here, in setRequest method :
> 
>         if ( req.getCharacterEncoding() != null )
>         {
>             enc = req.getCharacterEncoding();
>         }
> 
> That's when the parameter parser tries to find what is the request encoding.
> Under tomcat 4, ok, under tomcat 3, the getCharacterEncoding isn't null and
> so we try to decode 8859-1 to 8859-1...
> This final test is here "in case" the browser really sent its encoding, but
> as none does (I've tested most used once) maybe this should be thrown
> away... I don't know. Here in my app that's what I finally did.
> 
> For mor info on encoding troubles with 2.2 API, see this :
> http://www.jguru.com/faq/printablefaq.jsp?topic=I18N
> 
> I hope I was clear enough, but as you see this is an awful bug.
> Aurelien
> 
> Joachim Müller a écrit :
> 
> > hi, I just want to check back before I submit this
> > to bugzilla:
> >
> >
> > there is a possible character encoding bug in
> >
> > org.apache.jetspeed.util.parser.DefaultJetspeedParameterParser
> >
> > line 151
> >
> > return new String(str.getBytes("8859_1"), getCharacterEncoding());
> >
> >
> > this leads to errors using german umlaute when rundata parameter
> > are encoded with UTF-8. (eg. in the user name: try user name
> > &uuml;bel, create an account and try to edit the account)
> >
> > if the rundata encoding is UTF-8 this leads to errors creating the
> > string with umlauten. does somebody put the fixed encoding here on
> > purpose? If not I would propose this modification:
> >
> > return new String(str.getBytes(getCharacterEncoding()),
> > getCharacterEncoding());
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org
> 
> 
> 
Mime
View raw message