portals-jetspeed-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shinsuke SUGAYA <shins...@yahoo.co.jp>
Subject Re: possible character encoding bug
Date Thu, 22 May 2003 17:04:41 GMT
Hi,

DefaultJetspeedParameterParser applies the following priorities
when determining a encoding information of the form data.

   1) the character encoding used in the body of the request.
   2) character-set parameter in media.xreg
   3) content.defaultencoding in JetspeedResources.properties
   4) US-ASCII

I have considered that it is better for Jetspeed to use the encoding
used in the requested page. But, for tomcat3, it seems that ISO-8859-1
is returned. So Jetspeed cannot handle the form data. Therefore I'm thinking
that 1) should be deleted for fixing this issue. Please delete the
following code and check this issue.

DefaultJetspeedParameterParser.java:
@@ -127,10 +127,6 @@
              }

          }
-        if ( req.getCharacterEncoding() != null )
-        {
-            enc = req.getCharacterEncoding();
-        }
          setCharacterEncoding( enc );
      }

Please let me know if you have any problems.

Regards,
  shinsuke

Aurelien Pernoud wrote:
> In fact I've encountered troubles in my app too with encoding in UTF-8. As
> far as I went, here's the trouble :
> 
> When a browser sends a request to a web server, it should send the
> Content-Type header with its charset (in your case UTF-8). But
> unfortunately, most of them don't (IE 5,6, and even Mozilla doesn't ! I
> think it's more te act "like IE" but anyway...). When no encoding is
> specified, then the servlet API says it's ISO-8859-1.
> 
> In servlet 2.2 (tomcat 3), there is no way to specify what is the encoding
> of the request (in 2.3 you have the request.setcharacterencoding that saves
> everything), so people came out with this "trick" now also included in
> Jetspeed, that is you get the request parameter as if it was 8859-1 (because
> browsers didn't say explicitely that it was UTF-8), and then you use the
> good character encoding (utf-8) to decode the string.
> 
> Unfortunately, tomcat 3 and tomcat 4 don't work the same way. Tomcat 4
> handles perfectly this trick, because request.getCharacterEncoding returns
> null, but I don't know why tomcat 3 returns ISO-8859-1 there. Are you by any
> chance using tomcat 3 ?
> If so, you can change in turbineresources.properties the parameterparser to
> be used :
> 
> services.RunDataService.default.parameter.parser=org.apache.jetspeed.util.pa
> rser.DefaultJetspeedParameterParser
> #
> services.RunDataService.default.parameter.parser=org.apache.turbine.util.par
> ser.DefaultParameterParser
> 
> The other one may work fine in your case. That is a real mess, and I haven't
> found any way to make it work fine and be API 2.2 compatible with all webapp
> servers. The only way to get rid of this is to definitely move to 2.3 :(
> 
> The revelant code is not what you said but here, in setRequest method :
> 
>         if ( req.getCharacterEncoding() != null )
>         {
>             enc = req.getCharacterEncoding();
>         }
> 
> That's when the parameter parser tries to find what is the request encoding.
> Under tomcat 4, ok, under tomcat 3, the getCharacterEncoding isn't null and
> so we try to decode 8859-1 to 8859-1...
> This final test is here "in case" the browser really sent its encoding, but
> as none does (I've tested most used once) maybe this should be thrown
> away... I don't know. Here in my app that's what I finally did.
> 
> For mor info on encoding troubles with 2.2 API, see this :
> http://www.jguru.com/faq/printablefaq.jsp?topic=I18N
> 
> I hope I was clear enough, but as you see this is an awful bug.
> Aurelien
> 
> Joachim Müller a écrit :
> 
> 
>>hi, I just want to check back before I submit this
>>to bugzilla:
>>
>>
>>there is a possible character encoding bug in
>>
>>org.apache.jetspeed.util.parser.DefaultJetspeedParameterParser
>>
>>line 151
>>
>>return new String(str.getBytes("8859_1"), getCharacterEncoding());
>>
>>
>>this leads to errors using german umlaute when rundata parameter
>>are encoded with UTF-8. (eg. in the user name: try user name
>>&uuml;bel, create an account and try to edit the account)
>>
>>if the rundata encoding is UTF-8 this leads to errors creating the
>>string with umlauten. does somebody put the fixed encoding here on
>>purpose? If not I would propose this modification:
>>
>>return new String(str.getBytes(getCharacterEncoding()),
>>getCharacterEncoding());
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org
> 



__________________________________________________
Do You Yahoo!?
Yahoo! BB is Broadband by Yahoo!  http://bb.yahoo.co.jp/


---------------------------------------------------------------------
To unsubscribe, e-mail: jetspeed-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jetspeed-dev-help@jakarta.apache.org


Mime
View raw message