flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Subhramanian <deepak.subhraman...@gmail.com>
Subject Re: Json over netcat source
Date Thu, 08 May 2014 17:24:08 GMT
Hi Ashish,

Thanks for the solution. I made the changes and I can see the JSON message
now. There is a JIRA raised on the same issue.

https://issues.apache.org/jira/browse/FLUME-2126


>From Hive when I load JSON data it automatically splits JSON fields to
different columns. For some reason the ESSink doesnt load in the same way.
I am not sure if I am setting the correct type. There is a parameter es.
input.json I have to set to true in hive table . Is there any similar
variable I have to set for ESSink

Here is the raw data I am getting in Kibana.

{
  "_index": "test-2014-05-08",
  "_type": "parsed_logs",
  "_id": "7qSBgRx-Q_GLaCDWARs_Cg",
  "_score": null,
  "_source": {
    "@message": "{\"action\":{\"id\":\"00001\"}}",
    "@timestamp": "2014-05-08T16:48:44.180Z",
    "@type": "application/json",
    "@fields": {
      "_attachment_mimetype": "application/json",
      "timestamp": "1399567724180",
      "_type": "application/json",
      "type": "application/json"
    }
  },
  "sort": [
    1399567724180
  ]
}



On Sun, Apr 13, 2014 at 4:56 PM, Ashish <paliwalashish@gmail.com> wrote:

> little more on the issue
>
> builder.field(fieldName, tmp); calls the XContentBuilder API where class
> type is determined and appropriate method is called. Since tmp, which is
> instance of XContentBuilder, doesn't match any of the defined if conditions
> it goes to final else where the tmp.toString() is called, and field(String,
> String) method is called so we get object address in index.
>
> Replacing
> builder.field(fieldName, tmp);
> with
> builder.field(fieldName, tmp.string());
>
> shall make things work, but I am not sure if this would be the best way to
> use the API.
>
> Got the answer from ES user list :)
>
> http://elasticsearch-users.115913.n3.nabble.com/Issue-with-posting-json-data-to-elastic-search-via-Flume-td4054017.html
>
> Can ES experts comment on the best way forward?
>
>
>
> On Sun, Apr 13, 2014 at 8:10 PM, Ashish <paliwalashish@gmail.com> wrote:
>
>> Have been able to reproduce the problem locally using the existing test
>> cases inside ES Sink. The problem does exist.
>>
>> Did some initial investigation, the framework is able to detect the JSON
>> content and tries to add it as complex field.
>> timestamp is added only if present in header.
>>
>> In the class org.apache.flume.sink.elasticsearch.ContentBuilderUtil
>>
>> public static void addComplexField(XContentBuilder builder, String
>> fieldName,
>>       XContentType contentType, byte[] data) throws IOException {
>>     XContentParser parser = null;
>>     try {
>>       XContentBuilder tmp = jsonBuilder();
>>       parser = XContentFactory.xContent(contentType).createParser(data);
>>       parser.nextToken();
>>       tmp.copyCurrentStructure(parser);
>>       builder.field(fieldName, tmp); <<<< This is where the we might
have
>> an issue (real action is happening inside this method
>>                                       call)
>>
>> Can someone familiar with this part look further into this? I shall debug
>> further as soon as I have free cycles.
>>
>> thanks
>> ashish
>>
>>
>>
>> On Fri, Apr 11, 2014 at 5:24 PM, Deepak Subhramanian <
>> deepak.subhramanian@gmail.com> wrote:
>>
>>>  Thanks Simon. I am also struggling with no luck. I tried using the
>>> latest flume elastic search sink jar  build from 1.5SNAPSHOT ,but still no
>>> luck. I will try to see if it is an issue with elastic search api . When I
>>> loaded json using hive it loaded JSON properly. But we have to pass a
>>> property es.input.json in hive.  Is there a way to pass the same in Flume.
>>>
>>> CREATE EXTERNAL TABLE json (data STRING <http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO25-1>)
>>>
>>>
>>>
>>> STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
>>> TBLPROPERTIES('es.resource' = '...',
>>>
>>>
>>>
>>>               'es.input.json` = 'yes' <http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#CO25-2>);
>>>
>>>
>>
>>
>> --
>> thanks
>> ashish
>>
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>



-- 
Deepak Subhramanian

Mime
View raw message