flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bramnik <mark.bram...@gmail.com>
Subject Flume Elastic Search Sink TTL question
Date Sun, 15 Mar 2015 16:44:56 GMT
Hi,
I've got a question about the TTL in elasticsearch sink of apache flume


I've working on elastic search + flume integration.
I'm using elasticsearch version 1.4.1 and flume version 1.5.2
Both are running locally on my machine

In Flume My ElasticSearch Sink is configured as follows:

agent.sinks.elasticSearchSink.type =
org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticSearchSink.channel = fileChannel
agent.sinks.elasticSearchSink.hostNames=localhost:9300
agent.sinks.elasticSearchSink.indexName=platform
agent.sinks.elasticSearchSink.indexType=platformtype
agent.sinks.elasticSearchSink.ttl=1m
agent.sinks.elasticSearchSink.batchSize=1000
agent.sinks.elasticSearchSink.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

Note, there is a ttl of 1m (1 minute) for the sake of test.

The ES starts empty with default configurations

Now, I place the log events in my system into flume and see that they get
stored in elasticsearch,
for example, after 3 events being added to flume, I see in ES rest
interface:

>> GET:: http://localhost:9200/_search
{

   - "took": 2,
   - "timed_out": false,
   -
   "_shards": {
      - "total": 5,
      - "successful": 5,
      - "failed": 0
   },
   -
   "hits": {
      - "total": 3,
      - "max_score": 1,
      -
      "hits": [
         -
         {
            - "_index": "platform-2015-03-15",
            - "_type": "platformtype",
            - "_id": "AUweJbNImsCLrYBu-7gJ",
            - "_score": 1,
            -
            "_source": {
               - "@message": "",
               -
               "@fields": { … }
            }
         },
         -
         {
            - "_index": "platform-2015-03-15",
            - "_type": "platformtype",
            - "_id": "AUweJbNImsCLrYBu-7gI",
            - "_score": 1,
            -
            "_source": {
               - "@message": "",
               -
               "@fields": { … }
            }
         },
         -
         {
            - "_index": "platform-2015-03-15",
            - "_type": "platformtype",
            - "_id": "AUweJbNJmsCLrYBu-7gK",
            - "_score": 1,
            -
            "_source": {
               - "@message": "",
               -
               "@fields": { … }
            }
         }
      ]
   }

}
Now, I would expect the messages to be deleted after a minute, but
unfortunately its not the case.

I can see that the ES index doesn't include any TTL definition at all:

>> GET: http://localhost:9200/_all/platformtype/_mapping
{

   -
   "platform-2015-03-15": {
      -
      "mappings": {
         -
         "platformtype": {
            -
            "properties": {
               -
               "@fields": {
                  -
                  "properties": { … } // the event properties, all are
                  present here
               }
            }
         }
      }
   }

}


So, these messages get stuck forever in ES.

I know that _ttl is disabled by default in ES as stated here:

http://www.elastic.co/guide/en/elasticsearch/reference/master/mapping-ttl-field.html


So, I'm trying to enable the TTL "manually" to examine the behavior:
>> PUT: http://localhost:9200/_all/platformtype/_mapping
with body:
{"platformtype" : {"_ttl" : {"enabled" : true, "default" : "2m"}}}

This results (the ttl has been set):
{

   - "acknowledged": true

}

Note, that intentionally I've put 2m unlike the definition of 1 minute in
flume sink configuration.
So, now I can see the following in the mapping:

>> http://localhost:9200/_all/platformtype/_mapping

{

   -
   "platform-2015-03-15": {
      -
      "mappings": {
         -
         "platformtype": {
            -
            "_ttl": {
               - "enabled": true,
               - "default": 120000
            },
            -
            "properties": { … }
         }
      }
   }

}

Ok, Now I'm adding 3 more events to flume and there are totally 6 events in
ES now.
I'm waiting for 1 minute and the messages get deleted (it takes less than 2
minutes), which means that ES sink's TTL definition definitely take place.

So I'm confused, I've assumed that the TTL on index is working by default,
based solely on flume elastic search definitions, but it looks that I'm
wrong.
Could you please explain, whether its a bug in ES sink or intended
behavior, how it should work?

Thanks and have a nice day,
Mark Bramnik

Mime
View raw message