flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balasubramanian Jayaraman <balasubramanian.jayara...@autodesk.com>
Subject Reg S3 Flume HDFS SINK Compression
Date Wed, 20 May 2015 03:51:15 GMT

I am trying to write the flume events in Amaozn S3.The events written in S3 is in compressed
format. My Flume configuration is given below. I am facing a data loss. Based on the configuration
given below, if I publish 20000 events, I receive only 1000 events and all other data is lost.
But When I disable the rollcount, rollSize and rollInterval configurations, all the events
are received but there are 2000 small files created. Is there any wrong in my configuration
settings? Should I add any other configurations?

    injector.sinks.s3_3store.type = hdfs
    injector.sinks.s3_3store.channel = disk_backed4
    injector.sinks.s3_3store.hdfs.fileType = CompressedStream
    injector.sinks.s3_3store.hdfs.codeC = gzip
    injector.sinks.s3_3store.hdfs.serializer = TEXT
    injector.sinks.s3_3store.hdfs.path = s3n://CID:SecretKey@bucketName/dth=%Y-%m-%d-%H
    injector.sinks.s3_1store.hdfs.filePrefix = events-%{receiver}
    # Roll when files reach 256M or after 10m, whichever comes first
    injector.sinks.s3_3store.hdfs.rollCount = 0
    injector.sinks.s3_3store.hdfs.idleTimeout = 600
    injector.sinks.s3_3store.hdfs.rollSize = 268435456
    #injector.sinks.s3_3store.hdfs.rollInterval = 3600
    # Flush data to buckets every 1k events
    injector.sinks.s3_3store.hdfs.batchSize = 10000


View raw message