Seems the problem only exists when using load balance sink processor with groups of sinks that use lzo compression.

Looking into the details.

2015-09-08 10:19 GMT+08:00 Shady Xu <shadyxu@gmail.com>:
Using different prefixes does not fix the problem. Any other idea?

2015-09-07 10:19 GMT+08:00 Shady Xu <shadyxu@gmail.com>:
Yes, I have several sinks that all write to sub directories of /user/data. Among them, there are two sinks, grouped as load balance sink processor, write to the same directory. I will try the set different prefix for the load balance sinks.

If you don't see this as a bug, please make it clear in the documentation.

2015-09-07 0:24 GMT+08:00 Hari Shreedharan <hshreedharan@cloudera.com>:
Do you have multiple sinks writing to the same directory? If yes, that could cause issues like this. Can you use different prefixes for each sink if you want them to write to the same directory.


On Sunday, September 6, 2015, Shady Xu <shadyxu@gmail.com> wrote:
Hi all,

Have anyone experienced the exception below? I am using LZO compression and when local data that hasn't been uploaded to HDFS accumulate to size of Gs, this error happens and Flume can not recover from it. 

29 Aug 2015 18:50:37,031 WARN  [SinkRunner-PollingRunner-LoadBalancingSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.append:555)  - Caught IOException writing to HDFSWriter (write beyond end of stream). Closing file (/user/log/data.1440845433925.lzo.tmp) and rethrowing exception.
29 Aug 2015 18:50:37,039 INFO  [SinkRunner-PollingRunner-LoadBalancingSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.close:363)  - Closing /user/log/data.1440845433925.lzo.tmp
29 Aug 2015 18:50:37,064 INFO  [hdfs-sink2-call-runner-3] (org.apache.flume.sink.hdfs.BucketWriter$8.call:629)  - Renaming /user/log/data.1440845433925.lzo.tmp to /user/log/data.1440845433925.lzo
29 Aug 2015 18:50:37,069 INFO  [SinkRunner-PollingRunner-LoadBalancingSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink$1.run:394)  - Writer callback called.
29 Aug 2015 18:50:37,086 WARN  [SinkRunner-PollingRunner-LoadBalancingSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:455)  - HDFS IO error
java.io.IOException: write beyond end of stream
        at com.hadoop.compression.lzo.LzopOutputStream.write(LzopOutputStream.java:134)
        at java.io.OutputStream.write(OutputStream.java:75)
        at org.apache.flume.serialization.BodyTextEventSerializer.write(BodyTextEventSerializer.java:71)
        at org.apache.flume.sink.hdfs.HDFSCompressedDataStream.append(HDFSCompressedDataStream.java:126)
        at org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:550)
        at org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:547)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)


--

Thanks,
Hari