flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guyle M. Taber" <gu...@gmtech.net>
Subject Re: Flume truncating files at about 2060 characters
Date Mon, 31 Aug 2015 21:03:20 GMT
Perfect Iain. Worked like a charm.


> On Aug 31, 2015, at 11:19 AM, iain wright <iainwrig@gmail.com> wrote:
> 
> I'd expect it to work with any source, ive used it with exec & spoolingdirsource
> 
> Cheers,
> 
> -- 
> Iain Wright
> 
> This email message is confidential, intended only for the recipient(s) named above and
may contain information that is privileged, exempt from disclosure under applicable law. If
you are not the intended recipient, do not disclose or disseminate the message to anyone except
the intended recipient. If you have received this message in error, or are not the named recipient(s),
please immediately notify the sender by return email, and delete all copies of this message.
> 
> On Mon, Aug 31, 2015 at 11:14 AM, Guyle M. Taber <guyle@gmtech.net <mailto:guyle@gmtech.net>>
wrote:
> Fantastic.
> 
> So with this deserializer setting, it’s not dependent on the source being a logger
type?
> 
> 
>> On Aug 31, 2015, at 11:12 AM, iain wright <iainwrig@gmail.com <mailto:iainwrig@gmail.com>>
wrote:
>> 
>> Hi Guyle,
>> 
>> We ran into the same thing.
>> 
>> Please see https://flume.apache.org/FlumeUserGuide.html#line <https://flume.apache.org/FlumeUserGuide.html#line>
>> 
>> On the originating source/where the event enters flume for the first time, increase
maxLineLength, ie:
>> ...
>> agent1.sources.source1.deserializer.maxLineLength = 1048576
>> ...
>> 
>> Best,
>> 
>> -- 
>> Iain Wright
>> 
>> This email message is confidential, intended only for the recipient(s) named above
and may contain information that is privileged, exempt from disclosure under applicable law.
If you are not the intended recipient, do not disclose or disseminate the message to anyone
except the intended recipient. If you have received this message in error, or are not the
named recipient(s), please immediately notify the sender by return email, and delete all copies
of this message.
>> 
>> On Mon, Aug 31, 2015 at 11:03 AM, Guyle M. Taber <guyle@gmtech.net <mailto:guyle@gmtech.net>>
wrote:
>> I’m using an Avrosink to send events to HDFS and we’re seeing with long content
lines, our lines seem to be getting truncated at about the 2060 character mark. How can I
prevent long lines from being truncated when using an Avro sink in this fashion?
>> 
>> Here’s a snippet of an event from the raw logs before flume is involved. I’ve
toggled hidden characters so you can see the EOL character being inserted, which breaks up
the event into two lines.
>> 
>> …utm_campaign=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4&camp=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4^Isearch-term[=]^Isession-id[=]720D69AB19F1DD17D27A948C9B31D380^Istore-id[=]^Itracking-ticket-id[=]^Itracking-ticket-number[=]^Ievent-session-id[=]98df4905-51ab-43a9-92d9-35d879a69b9a
$
>> 
>> Here’s a snippet of an event that gets truncated.
>> 
>> …utm_campaign=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4&camp=%E5%81%A5%E5%BA%$
>> 
>> B7%E7%BE%8E%E6%8A%A4^Isearch-term[=]^Isession-id[=]720D69AB19F1DD17D27A948C9B31D380^Istore-id[=]^Itracking-ticket-id[=]^Itracking-ticket-number[=]^Ievent-session-id[=]98df4905-51ab-43a9-92d9-35d879a69b9a
$
>> 
>> Here is our sink on the sending node.
>> 
>> agent.sinks = AvroSink
>> agent.sinks.AvroSink.type = avro
>> agent.sinks.AvroSink.channel = memoryChannel
>> agent.sinks.AvroSink.hostname = flume.mydomain.int <http://flume.mydomain.int/>
>> agent.sinks.AvroSink.port = 4169
>> agent.sinks.AvroSink.batchSize = 0
>> agent.sinks.AvroSink.rollSize = 0
>> agent.sinks.AvroSink.rollInterval = 0
>> agent.sinks.AvroSink.rollCount = 0
>> agent.sinks.AvroSink.idleTimeout = 0
>> agent.sinks.AvroSink.useLocalTimeStamp = true
>> 
>> Here is our sink on the HDFS receiving side.
>> 
>> dp1.sinks.sinkCN.type = hdfs
>> dp1.sinks.sinkCN.channel = channelCN
>> dp1.sinks.sinkCN.hdfs.filePrefix = %{basename}-
>> dp1.sinks.sinkCN.hdfs.path = hdfs://sf1-hadoopnn1.mydomain.int/flume/events/ods/cn/fe_event/%{host}/%y-%m-%d
<>
>> dp1.sinks.sinkCN.hdfs.fileType = DataStream
>> dp1.sinks.sinkCN.hdfs.writeFormat = Text
>> dp1.sinks.sinkCN.hdfs.rollSize = 0
>> dp1.sinks.sinkCN.hdfs.rollCount = 0
>> dp1.sinks.sinkCN.hdfs.batchSize = 5000
>> 
> 
> 


Mime
View raw message