flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denes Arvay <de...@cloudera.com>
Subject Re: Ingestion to Solr is very slow
Date Thu, 23 Feb 2017 15:39:39 GMT
Hi,

The Flume config seems OK for me, one minor thing: I'd suggest to try the
memory channel, it can speed up the things a little bit.
The morphline part might be a bottleneck, could you please share its config
as well?
Some sample input files might also be useful to be able to help with the
debugging.

Beside these I'd recommend to try to profile it with a Java profiler (e.g.
jvisualvm).

Regards,
Denes


On Fri, Feb 17, 2017 at 12:00 AM Anatharaman, Srinatha (Contractor) <
Srinatha_Anantharaman@comcast.com> wrote:

Hi,



I have large set of small files , each file is around 7 – 10 K in size

Total I have 350K files with around 6 GB.



I have changed my flume configuration with many options but whatever the
config change Solr takes 2 sec for each file to ingest





agent.sources = SpoolDirSrc

agent.channels = FileChannel

agent.sinks = SolrSink



# Configure Source



agent.sources.SpoolDirSrc.channels = fileChannel

agent.sources.SpoolDirSrc.type = spooldir

agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/final

agent.sources.SpoolDirSrc.basenameHeader = true

#agent.sources.SpoolDirSrc.batchSize = 100000



agent.sources.SpoolDirSrc.fileHeader = true

agent.sources.SpoolDirSrc.deserializer =
org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder





# Use a channel that buffers events in memory

agent.channels.FileChannel.type = file

agent.channels.FileChannel.capacity = 1000

agent.channels.FileChannel.transactionCapacity = 1000



#agent.channels.FileChannel.transactionCapacity = 10000



# Configure Solr Sink



agent.sinks.SolrSink.type =
org.apache.flume.sink.solr.morphline.MorphlineSolrSink

agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf

#agent.sinks.SolrSink.batchsize = 100000

#agent.sinks.SolrSink.batchDurationMillis = 5000

agent.sinks.SolrSink.channel = fileChannel

agent.sinks.SolrSink.morphlineId = morphline1

agent.sinks.SolrSink.tika.config = tikaConfig.xml

agent.sinks.SolrSink.rollCount = 0

agent.sinks.SolrSink.rollInterval = 0

agent.sinks.SolrSink.rollsize = 100000000

agent.sinks.SolrSink.idleTimeout = 0

agent.sinks.SolrSink.batchSize = 100000

agent.sinks.SolrSink.txnEventMax = 10000000



agent.sources.SpoolDirSrc.channels = FileChannel

agent.sinks.SolrSink.channel = FileChannel



My Collection is on 2 shards and 1 replication



Kindly let me know how do I make this better



Regards,

~Sri

Mime
View raw message