Denes,

 

Please find below my Morphline config file. I had tried Memory channel but  found it runs faster with File Channel.

 

solrLocator: {

 

collection : esearch

 

zkHost : "codesolr-as-r2p:2181"

 

}

 

morphlines :

[

 

  {

 

    id : morphline1

 

    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

 

    commands :

    [

 

      { detectMimeType { includeDefaultMimeTypes : true } }

 

      {

 

        solrCell {

 

          solrLocator : ${solrLocator}

 

          captureAttr : true

 

          lowernames : true

 

          capture : [_attachment_body, _attachment_mimetype, basename, content, content_encoding, content_type, file, meta,text]

 

          parsers : [

                                { parser : org.apache.tika.parser.txt.TXTParser }

                    ]

 

         fmap : { content : text }

         }

 

      }

      { generateUUID { field : id } }

 

      { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }

 

 

      { logDebug { format : "output record: {}", args : ["@{}"] } }

 

      { loadSolr: { solrLocator : ${solrLocator} } }

 

    ]

 

  }

 

]

 

 

Sample text file looks like below

 

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 

 

Received: from abc.net ([11.222.333.444])

        by abc.abc.net with bizsmtp

        id djfAJSD*jKDHJKD; Sun, 01 Jan 2010 12:31:51 +0000

Received: from xya.xyz.net ([99.888.777.666])

        by xyz.xyz.net with SMTP

        id jhcfhchABHDJHDD*HDJhsdjcfjh; Sun, 01 Jan 2019 02:31:50 +0000

Received: from smtp.abccbc.abcbcbcb.com ([11.111.22.34])

        by pqrs.pqrs.net with SMTP

        id JHDJHJDHJHD*USDHCFJNHSD*; Sun, 01 Jan 2010 02:31:51 +0000

X-Xfinity-Message-Heuristics: IPv6:N;TLS=0;SPF=1;DMARC=

Received: from portalmail (unknown [777.33.2.90])

        by smtp.ajhjhdjjdfh-ajhdjkjsd.com (Postfix) with ESMTP id HDJHDJDSJKS

        for <PQRS@abc.net>; Sat, 31 Dec 2010 18:31:49 -0800 (PST)

From: "abc_abc@abc.com"

To: qqqq@abc.net

Message-ID: <999999999.888.3449859489586.JavaMail.VV@mortalmail>

Subject: 111-2343444434  You got a email, LLC ("abc")

MIME-Version: 1.0

Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: quoted-printable

X-CMAE-Envelope: kjsjdsjdjdjvf9jd/12djhfjhd83hjnr38/jfjjvgf95kjg905j95ygjmt59ytjmgh95ijmhjkt6h

9085jghty89jhn596ijyiuh96ijmhj90t5ui9kjio6i5uy096i5jki650ui6o7kuoki

 

-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1

 

ABC of jdjdhjhjdhjvfh Use of fkjkfjo9r5nmfkf90trmbklgftob

ABC ID: 111-34345454545

Action Date: 01 Jan 2010 02:31:33 GMT

 

ABC Corporation

 

Dear Sir or Madam:

 

dhjdhjfddsfjufnkdfjkjdjnhjfdjk832nhjkfg8nsdvjvnhjvjkffdjkvjhdfhjfjbhjfhnb

jchjvhfjvjhjjnxj4328uiwejf3uivcnj3490uncvrgu890jvkfjviujrfig94uvnjfvvgjhg89

hdfg9urvnjfijuhvirjsgu9rjdnvidj9ujbvgbi9rbdfgjbi9tujfbvkrniujv bnrtbjiuj

jdfjvb9utrjgnbg90ujrjmf043ikvjkfjvfrjopfr0gjvkfdjvfjovgfdovofdodopigif04jvkerj

ibjhidfjbikjfdbjibr9gikfdjgvr905jfkjgvgvj9ufkjbvfiugtjgkjb90tvbjkjfdjbffkjjfb

kjffkjbkfjkjff9g4rjdf044jn v90dfjvgr0irkjkvjfb09ua[vbjksoohfrijugb9jkvjkjkfjf

 

 

Regards,

 

XYZ

 

*pgp public key is available on the key server at http://xyz.git.edu

 

Note: The information transmitted in this Notice is intended only for the p=

erson or entity to which it is addressed and may contain confidential and/o=

r privileged material.  Any review, reproduction, retransmission, dissemina=

tion or other use of, or taking of any action in reliance upon, this inform=

ation by persons or entities other than the intended recipient is prohibite=

d.  If you received this in error, please contact the sender and delete the=

material from all computers.

 

This infringement notice contains an XML tag that can be used to automate t=

he processing of this data.  If you would like more information on how to u=

se this tag please contact XYZ.

 

 

- - ---Start ACNS XML

<?xml version=3D"1.0" encoding=3D"UTF-8"?>

<Infringement xmlns=3D"http://www.acns.net/ACNS" xmlns:xsi=3D"http://www.w3=

.org/2001/XMLSchema-instance" xsi:schemaLocation=3D"http://www.acns.net/ACN=

S http://www.acns.net/v1.2/ACNS2v1_2.xsd">

    <Case>

        <ID>00000000</ID>

        <Status>Open</Status>

    </Case>

    <Complainant>

        <Entity>XYZ USA, Inc</Entity>

        <Contact>XYZ</Contact>

        <Address>P.O. Box 000, North XYZ, KA 00000</Address>

        <Phone>999999999</Phone>

        <Email>abc@abc.com</Email>

    </Complainant>

   <Service_Provider>

        <Entity>ABC Corporation</Entity>

        <Email>abc@abc.net</Email>

    </Service_Provider>

    <Source>

        <TimeStamp>2016-12-31T23:15:40.000Z</TimeStamp>

        <IP_Address>11.22.33.444</IP_Address>

        <Port>55555</Port>

        <Type>BitTorrent</Type>

        <Number_Files>1</Number_Files>

        <Deja_Vu>No</Deja_Vu>

    </Source>

    <Content>

        <Item>

            <TimeStamp>2016-12-31T23:15:40.000Z</TimeStamp>

            <Title>Power</Title>

            <FileName>Power </FileName>

            <FileSize>000000000</FileSize>

            <URL>dht</URL>

        </Item>

    </Content>

</Infringement>

- - ---End ACNS XML

-----BEGIN PGP SIGNATURE-----

Version: GnuPG v2.0.22 (MingW32)

 

xjsdh78h23e7u2he3279y3hjdhe7823jhd3783gddey373hyfu37ru3rh892rhf2

23897EBHCA8ENHD         q0jc39ujdkjd9rj8287hcd833hrnj390unce90ru3jrifj9r

930jh3ier390hnd9d23ujf3249u9uifoje9frjfij90fvu394ujfjc0f9u9vjfv9

 

-----END PGP SIGNATURE-----

 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 

 

I will try profiling it.

 

Regards,

~Sri

 

From: Denes Arvay [mailto:denes@cloudera.com]
Sent: Thursday, February 23, 2017 10:40 AM
To: user@flume.apache.org
Subject: Re: Ingestion to Solr is very slow

 

Hi,

 

The Flume config seems OK for me, one minor thing: I'd suggest to try the memory channel, it can speed up the things a little bit.

The morphline part might be a bottleneck, could you please share its config as well?

Some sample input files might also be useful to be able to help with the debugging.

 

Beside these I'd recommend to try to profile it with a Java profiler (e.g. jvisualvm).

 

Regards,

Denes

 

 

On Fri, Feb 17, 2017 at 12:00 AM Anatharaman, Srinatha (Contractor) <Srinatha_Anantharaman@comcast.com> wrote:

Hi,

 

I have large set of small files , each file is around 7 – 10 K in size

Total I have 350K files with around 6 GB.

 

I have changed my flume configuration with many options but whatever the config change Solr takes 2 sec for each file to ingest

 

 

agent.sources = SpoolDirSrc

agent.channels = FileChannel

agent.sinks = SolrSink

 

# Configure Source

 

agent.sources.SpoolDirSrc.channels = fileChannel

agent.sources.SpoolDirSrc.type = spooldir

agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/final

agent.sources.SpoolDirSrc.basenameHeader = true

#agent.sources.SpoolDirSrc.batchSize = 100000

 

agent.sources.SpoolDirSrc.fileHeader = true

agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder

 

 

# Use a channel that buffers events in memory

agent.channels.FileChannel.type = file

agent.channels.FileChannel.capacity = 1000

agent.channels.FileChannel.transactionCapacity = 1000

 

#agent.channels.FileChannel.transactionCapacity = 10000

 

# Configure Solr Sink

 

agent.sinks.SolrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink

agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf

#agent.sinks.SolrSink.batchsize = 100000

#agent.sinks.SolrSink.batchDurationMillis = 5000

agent.sinks.SolrSink.channel = fileChannel

agent.sinks.SolrSink.morphlineId = morphline1

agent.sinks.SolrSink.tika.config = tikaConfig.xml

agent.sinks.SolrSink.rollCount = 0

agent.sinks.SolrSink.rollInterval = 0

agent.sinks.SolrSink.rollsize = 100000000

agent.sinks.SolrSink.idleTimeout = 0

agent.sinks.SolrSink.batchSize = 100000

agent.sinks.SolrSink.txnEventMax = 10000000

 

agent.sources.SpoolDirSrc.channels = FileChannel

agent.sinks.SolrSink.channel = FileChannel

 

My Collection is on 2 shards and 1 replication

 

Kindly let me know how do I make this better

 

Regards,

~Sri