Hello Folks:

    I would like to request your help regarding Flum's Configuration to replicate files from one node to other, where we currently have an issue about lost files during replication process.

    The following diagram represent the actual architecture where flum is working, replicating files in Avro format to HDFS and SolR.


    When we check the information at both destination we have found that not all the information were replicate from source, losing files

    Below is the configuration file for Node 2:


Nodo2:

a1.sources = r1

# Describe/configure the source

a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 50001

a1.channels = c1 c2


# Use a channel c1 which buffers events in memory

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000


# Use a channel c2 which buffers events in memory

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000


# Definición de Interceptor en caso de ser Multiplexación

// Customer wants to use Replicating, Is it necessary to keep the Interceptor declaration inside the file? Because Interceptor is for Multiplexing only. //

a1.sources.r1.interceptors = pcgInterceptor
a1.sources.r1.interceptors.pcgInterceptor.type = pcg.PcgInterceptor$Builder
a1.sources.r1.interceptors.pcgInterceptor.solrServer= http://XXX.YYY.ZZZ.WWW:NN/solr/pcgs_dt_datos_panel_cntrl_shard1_replica1/
a1.sources.r1.interceptors.pcgInterceptor.paramKeys =  TPO_REG,START_YEAR,START_MONTH,START_DAY,START_HOUR,START_MINUTE,START_SECONDS,END_YEAR,END_MONTH,END_DAY,END_HOUR,END_MINUTE,END_SECONDS,COD_EST,PROCESS_NAME,TPO_MLL,SGL_SIS,SGL_SUB_SIS,NOM_TAR,COD_SEC,PSO_ARQ,REG_LEI,REG_PCS,REG_RCH,NOM_ARQ,FEC_OPE,DISK,MEMORY,CPU,PID,RANKING,END_LINE

# Define channel selector and define mapping

a1.sources.r1.selector.type = replicating
a1.sinks = k1 k2


# Definición de los Sumideros (Sinks) o Destinos
# Describe first SOLR sink k1 to store manager's data only, its associated with channel c1

a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.channel = c1
a1.sinks.k1.morphlineFile = k1.conf
a1.sinks.k1.morphlineId = pcg
a1.sinks.k1.isProductionMode = true
a1.sinks.k1.batchSize = 1


# Describe k2 sink k2 to store developer’s data only, its associated with channel c2

a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c2
a1.sinks.k2.hdfs.path = hdfs://pcg/pcgs_dt_evnto
a1.sinks.k2.hdfs.rollInterval = 0
a1.sinks.k2.hdfs.rollSize = 1073741824
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k2.hdfs.idleTimeout = 28800
a1.sinks.k2.hdfs.kerberosPrincipal = ingest@CORP.CORP
a1.sinks.k2.hdfs.kerberosKeytab = /home/ingest/ingest.keytab


# Enlazar la fuente y los sumideros (Sinks) al Canal
# Bind the source and sink to the channel
# a1.sources.spoolDirectory.channels = c1 c2

a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

I will appreciate your feedback to this doubt

Best Regards
PEHC