Hello Folks:
I would like to request your help regarding Flum's Configuration to replicate files from one node to other, where we currently have an issue about lost files during replication process.
The following diagram represent the actual architecture where flum is working, replicating files in Avro format to HDFS and SolR.
When we check the information at both destination we have
found that not all the information were replicate from source,
losing files
Below is the configuration file for Node 2:
Nodo2:
a1.sources = r1
# Describe/configure the
source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 50001
a1.channels = c1 c2
# Use a channel c1 which
buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
# Use a channel c2 which buffers events in memory
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
# Definición de
Interceptor en caso de ser Multiplexación
// Customer wants to use
Replicating, Is it necessary to keep the Interceptor
declaration inside the file? Because Interceptor is for
Multiplexing only. //
a1.sources.r1.interceptors =
pcgInterceptor
a1.sources.r1.interceptors.pcgInterceptor.type =
pcg.PcgInterceptor$Builder
a1.sources.r1.interceptors.pcgInterceptor.solrServer=
http://XXX.YYY.ZZZ.WWW:NN/solr/pcgs_dt_datos_panel_cntrl_shard1_replica1/
a1.sources.r1.interceptors.pcgInterceptor.paramKeys =
TPO_REG,START_YEAR,START_MONTH,START_DAY,START_HOUR,START_MINUTE,START_SECONDS,END_YEAR,END_MONTH,END_DAY,END_HOUR,END_MINUTE,END_SECONDS,COD_EST,PROCESS_NAME,TPO_MLL,SGL_SIS,SGL_SUB_SIS,NOM_TAR,COD_SEC,PSO_ARQ,REG_LEI,REG_PCS,REG_RCH,NOM_ARQ,FEC_OPE,DISK,MEMORY,CPU,PID,RANKING,END_LINE
# Define channel selector
and define mapping
a1.sources.r1.selector.type =
replicating
a1.sinks = k1 k2
# Definición de los
Sumideros (Sinks) o Destinos
# Describe first SOLR sink k1 to store manager's data only,
its associated with channel c1
a1.sinks.k1.type =
org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.channel = c1
a1.sinks.k1.morphlineFile = k1.conf
a1.sinks.k1.morphlineId = pcg
a1.sinks.k1.isProductionMode = true
a1.sinks.k1.batchSize = 1
# Describe k2 sink k2 to
store developer’s data only, its associated with channel c2
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c2
a1.sinks.k2.hdfs.path = hdfs://pcg/pcgs_dt_evnto
a1.sinks.k2.hdfs.rollInterval = 0
a1.sinks.k2.hdfs.rollSize = 1073741824
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k2.hdfs.idleTimeout = 28800
a1.sinks.k2.hdfs.kerberosPrincipal = ingest@CORP.CORP
a1.sinks.k2.hdfs.kerberosKeytab = /home/ingest/ingest.keytab
# Enlazar la fuente y los
sumideros (Sinks) al Canal
# Bind the source and sink to the channel
# a1.sources.spoolDirectory.channels = c1 c2
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2