From user-return-6557-apmail-flume-user-archive=flume.apache.org@flume.apache.org Mon Feb 2 23:50:41 2015 Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08BB6102C6 for ; Mon, 2 Feb 2015 23:50:41 +0000 (UTC) Received: (qmail 92646 invoked by uid 500); 2 Feb 2015 23:50:41 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 92593 invoked by uid 500); 2 Feb 2015 23:50:41 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 92583 invoked by uid 99); 2 Feb 2015 23:50:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Feb 2015 23:50:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bob.metelsky@gmail.com designates 209.85.213.179 as permitted sender) Received: from [209.85.213.179] (HELO mail-ig0-f179.google.com) (209.85.213.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Feb 2015 23:50:36 +0000 Received: by mail-ig0-f179.google.com with SMTP id l13so20824562iga.0 for ; Mon, 02 Feb 2015 15:49:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=NoIFYdectoeeeZJgYG06Q/uHkXyEFDH6ry5yaAJ4Rwc=; b=YVlBbs7Y7vc35+18cYX3o8RqaIeTOVjqvHhnrgqlmgut2oTw6WRiBqFNVhkRW/VR4p PPQQYxKzaZ3k5vdo3AbhIEIOu0R+HfVlE9ueLFvmXPmDUXTtXi2OJksHnkWdjTTAi+SM Kf5YcqJ7a+m+UYi/Lv3Fnj6K+w7whjd4xJpiwH5pWIKkzKrA8DjFRNMYrD1Y607AoWA/ Yq1O69nqEEbUkWCkQQJE8JsF/h05bFXtxHWwIHtYlLTywMpX9vIvoW9sntg/iLrpuDty 2VlsSECA2UCOWKN4CWIqmepkc7Jr8kbUmERVLKwlew9RZmRg0W+3/jieW+OuWxngcLge PC6g== MIME-Version: 1.0 X-Received: by 10.107.12.196 with SMTP id 65mr25979808iom.71.1422920969639; Mon, 02 Feb 2015 15:49:29 -0800 (PST) Received: by 10.36.54.9 with HTTP; Mon, 2 Feb 2015 15:49:29 -0800 (PST) In-Reply-To: <7876DB92-2F2B-4A8A-B94D-F672E664BE3B@gmail.com> References: <7876DB92-2F2B-4A8A-B94D-F672E664BE3B@gmail.com> Date: Mon, 2 Feb 2015 18:49:29 -0500 Message-ID: Subject: Re: Simple- Just copying plain files into the cluster (hdfs) using flume - possible? From: Bob Metelsky To: user@flume.apache.org Content-Type: multipart/alternative; boundary=001a113f8fd2f0670a050e239a69 X-Virus-Checked: Checked by ClamAV on apache.org --001a113f8fd2f0670a050e239a69 Content-Type: text/plain; charset=UTF-8 Steve - I appreciate you time on this... Yes, I want to use flume to copy .xml or .whatever files from a server outside the cluster to hdfs. That server does l have flume installed on it Id like the same behavior as "spooling directory" but from a remote machine --> to hdfs So, from all my reading flume looks like it completely designed for streaming "live" logs and program outputs... Doesn't seem to be known for being a filewatcher and grabbing files as they show up, then shiping and writing to hdfs Of can it? Ok I can think fragmentation with individual "small" files but doesn't "spool directory behaviour" face the same issue? I've done quite a bit of reading but one can easily get into the weeds :) - All I need to do is this simple task. Thanks On Mon, Feb 2, 2015 at 5:17 PM, Steve Morin wrote: > So you want 1 to 1 replication of the logs to HDFS? > > As a footnote people usually don't do this because the log files are often > too small (think fragmentation) which causes performance problems when used > on Hadoop > > On Feb 2, 2015, at 13:30, Bob Metelsky wrote: > > Hi I have a simple requirement > > on server1 (NOT in the cluster, but has flume installed) > I have a process that constantly generates xml files in a known directory > > I need to transfer them to server2 (IN the hadoop cluster) > and into hdfs as xml files > > from what Im reading avro, thrift rpc, et all - are designed for other uses > > Is there a way to have flume just copy over plain files? txt, xml... > Im thinking there should be but I cant find it > > The closest I see is the "spooling directory" but that seems to be the > files are already inside the cluster. > > Can flume do this? Is there an example,I've read the flume documentation > and nothing is jumping out > > Thanks! > > --001a113f8fd2f0670a050e239a69 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Steve - I appreciate you time on this...

Yes, I want to use flume to copy .xml =C2=A0or .whatever files from a ser= ver outside the cluster to hdfs. That server does l have flume installed on= it

Id like the same behavior as "spooling director= y" but from a remote machine --> to hdfs

S= o, from all my reading flume looks like it completely designed for streamin= g "live" logs and program outputs...

Doe= sn't seem to be known for =C2=A0being a filewatcher and grabbing files = as they show up, then shiping and writing to hdfs

= Of can it?

Ok I can think fragmentation with indiv= idual "small" files but doesn't "spool directory behavio= ur" face the same issue?=C2=A0

I've done = quite a bit of reading but one can easily get into the weeds :) - All I nee= d to do is this simple task.

Thanks

=


On Mon, Feb 2, 2015 at 5:17 PM, Steve Morin <steve.mo= rin@gmail.com> wrote:
So you want 1 to 1 replication of the logs to HDFS? =C2= =A0

As a footnote people usually don't do this= because the log files are often too small (think fragmentation) which caus= es performance problems when used on Hadoop=C2=A0

On Feb 2, 2015, at 13:30, Bob Metelsky <bob.metelsky@gmail.com> wrot= e:

Hi I have a simple requirement<= /div>

on server1 (NOT in the cluster, bu= t has flume installed)
I have a process that=C2=A0constantly=C2=A0generates xml files in a = known directory
=
I need to t= ransfer them to server2 (IN=C2=A0the=C2=A0hadoop cluster)
and into hdfs as xml files=

from what Im reading avro, thrift= rpc, et all - are designed for other uses

<= font size=3D"1">Is there a=C2=A0way=C2=A0to have flume just copy over plain= files? txt, xml...
Im thinking there should be but I cant find it

The closest I see is=C2=A0the=C2=A0"spooling= directory" but that seems to be the files are already inside=C2=A0the= =C2=A0cluster.
<= br>
Can flume do= this? Is=C2=A0there=C2=A0an example,I've=C2=A0read the flume documenta= tion and nothing is=C2=A0jumping out

Thanks!

--001a113f8fd2f0670a050e239a69--