Thanks Ahmed Vila.

I will consider the suggestions you have mentioned above when I design the flume agent.

Thanks & Regards,

Shiva Ram

On Fri, Oct 2, 2015 at 3:12 PM, Ahmed Vila <avila@devlogic.eu> wrote:
Hi Shiva,

If your files are immutable (once the file is placed in a directory, they won't be changed ever afterwards), then the best source to use is spooling directory.
If the files are mutable, then avoid spooling directory source as Flume will throw an exception and shut the source down, so you'll have to restart it.

You can put flume on a different server than the one where files reside and have that folder mounted as a local folder via NFS or similar.
That isn't an option if you'll mount source folder across the firewall, two networks or an internet.

With exec source it's hard to achieve cross-node execution as it will have to execute a real bash command you provide it with on a remote node.
If you still achieve it, it will be very slow due to constant SSH negotiation.

Either way, I would most definitely recommend to put flume on a same node where the source folder is, or at least closest to the source like in the same network.
That way you can minimize influence of network jitters and dropouts to the source. All sources that pull data will fail ungracefully if they encounter an error fetching data and you'll end up restarting flume.

If the HDFS is cross-network or across the internet, I would suggest bonding two flumes on both sides of a wire via AvroSink on source node and AvroSource on destination node since they support fundamental things for such harsh transport environment, like serialization, compression, SSL security over a single TCP connection and a need to have only one port open etc.
Then, you configure Flume on destination to drain via HdfsSink into the HDFS.


On Fri, Oct 2, 2015 at 7:08 AM, Shiva Ram <shivaram.hadoop2015@gmail.com> wrote:
Set files are placed in the remote server[not a hadoop cluster node], which source type is suitable for collecting these files from remote server to HDFS using Flume. The initial study on Flume, I came to know source type "Exec", "Spooling Directory" can be used to collect these file, I want to know whether Flume service should run the remote server[source system from where i want to get the data]? Thanks.

Thanks & Regards,

Shiva Ram

On Fri, Oct 2, 2015 at 10:36 AM, <user-help@flume.apache.org> wrote:
Hi! This is the ezmlm program. I'm managing the
user@flume.apache.org mailing list.

Acknowledgment: I have added the address

   shivaram.hadoop2015@gmail.com

to the user mailing list.

Welcome to user@flume.apache.org!

Please save this message so that you know the address you are
subscribed under, in case you later want to unsubscribe or change your
subscription address.


--- Administrative commands for the user list ---

I can handle administrative requests automatically. Please
do not send them to the list address! Instead, send
your message to the correct command address:

To subscribe to the list, send a message to:
   <user-subscribe@flume.apache.org>

To remove your address from the list, send a message to:
   <user-unsubscribe@flume.apache.org>

Send mail to the following for info and FAQ for this list:
   <user-info@flume.apache.org>
   <user-faq@flume.apache.org>

Similar addresses exist for the digest list:
   <user-digest-subscribe@flume.apache.org>
   <user-digest-unsubscribe@flume.apache.org>

To get messages 123 through 145 (a maximum of 100 per request), mail:
   <user-get.123_145@flume.apache.org>

To get an index with subject and author for messages 123-456 , mail:
   <user-index.123_456@flume.apache.org>

They are always returned as sets of 100, max 2000 per request,
so you'll actually get 100-499.

To receive all messages with the same subject as message 12345,
send a short message to:
   <user-thread.12345@flume.apache.org>

The messages should contain one line or word of text to avoid being
treated as sp@m, but I will ignore their content.
Only the ADDRESS you send to is important.

You can start a subscription for an alternate address,
for example "john@host.domain", just add a hyphen and your
address (with '=' instead of '@') after the command word:
<user-subscribe-john=host.domain@flume.apache.org>

To stop subscription for this address, mail:
<user-unsubscribe-john=host.domain@flume.apache.org>

In both cases, I'll send a confirmation message to that address. When
you receive it, simply reply to it to complete your subscription.

If despite following these instructions, you do not get the
desired results, please contact my owner at
user-owner@flume.apache.org. Please be patient, my owner is a
lot slower than I am ;-)

--- Enclosed is a copy of the request I received.

Return-Path: <shivaram.hadoop2015@gmail.com>
Received: (qmail 43413 invoked by uid 99); 2 Oct 2015 05:06:54 -0000
Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Oct 2015 05:06:54 +0000
Received: from localhost (localhost [127.0.0.1])
        by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A1269C14BD
        for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015=gmail.com@flume.apache.org>; Fri,  2 Oct 2015 05:06:53 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 3.131
X-Spam-Level: ***
X-Spam-Status: No, score=3.131 tagged_above=-999 required=6.31
        tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
        FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3,
        RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, URIBL_BLOCKED=0.001]
        autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
        dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-us-east.apache.org ([10.40.0.8])
        by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
        with ESMTP id CjJlyeYvk98Y
        for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015=gmail.com@flume.apache.org>;
        Fri,  2 Oct 2015 05:06:49 +0000 (UTC)
Received: from mail-ig0-f180.google.com (mail-ig0-f180.google.com [209.85.213.180])
        by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id D4FBA42B32
        for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015=gmail.com@flume.apache.org>; Fri,  2 Oct 2015 05:06:48 +0000 (UTC)
Received: by igxx6 with SMTP id x6so9676936igx.1
        for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015=gmail.com@flume.apache.org>; Thu, 01 Oct 2015 22:06:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        bh=W4CNcckri44NbE1Oxr7dX2Sqd3SyZ+fbygPB84QfoW4=;
        b=U5ECXsUfh+BabyrKs3fWSkau4ItIQmhGMFojV40mE9Wmd9njMInTSCoHP0tKetDy9W
         3wOkHIUKhlcJN1V8Q2XVLXvQ9pxsgOXIBh6CJLKuWW+ROySftRYURLypX8kvjl480Uvp
         iosJBrfG9VCP6WGaRTFqLr7ncGr7kSafiAlnUYnfkK9j6DgZZMv31gynAD+uyjQYgmI9
         U01YKPiG0nzWf2usFbSFS0ZwNU0iPCeWGzWZsTi4irbpOJGwh0H1bfORasby80kg2VPW
         ECUbqM8luLRGqp+JigZzSB6nmMdTiWjFrVjFdVDc1a2MMqZH7Bx9/0f3STIglhFTYolj
         CtvA==
MIME-Version: 1.0
X-Received: by 10.50.70.98 with SMTP id l2mr2264433igu.52.1443762402446; Thu,
 01 Oct 2015 22:06:42 -0700 (PDT)
Received: by 10.107.15.210 with HTTP; Thu, 1 Oct 2015 22:06:42 -0700 (PDT)
In-Reply-To: <1443762280.42117.ezmlm@flume.apache.org>
References: <1443762280.42117.ezmlm@flume.apache.org>
Date: Fri, 2 Oct 2015 10:36:42 +0530
Message-ID: <CAA8xGAEzME9N=ZtQmP2XfGufkigiK5jmuLGtCj6pd-VNV75V2g@mail.gmail.com>
Subject: Re: confirm subscribe to user@flume.apache.org
From: Shiva Ram <shivaram.hadoop2015@gmail.com>
To: user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015=gmail.com@flume.apache.org
Content-Type: multipart/alternative; boundary=047d7b3a959223534105211821a4





--

Best regards,

Ahmed Vila | Senior software developer
DevLogic | Sarajevo | Bosnia and Herzegovina

Office : +387 33 942 123 
Mobile: +387 62 139 348

Website: www.devlogic.eu 
E-mail   : avila@devlogic.eu
---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended recipient(s) only. This email contains confidential information. It should not be copied, disclosed to, retained or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination or copying of this E-mail or its attachments, and/or any use of any information contained in them, is strictly prohibited and may be illegal. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender directly via email. Any emails that you send to us may be monitored by systems or persons other than the named communicant for the purposes of ascertaining whether the communication complies with the law and company policies.

---------------------------------------------------------------------
This e-mail and any attachment is for authorised use by the intended recipient(s) only. This email contains confidential information. It should not be copied, disclosed to, retained or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination or copying of this E-mail or its attachments, and/or any use of any information contained in them, is strictly prohibited and may be illegal. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender directly via email. Any emails that you send to us may be monitored by systems or persons other than the named communicant for the purposes of ascertaining whether the communication complies with the law and company policies.