flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From =?ks_c_5601-1987?B?YXNodXRvc2gov8DHwsfDt6fG+7Czud/GwCk=?= <sharma.ashut...@kt.com>
Subject Need Suggestion - Log Store
Date Fri, 07 Sep 2012 02:33:38 GMT
Hi All,

I know that my query is not related to the flume, but it has correlation to the flume based
solution. It’ll help others too, to understand the design of the flume based solution.
So, the story begin from here, we have 300+ servers running about 20+ apps on these hosts.
These apps generates five different types of logs as per their functional behavior. I am designing
the solution to collect all these logs from these hosts and store them to hadoop cluster.
We want to analyze all these logs for purpose of monitoring, and current trends etc. I want
to design the solution from both collection and analysis point of view. The solution should
be robust to support the requirement from both ends. So, I need your help to design the solution
for storage of logs so that we could efficiently analyze.
According to my design, I defined the structure of the log store as follows:
<MainDirectory>…<LogType>…<Host>…<Date>…<logfile>  // rolling
interval is 1 min.

I think above directory structure to store the logs is fine enough as it’ll be simple enough
to utilize the data for analysis as it’s clearly define the data belongs to which server,
log type and the date. But, in my PoC, I ended up with lots of small files as each host generate
20-50 logs per second with 240 bytes log size. We expect more number of logs generated by
the system in future e.g. 100-150. So, according to the above numbers, should I change my
directory structure by removing the host directory and combine the logs from all available
sources for a particular type of log and store them date wise? In this case, I don’t want
to lose the host information associated with each log event. So, I can store the host information
as part of the log itself. So, the changed directory structure would be as follows:

So, what should be the idle directory structure to store logs data?

Please provide your valuable inputs or suggest me some forum where I could get suggestion
from the experts.

Thanks & Regards,
Ashutosh Sharma

이 메일은 지정된 수취인만을 위해 작성되었으며, 중요한 정보나 저작권을 포함하고 있을 수 있습니다.
어떠한 권한 없이, 본 문서에 포함된 정보의 전부 또는 일부를 무단으로 제3자에게 공개, 배포,
복사 또는 사용하는 것을 엄격히 금지합니다. 만약, 본 메일이 잘못 전송된 경우, 발신인 또는 당사에
알려주시고, 본 메일을 즉시 삭제하여 주시기 바랍니다.
This E-mail may contain confidential information and/or copyright material. This email is
intended for the use of the addressee only. If you receive this email by mistake, please either
delete it without reproducing, distributing or retaining copies thereof or notify the sender
View raw message