incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: [DISCUSS] Eagle incubator proposal
Date Tue, 20 Oct 2015 15:02:20 GMT
Hi Arun,

Eagle sounds very promising. I just had a discussion with someone about
this exact need. I do however agree with Greg on the name. As far as I can
see, besides the name, your weakest point is the all eBay employed team.
It's not a blocker and can be fixed during incubation. Good luck to you.

Alex


On Tue, Oct 20, 2015 at 5:51 PM, Manoharan, Arun <armanoharan@ebay.com>
wrote:

> Hi Greg,
>
> Thank you for reviewing the proposal.
>
> Originally we thought Eagle might be trademarked by someone already but I
> went thru eBay legal team to get the clearance for the name to be used. We
> will look into it again to see if there will be potential problems.
>
> Thanks,
> Arun
>
> On 10/20/15, 1:52 AM, "Greg Stein" <gstein@gmail.com> wrote:
>
> >Hey there, Arun! ... I have no commentary on the proposal itself, as it
> >looks like a great proposal. I would suggest being a bit wary of the name,
> >as "Eagle" is a *very* popular PCB design program.
> >
> >On Mon, Oct 19, 2015 at 10:33 AM, Manoharan, Arun <armanoharan@ebay.com>
> >wrote:
> >
> >> Hello Everyone,
> >>
> >> My name is Arun Manoharan. Currently a product manager in the Analytics
> >> platform team at eBay Inc.
> >>
> >> I would like to start a discussion on Eagle and its joining the ASF as
> >>an
> >> incubation project.
> >>
> >> Eagle is a Monitoring solution for Hadoop to instantly identify access
> >>to
> >> sensitive data, recognize attacks, malicious activities and take
> >>actions in
> >> real time. Eagle supports a wide variety of policies on HDFS data and
> >>Hive.
> >> Eagle also provides machine learning models for detecting anomalous user
> >> behavior in Hadoop.
> >>
> >> The proposal is available on the wiki here:
> >> https://wiki.apache.org/incubator/EagleProposal
> >>
> >> The text of the proposal is also available at the end of this email.
> >>
> >> Thanks for your time and help.
> >>
> >> Thanks,
> >> Arun
> >>
> >> <COPY of the proposal in text format>
> >>
> >> Eagle
> >>
> >> Abstract
> >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> >> identify access to sensitive data, recognize attacks, malicious
> >>activities
> >> in hadoop and take actions.
> >>
> >> Proposal
> >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> >> enforces policies defined on sensitive data access and alerts or blocks
> >> user¹s access to that sensitive data in real time. Eagle also creates
> >>user
> >> profiles based on the typical access behaviour for HDFS and Hive and
> >>sends
> >> alerts when anomalous behaviour is detected. Eagle can also import
> >> sensitive data information classified by external classification
> >>engines to
> >> help define its policies.
> >>
> >> Overview of Eagle
> >> Eagle has 3 main parts.
> >> 1.Data collection and storage - Eagle collects data from various hadoop
> >> logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> >>storage.
> >> 2.Data processing and policy engine - Eagle allows users to create
> >> policies based on various metadata properties on HDFS, Hive and HBase
> >>data.
> >> 3.Eagle services - Eagle services include policy manager, query service
> >> and the visualization component. Eagle provides intuitive user
> >>interface to
> >> administer Eagle and an alert dashboard to respond to real time alerts.
> >>
> >> Data Collection and Storage:
> >> Eagle provides programming API for extending Eagle to integrate any data
> >> source into Eagle policy evaluation framework. For example, Eagle hdfs
> >> audit monitoring collects data from Kafka which is populated from
> >>namenode
> >> log4j appender or from logstash agent. Eagle hive monitoring collects
> >>hive
> >> query logs from running job through YARN API, which is designed to be
> >> scalable and fault-tolerant. Eagle uses HBase as storage for storing
> >> metadata and metrics data, and also supports relational database through
> >> configuration change.
> >>
> >> Data Processing and Policy Engine:
> >> Processing Engine: Eagle provides stream processing API which is an
> >> abstraction of Apache Storm. It can also be extended to other streaming
> >> engines. This abstraction allows developers to assemble data
> >> transformation, filtering, external data join etc. without physically
> >>bound
> >> to a specific streaming platform. Eagle streaming API allows developers
> >>to
> >> easily integrate business logic with Eagle policy engine and internally
> >> Eagle framework compiles business logic execution DAG into program
> >> primitives of underlying stream infrastructure e.g. Apache Storm. For
> >> example, Eagle HDFS monitoring transforms audit log from Namenode to
> >>object
> >> and joins sensitivity metadata, security zone metadata which are
> >>generated
> >> from external programs or configured by user. Eagle hive monitoring
> >>filters
> >> running jobs to get hive query string and parses query string into
> >>object
> >> and then joins sensitivity metadata.
> >> Alerting Framework: Eagle Alert Framework includes stream metadata API,
> >> scalable policy engine framework, extensible policy engine framework.
> >> Stream metadata API allows developers to declare event schema including
> >> what attributes constitute an event, what is the type for each
> >>attribute,
> >> and how to dynamically resolve attribute value in runtime when user
> >> configures policy. Scalable policy engine framework allows policies to
> >>be
> >> executed on different physical nodes in parallel. It is also used to
> >>define
> >> your own policy partitioner class. Policy engine framework together with
> >> streaming partitioning capability provided by all streaming platforms
> >>will
> >> make sure policies and events can be evaluated in a fully distributed
> >>way.
> >> Extensible policy engine framework allows developer to plugin a new
> >>policy
> >> engine with a few lines of codes. WSO2 Siddhi CEP engine is the policy
> >> engine which Eagle supports as first-class citizen.
> >> Machine Learning module: Eagle provides capabilities to define user
> >> activity patterns or user profiles for Hadoop users based on the user
> >> behaviour in the platform. These user profiles are modeled using Machine
> >> Learning algorithms and used for detection of anomalous users
> >>activities.
> >> Eagle uses Eigen Value Decomposition, and Density Estimation algorithms
> >>for
> >> generating user profile models. The model reads data from HDFS audit
> >>logs,
> >> preprocesses and aggregates data, and generates models using Spark
> >> programming APIs. Once models are generated, Eagle uses stream
> >>processing
> >> engine for near real-time anomaly detection to determine if any user¹s
> >> activities are suspicious or not.
> >>
> >> Eagle Services:
> >> Query Service: Eagle provides SQL-like service API to support
> >> comprehensive computation for huge set of data on the fly, for e.g.
> >> comprehensive filtering, aggregation, histogram, sorting, top,
> >>arithmetical
> >> expression, pagination etc. HBase is the data storage which Eagle
> >>supports
> >> as first-class citizen, relational database is supported as well. For
> >>HBase
> >> storage, Eagle query framework compiles user provided SQL-like query
> >>into
> >> HBase native filter objects and execute it through HBase coprocessor on
> >>the
> >> fly.
> >> Policy Manager: Eagle policy manager provides UI and Restful API for
> >>user
> >> to define policy with just a few clicks. It includes site management UI,
> >> policy editor, sensitivity metadata import, HDFS or Hive sensitive
> >>resource
> >> browsing, alert dashboards etc.
> >> Background
> >> Data is one of the most important assets for today¹s businesses, which
> >> makes data security one of the top priorities of today¹s enterprises.
> >> Hadoop is widely used across different verticals as a big data
> >>repository
> >> to store this data in most modern enterprises.
> >> At eBay we use hadoop platform extensively for our data processing
> >>needs.
> >> Our data in Hadoop is becoming bigger and bigger as our user base is
> >>seeing
> >> an exponential growth. Today there are variety of data sets available in
> >> Hadoop cluster for our users to consume. eBay has around 120 PB of data
> >> stored in HDFS across 6 different clusters and around 1800+ active
> >>hadoop
> >> users consuming data thru Hive, HBase and mapreduce jobs everyday to
> >>build
> >> applications using this data. With this astronomical growth of data
> >>there
> >> are also challenges in securing sensitive data and monitoring the
> >>access to
> >> this sensitive data. Today in large organizations HDFS is the defacto
> >> standard for storing big data. Data sets which includes and not limited
> >>to
> >> consumer sentiment, social media data, customer segmentation, web
> >>clicks,
> >> sensor data, geo-location and transaction data get stored in Hadoop for
> >>day
> >> to day business needs.
> >> We at eBay want to make sure the sensitive data and data platforms are
> >> completely protected from security breaches. So we partnered very
> >>closely
> >> with our Information Security team to understand the requirements for
> >>Eagle
> >> to monitor sensitive data access on hadoop:
> >> 1.Ability to identify and stop security threats in real time
> >> 2.Scale for big data (Support PB scale and Billions of events)
> >> 3.Ability to create data access policies
> >> 4.Support multiple data sources like HDFS, HBase, Hive
> >> 5.Visualize alerts in real time
> >> 6.Ability to block malicious access in real time
> >> We did not find any data access monitoring solution that available today
> >> and can provide the features and functionality that we need to monitor
> >>the
> >> data access in the hadoop ecosystem at our scale. Hence with an
> >>excellent
> >> team of world class developers and several users, we have been able to
> >> bring Eagle into production as well as open source it.
> >>
> >> Rationale
> >> In today¹s world; data is an important asset for any company. Businesses
> >> are using data extensively to create amazing experiences for users. Data
> >> has to be protected and access to data should be secured from security
> >> breaches. Today Hadoop is not only used to store logs but also stores
> >> financial data, sensitive data sets, geographical data, user click
> >>stream
> >> data sets etc. which makes it more important to be protected from
> >>security
> >> breaches. To secure a data platform there are multiple things that need
> >>to
> >> happen. One is having a strong access control mechanism which today is
> >> provided by Apache Ranger and Apache Sentry. These tools provide the
> >> ability to provide fine grain access control mechanism to data sets on
> >> hadoop. But there is a big gap in terms of monitoring all the data
> >>access
> >> events and activities in order to securing the hadoop data platform.
> >> Together with strong access control, perimeter security and data access
> >> monitoring in place data in the hadoop clusters can be secured against
> >> breaches. We looked around and found following:
> >> Existing data activity monitoring products are designed for traditional
> >> databases and data warehouse. Existing monitoring platforms cannot scale
> >> out to support fast growing data and petabyte scale. Few products in the
> >> industry are still very early in terms of supporting HDFS, Hive, HBase
> >>data
> >> access monitoring.
> >> As mentioned in the background, the business requirement and urgency to
> >> secure the data from users with malicious intent drove eBay to invest in
> >> building a real time data access monitoring solution from scratch to
> >>offer
> >> real time alerts and remediation features for malicious data access.
> >> With the power of open source distributed systems like Hadoop, Kafka and
> >> much more we were able to develop a data activity monitoring system that
> >> can scale, identify and stop malicious access in real time.
> >> Eagle allows admins to create standard access policies and rules for
> >> monitoring HDFS, Hive and HBase data. Eagle also provides out of box
> >> machine learning models for modeling user profiles based on user access
> >> behaviour and use the model to alert on anomalies.
> >>
> >> Current Status
> >>
> >> Meritocracy
> >> Eagle has been deployed in production at eBay for monitoring billions of
> >> events per day from HDFS and Hive operations. From the start; the
> >>product
> >> has been built with focus on high scalability and application
> >>extensibility
> >> in mind and Eagle has demonstrated great performance in responding to
> >> suspicious events instantly and great flexibility in defining policy.
> >>
> >> Community
> >> Eagle seeks to develop the developer and user communities during
> >> incubation.
> >>
> >> Core Developers
> >> Eagle is currently being designed and developed by engineers from eBay
> >> Inc. ­ Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang,
> >> Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of
> >> these core developers have deep expertise in developing monitoring
> >>products
> >> for the Hadoop ecosystem.
> >>
> >> Alignment
> >> The ASF is a natural host for Eagle given that it is already the home of
> >> Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
> >> projects. Eagle leverages lot of Apache open-source products. Eagle was
> >> designed to offer real time insights into sensitive data access by
> >>actively
> >> monitoring the data access on various data sets in hadoop and an
> >>extensible
> >> alerting framework with a powerful policy engine. Eagle compliments the
> >> existing Hadoop platform area by providing a comprehensive monitoring
> >>and
> >> alerting solution for detecting sensitive data access threats based on
> >> preset policies and machine learning models for user behaviour analysis.
> >>
> >> Known Risks
> >>
> >> Orphaned Products
> >> The core developers of Eagle team work full time on this project. There
> >>is
> >> no risk of Eagle getting orphaned since eBay is extensively using it in
> >> their production Hadoop clusters and have plans to go beyond hadoop. For
> >> example, currently there are 7 hadoop clusters and 2 of them are being
> >> monitored using Hadoop Eagle in production. We have plans to extend it
> >>to
> >> all hadoop clusters and eventually other data platforms. There are 10¹s
> >>of
> >> policies onboarded and actively monitored with plans to onboard more use
> >> case. We are very confident that every hadoop cluster in the world will
> >>be
> >> monitored using Eagle for securing the hadoop ecosystem by actively
> >> monitoring for data access on sensitive data. We plan to extend and
> >> diversify this community further through Apache. We presented Eagle at
> >>the
> >> hadoop summit in china and garnered interest from different companies
> >>who
> >> use hadoop extensively.
> >>
> >> Inexperience with Open Source
> >> The core developers are all active users and followers of open source.
> >> They are already committers and contributors to the Eagle Github
> >>project.
> >> All have been involved with the source code that has been released
> >>under an
> >> open source license, and several of them also have experience developing
> >> code in an open source environment. Though the core set of Developers do
> >> not have Apache Open Source experience, there are plans to onboard
> >> individuals with Apache open source experience on to the project. Apache
> >> Kylin PMC members are also in the same ebay organization. We work very
> >> closely with Apache Ranger committers and are looking forward to find
> >> meaningful integrations to improve the security of hadoop platform.
> >>
> >> Homogenous Developers
> >> The core developers are from eBay. Today the problem of monitoring data
> >> activities to find and stop threats is a universal problem faced by all
> >>the
> >> businesses. Apache Incubation process encourages an open and diverse
> >> meritocratic community. Eagle intends to make every possible effort to
> >> build a diverse, vibrant and involved community and has already received
> >> substantial interest from various organizations.
> >>
> >> Reliance on Salaried Developers
> >> eBay invested in Eagle as the monitoring solution for Hadoop clusters
> >>and
> >> some of its key engineers are working full time on the project. In
> >> addition, since there is a growing need for securing sensitive data
> >>access
> >> we need a data activity monitoring solution for Hadoop, we look forward
> >>to
> >> other Apache developers and researchers to contribute to the project.
> >> Additional contributors, including Apache committers have plans to join
> >> this effort shortly. Also key to addressing the risk associated with
> >> relying on Salaried developers from a single entity is to increase the
> >> diversity of the contributors and actively lobby for Domain experts in
> >>the
> >> security space to contribute. Eagle intends to do this.
> >>
> >> Relationships with Other Apache Products
> >> Eagle has a strong relationship and dependency with Apache Hadoop,
> >>HBase,
> >> Spark, Kafka and Storm. Being part of Apache¹s Incubation community,
> >>could
> >> help with a closer collaboration among these projects and as well as
> >> others. An Excessive Fascination with the Apache Brand Eagle is
> >>proposing
> >> to enter incubation at Apache in order to help efforts to diversify the
> >> committer-base, not so much to capitalize on the Apache brand. The Eagle
> >> project is in production use already inside eBay, but is not expected
> >>to be
> >> an eBay product for external customers. As such, the Eagle project is
> >>not
> >> seeking to use the Apache brand as a marketing tool.
> >>
> >> Documentation
> >> Information about Eagle can be found at https://github.com/eBay/Eagle.
> >> The following link provide more information about Eagle
> >>http://goeagle.io.
> >>
> >> Initial Source
> >> Eagle has been under development since 2014 by a team of engineers at
> >>eBay
> >> Inc. It is currently hosted on Github.com under an Apache license 2.0 at
> >> https://github.com/eBay/Eagle. Once in incubation we will be moving the
> >> code base to apache git library.
> >>
> >> External Dependencies
> >> Eagle has the following external dependencies.
> >> Basic
> >> €JDK 1.7+
> >> €Scala 2.10.4
> >> €Apache Maven
> >> €JUnit
> >> €Log4j
> >> €Slf4j
> >> €Apache Commons
> >> €Apache Commons Math3
> >> €Jackson
> >> €Siddhi CEP engine
> >>
> >> Hadoop
> >> €Apache Hadoop
> >> €Apache HBase
> >> €Apache Hive
> >> €Apache Zookeeper
> >> €Apache Curator
> >>
> >> Apache Spark
> >> €Spark Core Library
> >>
> >> REST Service
> >> €Jersey
> >>
> >> Query
> >> €Antlr
> >>
> >> Stream processing
> >> €Apache Storm
> >> €Apache Kafka
> >>
> >> Web
> >> €AngularJS
> >> €jQuery
> >> €Bootstrap V3
> >> €Moment JS
> >> €Admin LTE
> >> €html5shiv
> >> €respond
> >> €Fastclick
> >> €Date Range Picker
> >> €Flot JS
> >>
> >> Cryptography
> >> Eagle will eventually support encryption on the wire. This is not one of
> >> the initial goals, and we do not expect Eagle to be a controlled export
> >> item due to the use of encryption. Eagle supports but does not require
> >>the
> >> Kerberos authentication mechanism to access secured Hadoop services.
> >>
> >> Required Resources
> >>
> >> Mailing List
> >> €eagle-private for private PMC discussions
> >> €eagle-dev for developers
> >> €eagle-commits for all commits
> >> €eagle-users for all eagle users
> >>
> >> Subversion Directory
> >> €Git is the preferred source control system.
> >>
> >> Issue Tracking
> >> €JIRA Eagle (Eagle)
> >>
> >> Other Resources
> >> The existing code already has unit tests so we will make use of existing
> >> Apache continuous testing infrastructure. The resulting load should not
> >>be
> >> very large.
> >>
> >> Initial Committers
> >> €Seshu Adunuthula <sadunuthula at ebay dot com>
> >> €Arun Manoharan <armanoharan at ebay dot com>
> >> €Edward Zhang <yonzhang at ebay dot com>
> >> €Hao Chen <hchen9 at ebay dot com>
> >> €Chaitali Gupta <cgupta at ebay dot com>
> >> €Libin Sun <libsun at ebay dot com>
> >> €Jilin Jiang <jiljiang at ebay dot com>
> >> €Qingwen Zhao <qingwzhao at ebay dot com>
> >> €Hemanth Dendukuri <hdendukuri at ebay dot com>
> >> €Senthil Kumar <senthilkumar at ebay dot com>
> >> €Tan Chen <tanchen at ebay dot com>
> >>
> >> Affiliations
> >> The initial committers are employees of eBay Inc.
> >>
> >> Sponsors
> >>
> >> Champion
> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> >>
> >> Nominated Mentors
> >> €Owen O¹Malley < omalley at apache dot org > - Apache IPMC member,
> >> Hortonworks
> >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> >> €Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member,
> >> Hortonworks
> >>
> >> Sponsoring Entity
> >> We are requesting the Incubator to sponsor this project.
> >>
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Best Regards,
-- Alex

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message