incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 黄向东 <saint...@gmail.com>
Subject Re: [Result][Vote] vote for IoTDB incubation proposal
Date Thu, 15 Nov 2018 14:57:45 GMT
> - When you say "open source" repo, do you mean private repo vs public
> repo?

Yes.

> 
> - I believe Craig as Secretary will say an SGA never hurts but isn't
> everything already licensed ASLv2?  It's been a few weeks and a few
> proposals reviewed so it could be my memory.

Currently, the licenses of the dependency libs of IoTDB includes: Apache2.0, BSD (antlr3),
EPL1.0 (logback) and EPL2.0 (junit). 
We are working on checking all the licenses once again for avoiding mistakes.

Regards,
Xiangdong Huang


> 在 2018年11月15日,下午10:43,Kevin A. McGrail <kmcgrail@apache.org> 写道:
> 
> Well, first, let's ask some questions:
> 
> - When you say "open source" repo, do you mean private repo vs public
> repo?
> 
> - I believe Craig as Secretary will say an SGA never hurts but isn't
> everything already licensed ASLv2?  It's been a few weeks and a few
> proposals reviewed so it could be my memory.
> 
> Regards,
> KAM
> 
> --
> Kevin A. McGrail
> VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
> 
> 
> On Thu, Nov 15, 2018 at 7:27 AM hxd <hxdreg@qq.com> wrote:
> 
>> Currently, there are 6 repositories (IoTDB, IoTDB-JDBC, TsFile,
>> Spark-Connector, Hive-Connector, and Grafana-Connector) totally and we will
>> merge them all in one repositories.
>> 
>> Only the first one is private.
>> 
>> Actually we are lack of experiences about how to open source.
>> 
>> Should we open all the source now or after all the Apache legal documents
>> are done?
>> 
>> Best,
>> 
>> Xiangdong Huang
>> 
>>> 在 2018年11月15日,下午5:06,Willem Jiang <willem.jiang@gmail.com>
写道:
>>> 
>>> Here is a question for the source code repository
>>> 
>>> The main source git repo[1] is still a private repo.  I think we need
>>> to open source the repo before sending the SGA?
>>> 
>>> 
>>> [1]https://github.com/thulab/iotdb
>>> 
>>> Willem Jiang
>>> 
>>> Twitter: willemjiang
>>> Weibo: 姜宁willem
>>> On Thu, Nov 15, 2018 at 4:08 PM hxd <hxdreg@qq.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> In the proposal discussion process, we got 3 mentors,  Justin Mclean,
>> Christofer Dutz, and Willem Ning Jiang.
>>>> 
>>>> In the vote process, we got a new mentor, Joe Witt.
>>>> 
>>>> Totally, there are one Champion and four mentors, they are:
>>>> 
>>>> Kevin A. McGrail (the Champion),
>>>> Justin Mclean,
>>>> Christofer Dutz,
>>>> Willem Ning Jiang, and
>>>> Joe Witt
>>>> 
>>>> I have checked their name on
>> http://people.apache.org/committer-index.html, and they are accurate now.
>>>> The name list on the proposal list (
>> https://wiki.apache.org/incubator/IoTDBProposal) is also correct.
>>>> 
>>>> Regards,
>>>> Xiangdong Huang
>>>> 
>>>> 
>>>> 
>>>> 在 2018年11月15日,上午12:51,Kevin A. McGrail <kmcgrail@apache.org>
写道:
>>>> 
>>>> Congratulations!  As champion, I think the next steps are:
>>>> 
>>>> 1 - Xiangdong, Can you confirm the list of mentors on the proposal is
>> accurate?
>>>> 
>>>> 2 - Also Xiangdong, Is there anyone else that stepped forward as a
>> mentor during the voting process that the project wants the IPMC to approve?
>>>> 
>>>> 3 - Justin, I think you have to request the creation of the podling and
>> then I as champion work on things like the meta data file from this page,
>>>> https://incubator.apache.org/policy/incubation.html, correct?
>>>> 
>>>> Regards,
>>>> KAM
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Kevin A. McGrail
>>>> VP Fundraising, Apache Software Foundation
>>>> Chair Emeritus Apache SpamAssassin Project
>>>> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>>>> 
>>>> 
>>>> On Wed, Nov 14, 2018 at 6:29 AM hxd <hxdreg@qq.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> With 8 +1 binding votes,  2 +1 non-binding votes and No +/-0 or -1
>> votes, this VOTE passes.
>>>>> 
>>>>> Thanks to everyone who voted!
>>>>> 
>>>>> Bellow is a voting tally:
>>>>> 
>>>>> Binding
>>>>> Von Gosling
>>>>> Christofer Dutz
>>>>> Kevin A. McGrail
>>>>> Felix Cheung
>>>>> Matt Sticker
>>>>> Joe Witt
>>>>> Justin Mclean
>>>>> Willem Jiang
>>>>> 
>>>>> 
>>>>> Non-binding
>>>>> Sheng Wu
>>>>> Yang Bo
>>>>> 
>>>>> The vote thread:
>> https://lists.apache.org/thread.html/077f029ab2b52a2b19fc8d41c07438f660a8e93dd87b3895d262263c@%3Cgeneral.incubator.apache.org%3E
>> <
>> https://lists.apache.org/thread.html/077f029ab2b52a2b19fc8d41c07438f660a8e93dd87b3895d262263c@%3Cgeneral.incubator.apache.org%3E
>>> 
>>>>> The proposal: https://wiki.apache.org/incubator/IoTDBProposal <
>> https://wiki.apache.org/incubator/IoTDBProposal>
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Xiangdong Huang
>>>>> 
>>>>> 
>>>>>> 在 2018年11月7日,下午3:46,hxd <hxdreg@qq.com> 写道:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Sorry for the previous mail with bad format.
>>>>>> I'd like to call a VOTE to accept IoTDB project, a database for
>> managing large amounts of time series data  from IoT sensors in industrial
>> applications, into the Apache Incubator.
>>>>>> The full proposal is available on the wiki:
>> https://wiki.apache.org/incubator/IoTDBProposal
>>>>>> and it is also attached below for your convenience.
>>>>>> 
>>>>>> Please cast your vote:
>>>>>> 
>>>>>> [ ] +1, bring IoTDB into Incubator
>>>>>> [ ] +0, I don't care either way,
>>>>>> [ ] -1, do not bring IoTDB into Incubator, because...
>>>>>> 
>>>>>> The vote will open at least for 72 hours.
>>>>>> 
>>>>>> Thanks,
>>>>>> Xiangdong Huang.
>>>>>> 
>>>>>> 
>>>>>> = IoTDB Proposal  =
>>>>>> v0.1.1
>>>>>> 
>>>>>> 
>>>>>> == Abstract ==
>>>>>> IoTDB is a data store for managing large amounts of time series data
>> such as timestamped data from IoT sensors in industrial applications.
>>>>>> 
>>>>>> == Proposal ==
>>>>>> IoTDB is a database for managing large amount of time series data
>> with columnar storage, data encoding, pre-computation, and index
>> techniques. It has SQL-like interface to write millions of data points per
>> second per node and is optimized to get query results in few seconds over
>> trillions of data points. It can also be easily integrated with Apache
>> Hadoop MapReduce and Apache Spark for analytics.
>>>>>> 
>>>>>> == Background ==
>>>>>> 
>>>>>> A new class of data management system requirements is becoming
>> increasingly important with the rise of the Internet of Things. There are
>> some database systems and technologies aimed at time series data
>> management.  For example, Gorilla and InfluxDB which are mainly built for
>> data centers and monitoring application metrics. Other systems, for
>> example, OpenTSDB and KairosDB, are built on Apache HBase and Apache
>> Cassandra, respectively.
>>>>>> 
>>>>>> However, many applications for time series data management have more
>> requirements especially in industrial applications as follows:
>>>>>> 
>>>>>> * Supporting time series data which has high data frequency. For
>> example, a turbine engine may generate 1000 points per second (i.e.,
>> 1000Hz), while each CPU only reports 1 data points per 5 seconds in a data
>> center monitoring application.
>>>>>> 
>>>>>> * Supporting scanning data multi-resolutionally. For example,
>> aggregation operation is important for time series data.
>>>>>> 
>>>>>> * Supporting special queries for time series, such as pattern
>> matching, time series segmentation, time-frequency transformation and
>> frequency query.
>>>>>> 
>>>>>> * Supporting a large number of monitoring targets (i.e. time series).
>> An excavator may report more than 1000 time series, for example, revolving
>> speed of the motor-engine, the speed of the excavator, the accelerated
>> speed, the temperature of the water tank and so on, while a CPU or an
>> application monitor has much fewer time series.
>>>>>> 
>>>>>> * Optimization for out-of-order data points. In the industrial
>> sector, it is common that equipment sends data using the UDP protocol
>> rather than the TCP protocol. Sometimes, the network connect is unstable
>> and parts of the data will be buffered for later sending.
>>>>>> 
>>>>>> * Supporting long-term storage. Historical data is precious for
>> equipment manufacturers. Therefore, removing or unloading historical data
>> is highly desired for most industrial applications. The database system
>> must not only support fast retrieval of historical data, but also should
>> guarantee that the historical data does not impact the processing speed for
>> “hot” or current data.
>>>>>> 
>>>>>> * Supporting online transaction processing (OLTP) as well as complex
>> analytics. It is obvious that supporting analyzing from the data files
>> using Apache Spark/Apache Hadoop MapReduce directly is better than
>> transforming data files to another file format for Big Data analytics.
>>>>>> 
>>>>>> * Flexible deployment either on premise or in the cloud.  IoTDB is
as
>> simple and can be deployed on a Raspberry Pi handling hundreds of time
>> series. Meanwhile, the system can be also deployed in the cloud so that it
>> supports tens of millions ingestions per second, OLTP queries in
>> milliseconds, and analytics using Apache Spark/Apache Hadoop MapReduce.
>>>>>> 
>>>>>> * * (1) If users deploy IoTDB on a device, such as a Raspberry Pi,
a
>> wind turbine, or a meteorological station, the deployment of the chosen
>> database is designed to be simple. A device may have hundreds of time
>> series (but less than a thousand time series) and the database needs to
>> handle them.
>>>>>> * * (2) When deploying IoTDB in a data center, the computational
>> resources (i.e., the hardware configuration of servers) is not a problem
>> when compared to a Raspberry Pi. In this deployment, IoTDB can use more
>> computation resources, and has the ability to handle more time seires
>> (e.g., millions of time series).
>>>>>> 
>>>>>> Based on these requirements, we developed IoTDB, a new data store
>> system for managing time series data.
>>>>>> 
>>>>>> IoTDB started as a Tsinghua University research project. IoTDB's
>> developer community has also grown to include additional institutions, for
>> example, universities (e.g., Fudan University), research labs (e.g, NEL-BDS
>> lab), and corporations (e.g., K2Data, Tencent). Funding has been provided
>> by various institutions including the National Natural Science Foundation
>> of China, and industry sponsors, such as Lenovo and K2Data.
>>>>>> 
>>>>>> == Rationale ==
>>>>>> Because there is no existed open-sourced time series databases
>> covering all the above requirements, we developed IoTDB. As the system
>> matures, we are seeking a long-term home for the project. We believe the
>> Apache Software Foundation would be an ideal fit. Also joining Apache will
>> help coordinate and improve the development effort of the growing number of
>> organizations which contribute to IoTDB improving the diversity of our
>> community.
>>>>>> 
>>>>>> IoTDB contains multiple modules, which are classified into categories:
>>>>>> 
>>>>>> * '''TsFile Format''': TsFile is a new columnar file format.
>>>>>> * '''Adaptor for Analytics and Visualization''': Integrating TsFile
>> with Apache Hadoop HDFS, Apache Hadoop MapReduce and Apache Spark. Examples
>> of integrating IoTDB with Apache Kafka, Apache Storm and Grafana are also
>> provided.
>>>>>> * '''IoTDB Engine''': An engine which consists of SQL parser, query
>> plan generator, memtable, authentication and authorization,write ahead log
>> (WAL), crash recovery, out-of-order data handler, and index for aggregation
>> and pattern matching. The engine stores system data in TsFile format.
>>>>>> * '''IoTDB JDBC''': An implementation of Java Database Connectivity
>> (JDBC) for clients to connect to IoTDB using Java.
>>>>>> 
>>>>>> === TsFile Format ===
>>>>>> 
>>>>>> TsFile format is a columnar store, which is similar with Apache
>> Parquet and Apache CarbonData. It has the concepts of Chunk Group, Column
>> Chunk, Page and Footer. Comparing with Apache Parquet and Apache
>> CarbonData, it is designed and optimized for time series:
>>>>>> 
>>>>>> ==== Time Series Friendly Encoding ====
>>>>>> IoTDB currently supports run length encoding (RLE), delta-of-delta
>> encoding, and Facebook's Gorilla encoding.
>>>>>> 
>>>>>> Lossy encoding methods (e.g., Piecewise Linear Approximation (PLA)
>> and time-frequency transformation are works-in-progress.
>>>>>> 
>>>>>> 
>>>>>> ==== Chunk Group ====
>>>>>> The data part of a TsFile consists of many Chunk Groups. Each Chunk
>> Group stores the data of a device at a time interval.  A Chunk Group is
>> similar to the row group in Apache Parquet, while there are some
>> constraints of the time dimension:  For each device, the time intervals of
>> different Chunk Groups are not overlapped and the latter Chunk Group always
>> has a larger timestamp.
>>>>>> 
>>>>>> Given a TsFile and a query with a time range filter, the query
>> process can terminate scanning data once it reads data points whose
>> timestamp reaches the time limit of the filter. We call the feature
>> ''fast-return'' and it makes the time range query in a TsFile very
>> efficient.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ==== Different Column Chunk Format (Unnecessary the Repetition (R)
>> and Definition (D) Fields) ====
>>>>>> 
>>>>>> While Apache Parquet and Apache CarbonData support complex data
>> types, e.g., nested data and sparse columns, TsFile is exclusively designed
>> for time series whose data model is \<device_id, series_id, timestamp,
>> value\>.
>>>>>> 
>>>>>> In a `Chunk Group`, each time series is a `Column Chunk`. Even though
>> these time series belong to the same device, the data points in different
>> time series are not aligned in the time dimension originally.
>>>>>> 
>>>>>> For example, if you have a device with 2 sensors on the same data
>> collection frequencies, sensor 1 may collect data at time 1521622662000
>> while the other one collects data at time 1521622662001 (delta=1ms).
>> Therefore, each Column Chunk has its timestamps and values, which is quite
>> different from Apache Parquet and Apache CarbonData.  Because we store the
>> time column along with each value column instead of making different chunks
>> share the same time column for the sake of diverse data frequency for
>> different time series, we do not store any null value on disk to align
>> across time series. Besides, we do not need to attach  `repetition` (R) and
>> `definition` (D) fields on each value. Therefore, the disk space is saved
>> and the query latency is reduced (because we do not align data by
>> calculating R and D fields).
>>>>>> 
>>>>>> 
>>>>>> ==== Domain Specific Information in Each Page ====
>>>>>> Similar to Apache Parquet and Apache CarbonData, a `Column Chunk`
>> consists of several `Pages`, and each `Page` has a `Page header`. The `Page
>> header` is a summary of the data in the page.
>>>>>> 
>>>>>> Because TsFile is optimized for time series, the page header contains
>> more domain specific information, such as the minimal and maximal value,
>> the minimal and the maximal timestamp, the frequency and so on. TsFile can
>> even store the histogram of values in the page header.
>>>>>> 
>>>>>> This header information helps IoTDB in speeding up queries by
>> skipping unnecessary pages.
>>>>>> 
>>>>>> 
>>>>>> === Adaptor for Analytics ===
>>>>>> The TsFile provides:
>>>>>> 
>>>>>> * InputFormat/OutputFormat interfaces for Reading/Writing data.
>>>>>> * Deep integration with Apache Spark/Hadoop MapReduce including
>> predicate push-down, column pruning, aggregation push down, etc. So users
>> can use Apache Spark SQL/HiveQL to connect and query TsFiles.
>>>>>> 
>>>>>> 
>>>>>> === IoTDB Engine ===
>>>>>> The IoTDB engine is a database engine, which uses TsFile as its
>> storage file format. The IoTDB Engine supports SQL-like query plus many
>> useful functions:
>>>>>> 
>>>>>> * Tree-based time series schema
>>>>>> * Log-Structured Merge (LSM)-based storage
>>>>>> * Overflow file for out-of-order data
>>>>>> * Scalable index framework
>>>>>> * Special queries for time series
>>>>>> 
>>>>>> ==== Tree-based Time Series Schema ====
>>>>>> IoTDB manages all the time series definitions using a tree structure.
>> A path from the root of the tree to a leaf node represents a time series.
>> Therefore, the unique id of a time series is a path, e.g.,
>> `root.China.beijing.windFarm1.windTurbine1.speed`.
>>>>>> 
>>>>>> This kind of schema can express `group by` naturally. For example,
>> `root.China.beijing.windFarm1.*.speed` represents the speed of all the wind
>> turbines in wind farm 1 in Beijing, China.
>>>>>> 
>>>>>> ==== Log-Structured Merge (LSM)-based Storage ====
>>>>>> In a time series, the data points should be ordered by their
>> timestamps. In IoTDB, we use Log-Structured Merge (LSM) based mechanism.
>> Therefore, a part of the data is stored in memory first and can be called
>> as `memtable`. At this time, if data points come out-of-order, we resort
>> them in memory. When this part of data exceeds the configured memory limit,
>> we flush it on disk as a `Chunk Group` into an unclosed TsFile.  Finally, a
>> TsFile may contain several Chunk Groups, for reducing the number of small
>> data files, which is helpful to reduce the I/O load of the storage system
>> and reduces the execution time of a file-merge in LSM. Notice that the data
>> is time-ordered in one Chunk Group on disk, and this layout is helpful for
>> fast filtering in one Chunk Group for a query.
>>>>>> 
>>>>>> Rule 1: In a TsFile, the Chunk Groups of one device are ordered by
>> timestamp (Rule 1), and it is helpful for fast filtering among Chunk Groups
>> for a query.
>>>>>> 
>>>>>> Rule 2: When the size of the unclosed TsFile reaches the threshold
>> defined in the configuration file, we close the file and generate a new one
>> to store new arriving data spanning the entire data set. Like many systems
>> which use LSM-based storage, we never modify a TsFile which has been closed
>> except for the file-merge process (Rule 2).
>>>>>> 
>>>>>> Rule 3: To reduce the number of TsFiles involved in a query process,
>> we guarantee that the data points in different TsFiles are not overlapping
>> on the time dimension after file mergence (Rule 3).
>>>>>> 
>>>>>> ==== Overflow File for Out-of-order Data ====
>>>>>> When a part of data is flushed on disk (and will form a `Chunk Group`
>> in a TsFile), the newly arriving data points whose timestamps are smaller
>> than the largest timestamp in the Tsfile are `out-of-order`.
>>>>>> 
>>>>>> To store the out-of-order data, we organize all the troublesome
>> `out-of-order` data point insertions into a special TsFile, named
>> `UnSequenceTsFile`. In an UnSequenceTsFile, the Chunk Groups of one device
>> may be overlapping in the time dimension, which violates the Rule 1 and
>> costs additional time compared to a normal TsFile for query filtering.
>>>>>> 
>>>>>> There is another special operation: updating all the data points
in a
>> time range, e.g., `update all the speed values of device1 as 0 where the
>> data time is in [1521622000000, 1521622662000]`. The operation is called
>> when: (1) a sensor malfunctions and the database receives wrong data for a
>> period; (2) we may want to reset all the records. Many NoSQL time series
>> databases do not support such an operation. To support the operation in
>> IoTDB, we use a tree-based structure, Treap, to store this part of
>> operations and store them as `Overflow` files.
>>>>>> 
>>>>>> Therefore, there are 3 kinds of data files: TsFiles,
>> UnSequenceTsFiles and Overflow files.  TsFiles should store most of the
>> data. The volume of UnSequenceTsFiles depends on the workload: if there are
>> too many out-of-order and the time span of out-of-order is huge, the volume
>> will be large. Overflow files handle fewest data operations but will depend
>> on the use of the special operations.
>>>>>> 
>>>>>> ==== LSM-tree ====
>>>>>> Normally, LSM-based storage engines merge data files level by level
>> so that it looks like a tree structure. In this way, data is well
>> organized. The disadvantage is that data will be read and written several
>> times. If the tree has 4 levels, each data point will be rewritten at least
>> 4 times.
>>>>>> 
>>>>>> Currently, we do not merge all the TsFiles into one because (1) the
>> number of TsFiles is kept lower than many LSM storage engines because a
>> memtable is mapped to several Chunk Groups rather than a file; (2)
>> different TsFiles are not overlapping with each other in the time dimension
>> (because of Rule 3).
>>>>>> 
>>>>>> As mentioned before,  TsFile supports ''fast-return'' to accelerate
>> queries. However, UnSequenceTsFile and Overflow files do not allow this
>> feature. The time spans of UnSequenceTsFile, Overflow file andTsFile may be
>> overlapped, which leads to more files involved in the query process. To
>> accelerate these queries, there is a merging process to reorganize files in
>> the background. All the three kinds of files: TsFiles, UnSequenceTsFiles
>> and Overflow files, are involved in the merging process. The merging
>> process is implemented using multi-threading, while each thread is
>> responsible for a series family.
>>>>>> After merging, only TsFiles are left. These files have
>> non-overlapping time spans and support the ''fast-return'' feature.
>>>>>> 
>>>>>> ==== Scalable Index Framework ====
>>>>>> We allow users to implement indexes for faster queries. We currently
>> support an index for pattern matching query (KV-Match index, ICDE 2019).
>> Another index for fast aggregation (PISA index, CIKM 2016) is a
>> work-in-progress.
>>>>>> 
>>>>>> ==== Special Queries ====
>>>>>> We currently support `group by time interval` aggregation queries
and
>> `Fill by` operations, which are similar to those of InfluxDB. Time series
>> segmentation operations and frequency queries are work-in-progress.
>>>>>> 
>>>>>> == Initial Goals ==
>>>>>> The initial goals are to be open sourced and to integrate with the
>> Apache development process. Furthermore, we plan for incremental
>> development, and releases along with the Apache guidelines.
>>>>>> 
>>>>>> == Current Status ==
>>>>>> We have developed the system for more than 2 years. There are
>> currently 13k lines of code, some of which are generated by Antlr3 and
>> Thrift.  There are 230 issues which have been solved and more than 1500
>> commits.
>>>>>> 
>>>>>> The system has been deployed in the staging environment of the State
>> Grid Corporation of China to handle ~3 million time series (i.e, ~30,000
>> power generation assembly * ~100 sensors) and an equipment service company
>> in China managing ~2 million time series (i.e, ~20k devices * 100 sensors).
>> The insertion speed reaches ~2 million points/second/node, which is faster
>> than InfluxDB, OpenTSDB and Apache Cassandra in our environment.
>>>>>> 
>>>>>> There are many new features in the works including those mentioned
>> herein. We will add more analytics functions, improve the data file merge
>> process, and finish the first released version of IoTDB.
>>>>>> 
>>>>>> == Meritocracy ==
>>>>>> The IoTDB project operates on meritocratic principles. Developers
who
>> submit more code with higher quality earn more merit. We have used `Issues`
>> and `Pull Requests` modules on Github for collecting users' suggestions and
>> patches. Users who submit issues, pull requests, documents and help the
>> community management are welcomed and encouraged to become committers.
>>>>>> 
>>>>>> == Community ==
>>>>>> 
>>>>>> The IoTDB project users communicate on Github (
>>>>>> https://github.com/thulab/tsfile) . Developers make the
>> communication on a website which is similar with JIRA (Currently, only
>> registered users can apply to access the project for communication, url:
>> https://tower.im/projects/36de8571a0ff4833ae9d7f1c5c400c22/
>>>>>> ). We have also introduced IoTDB at many technical conferences. Next,
>> we will build the mailing list for more convenience, broader communication
>> and archived discussions.
>>>>>> 
>>>>>> If IoTDB is accepted for incubation at the Apache Software
>> Foundation, the primary goal is to build a larger community. We believe
>> that IoTDB will become a key project for time series data management, and
>> so, we will rely on a large community of users and developers.
>>>>>> 
>>>>>> TODO: IoTDB is currently on a private Github repository (
>>>>>> https://github.com/thulab/iotdb), while its subproject TsFile (a
>> file format for storing time series data) is open sourced on Github (
>> https://github.com/thulab/tsfile
>>>>>> ).
>>>>>> 
>>>>>> == Core Developers ==
>>>>>> IoTDB was initially developed by 2 dozen of students and teachers
at
>> Tsinghua University. Now, more and more developers have joined coming from
>> other universities: Fudan University, Northwestern Polytechnical University
>> and Harbin Institute of Technology in China.  Other developers come from
>> business companies such as Lenovo and Microsoft. We will be working to
>> bring more and more developers into the project making contributions to
>> IoTDB.
>>>>>> 
>>>>>> == Relationships with Other Apache Products ==
>>>>>> IoTDB requires some Apache products (Apache Thrift, commons,
>> collections, httpclient).
>>>>>> 
>>>>>> IoTDB-Spark-connector and IoTDB-Hadoop-connector have been developed
>> for supporting analysing time series data by using Apache Spark and
>> MapReduce.
>>>>>> 
>>>>>> Overall, IoTDB is designed as an open architecture, and it can be
>> integrated with many other systems in the future.
>>>>>> 
>>>>>> As mentioned before, in the IoTDB project, we designed a new columnar
>> file format, called TsFile, which is similar to Apache Parquet. However,
>> the new file format is optimized for time series data.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> == Known Risks ==
>>>>>> 
>>>>>> === Orphaned Products ===
>>>>>> Given the current level of investment in IoTDB, the risk of the
>> project being abandoned is minimal. Time series data is more and more
>> important and there are several constituents who are highly inspired to
>> continue development. Tsinghua and NEL-BDS Lab relies on IoTDB as a
>> platform for a large number of long-term research projects. We have
>> deployed IoTDB in some company's staging environments for future
>> applications.
>>>>>> 
>>>>>> === Inexperience with Open Source ===
>>>>>> Students and researchers in Tsinghua University have been developing
>> and using open source software for a long time. It is wonderful to be
>> guided to join a formal open-source process for students. Some of our
>> committers
>>>>>> have  experiences contributing to open source, for example:
>>>>>> 
>>>>>> * druid:
>>>>>> 
>> https://github.com/druid-io/druid/commit/f18cc5df97e5826c2dd8ffafba9fcb69d10a4d44
>>>>>> 
>>>>>> * druid:
>>>>>> 
>> https://github.com/druid-io/druid/commit/aa7aee53ce524b7887b218333166941654788794
>>>>>> 
>>>>>> * YCSB:
>>>>>> https://github.com/brianfrankcooper/YCSB/pull/776
>>>>>> 
>>>>>> 
>>>>>> Additionally, several ASF veterans and industry veterans have agreed
>> to mentor the project and are listed in this proposal. The project will
>> rely on their guidance and collective wisdom to quickly transition the
>> entire team of initial committers towards practicing the Apache Way.
>>>>>> 
>>>>>> 
>>>>>> === Reliance on Salaried Developers ===
>>>>>> Most of current developers are students and researchers/professors
in
>> universities, and their researches focus on big data management and
>> analytics. It is unlikely that they will change their research focus away
>> from big data management.  We will work to ensure that the ability for the
>> project to continuously be stewarded and to proceed forward independent of
>> salaried developers is continued.
>>>>>> 
>>>>>> === An Excessive Fascination with the Apache Brand ===
>>>>>> Most of the initial developers come from Tsinghua University with
no
>> intent to use the Apache brand for profit. We have no plans for making use
>> of Apache brand in press releases nor posting billboards advertising
>> acceptance of IoTDB into Apache Incubator.
>>>>>> 
>>>>>> 
>>>>>> == Initial Source ==
>>>>>> IoTDB's github address and some required dependencies:
>>>>>> 
>>>>>> * The storage file format:
>>>>>> https://github.com/thulab/tsfile
>>>>>> 
>>>>>> * Adaptor for Apache Hadoop MapReduce:
>>>>>> https://github.com/thulab/tsfile-hadoop-connector
>>>>>> 
>>>>>> * Adaptor for Apache Spark:
>>>>>> https://github.com/thulab/tsfile-spark-connector
>>>>>> 
>>>>>> * Adaptor for Grafana:
>>>>>> https://github.com/thulab/iotdb-grafana
>>>>>> 
>>>>>> * The database engine:
>>>>>> https://github.com/thulab/iotdb
>>>>>> (private project up to now)
>>>>>> * The client driver:
>>>>>> https://github.com/thulab/iotdb-jdbc
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> === External Dependencies ===
>>>>>> To the best of our knowledge, all dependencies of IoTDB are
>> distributed under Apache compatible licenses. Upon acceptance to the
>> incubator, we would begin a thorough analysis of all transitive
>> dependencies to verify this fact and introduce license checking into the
>> build and release process.
>>>>>> 
>>>>>> == Documentation ==
>>>>>> * Documentation for TsFile:
>>>>>> https://github.com/thulab/tsfile/wiki
>>>>>> 
>>>>>> * Documentation for IoTDB and its JDBC:
>>>>>> http://tsfile.org/document
>>>>>> (Chinese only. An English version is in progress.)
>>>>>> 
>>>>>> == Required Resources ==
>>>>>> === Mailing Lists ===
>>>>>> *
>>>>>> private@iotdb.incubator.apache.org
>>>>>> 
>>>>>> *
>>>>>> dev@iotdb.incubator.apache.org
>>>>>> 
>>>>>> *
>>>>>> commits@iotdb.incubator.apache.org
>>>>>> 
>>>>>> 
>>>>>> === Git Repositories ===
>>>>>> *
>>>>>> https://git-wip-us.apache.org/repos/asf/incubator-iotdb.git
>>>>>> 
>>>>>> 
>>>>>> === Issue Tracking ===
>>>>>> *  JIRA IoTDB (We currently use the issue management provided by
>> Github to track issues.)
>>>>>> 
>>>>>> 
>>>>>> == Initial Committers ==
>>>>>> Tsinghua University, K2Data Company, Lenovo, Microsoft
>>>>>> 
>>>>>> Jianmin Wang (jimwang at tsinghua dot edu dot cn )
>>>>>> 
>>>>>> Xiangdong Huang (sainthxd at gmail dot com)
>>>>>> 
>>>>>> Jun Yuan (richard_yuan16 at 163 dot com)
>>>>>> 
>>>>>> Chen Wang ( wang_chen at tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Jialin Qiao (qjl16 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Jinrui Zhang (jinrzhan at microsoft dot com)
>>>>>> 
>>>>>> Rong Kang (kr11 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Tian Jiang(jiangtia18 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Shuo Zhang (zhangshuo at k2data dot com dot cn)
>>>>>> 
>>>>>> Lei Rui (rl18 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Rui Liu (liur17 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Kun Liu (liukun16 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Gaofei Cao (cgf16 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Xinyi Zhao (xyzhao16 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Dongfang Mao (maodf17 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Tianan Li(lta18 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Yue Su (suy18 at mails dot tsinghua dot edu dot cn)
>>>>>> 
>>>>>> Hui Dai (daihui_iot at lenovo dot com, yuct_iot at lenovo dot com
)
>>>>>> 
>>>>>> == Sponsors ==
>>>>>> === Champion ===
>>>>>> Kevin A. McGrail (
>>>>>> kmcgrail@apache.org
>>>>>> )
>>>>>> 
>>>>>> === Nominated Mentors ===
>>>>>> Justin Mclean (justin at classsoftware dot com)
>>>>>> 
>>>>>> Christofer Dutz (christofer.dutz at c-ware dot de)
>>>>>> 
>>>>>> Willem Jiang (willem.jiang at gmail dot com)
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message