incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. McGrail" <kmcgr...@apache.org>
Subject Re: [Proposal]New storage project: HBlock
Date Wed, 25 Mar 2020 20:56:23 GMT
I have committed to champion and I think the points you make are good,
Ted.   Do you have the bandwidth to be a mentor?

I will work with them to set expectations about the process.  I have also
asked for them to do some community building now, too.
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Wed, Mar 25, 2020 at 12:00 PM Ted Dunning <ted.dunning@gmail.com> wrote:

> Three things are very clear to me:
>
> 1) having an open source iSCSI implementation from a mature and experienced
> storage stream is a very cool thing, especially if it can be targeted to
> non HDFS storage relatively easily. Building such a thing requires very
> high levels of experience and expertise that have generally been lacking in
> the open source world.
>
> 2) this team is very naive about the negative impacts that Apache processes
> will have on their development speed and will need lots of mentoring. Given
> their release schedule, I think that there are symmetrical risks, first
> that the team will be tempted to JFDI when getting features out the door
> rather than communicate and share designs and second that if they build a
> proper community overcoming language, timezone and large internal team
> dynamics that the internal political costs will severe due to slower
> development.
>
> 3) this team is very enthusiastic about making open source work and that
> might be enough to allow them to succeed in spite of the difficulties.
>
> The path to success here is, in my opinion, to require strong and engaged
> mentorship and make it very clear before they come in that Apache may not
> be a good fit due to the pressures they face to delivery on a schedule. If
> incubation with a high risk of exit back to a non-Apache form is acceptable
> to the project team, then it should be fine for Apache.
>
>
>
> On Mon, Mar 9, 2020 at 7:45 PM Sheng Wu <wu.sheng.841108@gmail.com> wrote:
>
> > Hi
> >
> > Personally, and basically, I am feeling the team has misunderstood
> > the meaning of incubator and the requirements of building the community.
> > Same as the last time discussion, I still think they will be in a big
> > pressure as they have to deal with the basic feature development,
> community
> > build and following ASF incubator requirements at the same time if they
> are
> > accepted into the incubator. And at the same time, the team lacks the
> > experiences of open source community in or out of ASF.
> > I am not sure whether this is good for the project. Seem like a little
> > hurry to join the incubator.
> > More Comments inline.
> >
> > Willing to listen to what other IPMCs think.
> >
> > <zhangguochen@chinatelecom.cn> 于2020年3月10日周二 上午10:21写道:
> >
> > > Hi, All,
> > >
> > > We are China Telecom Corporation Limited Cloud Computing Branch
> > > Corporation.
> > > We hope to contribute one of our projects named 'HBlock' to Apache.
> > > Here is the proposal of HBlock project, please feel free to let me know
> > > what
> > > the concerns and suggestions from you. Thank you so much.
> > >
> > > HBlock Proposal
> > >
> > > 1.Abstract
> > > The HBlock project will be an enterprise distributed block storage.
> > >
> > > 2.Proposal
> > > HBlock provides a distributed block storage with the following
> features:
> > > 2.1.User-space iSCSI target: HBlock will implement an iSCSI target that
> > is
> > > RFC-7143 (https://tools.ietf.org/html/rfc7143) compliant written in
> pure
> > > Java designed to run on top of any mainstream Operating System,
> including
> > > Windows and Linux, as a user-space process.
> > > 2.2.Enterprise level features: HBlock will implement comprehensive
> > > enterprise level features, such as
> > > Asymmetric Logical Unit Access (ALUA, Information technology -SCSI
> > Primary
> > > Commands - 4 (SPC-4),
> > https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf),
> > >
> > > Persistent Reservations (PR, Information technology -SCSI Primary
> > Commands
> > > -
> > > 4 (SPC-4), https://www.t10.org/cgi-bin/ac.pl?t=f&f=spc4r37.pdf),
> > > VMware vSphere Storage APIs - Array Integration(VAAI,
> > >
> > >
> >
> https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-int
> > > egration-10337.html
> > > <
> >
> https://www.vmware.com/techpapers/2012/vmware-vsphere-storage-apis-array-integration-10337.html
> > >
> > > ),
> > > Offloaded Data Transfer(ODX,
> > >
> > >
> >
> https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-se
> > > rver-2012-R2-and-2012/hh831628(v=ws.11)
> > > <
> >
> https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/hh831628(v=ws.11)
> > >),
> > > so that it will support
> > > session-level fail-over,
> > > Oracle Real Application Cluster(Oracle RAC,
> > > https://www.oracle.com/database/technologies/rac.html) ,
> > > Cluster File System (CFS), VMware cluster and Windows cluster.
> > > 2.3.Low latency: HBlock will implement in-memory distributed cache to
> > > reduce
> > > write latency and improve Input / Output Operations Per Second (IOPS),
> > and
> > > it will leverage storage-class memory to archive even higher durability
> > > without IOPS loss.
> > > 2.4.Smart Compaction and Garbage Collection(GC): HBlock will convert
> all
> > > the
> > > write operations into sequential append operations to improve the
> random
> > > write performance, and it will choose the best timing to compact and
> > > collect
> > > the garbage per Logic Unit (LU). Comparting to Solid State Drives
> (SSD's)
> > > internal Garbage Collection, such a global GC will reduce the need of
> > SSD's
> > > internal GC, which indirectly make SSD have more usable space, and have
> > > even
> > > better GC strategy due to close to application. In essence, flash
> writes
> > > data in block (32MB) order. In order to realize random write, SSD disk
> > will
> > > reserve a part of space for GC in the disk. Therefore, the more random
> > > write
> > > and delete, the more space needs to be reserved. HDFS based writes are
> > > sequential for SSD, so the space reserved in SSD is small. In short, as
> > > long
> > > as there is a GC, there must be reserved space, either in the HBlock
> > layer
> > > or in the controller layer inside the SSD. Because HBlock is closer to
> > LU,
> > > it can be more efficient GC. For example, a LU dedicated to video
> > > monitoring
> > > data basically writes video data in sequence, and starts writing again
> > when
> > > the disk is full. This LU does not need any GC at all. If you do GC in
> > the
> > > SSD layer, SSD will see the data of various LUs, and unnecessary
> movement
> > > will be made to the LU dedicated for video monitoring.
> > > 2.5.Hadoop Distributed File System (HDFS)-based: HBlock leverages HDFS
> a
> > as
> > > persistent layer to avoid reinventing wheels. The iSCSI target will run
> > on
> > > the client side of HDFS and directly read or write data from or to Data
> > > Nodes.
> > > 2.6.Easy to deploy: HBlock will provide easy-to-use utilities to make
> the
> > > installation process extremely easy. Since HBlock does not rely on any
> > > Operating System, deployment is easy unlike other storage systems that
> > rely
> > > on in-kernel iSCSI module, such as Linux-IO (LIO), or SCST.
> > >
> >
> > I noticed there are a lot of `will`s here in the Proposal section as the
> > project core features.
> > Are these language issues or all these features not available today?
> > Which parts have been implemented?
> >
> >
> > >
> > > 3.Background
> > > We think block storage is a very general technology.
> > > Block storage is the foundation of enterprise IT infrastructure. But
> > > unfortunately, there is not any open source and mature distributed
> block
> > > storage at this moment.
> > > Ceph is well known and widely adopted, but it is just a storage engine
> in
> > > the same level as HDFS. Ceph does not cover the need for iSCSI. If you
> > want
> > > to use Ceph as block storage, you must use solutions like LIO to handle
> > > iSCSI. Unfortunately, LIO lacks many features and thus cannot be
> directly
> > > used in an enterprise production environment. Additionally, LIO is a
> > Linux
> > > kernel module and Ceph is a user-space process creating problems to
> allow
> > > LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being
> > > worked on (
> > https://www.kernel.org/doc/Documentation/target/tcmu-design.txt
> > > ),
> > > but it looks ugly to make an in-kernel module call a user-space
> process.
> > > That is why we want to create HBlock, which will implement
> comprehensive
> > > enterprise level features completely in user-space including High
> > > Availability (HA), distributed cache, VAAI, PR, ODX and so on.
> > > HBlock project is based on HDFS and will be an excellent addition to
> the
> > > Apache family of projects.
> > >
> > > 4.Rationale
> > > Block storage is the foundation of enterprise IT infrastructure. But
> > > unfortunately, there is not any open source and mature distributed
> block
> > > storage at this moment.
> > > Ceph is well known and widely adopted, but it is just a storage engine
> in
> > > the same level as HDFS. Ceph does not cover the need for iSCSI. If you
> > want
> > > to use Ceph as block storage, you must use solutions like LIO to handle
> > > iSCSI. Unfortunately, LIO lacks many features and thus cannot be
> directly
> > > used in an enterprise production environment. Additionally, LIO is a
> > Linux
> > > kernel module and Ceph is a user-space process creating problems to
> allow
> > > LIO to talk with Ceph processes. Even TCM in User Space (TCMU) is being
> > > worked on (
> > https://www.kernel.org/doc/Documentation/target/tcmu-design.txt
> > > ),
> > > but it looks ugly to make an in-kernel module call a user-space
> process.
> > > That is why we want to create HBlock, which will implement
> comprehensive
> > > enterprise level features completely in user-space include High
> > > Availability
> > > (HA), distributed cache, VAAI, PR, ODX and so on.
> > > HBlock project is based on HDFS and will be an excellent addition to
> the
> > > Apache family of projects.
> > >
> > > 5.Initial Goals
> > > N/A.
> > >
> >
> > Why this is N/A?
> >
> >
> > >
> > > 6.Current Status
> > > At present, we have completed the development of HBlock in a
> stand-alone
> > > version. HBlock has been used in the online environment of many
> > customers.
> > > This standalone version has implemented advanced SCSI functions
> including
> > > PR, VAAI, ODX, etc., among which cross Network Address Translation(NAT)
> > NAT
> > > support is a key feature of HBlock, which can allow clients in the LAN
> to
> > > access iSCSI targets located on the Internet. HBlock makes it possible
> to
> > > provide iSCSI as a Service. A version with high availability features
> is
> > > also under testing.
> > > 6.1 Meritocracy
> > > At present, this project is still an internal private project which is
> > > operated according to the internal project development technology of
> the
> > > enterprise, so it does not involve this issue. But we are willing to
> > follow
> > > the rules of the open source community. We will be tracking submissions
> > > from
> > > patches, accepting the intentional patches of HBlock and increasing the
> > > publicity of HBlock. We look to invite more people who show merit to
> join
> > > the project.
> > > 6.2 Community
> > > At present, the HBlock project is still an internal private project,
> > which
> > > is operated according to the internal project development technology of
> > the
> > > enterprise, so it does not involve this issue. But we are willing to
> > follow
> > > the rules of the open source community.
> > > There are several business customers using our HBlock, and we will
> invite
> > > them and their industry partners to join the community. We will
> > communicate
> > > with China Telecom Cloud Service customers through forums, e-mail,
> > instant
> > > messages and other ways, and update the product information in time, so
> > as
> > > to attract more developers to join the project.
> > > 6.3 Core Developers
> > > At present, the HBlock Project has about 30 people.  Approximately 20
> > > internal developers and 10 test engineers, all very experienced
> > engineers.
> > >
> >
> > Are the test engineers internal too? I suppose.
> >
> >
> > > There is some brief introduction of the key contributors.
> > > Dong Changkun, who is the development team leader with rich JAVA
> > > development
> > > experience, as the architect of HBlock to control the overall design.
> > > Wu Zhimin, who is the R & D expert of cloud storage product line in our
> > > company, more than 12 years of storage development experience. In
> HBlock,
> > > he
> > > is mainly responsible for the architecture design of the protocol
> module,
> > > the implementation of the SCSI module, and the research of difficult
> > > points.
> > > Yu Erdong, who is rich JAVA development experience and distributed
> > storage
> > > system development experience; Mainly responsible for the design of
> > HBlock
> > > back-end modules and management tool modules, as well as the
> development
> > of
> > > back-end cache and master-slave switching.
> > > 6.4 Alignment
> > > HBlock is the only product in the industry that develops block storage
> > > based
> > > on HDFS.
> > > With the increase in sizes of disk capacity, such as the emergence of
> > > Shingled Magnetic Recording (SMR) disk, more and more disks show the
> > > negative characteristics of sequential write. Flash memory also has the
> > > same
> > > characteristics. The underlying particles of flash memory are written
> > > sequentially in blocks (32MB), but the SSD disk will reserve 20% space
> > for
> > > merging so that the file system seems to support random writing.
> Because
> > > HBlock is based on HDFS, HBlock inherently supports sequential write.
> > > Combined with thread IO of random write to SSDs being very small,
> HBlock
> > > allows you to reduce 20% of the reserved space to only 5%.
> > > In addition, with the large adoption of HDFS, HBlock allows HDFS
> > facilities
> > > to become highly available, cloud-ready, block storage which is super
> > cool!
> > >
> > > 7.Known Risks
> > > The software is not stable and has bugs, which needs continuous
> > > improvement.
> > > More sophisticated strategies are needed to schedule and optimize the
> > time
> > > of data merging to avoid merging data during the business peak hours.
> > >
> > > 8.Project Name
> > > HBlock is named because Hadoop is a distributed project in the Apache
> > > community, and the database project based on this project is called
> > HBase.
> > > In order to follow this style as a distributed block storage project,
> we
> > > named it HBlock.
> > >
> > > 9.Orphaned products
> > > Storage is our core business and HBlock is our technical direction.  We
> > > will
> > > continue to invest it and see value in building a vibrant open source
> > > community to improve it. We believe that HBlock, a product based on
> HDFS,
> > > will have more vitality as an open source software project under the
> > Apache
> > > Software Foundation.
> > > 9.1 Inexperience with Open Source
> > > We don't have much experience in open source, but we hope to open
> source
> > > HBlock so that more people can use and develop this project. We are
> > willing
> > > to learn from Apache's experience in open source and apply it to the
> > HBlock
> > > project.
> > > Jiang Feng, who is the founder and team leader of HBlock project,
> > submitted
> > > code to Hadoop more than 10 years ago.
> > >
> >
> > Is he already a Hadoop committer or PMC? Does he have experience in the
> ASF
> > process?
> >
> >
> > > 9.2 Length of Incubation
> > > It is expected that the HBlock project will take one year to complete
> the
> > > incubation process.
> > >
> >
> > One year is a short term for most incubator project. IPMC, please correct
> > me if I am wrong.
> > How do you get this as an expected conclusion?
> >
> >
> > > While learning the Apache Way, we have an aggressive release calendar:
> > >
> >
> > Why the following features have anything related to the Apache Way?
> > These look like feature roadmap only to me. These are development plans,
> > not like the community build.
> > Confused for me, could you explain?
> >
> >
> > > In April 2020, we will complete the version of HBlock with high
> > > availability.
> > > In June 2020, we will complete the development of the web portal and
> > > "green"
> > > installation that can be installed with existing applications and
> support
> > > x86 and ARM servers.
> > > In September 2020, we will complete advanced SCSI functions, including
> > PR,
> > > VAAI, ODX, etc.
> > > 9.3 Homogenous Developers
> > > At present, HBlock has approximately 20 developers, all of whom are
> very
> > > experienced engineers. They work in Beijing, Shanghai, Inner Mongolia
> and
> > > other regions, and they are experienced with working in a distributed
> > > environment for the same company.
> > > We will expand our existing team through campus recruitment and social
> > > recruitment, and attract more developers from the community to join the
> > > HBlock project. HDFS is a widely used project. We have confident that
> the
> > > block storage project based on HDFS will attract more volunteers.
> > > 9.4 Reliance on Salaried Developers
> > > HBlock is reliant on China Telecom's salaried developers. China Telecom
> > > will
> > > not easily change its market strategy. This is the first time for China
> > > Telecom to share the project with the open source community, so it will
> > pay
> > > attention to the investment in this project. At the same time, the
> > project
> > > will be widely used in China Telecom. With the support of resources of
> > > China
> > > Telecom and the verification of the actual project, the continuity and
> > > quality of the project will be guaranteed. We also have been developing
> > in
> > > the storage field for seven and a half years and will continue to work
> in
> > > this field. At the same time, block storage based on HDFS will
> definitely
> > > attract more volunteers to join. We will support volunteers being
> > involved
> > > and our developers are committed to doing so.
> > > 9.5 Relationships with Other Apache Products
> > > HBlock uses Apache HDFS, Apache commons-IO, commons-collections,
> > > commons-configuration, commons-email, commons-logging, Apache log4j,
> and
> > > Apache Hadoop-common.
> > > 9.6 An Excessive Fascination with the Apache Brand
> > > We have chosen the Apache Software Foundation as the home to open
> source
> > > HBlock because HBlock is based on HDFS.  We believe there is a very
> > natural
> > > synergy with Apache.
> > >
> > > 10.Documentation
> > > About the user guide, please refer to "China Telecom HBlock User
> > > Guide_20200121.docx". (There is only a doc version right now)
> > >
> > > 11.Initial Source
> > > HBlock has been developed since the second half of 2018. HBlock is
> based
> > on
> > > HDFS and the internal source code will be donated to the Foundation.
> > China
> > > Telecom is prepared to execute the paperwork required for the donation.
> > >
> > > 12.Source and Intellectual Property Submission Plan
> > > The HBlock specification and content on www.ctyun.cn are from China
> > > Telecom
> > > Co., Ltd. The HBlock library uses the Java language. There is no
> > complexity
> > > in the code base donation process and we are ready to move the
> > repositories
> > > over.
> > > 12.1 External Dependencies
> > > HBlock use Apache commons-IO, commons-collections,
> commons-configuration,
> > > Apache log4j,commons-email,commons-logging,org.json, jline,pty4j,
> Apache
> > > hadoop-hdfs, hadoop-common, netty-all, and Apache zookeeper. These are
> > all
> > > under Apache or BSD licenses.
> >
> > 12.2 Cryptography
> > > The HBlock project does not involve encryption code.
> > >
> > > 13.Required Resources
> > > 13.1 Mailing lists:
> > > private@hblock.incubator.apache.org
> > > dev@hblock.incubator.apache.org
> > > users@hblock.incubator.apache.org
> >
> >
> > user ml is not recommended. As you don't have users today. Recommend to
> > share it with the dev.
> >
> > Sheng Wu 吴晟
> > Twitter, wusheng1108
> >
> >
> > >
> > > commits@hblock.incubator.apache.org
> > > 13.2 Subversion Directory
> > > https://svn.apache.org/repos/asf/incubator/hblock
> > > (According to Apache rules)
> > > 13.3 Git Repositories
> > > https://gitbox.apache.org/repos/asf/incubator-hblock.git
> > > (According to Apache rules)
> > > 13.4 Issue Tracking
> > > JIRA HBlock(HBLOCK)
> > > (According to Apache rules)
> > > 13.5 Other Resources
> > > N/A.
> > >
> > > 14.Initial Committers
> > > Yu Erdong (yued at chinatelecom dot cn)
> > > Wu Zhimin (wuzhimin at chinatelecom dot cn)
> > > Yang Chao (yangchao1 at chinatelecom dot cn)
> > > Dong Changkun (dongck at chinatelecom dot cn)
> > > Guo Yong (guoyong1 at chinatelecom dot cn)
> > > Zhao Wentao(zhaowt at chinatelecom dot cn)
> > > Cui Meng (cuimeng at chinatelecom dot cn)
> > > Wei Wei (weiwei2 at chinatelecom dot cn)
> > >
> > > 15.Sponsors
> > > 15.1 Champion
> > > Kevin A. McGrail
> > > 15.2 Nominated Mentors
> > > Kevin A. McGrail
> > > 15.3 Sponsoring Entity
> > > The Incubator
> > > (END)
> > >
> > > Best Wishes.
> > >
> > >
> >
> ----------------------------------------------------------------------------
> > > ------------------
> > > Zhang Guochen  Project Manager
> > > China Telecom Corporation Limited Cloud Computing Branch Corporation
> > > Mail: zhangguochen@chinatelecom.cn
> > > Phone: 86-17301021225
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: general-help@incubator.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message