phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: How to Manage Data Architecture & Modeling for HBase
Date Tue, 07 Apr 2015 18:55:26 GMT
On Mon, Apr 6, 2015 at 5:23 PM, Pamecha, Abhishek <
apamecha@paypal.com.invalid> wrote:

> Re: hashtables:
>         I have always viewed HBase as a large distributed hash table with
> key ranges mapped to data nodes. Partly this has come from my
> interpretation of HBase's own documentation and partly based on other NoSQL
> datastores which revolve around same concept. Is it not right that a key is
> "hashed" to map to which node it is served by? And that all the data for
> that key is within that node in a blob? My intention was to state that such
> a hash table design is "very good" for key based data look ups but NOT
> ideal for use cases where a certain subset of keys need to be scanned and
> its values aggregated. This essentially means a FULL SCAN of the datastore
> with a reliable, consistent but a large (for our usecase) response time.
> Again, other usecases can live with this latency as in many reporting
> applications for example.


Not quite. HBase is an ordered, range-partitioned map. Indeed you have
random lookup based on key, but those keys are strictly ordered. A region
is a key range, so all sequential keys (like [a...f]) are stored together
in a single region. Regions themselves are spread across the cluster
uniformly, so there's no guarantees made that two sequential regions would
be hosted on the same region server. Thus, HBase is quite good at
sequential access as well as random access.

The Data Model [0] and Schema Design [1] sections of our online manual
explain this in more detail.

[0]: http://hbase.apache.org/book.html#datamodel
[1]: http://hbase.apache.org/book.html#schema


> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com]
> Sent: Monday, April 06, 2015 4:20 PM
> To: user@hbase.apache.org
> Cc: user@phoenix.apache.org
> Subject: Re: How to Manage Data Architecture & Modeling for HBase
>
> Ok…
>
> Need to clarify this…
>
> The use of real time is a bit misleading. Its subjective real time.
>
> With respect to schema design… please see my longer post on design in this
> thread. Again think Hierarchical which means that you get everything in a
> single get().
>
> And yes, you have to think about your use case.  In some use cases, you
> are using M/R and pulling data and doing calculations which is output in to
> HBase where another app will in subjective real time , pull data from hbase
> for use.
>
> In my earlier post I talked about using HBase to join data from different
> data sets. This is one of the main use cases and arguments for Hadoop. That
> you want to gain value by taking data from different data sets where the
> combined data may yield insights that were not previously possible.
>
> I’m not sure where you are getting at with hash tables.
>
> I am not suggesting that HBase is right for all occasions, because its
> not. But I am suggesting that a lot of effort and failed attempts can be
> avoided by understanding how to best use HBase and to not think in terms of
> relationships.
>
> HTH
>
> -Mike
>
>
>
> > On Apr 6, 2015, at 12:09 PM, Pamecha, Abhishek
> <apamecha@paypal.com.INVALID> wrote:
> >
> > I would stress that if you envision any joins or arbitrary slices and
> dices at a later point in your application, you might want to either
> redesign your schema "very carefully"  or be ready for more time consuming
> ( not near real time) answers. We had explored a possible solution on
> similar lines but a hashtable approach (as expected)  isn’t the best for
> database joins OR slicing based on arbitrary columns across the whole
> dataset. We had to switch back to a relational db for our usecase.
> >
> > Thanks,
> > Abhishek
> >
> > -----Original Message-----
> > From: Michael Segel [mailto:michael_segel@hotmail.com]
> > Sent: Monday, April 06, 2015 9:55 AM
> > To: user@hbase.apache.org
> > Cc: user@phoenix.apache.org
> > Subject: Re: How to Manage Data Architecture & Modeling for HBase
> >
> > I should add that in terms of financial modeling…
> >
> > Its easier to store derivatives and synthetic instruments because you
> aren’t really constrained by a relational model.
> > (Derivatives are nothing more than a contract.)
> >
> > HTH
> >
> > -Mike
> >
> >> On Apr 6, 2015, at 8:34 AM, Ben Liang <liangpc@hotmail.com> wrote:
> >>
> >> Thank you for your prompt reply.
> >>
> >> In my daily work, I mainly used Oracle DB to build a data warehouse
> with star topology data modeling, about financial analysis and marketing
> analysis.
> >> Now I trying to use Hbase to do it.
> >>
> >> I has a question,
> >> 1) many tables from ERP should be Incremental loading every day ,
> >> Including some insert and some update,  this scenario is appropriate
> >> to use  hbase to build data worehose?
> >> 2) Is there some case about Enterprise BI Solutions with HBASE?
> >>
> >> thanks.
> >>
> >>
> >> Regards,
> >> Ben Liang
> >>
> >>> On Apr 6, 2015, at 20:27, Michael Segel <michael_segel@hotmail.com>
> wrote:
> >>>
> >>> Yeah. Jean-Marc is right.
> >>>
> >>> You have to think more in terms of a hierarchical model where you’re
> modeling records not relationships.
> >>>
> >>> Your model would look like a single ER box per record type.
> >>>
> >>> The HBase schema is very simple.  Tables, column families and that’s
> it for static structures.  Even then, column families tend to get misused.
> >>>
> >>> If you’re looking at a relational model… Phoenix or Splice Machines
> would allow you to do something… although Phoenix is still VERY primitive.
> >>> (Do they take advantage of cell versioning like spice machines yet?
> >>> )
> >>>
> >>>
> >>> There are a couple of interesting things where you could create your
> >>> own modeling tool / syntax (relationships)…
> >>>
> >>> 1) HBase is more 3D than RDBMS 2D and similar to ORDBMSs.
> >>> 2) You can join entities on either a FK principle or on a weaker
> relationship type.
> >>>
> >>> HBase stores CLOBS/BLOBs in each cell. Its all just byte arrays with a
> finite bounded length not to exceed the size of a region. So you could
> store an entire record as a CLOB within a cell.  Its in this sense that a
> cell can represent multiple attributes of your object/record that you gain
> an additional dimension and why you only need to use a single data type.
> >>>
> >>> HBase and Hadoop in general allow one to join orthogonal data sets
> that have a weak relationship.  So while you can still join sets against a
> FK which implies a relationship, you don’t have to do it.
> >>>
> >>> Imagine if you wanted to find out the average cost of a front end
> collision by car of college aged drivers by major.
> >>> You would be joining insurance records against registrations for all
> of the universities in the US for those students between the ages of 17 and
> 25.
> >>>
> >>> How would you model this when in fact neither defining attribute is a
> FK?
> >>> (This is why you need a good Secondary Indexing implementation and
> >>> not something brain dead that wasn’t alcohol induced. ;-)
> >>>
> >>> Does that make sense?
> >>>
> >>> Note: I don’t know if anyone like CCCis, Allstate, State Farm, or
> Progressive Insurance are doing anything like this. But they could.
> >>>
> >>>> On Apr 5, 2015, at 7:54 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
> >>>>
> >>>> Not sure you want to ever do that... Designing an HBase application
> >>>> is far different from designing an RDBMS one. Not sure those tools
> fit well here.
> >>>>
> >>>> What's you're goal? Designing your HBase schema somewhere and then
> >>>> let the tool generate your HBase tables?
> >>>>
> >>>> 2015-04-05 18:26 GMT-04:00 Ben Liang <liangpc@hotmail.com>:
> >>>>
> >>>>> Hi all,
> >>>>>     Do you have any tools to manage Data Architecture & Modeling
> >>>>> for HBase( or Phoenix) ?  Can we  use Powerdesinger or ERWin to
do
> it?
> >>>>>
> >>>>>     Please give me some advice.
> >>>>>
> >>>>> Regards,
> >>>>> Ben Liang
> >>>>>
> >>>>>
> >>>
> >>> The opinions expressed here are mine, while they may reflect a
> cognitive thought, that is purely accidental.
> >>> Use at your own risk.
> >>> Michael Segel
> >>> michael_segel (AT) hotmail.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
>
>

Mime
View raw message