phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Domen Kren <dk.ne...@gmail.com>
Subject Re: TTL on a single column family in table
Date Tue, 04 Sep 2018 23:23:05 GMT
Hey,

let me describe our situation. We have a table with 3 column families, let us say a, b and
c. They are segregated by our access patterns and data usage. Column family a uses about 10%,
b around 25% and c around 65% of space in every row. We have a situation where CF a has no
TTL, data will not be deleted, CF b is up to debate (can probably be deleted after 6 months)
and c is not needed after 4 weeks(1 month). By deleting not needed CFs in every row we can
free up to 80+% of our space in the table. The table has no additional indexes.

I have read about the empty/default CF as it was part of the debate in the issue i linked,
but in our case that should not be a problem as we define CF a first and there is no TTL on
that CF. We also did some preliminary test and have had no problems. TTL on CF b and CF c
behaves the same as if we deleted these columns manually.

There are some other approaches we tried, like manually deleting all cells that are overdue,
and using multiple tables with their own TTLs, but they bring some overhead(a lot, in case
of multiple tables) and would like to use the HBase TTL tool that is perfect for our case.
 

Of course i understand that this usage is in violation of Phoenix library design and would
bring overhead in checking updates and new patterns added(checking for conflicts), so either
way, there is additional work and we have not decided on the final solution. But from our
understanding this usage of TTL should mirror manual delete and be a lot more efficient.

If you have any additional concerns, hints or even ideas for other approaches we would really
appreciate them.

Again, thank you and best regards,
Domen Kren

PS: The table is projected to a be pretty large, 100+TB in few years(without not needed columns),
so we are really focused on size and cleanup optimization.

On 2018/09/04 22:30:12, Chinmay Kulkarni <chinmayskulkarni@gmail.com> wrote: 
> Hi Domen,
> 
> After PHOENIX-1409, we don't allow specifying a TTL for a specific column
> family and all column families share the same TTL value. If you were to
> alter this using HBase APIs, this could lead to many inconsistencies at the
> Phoenix level where we assume all CFs to have the same TTL value. For
> example, if you were to alter the TTL value of the empty/default column
> family for a table, then a select count(*) query on the table would reflect
> a different value depending whether the TTL for that column family has
> expired or not. Whereas, if you were to alter the TTL for any other column
> family, this would not affect the result of the select count(*) since we
> use the dummy value written to the empty/default column family to
> efficiently calculate count(*). There may also be other code paths that
> could give inconsistent results after this change.
> 
> In fact, PHOENIX-3955 aims to propagate the TTL, REPLICATION_SCOPE and
> KEEP_DELETED_CELLS properties to all column families of a table as well as
> its indexes, in order to keep data in sync between the base table and its
> indexes. What is the reason you wish to manually alter the TTL of a single
> column family?
> 
> On Tue, Sep 4, 2018 at 3:29 PM Thomas D'Silva <tdsilva@salesforce.com>
> wrote:
> 
> > If you  set different TTLs for column families you can run into issues
> > with SELECT count(*) queries not working correctly (depending on which
> > column family is used to store the EMPTY_COLUMN_VALUE).
> >
> > On Tue, Sep 4, 2018 at 10:56 AM, Sergey Soldatov <
> > sergey.soldatov@gmail.com> wrote:
> >
> >> What is the use case to set TTL only for a single column family? I would
> >> say that making TTL table wide is a mostly technical decision because in
> >> relational databases we operate with rows and supporting TTL for only some
> >> columns sounds a bit strange.
> >>
> >> Thanks,
> >> Sergey
> >>
> >> On Fri, Aug 31, 2018 at 7:43 AM Domen Kren <dk.nexus@gmail.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> we have situation where we would like to set TTL on a single column
> >>> family in a table. After getting errors while trying to do that trough a
> >>> phoenix command i found this issue,
> >>> https://issues.apache.org/jira/browse/PHOENIX-1409, where it said "TTL
> >>> - James Taylor and I discussed offline and we decided that for now we will
> >>> only be supporting for all column families to have the same TTL as the
> >>> empty column family. This means we error out if a column family is
> >>> specified while setting TTL property - both at CREATE TABLE and ALTER TABLE
> >>> time. Also changes were made to make sure that any new column family added
> >>> gets the same TTL as the empty CF."
> >>>
> >>> If i understand correctly, this was a design decision and not a
> >>> technical one. So my question is, if i change this configuration trough
> >>> HBase API or console, could there be potential problems that arise in
> >>> phoenix?
> >>>
> >>> Thanks you and best regards,
> >>> Domen Kren
> >>>
> >>>
> >>>
> >
> 
> -- 
> Chinmay Kulkarni
> M.S. Computer Science,
> University of Illinois at Urbana-Champaign.
> B. Tech Computer Engineering,
> College of Engineering, Pune.
> 

Mime
View raw message