Hbase phoenix performance




















Major compaction merges all of the files in a region. Major compactions also remove deletes or expired versions. By default, major compactions run every 24 hours and merge together all store files into one. After a major compaction runs, there is a single StoreFile for each store.

The underlying row key design is the single most important factor in Phoenix performance. The columns for the primary key constraint should be chosen and ordered in a way that aligns with the common query patterns — choose the most frequently queried columns as primary keys. The key that you place in the leading position is the most performant one.

We performed a benchmarking of primary key performance. For this we used about 27 million records from a dataset loaded in Hbase table. For write-heavy workloads, if the primary key is monotonically increasing, create salt buckets to help avoid write hotspots at the expense of overall read throughput due to the additional scans needed. Also, when using UPSERT to write a large number of records, turn off autocommit and batch records instead of writing them one by one.

When deleting a large data set, turn on autoCommit before issuing the DELETE query so that the client does not need to remember the row keys of all the keys as they are deleted. This prevents the client from buffering the rows affected by the DELETE so that Phoenix can delete them directly on the region servers without the expense of returning them to the client. If your scenario favors write-speed over data integrity, you can consider disabling the write ahead log.

For example:. For details on this and other options, see Phoenix Grammer. Skip to content. Star Permalink master. Branches Tags. Could not load branches. Could not load tags. Raw Blame. Open with Desktop View raw View blame. Phoenix Performance Best Practices This article provides the fundamental techniques you should consider when optimizing the peformance of your Phoenix deployment on HDInsight.

Phoenix and HBase One of the most important considerations when optimizing the performance of Phoenix really boils down to making sure HBase is well optimized. Primary Key Design The primary key that you define on a table in Phoenix actually dictates the semantics for how data is stored within the rowkey of the underlying HBase table. You could define a Primary Key based on an increasing sequence number, so your rowkeys would look like: rowkey address phone firstName lastName San Gabriel Dr.

This eliminates hot-spotting of single or few regions servers. Read more about this feature here. Following chart shows write performance with and without the use of Salting which splits table in 4 regions running on 4 region server cluster Note: For optimal performance, number of salt buckets should match number of region servers. Following chart shows in-memory query time of running the Top-N query over 10M rows using Phoenix 1.

Download Microsoft Edge More info. Contents Exit focus mode. Table schema design When you create a table in Phoenix, that table is stored in an HBase table. Primary key design The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. You could define a primary key based on an increasing sequence number: rowkey address phone firstName lastName San Gabriel Dr.

Column family design If some columns are accessed more frequently than others, you should create multiple column families to separate the frequently accessed columns from rarely accessed columns.

When processing queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code. JSON isn't recommended, as it's larger. For anticipated queries, you can also create secondary indexes by specifying their columns.

When designing your indexes: Only create the indexes you need. Limit the number of indexes on frequently updated tables. Updates to a table translate into writes to both the main table and the index tables. Create secondary indexes Secondary indexes can improve read performance by turning what would be a full table scan into a point lookup, at the cost of storage space and write speed.

Use covered indexes Covered indexes are indexes that include data from the row in addition to the values that are indexed.

Join efficiently. Check that the plan: Uses your primary key when appropriate. Uses appropriate secondary indexes, rather than the data table. Join efficiently Generally, you want to avoid joins unless one side is small, especially on frequent queries. Scenarios The following guidelines describe some common patterns. Read-heavy workloads For read-heavy use cases, make sure you're using indexes.

Write-heavy workloads For write-heavy workloads where the primary key is monotonically increasing, create salt buckets to help avoid write hotspots, at the expense of overall read throughput because of the additional scans needed. Bulk deletes When deleting a large data set, turn on autoCommit before issuing the DELETE query, so that the client doesn't need to remember the row keys for all deleted rows.



0コメント

  • 1000 / 1000