Tag Archives: Hbase

Hbase Interview questions

What is Hbase?

Hbase is a colluomn-oriented database management system, which runs on the top of HDFS. It’s sub project of Hadoop. It highly scalable (both linear and modular scaling), distributed, process billions of rows quickly.

 When we goes to Apache HBase?

When the database has billions of rows, millions of columns and sparse datasets, Hbase is the best choice to process such data. Hbase can process unstructured data also, so Hadoop used to update the existent data.

 

What is importance of ColumnFamily in Hbase?

A logical deviation of a data represented by a key called column family. Virtually column families form dynamically based on data. Which holes the multiple columns of related data. All column members of a column family have the same prefix. For example Vehicle is a column Maruthi, Tata, Hero are the sub column of the Vehicle. So here Vehicle consider as column family.

Eg: Hbase > put ‘cars’, ‘price’, ‘ Vehicle:Maruthi’, ‘1,00,000’ // The syntax should be in order table, row, column family, value.

put ‘cars’, ‘price’, ‘Vehicle:Tata’,’2,00,000′

put ‘cars’, ‘price’,’Vehicle:Hero’,’3,00,000′

Here, cars is a table, Vehicle is a column family and 1,00,000 value.

What is the different between a column-oriented  and row-oriented databases?

Why Hbase instead of Hadoop?

Hbase suitable for low latency request, but mapreduce is high latency. Hadoop not support updates, but Hbase can support. Hadoop can store matadata only, but hbase can index the data.

What type of datatypes supports and not supports in Hbase?

Hbase has Put and Result interface which converts bytes and stored in an array as a value. So it can support any datatype like string, number, image or anything that can rendered as bytes. Typecasting always possible.

What are different type of block cache? 

Hbase provides 2 different block cache, such as on-heap and off-heap cache also called LruBlockCache (default) and bucketCache.  On-heap cache is implemented from Java heap, where as bucketCache implemented from file-block cache.

hbase interview questions

Elaborate very important commands in Hbase

create ‘table’, ‘columnFamily’

put ‘table’, ‘rwo’, ‘columnFamily’, ‘value’

get’table’, ‘row’, ‘columnfamily’, ‘value’

scan ‘table’, ‘row’, ‘columnfamily’, ‘value’

list ‘tablename’

disable ‘table’

drop ‘table’

describe ‘table’

What is Mem-store?

Menstore is a temporary repository in Hbase, which holds data in-memory modifications to the Store. It’s store Maximum HDFS block size data, once reaches maximum size(64MB), it flushes the data into a HDFS.

What is DFSClient functions?

DFSCllient handels all remote server’s interactions. It means to communicate with NameNode, Datanodes or JobTracker/YARN, required DFSClient. Hbase persists the data in HDFS via DFS client.

Where Read performance and write permanence high?

Sequential keys, salted keys, promoted field keys and random keys are main 4 types of keys.

Sequential reads: Sequential keys> salted keys> promoted field keys> random keys
Sequential writes: random keys> promoted field keys> salted keys> Sequential keys

All these keys based on #keys and focus on unique key.

What is autosharding in Hbase?

Hbase dynamically distributed by the system when the hbase is getting huge amount of data, this feature called auto-sharding.

 What are Ulimit and nproc of Hbase?

ulimit is a upper bound of the process.
nproc can limiting the maximum number of processes available for a particular application. Which restrict the processes.

What are Bloom Filters?

Bloom filters is filtering out blocks that you don’t need. Which can save your  disk and improve read latency.

What is the importance of MemStore and BlockCache?

Memory utilization and caching structures are too important in Hbase. To archive it’s goal, HBase maintain two cache structures called MemStore and BlockCache. MemStore is a temporary repository and buffering in memory. Block cache keeps data blocks in memory after read.

What are different type of blocks in HBase?

Block is a single smallest amount/unit of data. There are 4 type of veriets such as: Data, Meta, Index and Bloom. Data locks store User data. Index and Bloom blocks serve to speed up the read path. Index provides index of the particular Data blocks. Bloom block contain a bloom filter, that filter the data and display desired data quickly. Meta blocks store information about Hfile.

What are the core components in Hbase?

Hmaster serves one or more HRegion Servers.
Each HRegion Server serves one or more Region.
Each Region serves one Hlog and multiple Stores.
Each Store serves one MemStore and multiple StoreFile.
Each Store file has only one Hfile.
Each Hfile can hold 64kb of data.

How to write a file?

First the  client written the data to HregionServer. First data stored data in (write ahead log) Hlog file, then the data is written to MemStore. Memstore temporary holds the data. If Memstore is full, it flush the data to Hfile. The data is ordered in Memstore and Hfile. Which is the temporary repository in the Hbase. Which persist the data on HDFS via DFS client.

 Explain what is WAL and Hlog in Hbase?

Why Hbase follow lexicographical order?

If you forget syntax what should you do?

use help followed by command, for example, help ‘scan’

What is different between scan and get?

When CRUD operations not applicable?

when schema level updates / alternations done, its not possible to run CRUD operations. To alter schema level updations first disable the table, it’s mandatory.

quit

Useful links:

 http://netwovenblogs.com/2013/10/10/hbase-overview-of-architecture-and-data-model/

http://hbase.apache.org/book/quickstart.html

 http://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/

http://wiki.apache.org/hadoop/Hbase/Shell