Category Archives: Hbase

Hbase Interview questions

What is Hbase?

Hbase is a colluomn-oriented database management system, which runs on the top of HDFS. It’s sub project of Hadoop. It highly scalable (both linear and modular scaling), distributed, process billions of rows quickly.

 When we goes to Apache HBase?

When the database has billions of rows, millions of columns and sparse datasets, Hbase is the best choice to process such data. Hbase can process unstructured data also, so Hadoop used to update the existent data.

 

What is importance of ColumnFamily in Hbase?

A logical deviation of a data represented by a key called column family. Virtually column families form dynamically based on data. Which holes the multiple columns of related data. All column members of a column family have the same prefix. For example Vehicle is a column Maruthi, Tata, Hero are the sub column of the Vehicle. So here Vehicle consider as column family.

Eg: Hbase > put ‘cars’, ‘price’, ‘ Vehicle:Maruthi’, ‘1,00,000’ // The syntax should be in order table, row, column family, value.

put ‘cars’, ‘price’, ‘Vehicle:Tata’,’2,00,000′

put ‘cars’, ‘price’,’Vehicle:Hero’,’3,00,000′

Here, cars is a table, Vehicle is a column family and 1,00,000 value.

What is the different between a column-oriented  and row-oriented databases?

Why Hbase instead of Hadoop?

Hbase suitable for low latency request, but mapreduce is high latency. Hadoop not support updates, but Hbase can support. Hadoop can store matadata only, but hbase can index the data.

What type of datatypes supports and not supports in Hbase?

Hbase has Put and Result interface which converts bytes and stored in an array as a value. So it can support any datatype like string, number, image or anything that can rendered as bytes. Typecasting always possible.

What are different type of block cache? 

Hbase provides 2 different block cache, such as on-heap and off-heap cache also called LruBlockCache (default) and bucketCache.  On-heap cache is implemented from Java heap, where as bucketCache implemented from file-block cache.

hbase interview questions

Elaborate very important commands in Hbase

create ‘table’, ‘columnFamily’

put ‘table’, ‘rwo’, ‘columnFamily’, ‘value’

get’table’, ‘row’, ‘columnfamily’, ‘value’

scan ‘table’, ‘row’, ‘columnfamily’, ‘value’

list ‘tablename’

disable ‘table’

drop ‘table’

describe ‘table’

What is Mem-store?

Menstore is a temporary repository in Hbase, which holds data in-memory modifications to the Store. It’s store Maximum HDFS block size data, once reaches maximum size(64MB), it flushes the data into a HDFS.

What is DFSClient functions?

DFSCllient handels all remote server’s interactions. It means to communicate with NameNode, Datanodes or JobTracker/YARN, required DFSClient. Hbase persists the data in HDFS via DFS client.

Where Read performance and write permanence high?

Sequential keys, salted keys, promoted field keys and random keys are main 4 types of keys.

Sequential reads: Sequential keys> salted keys> promoted field keys> random keys
Sequential writes: random keys> promoted field keys> salted keys> Sequential keys

All these keys based on #keys and focus on unique key.

What is autosharding in Hbase?

Hbase dynamically distributed by the system when the hbase is getting huge amount of data, this feature called auto-sharding.

 What are Ulimit and nproc of Hbase?

ulimit is a upper bound of the process.
nproc can limiting the maximum number of processes available for a particular application. Which restrict the processes.

What are Bloom Filters?

Bloom filters is filtering out blocks that you don’t need. Which can save your  disk and improve read latency.

What is the importance of MemStore and BlockCache?

Memory utilization and caching structures are too important in Hbase. To archive it’s goal, HBase maintain two cache structures called MemStore and BlockCache. MemStore is a temporary repository and buffering in memory. Block cache keeps data blocks in memory after read.

What are different type of blocks in HBase?

Block is a single smallest amount/unit of data. There are 4 type of veriets such as: Data, Meta, Index and Bloom. Data locks store User data. Index and Bloom blocks serve to speed up the read path. Index provides index of the particular Data blocks. Bloom block contain a bloom filter, that filter the data and display desired data quickly. Meta blocks store information about Hfile.

What are the core components in Hbase?

Hmaster serves one or more HRegion Servers.
Each HRegion Server serves one or more Region.
Each Region serves one Hlog and multiple Stores.
Each Store serves one MemStore and multiple StoreFile.
Each Store file has only one Hfile.
Each Hfile can hold 64kb of data.

How to write a file?

First the  client written the data to HregionServer. First data stored data in (write ahead log) Hlog file, then the data is written to MemStore. Memstore temporary holds the data. If Memstore is full, it flush the data to Hfile. The data is ordered in Memstore and Hfile. Which is the temporary repository in the Hbase. Which persist the data on HDFS via DFS client.

 Explain what is WAL and Hlog in Hbase?

Why Hbase follow lexicographical order?

If you forget syntax what should you do?

use help followed by command, for example, help ‘scan’

What is different between scan and get?

When CRUD operations not applicable?

when schema level updates / alternations done, its not possible to run CRUD operations. To alter schema level updations first disable the table, it’s mandatory.

quit

Useful links:

 http://netwovenblogs.com/2013/10/10/hbase-overview-of-architecture-and-data-model/

http://hbase.apache.org/book/quickstart.html

 http://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/

http://wiki.apache.org/hadoop/Hbase/Shell

Hbase

Why Hbase use Phoenix, Why NoSql doesn’t support SQL Queries?
By Default NoSql doesn’t support SQL queries, but with the help of tools, it’s possible to run SQL commands on the top of NoSQL databases. For example Phoenix is a tool, that run on the top Hbase. Phoenix is a SQL layer over HBase, use JDBC driver to convert user queries into NoSql understandable format.

 

Big data:

Crud not support directly on Bigdata.

Updates are not possible in Big data.

particular keyword available or not not support. (no indexing)

……….

after quarying, results will come within a fraction of time and support the CRUD operations.

as a part of big data implementation Hbase and cassandra implemented.

Stumbleupon implemented Hbase ( QL not 100% sql) + with Phoenix 100% implemented SQL.–

Facebook-> amazon -> data stax -.< apache cassandra (Data Stax is main) (CQL) almost like 100%- joins not support

 

Scalability issue is trigger in Hbase, so it’s best. Other NOSQL application not on the top of hdfs.
Any application can use Hbase. Most of use coading, but not commands. Still Hbase 0.x version only.

Column family. is the Collection of columns. There is no concept of database. Everything is in the hbase is Tables.

Collection of family called Table.

only one datatype is bytes in hbase.

Local Mode,

Psuedo mode, : Hmaster only run

cluster -> Internal & external zookeeper. Protection purpose cluster external zookeeper.

Zookeeper is monitoring like Nagios . If system is highly availability must use zookeeper. It runs java threads.  Simple launch number of threads and observing those threads.  Ganglia and Nagios are networking tools, but zookeeper is process level monitoring.

…………..

 

start hadoopbase working or not true or not how to verify?

Hbase-site-xml… you are in cluster odistributed is true you are in, if false: no

hbase-eng.sh

last line expoert hbase manages zk = true. It’s local internal zookeeper..

First start zookeeper, master and region server.

Hbase shall : starting Hbase prompt

List: list of the tables.
hadoop fs -rmr /file – to delete file.

create ‘table name’, ‘column family’ // you can ad multiple column families, but creating one table only ..

create ‘t’,’cf’, ‘cf1’, ‘cf2’, ‘cf3’

no need ; just enter is end the line;

isd;

Coloum family you can add millions of columns dynamically, but it’s not possible in RDBMS.

column!= column family;
We can add wild operations also.

for eg: List ‘r’ show all tables started with r.

list list ‘r.*’ both are

CRUD

Create get scan,put, drop delete (crud)

put ‘test’, ‘row1’, ‘cf:a’,’value’

where a is columns.  the value assign to a.

Hbase and cossandra follows Sparse matrix to store the data , while RDBMS follow dense matrix.

If assign new value assign to value,  in the form of revision/version.

……………………

MR integration

Filter

Phoenix……..

MRIntegration: Whey to go to MR Integration?

Hbase do sequentially, but map-reduce do parallel. So some time hbase do MRintegration for better results.