Tag Archives: admin

Hadoop Admin Interview Questions

What is the use of SSH in Hadoop?

By default, SSH doesn’t required for Hadoop framework, but most frequently hadoop run scripts then warped into a file and called that file to run script. SSH file can run those script files, to do so required ssh file. For example start-mapred.sh is a script file, to run it, required ssh file. Otherwise you can start it manually  by using this command.
hadoop dfsadmin start namenode

What is check pointing in Hadoop?

Check point is a key point of maintaining and persisting the file system meta data in HDFS. It’s only trigger to recover and restart the Namenode.

Can you explain few compatibility issues in Hadoop?

Java API compatibility issue. It’s trigger in Hadoop. Hadoop interfaces runs based on java api. so its trigger to interfaceStability issues.
Java binary compatibility: Most often this problem will occur in mapreduce and yarn applications.
Semantic compatibility: used to test the apis behaviour, most often Tests and javadocs specifies api’s behavior.
wire compatibility: transfer data over hadoop process by usin rpc.client server and server to server compatibility issues are trigger. most often during upgrade this problem will uccer.
REST API compatibility issues such as requests and responce issues are common compatibility issues.

Why ecc ram recommended for servers?
Error Correcting Code (ECC) Ram can automatically detect & corrects most common internal data & memory errors. So  most of the cases protection servers used this ECC ram.

Can you explain few FileSystem shell commands?

appendToFile
hdfs dfs -appendToFile local_src1 src2 src3 destination
cat:
hdfs dfs -cat filepath_to_read
chgrp: To change group association of files.
hdfs dfs -chmod -r URI
chmod: To change file permission
hdfs dfs -chmod -r URI
chown: change ownership
hdfs dfs -chown -r URI
copyFromLocal:
hdfs dfs -copyFromLocal URI
copyToLocal:
hdfs dfs -copyToLocal URI
count: Count the number of files and directories. -q for remaining quota, -h is human readabld.
hdfs dfs -count -q -h /file
cp:copy apped overwrite the file. -f forcefully overwrite
hdfs dfs -cp -f URI
hdfs dfs -c src1 src2 target //to copy
du: display the size of files. -s summary of file length. -h means human readabld.
hdfs dfs -du -s -h uri
dus: display summary of the length.
hdfs dfs -dus
expunge: empty trash.
hdfs dfs -expunge
get: hdfs dfs -get hdfs_src local_destination
DFS
hdfs dfs -ls -r
hdfs dfs -mkdir /paths
hdfs dfs -moveFromLocal localfiles1 2 3 dest
hdfs dfs -moveToLocal sourses 1 2 3 dest
hdfs dfs -mv source destination
hdfs dfs -put localsrc dest
hdfs dfs -rm -r url
hdfs dfs -setrep 3 path
hdfs drs -stat uri //stastical information
hdfs dfs -tail url // last kb of file.
hdfs dfs -test -[ezd] url // -e file exist or not?, -z zero length or not? d is it directory or not?
hdfs dfs -text src // allows only text input output format
hdfs dfs -touchz url // create an empty file.

mapred historyserver  .. to know the server history.

………getfacl– getfatter ..getmerge..setfacl…setfattr…

Can you explain few Hadoop admin commands?

hdfs version: to check version of hadoop.
balancer: HDFS not store data uniformly across multiple datanodes especiall after commission/decommission. So few nodes have more work pressure, to resolve this problem used balancer command. The balancer uniformly placed the data cross the cluster.

hdfs balancer -threshold -policy
eg: hdfs balancer -thrshold 80 -policy dd

Mover: Its similar to balancer. Periodically scans the files in HDFS to check the blocks follow rules or not?

hdfs mover -p files -f localfilename

daemonlog:
Get/set log level for each daemon.
hadoop daemonlog -getlevel localhost:8000/logLevel?log=name
hadoop daemonlog -setlevel localhost:9000/logLevel?log=name

datanode:
hdfs datanode -rollback , many more.
hdfs namenode -upgrade
hdfs secondarynamenode -checkpoint
hdfs dfsadmin -refreshNodes,
hdfs dfsadmin -safemode leave

hadoop dfsadmin -report // Report basic filesystem information.
hadoop dfsadmin -safemode leave/enter/get/wait // get out of safemode, enter into safemode
hadoop dfsadmin -refreshNodes //Force the namenode to reread it’s configuration settings.

hadoop distcp /source /destination
//Copy files recursively from one file to another url
hadoop fsck /path/to/check/corrupt/file delete/move/the-file
//It can designed for reporting problems & check the file system’s health status.

hdfs balancer <Percentage of disk capacity> <datanode>
hdfs datanode -rollback

distcp: Destributed copy tool used to copy vast amount from one or multiple sources to single destination.
eg/syntax: hadoop distcp /home/hadoop/Desktop/file1 hdfs://nn:8020/file1 /destnationfile
If you dont no destination or other version, use -update.
If already there files use -overwrite  to overwrite the files.

eg: hadoop distcp -update -overwrite /source1 /source2 /target

hdfs dfs:
Used to run filesystem commands on the different filesystems like hadoop.
hdfs dfs -cat hdfs://nn:8080/file1

fetchdt:
fetch deligation token in a file on the local system to access secure system from non-secure client.

FSCK: its used for reporting problems in files and check the health status of the cluster. 99% namenode automatically corrects most recover failures.
hdfs fsck files_path
jar:
hadoop jar jarfile_path main_class args

mapred job -submit job_file, …

Pipes used to run/execute C++ programs in HDFS.
mapred pipes -input path, …
Queue:
mapred queue -list

mapred classpath