Apache Hadoop, everyone chanting Hadoop, it’s number one framework to store data in reliable manner. I agree, it’s true, fast, low cost and fault tolerance. Is there any alternative to HDFS? Is there any competitors or alternatives to HDFS to store data? Yes it’s called Ignite file system (IGFS). It’s number one storage file system to store, In this article i am explain about ignite file system overview.
What is Ignite File System (IGFS)
Apache ignite is a unified system to store and process any type of data. Apache ignite internally using ignite file system (IGFS) to store data. In another words like HDFS it will store, like alluxio it centralize data in memory, like Spark it’s process everything in memory. It’s fourth generating system.
In HDFS there are two type of storage called Memory and desk. By defaut HDFS store data in desk. Usually when you are processing at that time data move to memory, after processed, the data store in desk.
where as in IGFS, the data in two storage levels called on heap memory and off heap memory. It’s store data by default in memory. If not fit in memory remaining data store in off-heap memory. Means when you are processing time no IO hits, directly process data quickly. It’s huge plus to in memory processing systems like Spark.
On-heap Vs Off-heap memory
Simply when data processing time, temporary data store in memory to process. Let example ram size 8gb.
Now if you are processing 5gb that data fit in Memory so that data called on-heap. After process data garbage collector clean that on-heap memory.
If data more than heap memory (8gb), than remaining amount of data store in off-heap memory. Let example if you have 8gb ram, you want to process 10 gb ram than what happens 8gb store in ram remaining 2gb data store in off-heap. garbage collector unable to clean that off-heap memory.
Compare with off-heap memory on-heap memory very fast, but compare with desk off-heap memory very fast. Now ignite store like this, everything onheap and offheap
IGFS Integrate Other System
Ignite easily integrate with any distributed system like HDFS, Cloudera, Hortonworks. Unlike HDFS, IGFS does not need a name node. It automatically determines the file data locality using a hashing function.
If you use Ignite, no need Alluxio / Tachyon, both are doing same functionality. Ignite at a time it will store data, and process data. Alluxio simply Accelerator layer on top of HDFS, it’s not processing.
Please note Ignite is replacement of Alluxio, but not replacement of HDFS and spark. If you know Spark, directly run spark or execute ignite command anything ignite will support.