With the increasing advancements in technology and development, the internet has become widely used over the last years and has become an everyday part of our lives and those around us.
With the increasing trends of social media and online social networking, billions of people use these services at the same time around the globe. With such a massive number of people, these social networks bring in a lot of traffic and a flood of data that comes from many resources.
Whether it be social media websites, finance, telecommunication, media or any of the many online businesses, all the big data clusters need to be processed into simple programming models or in simple words; all the big data needs to be distributed and stored.
Hadoop is an open source framework, which allows all the big data to be distributed and stored. Hadoop is an efficient, reliable and easy to use and is an open source, i.e., it can be freely used in simple terms.
The Hadoop distributed file system can hold a very large amount of data and to store this vast amount of data, the storage is done across multiple machines.
It is also highly suitable for storage and processing of data. Another component of the Hadoop Architecture is the MapReduce Overview. It is a framework which is used to process a large amount of data in a distributed manner across multiple machines.
MapReduce can distribute an input of data into pairs, shuffle the pairs and also reduce them. YARN, which is another significant component of the Hadoop Architecture is responsible for data processing resources such as the CPU, ram delivered, memory, etc. needed to run an application successfully.
Important elements of YARN (Yet Another Resource Negotiator) are the resource manager and the node manager, working as the master and the slave.
The resource manager works as the master and knows where all the slaves are located and how many sources they have. The resource manager helps to assign resources.
The node manager is the slave when it starts to work it announces the resource manager or sends a signal. These were the important and the main components that the Hadoop architecture comprises of and are unique in their way with their own set of characteristics.
Some of the characteristics and features of Hadoop include it being fault tolerant, which means it can quickly look out for any failure and heal itself. It is also very cost effective and not expensive. Hadoop can store anything and has a vast storing capacity as well.
By focusing on the characteristics and the functions that can be performed by Hadoop, it can be concluded that Hadoop indeed is handy to store large amounts of data and can significantly help lessen the burden that any party has.
With its huge storing capacity and its availability for everyone, Hadoop already is being used by the many famous social networking sites. Renowned search engines like Yahoo and Amazon also use Hadoop.
Facebook, which is the largest social network uses Hadoop for log processing and also for storing data. It is used in almost every domain. Some of the famous renowned social websites that use Hadoop include Amazon, Google, IBM, New York Times, Twitter, Linked In etc.
The list goes on. There are also other several projects with Hadoop including ZooKeeper, Pig, Hive etc. These can further help, store and filter the data.
The final statement can be made about Hadoop that it has definitely provided companies that have huge amounts of data with effective solutions. Since it is an open source, it is widely used by companies.