Because the potential storage capability of a Hadoop cluster is so very large, you need some means to track both the data contained on the cluster and the data feeds moving data into and out of it. In addition, you need to consider the locations where data might reside on the cluster—that is, in HDFS, Hive, HBase, or Impala. Knowing you should track your data only spawns more questions, however: What type of reporting might be required and in what format? Is a dashboard needed to post the status of data at any given moment? Are graphs or tables helpful to show the state of a data source for a given time period, such as the days in a week? This article will help you sort out the answers to those questions by demonstrating how to build a range of simple reports using HDFS- and Hive-based data. Although you may end up using completely different tools, reporting methods, and data content to construct the reports for your real-world data, the building blocks presented here will provide insight into the tasks on Hadoop that apply to many other scenarios as well.
We’ll demonstrate the full use of Hunk (the Hadoop version of Splunk) by a list of articles, starting by this one that will show you how to source the software , how to install it and run it ,and a next articles is coming to show how to use it, how to create reports , how to deal. Some basic errors and their solutions will be presented along with some simple dashboards to monitor the data.