Field Guide to Hadoop

Field Guide to Hadoop

4.11 - 1251 ratings - Source

If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. Youa€™ll quickly understand how Hadoopa€™s projects, subprojects, and related technologies work together. Each chapter introduces a different topica€”such as core technologies or data transfera€”and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, youa€™ll have a good grasp of the playing field. Topics include: Core technologiesa€”Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark Database and data managementa€”Cassandra, HBase, MongoDB, and Hive Serializationa€”Avro, JSON, and Parquet Management and monitoringa€”Puppet, Chef, Zookeeper, and Oozie Analytic helpersa€”Pig, Mahout, and MLLib Data transfera€”Scoop, Flume, distcp, and Storm Security, access control, auditinga€”Sentry, Kerberos, and Knox Cloud computing and virtualizationa€”Serengeti, Docker, and Whirryou may have a job that requires that two or three other jobs finish, and each of these require that data is loaded into HDFS from some external source. And you may want to run this job on a periodic basis. Of course, you could orchestrate this anbsp;...

Title:Field Guide to Hadoop
Author:Kevin Sitto, Marshall Presser
Publisher:"O'Reilly Media, Inc." - 2015-03-02


You Must CONTINUE and create a free account to access unlimited downloads & streaming