Treasure Data on The YARN - Hadoop Conference Japan 2014

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. Who am I? • Ryu Kobayashi • @ryu_kobayashi • https://github.com/ryukobayashi • Treasure Data, Inc. • Software Engineer • Background • Hadoop, Cassandra, Machine Learning, ... • I developed Huahin(Hadoop) Framework.   http://huahinframework.org/

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. Our Service ! ! ! ! Columnar Storage! +! Hadoop! MapReduce! Data Collection Data Warehouse Data Analysis ! ! ! Open-Source! Log Collector! Bulk Loader! ! CSV / TSV! MySQL, Postgres! Oracle, etc. Web Log App Log Sensor RDBMS CRM ERP Streaming Upload BI Tools! Tableau, QlickView,! Pentaho, Excel, etc.! ! TD command /   Web Console REST API JDBC / ODBC SQL (HiveQL) or Pig Bulk Upload Parallel Upload External Service/ Storage! Custom App,! RDBMS, FTP, etc. Result push schema-less!

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. • MRv1 • JobTracker • TaskTracker • YARN • ResourceManager • NodeManager • ApplicationMaster • Job History Server * ******(We*can*not*see*the*log*history*If*it*do*not*install)

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. Distribution • CDH 5.0.2 • Red Hat/CentOS/Oracle 5 • Red Hat/CentOS/Oracle 6 • Ubuntu/Debian • HDP 2.1 • Red Hat/CentOS/SLES (64-bit) • (There is already Ubuntu12 to the repository) • Windows Server 2008 & 2012

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. Other notes for configuration file • hadoop-conf-pseudo does not work • some mistakes ex : yarn.nodemanager.aux-services mapreduce.shuffle -> mapreduce_shuffle • 2.2.0 and 2.4.0 • There are some differences

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. mapred-site.xml • mapred.tasktracker.map.tasks.maximum • mapred.tasktracker.reduce.tasks.maximum scheduler.xml • maxMaps, minMaps • maxReduces, minReduces MRv1

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. yarn-site.xml • yarn.nodemanager.resource.memory-mb • (yarn.nodenamager.vmem-pmem-ratio) • (yarn.scheduler.minimum-allocation-mb) mapred-site.xml • yarn.app.mapreduce.am.resource.mb • mapreduce.map.memory.mb • mapreduce.reduce.memory.mb fair-scheduler.xml • maxResources, minResources YARN(MRv2)

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. yarn.nodemanager.resource.memory-mb => Memory that NodeManager uses ! yarn.app.mapreduce.am.resource.mb => Memory that ApplicationMaster uses ! mapreduce.map.memory.mb => Memory that Map uses ! mapreduce.reduce.memory.mb => Memory that Reduce uses YANR Resource Management

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. yarn.nodemanager.resource.memory-mb = 4096 yarn.app.mapreduce.am.resource.mb = 1024 mapreduce.map.memory.mb = 1024 mapreduce.reduce.memory.mb = 2048 ! MRv2 Application ApplicationMaster => 1 Mapper => 3 Reducer => 1 YANR Resource Example

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. In addition to this(ex: Fair Scheduler): pool -> queue user. maxRunningJobs -> user. maxRunningApps userMaxJobsDefault -> userMaxAppsDefault etc… Changes Fair scheduler

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. e.g. Use hdp-conﬁguration-utils.py script http://goo.gl/L2hxyq ! Use Ambari http://ambari.apache.org/ (not supported Ubuntu12. Ubuntu 12 support is coming soon) YANR Resource Management

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. DefaultContainerExecuter • Container launch process based • Same as the conventional(MRv1) ! LinuxContainerExecuter • Only Linux • Some restrictions • cgroup, etc… YANR Container Executer

Copyright*©2014*Treasure*Data.**All*Rights*Reserved. MRv1 • The need to set the initial ! YARN • The need to set the initial • There is a change from MRv1 (ex: /tmp/hadoop-yarn/) YANR Directory Structure

Treasure Data on The YARN - Hadoop Conference Japan 2014

by Ryu Kobayashi

on Jul 09, 2014

Statistics

Views

Actions

1 Embed 5

Accessibility

Categories

Upload Details

Usage Rights

Report content

Treasure Data on The YARN - Hadoop Conference Japan 2014 Presentation Transcript