mirror of
https://github.com/clearlinux/clear-linux-documentation.git
synced 2026-04-29 11:38:23 +00:00
371 lines
21 KiB
HTML
371 lines
21 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html lang="en" data-content_root="../">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
|
||
|
||
<title>Apache* Hadoop* — Documentation for Clear Linux* project</title>
|
||
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=fa44fd50" />
|
||
<link rel="stylesheet" type="text/css" href="../_static/bizstyle.css?v=5283bb3d" />
|
||
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
|
||
|
||
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
|
||
<script src="../_static/doctools.js?v=9bcbadda"></script>
|
||
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
|
||
<script src="../_static/clipboard.min.js?v=a7894cd8"></script>
|
||
<script src="../_static/copybutton.js?v=a56c686a"></script>
|
||
<script src="../_static/bizstyle.js"></script>
|
||
<link rel="canonical" href="https://clearlinux.github.io/clear-linux-documentation/tutorials/apache-hadoop.html" />
|
||
<link rel="icon" href="../_static/favicon.ico"/>
|
||
<link rel="author" title="About these documents" href="../about.html" />
|
||
<link rel="index" title="Index" href="../genindex.html" />
|
||
<link rel="search" title="Search" href="../search.html" />
|
||
<link rel="next" title="Broadcom* Drivers" href="broadcom.html" />
|
||
<link rel="prev" title="Tutorials" href="index.html" />
|
||
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||
<!--[if lt IE 9]>
|
||
<script src="_static/css3-mediaqueries.js"></script>
|
||
<![endif]-->
|
||
</head><body>
|
||
<div class="related" role="navigation" aria-label="Related">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
accesskey="I">index</a></li>
|
||
<li class="right" >
|
||
<a href="broadcom.html" title="Broadcom* Drivers"
|
||
accesskey="N">next</a> |</li>
|
||
<li class="right" >
|
||
<a href="index.html" title="Tutorials"
|
||
accesskey="P">previous</a> |</li>
|
||
<li class="nav-item nav-item-0"><a href="../index.html">Documentation for Clear Linux* project</a> »</li>
|
||
<li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Tutorials</a> »</li>
|
||
<li class="nav-item nav-item-this"><a href="">Apache* Hadoop*</a></li>
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="document">
|
||
<div class="documentwrapper">
|
||
<div class="bodywrapper">
|
||
<div class="body" role="main">
|
||
|
||
<section id="apache-hadoop">
|
||
<span id="hadoop"></span><h1>Apache* Hadoop*<a class="headerlink" href="#apache-hadoop" title="Link to this heading">¶</a></h1>
|
||
<p>This tutorial explains the process of installing, configuring, and
|
||
running Apache Hadoop on Clear Linux* OS.</p>
|
||
<nav class="contents local" id="contents">
|
||
<ul class="simple">
|
||
<li><p><a class="reference internal" href="#description" id="id1">Description</a></p></li>
|
||
<li><p><a class="reference internal" href="#prerequisites" id="id2">Prerequisites</a></p></li>
|
||
<li><p><a class="reference internal" href="#install-apache-hadoop" id="id3">Install Apache Hadoop</a></p></li>
|
||
<li><p><a class="reference internal" href="#configure-apache-hadoop" id="id4">Configure Apache Hadoop</a></p></li>
|
||
<li><p><a class="reference internal" href="#configure-your-ssh-key" id="id5">Configure your SSH key</a></p></li>
|
||
<li><p><a class="reference internal" href="#run-the-hadoop-daemons" id="id6">Run the Hadoop daemons</a></p></li>
|
||
<li><p><a class="reference internal" href="#run-the-mapreduce-wordcount-example" id="id7">Run the MapReduce wordcount example</a></p></li>
|
||
</ul>
|
||
</nav>
|
||
<section id="description">
|
||
<h2><a class="toc-backref" href="#id1" role="doc-backlink">Description</a><a class="headerlink" href="#description" title="Link to this heading">¶</a></h2>
|
||
<p>For this tutorial, you will install Hadoop in a single machine
|
||
running both the master and slave daemons.</p>
|
||
<p>The Apache Hadoop software library is a framework for distributed processing
|
||
of large data sets across clusters of computers using simple programming
|
||
models. It is designed to scale up from single servers to thousands of
|
||
machines, with each machine offering local computation and storage.</p>
|
||
</section>
|
||
<section id="prerequisites">
|
||
<h2><a class="toc-backref" href="#id2" role="doc-backlink">Prerequisites</a><a class="headerlink" href="#prerequisites" title="Link to this heading">¶</a></h2>
|
||
<ul class="simple">
|
||
<li><p><a class="reference internal" href="../get-started/bare-metal-install-desktop.html#bare-metal-install-desktop"><span class="std std-ref">Install Clear Linux* OS from the live desktop</span></a></p></li>
|
||
<li><p>In Clear Linux OS, run <strong class="command">swupd update</strong></p></li>
|
||
</ul>
|
||
</section>
|
||
<section id="install-apache-hadoop">
|
||
<h2><a class="toc-backref" href="#id3" role="doc-backlink">Install Apache Hadoop</a><a class="headerlink" href="#install-apache-hadoop" title="Link to this heading">¶</a></h2>
|
||
<p>Apache Hadoop is included in the <strong class="command">big-data-basic</strong> bundle. To install
|
||
the framework, enter the following command:</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>swupd<span class="w"> </span>bundle-add<span class="w"> </span>big-data-basic
|
||
</pre></div>
|
||
</div>
|
||
</section>
|
||
<section id="configure-apache-hadoop">
|
||
<h2><a class="toc-backref" href="#id4" role="doc-backlink">Configure Apache Hadoop</a><a class="headerlink" href="#configure-apache-hadoop" title="Link to this heading">¶</a></h2>
|
||
<ol class="arabic">
|
||
<li><p>To create the configuration directory, enter the following command:</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>mkdir<span class="w"> </span>/etc/hadoop
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Copy the defaults from <code class="file docutils literal notranslate"><span class="pre">/usr/share/defaults/hadoop</span></code> to
|
||
<code class="file docutils literal notranslate"><span class="pre">/etc/hadoop</span></code> with the following command:</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>cp<span class="w"> </span>/usr/share/defaults/hadoop/*<span class="w"> </span>/etc/hadoop
|
||
</pre></div>
|
||
</div>
|
||
<div class="admonition note">
|
||
<p class="admonition-title">Note</p>
|
||
<p>Since Clear Linux OS is a stateless system, never modify the
|
||
files under the <code class="file docutils literal notranslate"><span class="pre">/usr/share/defaults</span></code> directory. The software
|
||
updater will overwrite those files.</p>
|
||
</div>
|
||
<p>Once all the configuration files are in <code class="file docutils literal notranslate"><span class="pre">/etc/hadoop</span></code>, edit them to
|
||
fit your needs. The <cite>NameNode</cite> server is the master server that manages the
|
||
namespace of the files’ system and regulates the clients’ access to files.
|
||
The first file to be edited, <code class="file docutils literal notranslate"><span class="pre">/etc/hadoop/core-site.xml</span></code>, informs the
|
||
Hadoop daemon where <cite>NameNode</cite> is running. In this tutorial, <cite>NameNode</cite> runs
|
||
in the <cite>localhost</cite>.</p>
|
||
</li>
|
||
<li><p>Open the <code class="file docutils literal notranslate"><span class="pre">/etc/hadoop/core-site.xml</span></code> file using any editor and modify
|
||
the file as follows:</p>
|
||
<div class="highlight-xml notranslate"><div class="highlight"><pre><span></span><span class="cp"><?xml version="1.0" encoding="UTF-8"?></span>
|
||
<span class="cp"><?xml-stylesheet type="text/xsl" href="configuration.xsl"?></span>
|
||
<span class="nt"><configuration></span>
|
||
<span class="nt"><property></span>
|
||
<span class="nt"><name></span>fs.default.name<span class="nt"></name></span>
|
||
<span class="nt"><value></span>hdfs://localhost:9000<span class="nt"></value></span>
|
||
<span class="nt"></property></span>
|
||
<span class="nt"></configuration></span>
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Edit the <code class="file docutils literal notranslate"><span class="pre">/etc/hadoop/hdfs-site.xml</span></code> file. This file configures the
|
||
<abbr title="Hadoop Distributed File System">HDFS</abbr> daemons. This configuration
|
||
includes the list of permitted and excluded data nodes and the size of
|
||
those blocks. For this example, set the number of block replication to 1
|
||
from the default of 3 as follows:</p>
|
||
<div class="highlight-xml notranslate"><div class="highlight"><pre><span></span><span class="cp"><?xml version="1.0" encoding="UTF-8"?></span>
|
||
<span class="cp"><?xml-stylesheet type="text/xsl" href="configuration.xsl"?></span>
|
||
<span class="nt"><configuration></span>
|
||
<span class="nt"><property></span>
|
||
<span class="nt"><name></span>dfs.replication<span class="nt"></name></span>
|
||
<span class="hll"><span class="nt"><value></span>1<span class="nt"></value></span>
|
||
</span><span class="nt"></property></span>
|
||
<span class="nt"><property></span>
|
||
<span class="nt"><name></span>dfs.permission<span class="nt"></name></span>
|
||
<span class="nt"><value></span>false<span class="nt"></value></span>
|
||
<span class="nt"></property></span>
|
||
<span class="nt"></configuration></span>
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Edit the <code class="file docutils literal notranslate"><span class="pre">/etc/hadoop/mapred-site.xml</span></code> file. This file configures
|
||
all daemons related to <cite>MapReduce</cite>: <cite>JobTracker</cite> and <cite>TaskTrackers</cite>. With
|
||
<cite>MapReduce</cite>, Hadoop can process big amounts of data in multiple systems. In
|
||
our example, we set <abbr title="Yet Another Resource Manager">YARN</abbr> as our
|
||
runtime framework for executing <cite>MapReduce</cite> jobs as follows:</p>
|
||
<div class="highlight-xml notranslate"><div class="highlight"><pre><span></span><span class="cp"><?xml version="1.0" encoding="UTF-8"?></span>
|
||
<span class="cp"><?xml-stylesheet type="text/xsl" href="configuration.xsl"?></span>
|
||
<span class="nt"><configuration></span>
|
||
<span class="nt"><property></span>
|
||
<span class="hll"><span class="nt"><name></span>mapreduce.framework.name<span class="nt"></name></span>
|
||
</span><span class="hll"><span class="nt"><value></span>yarn<span class="nt"></value></span>
|
||
</span><span class="nt"></property></span>
|
||
<span class="nt"></configuration></span>
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Edit the <code class="file docutils literal notranslate"><span class="pre">/etc/hadoop/yarn-site.xml</span></code> file. This file configures all
|
||
daemons related to <cite>YARN</cite>: <cite>ResourceManager</cite> and <cite>NodeManager</cite>. In our
|
||
example, we implement the <cite>mapreduce_shuffle</cite> service, which is the
|
||
default as follows:</p>
|
||
<div class="highlight-xml notranslate"><div class="highlight"><pre><span></span><span class="cp"><?xml version="1.0"?></span>
|
||
<span class="nt"><configuration></span>
|
||
<span class="nt"><property></span>
|
||
<span class="hll"><span class="nt"><name></span>yarn.nodemanager.aux-services<span class="nt"></name></span>
|
||
</span><span class="hll"><span class="nt"><value></span>mapreduce_shuffle<span class="nt"></value></span>
|
||
</span><span class="nt"></property></span>
|
||
<span class="nt"><property></span>
|
||
<span class="hll"><span class="nt"><name></span>yarn.nodemanager.auxservices.mapreduce.shuffle.class<span class="nt"></name></span>
|
||
</span><span class="hll"><span class="nt"><value></span>org.apache.hadoop.mapred.ShuffleHandler<span class="nt"></value></span>
|
||
</span><span class="nt"></property></span>
|
||
<span class="nt"></configuration></span>
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
</ol>
|
||
</section>
|
||
<section id="configure-your-ssh-key">
|
||
<h2><a class="toc-backref" href="#id5" role="doc-backlink">Configure your SSH key</a><a class="headerlink" href="#configure-your-ssh-key" title="Link to this heading">¶</a></h2>
|
||
<ol class="arabic">
|
||
<li><p>Create a SSH key. If you already have one, skip this step.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>ssh-keygen<span class="w"> </span>-t<span class="w"> </span>rsa
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Copy the key to your authorized keys.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>cat<span class="w"> </span>/root/.ssh/id_rsa.pub<span class="w"> </span><span class="p">|</span><span class="w"> </span>sudo<span class="w"> </span>tee<span class="w"> </span>-a<span class="w"> </span>/root/.ssh/authorized_keys
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Log into the localhost. If no password prompt appears, you are ready to
|
||
run the Hadoop daemons.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>ssh<span class="w"> </span>localhost
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
</ol>
|
||
</section>
|
||
<section id="run-the-hadoop-daemons">
|
||
<h2><a class="toc-backref" href="#id6" role="doc-backlink">Run the Hadoop daemons</a><a class="headerlink" href="#run-the-hadoop-daemons" title="Link to this heading">¶</a></h2>
|
||
<p>With all the configuration files properly edited, you are ready to start the
|
||
daemons.</p>
|
||
<p>When you format the <cite>NameNode</cite> server, it formats the metadata related to
|
||
data nodes. Thus, all the information on the data nodes is lost and the nodes
|
||
can be reused for new data.</p>
|
||
<ol class="arabic">
|
||
<li><p>Format the <cite>NameNode</cite> server with the following command:</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>hdfs<span class="w"> </span>namenode<span class="w"> </span>-format
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Start the DFS in <cite>NameNode</cite> and <cite>DataNodes</cite> with the following command:</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>start-dfs.sh
|
||
</pre></div>
|
||
</div>
|
||
<p>The console output should be similar to:</p>
|
||
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">Starting namenodes on [localhost]</span>
|
||
<span class="go">The authenticity of host 'localhost (::1)' can't be established.</span>
|
||
<span class="go">ECDSA key fingerprint is</span>
|
||
<span class="go">SHA256:97e+7TnomsS9W7GjFPjzY75HGBp+f1y6sA+ZFcOPIPU.</span>
|
||
<span class="go">Are you sure you want to continue connecting (yes/no)?</span>
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Enter <cite>yes</cite> to continue.</p></li>
|
||
<li><p>Start the <cite>YARN</cite> daemons <cite>ResourceManager</cite> and <cite>NodeManager</cite> with the
|
||
following command:</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>start-yarn.sh
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Ensure everything is running as expected with the following command:</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>jps
|
||
</pre></div>
|
||
</div>
|
||
<p>The console output should be similar to:</p>
|
||
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">22674 DataNode</span>
|
||
<span class="go">26228 Jps</span>
|
||
<span class="go">22533 NameNode</span>
|
||
<span class="go">23046 ResourceManager</span>
|
||
<span class="go">22854 SecondaryNameNode</span>
|
||
<span class="go">23150 NodeManager</span>
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
</ol>
|
||
</section>
|
||
<section id="run-the-mapreduce-wordcount-example">
|
||
<h2><a class="toc-backref" href="#id7" role="doc-backlink">Run the MapReduce wordcount example</a><a class="headerlink" href="#run-the-mapreduce-wordcount-example" title="Link to this heading">¶</a></h2>
|
||
<ol class="arabic">
|
||
<li><p>Create the input directory.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>hdfs<span class="w"> </span>dfs<span class="w"> </span>-mkdir<span class="w"> </span>-p<span class="w"> </span>/user/root/input
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Copy a file from the local file system to the HDFS.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>hdfs<span class="w"> </span>dfs<span class="w"> </span>-copyFromLocal<span class="w"> </span>local-file<span class="w"> </span>/user/root/input
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Run the <cite>wordcount</cite> example.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>hadoop<span class="w"> </span>jar<span class="w"> </span>/usr/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar<span class="w"> </span>wordcount<span class="w"> </span>input<span class="w"> </span>output
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
<li><p>Read the output file “part-r-00000”. This file contains the number of times
|
||
each word appears in the file.</p>
|
||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>hdfs<span class="w"> </span>dfs<span class="w"> </span>-cat<span class="w"> </span>/user/root/output/part-r-00000
|
||
</pre></div>
|
||
</div>
|
||
</li>
|
||
</ol>
|
||
<p><strong>Congratulations!</strong></p>
|
||
<p>You have successfully installed and setup a single node Hadoop cluster.
|
||
Additionally, you ran a simple wordcount example.</p>
|
||
<p>Your single node Hadoop cluster is up and running!</p>
|
||
</section>
|
||
</section>
|
||
|
||
|
||
<div class="clearer"></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="sphinxsidebar" role="navigation" aria-label="Main">
|
||
<div class="sphinxsidebarwrapper">
|
||
<p class="logo"><a href="../index.html">
|
||
<img class="logo" src="../_static/clearlinux.png" alt="Logo of Clear Linux* Project Docs"/>
|
||
</a></p>
|
||
<div>
|
||
<h3><a href="../index.html">Table of Contents</a></h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">Apache* Hadoop*</a><ul>
|
||
<li><a class="reference internal" href="#description">Description</a></li>
|
||
<li><a class="reference internal" href="#prerequisites">Prerequisites</a></li>
|
||
<li><a class="reference internal" href="#install-apache-hadoop">Install Apache Hadoop</a></li>
|
||
<li><a class="reference internal" href="#configure-apache-hadoop">Configure Apache Hadoop</a></li>
|
||
<li><a class="reference internal" href="#configure-your-ssh-key">Configure your SSH key</a></li>
|
||
<li><a class="reference internal" href="#run-the-hadoop-daemons">Run the Hadoop daemons</a></li>
|
||
<li><a class="reference internal" href="#run-the-mapreduce-wordcount-example">Run the MapReduce wordcount example</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div>
|
||
<h4>Previous topic</h4>
|
||
<p class="topless"><a href="index.html"
|
||
title="previous chapter">Tutorials</a></p>
|
||
</div>
|
||
<div>
|
||
<h4>Next topic</h4>
|
||
<p class="topless"><a href="broadcom.html"
|
||
title="next chapter">Broadcom* Drivers</a></p>
|
||
</div>
|
||
<div role="note" aria-label="source link">
|
||
<h3>This Page</h3>
|
||
<ul class="this-page-menu">
|
||
<li><a href="../_sources/tutorials/apache-hadoop.rst.txt"
|
||
rel="nofollow">Show Source</a></li>
|
||
</ul>
|
||
</div>
|
||
<search id="searchbox" style="display: none" role="search">
|
||
<h3 id="searchlabel">Quick search</h3>
|
||
<div class="searchformwrapper">
|
||
<form class="search" action="../search.html" method="get">
|
||
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
|
||
<input type="submit" value="Go" />
|
||
</form>
|
||
</div>
|
||
</search>
|
||
<script>document.getElementById('searchbox').style.display = "block"</script>
|
||
</div>
|
||
</div>
|
||
<div class="clearer"></div>
|
||
</div>
|
||
<div class="related" role="navigation" aria-label="Related">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
>index</a></li>
|
||
<li class="right" >
|
||
<a href="broadcom.html" title="Broadcom* Drivers"
|
||
>next</a> |</li>
|
||
<li class="right" >
|
||
<a href="index.html" title="Tutorials"
|
||
>previous</a> |</li>
|
||
<li class="nav-item nav-item-0"><a href="../index.html">Documentation for Clear Linux* project</a> »</li>
|
||
<li class="nav-item nav-item-1"><a href="index.html" >Tutorials</a> »</li>
|
||
<li class="nav-item nav-item-this"><a href="">Apache* Hadoop*</a></li>
|
||
</ul>
|
||
</div>
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2022 Intel Corporation. All Rights Reserved..
|
||
Last updated on Nov 04, 2024.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 8.1.3.
|
||
</div>
|
||
</body>
|
||
</html> |