mirror of
https://github.com/clearlinux/clear-linux-documentation.git
synced 2026-04-29 03:23:42 +00:00
391 lines
31 KiB
HTML
391 lines
31 KiB
HTML
|
|
<!DOCTYPE html>
|
|
|
|
<html lang="en" data-content_root="../">
|
|
<head>
|
|
<meta charset="utf-8" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
|
|
<title>Function Multi-Versioning — Documentation for Clear Linux* project</title>
|
|
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=fa44fd50" />
|
|
<link rel="stylesheet" type="text/css" href="../_static/bizstyle.css?v=5283bb3d" />
|
|
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
|
|
|
|
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
|
|
<script src="../_static/doctools.js?v=9bcbadda"></script>
|
|
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
|
|
<script src="../_static/clipboard.min.js?v=a7894cd8"></script>
|
|
<script src="../_static/copybutton.js?v=a56c686a"></script>
|
|
<script src="../_static/bizstyle.js"></script>
|
|
<link rel="canonical" href="https://clearlinux.github.io/clear-linux-documentation/tutorials/fmv.html" />
|
|
<link rel="icon" href="../_static/favicon.ico"/>
|
|
<link rel="author" title="About these documents" href="../about.html" />
|
|
<link rel="index" title="Index" href="../genindex.html" />
|
|
<link rel="search" title="Search" href="../search.html" />
|
|
<link rel="next" title="HPC Cluster" href="hpc.html" />
|
|
<link rel="prev" title="Flatpak*" href="flatpak.html" />
|
|
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
|
<!--[if lt IE 9]>
|
|
<script src="_static/css3-mediaqueries.js"></script>
|
|
<![endif]-->
|
|
</head><body>
|
|
<div class="related" role="navigation" aria-label="Related">
|
|
<h3>Navigation</h3>
|
|
<ul>
|
|
<li class="right" style="margin-right: 10px">
|
|
<a href="../genindex.html" title="General Index"
|
|
accesskey="I">index</a></li>
|
|
<li class="right" >
|
|
<a href="hpc.html" title="HPC Cluster"
|
|
accesskey="N">next</a> |</li>
|
|
<li class="right" >
|
|
<a href="flatpak.html" title="Flatpak*"
|
|
accesskey="P">previous</a> |</li>
|
|
<li class="nav-item nav-item-0"><a href="../index.html">Documentation for Clear Linux* project</a> »</li>
|
|
<li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Tutorials</a> »</li>
|
|
<li class="nav-item nav-item-this"><a href="">Function Multi-Versioning</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="document">
|
|
<div class="documentwrapper">
|
|
<div class="bodywrapper">
|
|
<div class="body" role="main">
|
|
|
|
<section id="function-multi-versioning">
|
|
<span id="fmv"></span><h1>Function Multi-Versioning<a class="headerlink" href="#function-multi-versioning" title="Link to this heading">¶</a></h1>
|
|
<p>In this tutorial, we will use <abbr title="Function Multi-Versioning">FMV</abbr> on
|
|
general code and on <abbr title="Fast Fourier Transform">FFT</abbr> library code (FFTW).
|
|
Upon completing the tutorial, you will be able to use this technology on your
|
|
code and use the libraries to deploy architecture-based optimizations to your
|
|
application code.</p>
|
|
<nav class="contents local" id="contents">
|
|
<ul class="simple">
|
|
<li><p><a class="reference internal" href="#description" id="id1">Description</a></p></li>
|
|
<li><p><a class="reference internal" href="#install-and-configure-a-cl-host-on-bare-metal" id="id2">Install and configure a Clear Linux OS host on bare metal</a></p></li>
|
|
<li><p><a class="reference internal" href="#detect-loop-vectorization-candidates" id="id3">Detect loop vectorization candidates</a></p></li>
|
|
<li><p><a class="reference internal" href="#generate-the-fmv-patch" id="id4">Generate the FMV patch</a></p></li>
|
|
<li><p><a class="reference internal" href="#fft-project-example-using-fftw" id="id5">FFT project example using FFTW</a></p></li>
|
|
</ul>
|
|
</nav>
|
|
<section id="description">
|
|
<h2><a class="toc-backref" href="#id1" role="doc-backlink">Description</a><a class="headerlink" href="#description" title="Link to this heading">¶</a></h2>
|
|
<p>CPU architectures often gain interesting new instructions as they evolve but
|
|
application developers find it difficult to take advantage of those
|
|
instructions. The reluctance to lose backward-compatibility is one of the
|
|
main roadblocks slowing developers from using advancements in newer computing
|
|
architectures. FMV, which first appeared in <a class="reference external" href="https://gcc.gnu.org">GCC</a> 4.8, is a way to have
|
|
multiple implementations of a function, each using different
|
|
architecture-specialized instruction-set extensions. GCC 6 introduces
|
|
changes to FMV to make it even easier to bring architecture-based
|
|
optimizations to the application code.</p>
|
|
</section>
|
|
<section id="install-and-configure-a-cl-host-on-bare-metal">
|
|
<h2><a class="toc-backref" href="#id2" role="doc-backlink">Install and configure a Clear Linux OS host on bare metal</a><a class="headerlink" href="#install-and-configure-a-cl-host-on-bare-metal" title="Link to this heading">¶</a></h2>
|
|
<p>First, follow our guide to <a class="reference internal" href="../get-started/bare-metal-install-desktop.html#bare-metal-install-desktop"><span class="std std-ref">Install Clear Linux* OS from the live desktop</span></a>. Once the bare
|
|
metal installation and initial configuration are complete, add the
|
|
<strong class="command">desktop-dev</strong> bundle to the system. <strong class="command">desktop-dev</strong> contains
|
|
the necessary development tools like GCC and Perl*.</p>
|
|
<ol class="arabic simple">
|
|
<li><p>To install the bundles, run the following command in the <code class="file docutils literal notranslate"><span class="pre">$HOME</span></code>
|
|
directory:</p></li>
|
|
</ol>
|
|
<blockquote>
|
|
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>swupd<span class="w"> </span>bundle-add<span class="w"> </span>desktop-dev
|
|
</pre></div>
|
|
</div>
|
|
</div></blockquote>
|
|
</section>
|
|
<section id="detect-loop-vectorization-candidates">
|
|
<h2><a class="toc-backref" href="#id3" role="doc-backlink">Detect loop vectorization candidates</a><a class="headerlink" href="#detect-loop-vectorization-candidates" title="Link to this heading">¶</a></h2>
|
|
<p>Now, we need to detect the loop vectorization candidates to be cloned for
|
|
multiple platforms with FMV. As an example, we will use the following
|
|
simple C code:</p>
|
|
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="w"> </span><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span>
|
|
<span class="linenos"> 2</span><span class="w"> </span><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span>
|
|
<span class="linenos"> 3</span><span class="w"> </span><span class="cp">#include</span><span class="w"> </span><span class="cpf"><sys/time.h></span>
|
|
<span class="linenos"> 4</span><span class="w"> </span><span class="cp">#define MAX 1000000</span>
|
|
<span class="linenos"> 5</span>
|
|
<span class="linenos"> 6</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
|
|
<span class="linenos"> 7</span>
|
|
<span class="linenos"> 8</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">foo</span><span class="p">(){</span>
|
|
<span class="linenos"> 9</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="n">x</span><span class="p">;</span>
|
|
<span class="linenos">10</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o"><</span><span class="n">MAX</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o">++</span><span class="p">){</span>
|
|
<span class="linenos">11</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o"><</span><span class="mi">256</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">){</span>
|
|
<span class="linenos">12</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
|
|
<span class="linenos">13</span><span class="w"> </span><span class="p">}</span>
|
|
<span class="linenos">14</span><span class="w"> </span><span class="p">}</span>
|
|
<span class="linenos">15</span><span class="w"> </span><span class="p">}</span>
|
|
<span class="linenos">16</span>
|
|
<span class="linenos">17</span>
|
|
<span class="linenos">18</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(){</span>
|
|
<span class="linenos">19</span><span class="w"> </span><span class="n">foo</span><span class="p">();</span>
|
|
<span class="linenos">20</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
|
|
<span class="linenos">21</span><span class="w"> </span><span class="p">}</span>
|
|
</pre></div>
|
|
</div>
|
|
<p>Save the example code as <code class="file docutils literal notranslate"><span class="pre">example.c</span></code> in the current directory and
|
|
build with the following flags:</p>
|
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>gcc<span class="w"> </span>-O3<span class="w"> </span>-fopt-info-vec<span class="w"> </span>example.c<span class="w"> </span>-o<span class="w"> </span>example
|
|
</pre></div>
|
|
</div>
|
|
<p>The build generates the following output:</p>
|
|
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">example.c:11:9: note: loop vectorized</span>
|
|
<span class="go">example.c:11:9: note: loop vectorized</span>
|
|
</pre></div>
|
|
</div>
|
|
<p>The output shows that line 11 is a good candidate for vectorization:</p>
|
|
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o"><</span><span class="mi">256</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">){</span>
|
|
<span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
|
|
</pre></div>
|
|
</div>
|
|
</section>
|
|
<section id="generate-the-fmv-patch">
|
|
<h2><a class="toc-backref" href="#id4" role="doc-backlink">Generate the FMV patch</a><a class="headerlink" href="#generate-the-fmv-patch" title="Link to this heading">¶</a></h2>
|
|
<ol class="arabic">
|
|
<li><p>To generate the FMV patch with the <a class="reference external" href="https://github.com/clearlinux/make-fmv-patch">make-fmv-patch</a> project, we
|
|
must clone the project and generate a log file with the loop vectorized
|
|
information:</p>
|
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/clearlinux/make-fmv-patch.git
|
|
gcc<span class="w"> </span>-O3<span class="w"> </span>-fopt-info-vec<span class="w"> </span>example.c<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span><span class="p">&</span>><span class="w"> </span>log
|
|
</pre></div>
|
|
</div>
|
|
</li>
|
|
<li><p>To generate the patch files, execute:</p>
|
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>perl<span class="w"> </span>./make-fmv-patch/make-fmv-patch.pl<span class="w"> </span>log<span class="w"> </span>.
|
|
</pre></div>
|
|
</div>
|
|
</li>
|
|
<li><p>The <code class="file docutils literal notranslate"><span class="pre">make-fmv-patch.pl</span></code> script takes two arguments: <cite><buildlog></cite>
|
|
and <cite><sourcecode></cite>. Replace <cite><buildlog></cite> and <cite><sourcecode></cite> with the
|
|
proper values and execute:</p>
|
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>perl<span class="w"> </span>make-fmv-patch.pl<span class="w"> </span><buildlog><span class="w"> </span><sourcecode>
|
|
</pre></div>
|
|
</div>
|
|
<p>The command generates the following <code class="file docutils literal notranslate"><span class="pre">example.c.patch</span></code> patch:</p>
|
|
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">--- ./example.c 2017-09-27 16:05:42.279505430 +0000</span>
|
|
<span class="go">+++ ./example.c~ 2017-09-27 16:19:11.691544026 +0000</span>
|
|
<span class="go">@@ -5,6 +5,7 @@</span>
|
|
|
|
<span class="go"> int a[256], b[256], c[256];</span>
|
|
|
|
<span class="go">+__attribute__((target_clones("avx2","arch=atom","default")))</span>
|
|
<span class="go"> void foo(){</span>
|
|
<span class="go"> int i,x;</span>
|
|
<span class="go"> for (x=0; x<MAX; x++){</span>
|
|
</pre></div>
|
|
</div>
|
|
<p>We recommend you use the <code class="file docutils literal notranslate"><span class="pre">make-fmv-patch</span></code> script to add the attribute
|
|
generating the target clones on the function <cite>foo</cite>. Thus, we can have the
|
|
following code:</p>
|
|
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span>
|
|
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span>
|
|
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><sys/time.h></span>
|
|
<span class="cp">#define MAX 1000000</span>
|
|
|
|
<span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
|
|
|
|
<span class="n">__attribute__</span><span class="p">((</span><span class="n">target_clones</span><span class="p">(</span><span class="s">"avx2"</span><span class="p">,</span><span class="s">"arch=atom"</span><span class="p">,</span><span class="s">"default"</span><span class="p">)))</span>
|
|
<span class="kt">void</span><span class="w"> </span><span class="n">foo</span><span class="p">(){</span>
|
|
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="n">x</span><span class="p">;</span>
|
|
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o"><</span><span class="n">MAX</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o">++</span><span class="p">){</span>
|
|
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o"><</span><span class="mi">256</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">){</span>
|
|
<span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="w"> </span><span class="p">}</span>
|
|
<span class="p">}</span>
|
|
|
|
|
|
<span class="kt">int</span><span class="w"> </span><span class="n">main</span><span class="p">(){</span>
|
|
<span class="w"> </span><span class="n">foo</span><span class="p">();</span>
|
|
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
|
|
<span class="p">}</span>
|
|
</pre></div>
|
|
</div>
|
|
</li>
|
|
<li><p>Changing the value of the <cite>$avx2</cite> variable, we can change the target
|
|
clones when adding the patches or in the <code class="file docutils literal notranslate"><span class="pre">make-fmv-patch.pl</span></code> script:</p>
|
|
<div class="highlight-perl notranslate"><div class="highlight"><pre><span></span><span class="k">my</span><span class="w"> </span><span class="nv">$avx2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">'__attribute__((target_clones("avx2","arch=atom","default")))'</span><span class="o">.</span><span class="s">"\n"</span><span class="p">;</span>
|
|
</pre></div>
|
|
</div>
|
|
</li>
|
|
<li><p>Compile the code again with FMV and add the option to analyze the
|
|
<cite>objdump</cite> log:</p>
|
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>gcc<span class="w"> </span>-O3<span class="w"> </span>example.c<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span>-g
|
|
objdump<span class="w"> </span>-S<span class="w"> </span>example<span class="w"> </span><span class="p">|</span><span class="w"> </span>less
|
|
</pre></div>
|
|
</div>
|
|
<p>You can see the multiple clones of the <cite>foo</cite> function:</p>
|
|
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">foo</span>
|
|
<span class="go">foo.avx2.0</span>
|
|
<span class="go">foo.arch_atom.1</span>
|
|
</pre></div>
|
|
</div>
|
|
</li>
|
|
<li><p>The cloned functions use AVX2 registers and vectorized instructions. To
|
|
verify, enter the following commands:</p>
|
|
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">vpaddd</span> <span class="p">(</span><span class="o">%</span><span class="n">r8</span><span class="p">,</span><span class="o">%</span><span class="n">rax</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="o">%</span><span class="n">ymm0</span><span class="p">,</span><span class="o">%</span><span class="n">ymm0</span>
|
|
<span class="n">vmovdqu</span> <span class="o">%</span><span class="n">ymm0</span><span class="p">,(</span><span class="o">%</span><span class="n">rcx</span><span class="p">,</span><span class="o">%</span><span class="n">rax</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
|
|
</pre></div>
|
|
</div>
|
|
</li>
|
|
</ol>
|
|
</section>
|
|
<section id="fft-project-example-using-fftw">
|
|
<h2><a class="toc-backref" href="#id5" role="doc-backlink">FFT project example using FFTW</a><a class="headerlink" href="#fft-project-example-using-fftw" title="Link to this heading">¶</a></h2>
|
|
<p>To follow the same approach with a package like FFTW, use the
|
|
<cite>-fopt-info-vec</cite> flag to get a build log file similar to:</p>
|
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>~/make-fmv-patch/make-fmv-patch.pl<span class="w"> </span>results/build.log<span class="w"> </span>fftw-3.3.6-pl2/
|
|
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/verify-lib.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">36</span><span class="w"> </span><span class="m">114</span><span class="w"> </span><span class="m">151</span><span class="w"> </span><span class="m">162</span><span class="w"> </span><span class="m">173</span><span class="w"> </span><span class="m">195</span><span class="w"> </span><span class="m">215</span><span class="w"> </span><span class="m">284</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/tools/fftw-wisdom.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">150</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/speed.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">26</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/tests/bench.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">27</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/util.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">181</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/problem.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">229</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/tests/fftw-bench.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">101</span><span class="w"> </span><span class="m">147</span><span class="w"> </span><span class="m">162</span><span class="w"> </span><span class="m">249</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/mp.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">79</span><span class="w"> </span><span class="m">190</span><span class="w"> </span><span class="m">215</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/caset.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">5</span><span class="o">)</span>
|
|
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/verify-r2r.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">44</span><span class="w"> </span><span class="m">187</span><span class="w"> </span><span class="m">197</span><span class="w"> </span><span class="m">207</span><span class="w"> </span><span class="m">316</span><span class="w"> </span><span class="m">333</span><span class="w"> </span><span class="m">723</span><span class="o">)</span>
|
|
</pre></div>
|
|
</div>
|
|
<p>For example, the <code class="file docutils literal notranslate"><span class="pre">fftw-3.3.6-pl2/tools/fftw-wisdom.c.patch</span></code> file
|
|
generates the following patches:</p>
|
|
<div class="highlight-diff notranslate"><div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="w"> </span> --- fftw-3.3.6-pl2/libbench2/verify-lib.c 2017-01-27 21:08:13.000000000 +0000
|
|
<span class="linenos"> 2</span><span class="w"> </span> +++ fftw-3.3.6-pl2/libbench2/verify-lib.c~ 2017-09-27 17:49:21.913802006 +0000
|
|
<span class="linenos"> 3</span><span class="w"> </span> @@ -33,6 +33,7 @@
|
|
<span class="linenos"> 4</span>
|
|
<span class="linenos"> 5</span><span class="w"> </span> double dmax(double x, double y) { return (x > y) ? x : y; }
|
|
<span class="linenos"> 6</span>
|
|
<span class="linenos"> 7</span><span class="w"> </span> +__attribute__((target_clones("avx2","arch=atom","default")))
|
|
<span class="linenos"> 8</span><span class="w"> </span> static double aerror(C *a, C *b, int n)
|
|
<span class="linenos"> 9</span><span class="w"> </span> {
|
|
<span class="linenos">10</span><span class="w"> </span> if (n > 0) {
|
|
<span class="linenos">11</span><span class="w"> </span> @@ -111,6 +112,7 @@
|
|
<span class="linenos">12</span><span class="w"> </span> }
|
|
<span class="linenos">13</span>
|
|
<span class="linenos">14</span><span class="w"> </span> /* make array hermitian */
|
|
<span class="linenos">15</span><span class="w"> </span> +__attribute__((target_clones("avx2","arch=atom","default")))
|
|
<span class="linenos">16</span><span class="w"> </span> void mkhermitian(C *A, int rank, const bench_iodim *dim, int stride)
|
|
<span class="linenos">17</span><span class="w"> </span> {
|
|
<span class="linenos">18</span><span class="w"> </span> if (rank == 0)
|
|
<span class="linenos">19</span><span class="w"> </span> @@ -148,6 +150,7 @@
|
|
<span class="linenos">20</span><span class="w"> </span> }
|
|
<span class="linenos">21</span>
|
|
<span class="linenos">22</span><span class="w"> </span> /* C = A + B */
|
|
<span class="linenos">23</span><span class="w"> </span> +__attribute__((target_clones("avx2","arch=atom","default")))
|
|
<span class="linenos">24</span><span class="w"> </span> void aadd(C *c, C *a, C *b, int n)
|
|
<span class="linenos">25</span><span class="w"> </span> {
|
|
<span class="linenos">26</span><span class="w"> </span> int i;
|
|
<span class="linenos">27</span><span class="w"> </span> @@ -159,6 +162,7 @@
|
|
<span class="linenos">28</span><span class="w"> </span> }
|
|
<span class="linenos">29</span>
|
|
<span class="linenos">30</span><span class="w"> </span> /* C = A - B */
|
|
<span class="linenos">31</span><span class="w"> </span> +__attribute__((target_clones("avx2","arch=atom","default")))
|
|
<span class="linenos">32</span><span class="w"> </span> void asub(C *c, C *a, C *b, int n)
|
|
<span class="linenos">33</span><span class="w"> </span> {
|
|
<span class="linenos">34</span><span class="w"> </span> int i;
|
|
<span class="linenos">35</span><span class="w"> </span> @@ -170,6 +174,7 @@
|
|
<span class="linenos">36</span><span class="w"> </span> }
|
|
<span class="linenos">37</span>
|
|
<span class="linenos">38</span><span class="w"> </span> /* B = rotate left A (complex) */
|
|
<span class="linenos">39</span><span class="w"> </span> +__attribute__((target_clones("avx2","arch=atom","default")))
|
|
<span class="linenos">40</span><span class="w"> </span> void arol(C *b, C *a, int n, int nb, int na)
|
|
<span class="linenos">41</span><span class="w"> </span> {
|
|
<span class="linenos">42</span><span class="w"> </span> int i, ib, ia;
|
|
<span class="linenos">43</span><span class="w"> </span> @@ -192,6 +197,7 @@
|
|
<span class="linenos">44</span><span class="w"> </span> }
|
|
<span class="linenos">45</span><span class="w"> </span> }
|
|
</pre></div>
|
|
</div>
|
|
<p>With these patches, we can select where to apply the FMV technology, which
|
|
makes it even easier to bring architecture-based optimizations to
|
|
application code.</p>
|
|
<p><strong>Congratulations!</strong></p>
|
|
<p>You have successfully installed an FMV development environment on Clear Linux OS.
|
|
Furthermore, you used cutting edge compiler technology to improve the
|
|
performance of your application based on Intel® architecture and
|
|
profiling of the specific execution of your application.</p>
|
|
<p><em>Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries.</em></p>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
<div class="clearer"></div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<div class="sphinxsidebar" role="navigation" aria-label="Main">
|
|
<div class="sphinxsidebarwrapper">
|
|
<p class="logo"><a href="../index.html">
|
|
<img class="logo" src="../_static/clearlinux.png" alt="Logo of Clear Linux* Project Docs"/>
|
|
</a></p>
|
|
<div>
|
|
<h3><a href="../index.html">Table of Contents</a></h3>
|
|
<ul>
|
|
<li><a class="reference internal" href="#">Function Multi-Versioning</a><ul>
|
|
<li><a class="reference internal" href="#description">Description</a></li>
|
|
<li><a class="reference internal" href="#install-and-configure-a-cl-host-on-bare-metal">Install and configure a Clear Linux OS host on bare metal</a></li>
|
|
<li><a class="reference internal" href="#detect-loop-vectorization-candidates">Detect loop vectorization candidates</a></li>
|
|
<li><a class="reference internal" href="#generate-the-fmv-patch">Generate the FMV patch</a></li>
|
|
<li><a class="reference internal" href="#fft-project-example-using-fftw">FFT project example using FFTW</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
</div>
|
|
<div>
|
|
<h4>Previous topic</h4>
|
|
<p class="topless"><a href="flatpak.html"
|
|
title="previous chapter">Flatpak*</a></p>
|
|
</div>
|
|
<div>
|
|
<h4>Next topic</h4>
|
|
<p class="topless"><a href="hpc.html"
|
|
title="next chapter">HPC Cluster</a></p>
|
|
</div>
|
|
<div role="note" aria-label="source link">
|
|
<h3>This Page</h3>
|
|
<ul class="this-page-menu">
|
|
<li><a href="../_sources/tutorials/fmv.rst.txt"
|
|
rel="nofollow">Show Source</a></li>
|
|
</ul>
|
|
</div>
|
|
<search id="searchbox" style="display: none" role="search">
|
|
<h3 id="searchlabel">Quick search</h3>
|
|
<div class="searchformwrapper">
|
|
<form class="search" action="../search.html" method="get">
|
|
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
|
|
<input type="submit" value="Go" />
|
|
</form>
|
|
</div>
|
|
</search>
|
|
<script>document.getElementById('searchbox').style.display = "block"</script>
|
|
</div>
|
|
</div>
|
|
<div class="clearer"></div>
|
|
</div>
|
|
<div class="related" role="navigation" aria-label="Related">
|
|
<h3>Navigation</h3>
|
|
<ul>
|
|
<li class="right" style="margin-right: 10px">
|
|
<a href="../genindex.html" title="General Index"
|
|
>index</a></li>
|
|
<li class="right" >
|
|
<a href="hpc.html" title="HPC Cluster"
|
|
>next</a> |</li>
|
|
<li class="right" >
|
|
<a href="flatpak.html" title="Flatpak*"
|
|
>previous</a> |</li>
|
|
<li class="nav-item nav-item-0"><a href="../index.html">Documentation for Clear Linux* project</a> »</li>
|
|
<li class="nav-item nav-item-1"><a href="index.html" >Tutorials</a> »</li>
|
|
<li class="nav-item nav-item-this"><a href="">Function Multi-Versioning</a></li>
|
|
</ul>
|
|
</div>
|
|
<div class="footer" role="contentinfo">
|
|
© Copyright 2022 Intel Corporation. All Rights Reserved..
|
|
Last updated on Nov 04, 2024.
|
|
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 8.1.3.
|
|
</div>
|
|
</body>
|
|
</html> |