Files
clear-linux-documentation/tutorials/fmv.html
2024-11-04 18:56:31 +00:00

391 lines
31 KiB
HTML

<!DOCTYPE html>
<html lang="en" data-content_root="../">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Function Multi-Versioning &#8212; Documentation for Clear Linux* project</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../_static/bizstyle.css?v=5283bb3d" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<script src="../_static/documentation_options.js?v=5929fcd5"></script>
<script src="../_static/doctools.js?v=9bcbadda"></script>
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="../_static/clipboard.min.js?v=a7894cd8"></script>
<script src="../_static/copybutton.js?v=a56c686a"></script>
<script src="../_static/bizstyle.js"></script>
<link rel="canonical" href="https://clearlinux.github.io/clear-linux-documentation/tutorials/fmv.html" />
<link rel="icon" href="../_static/favicon.ico"/>
<link rel="author" title="About these documents" href="../about.html" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="HPC Cluster" href="hpc.html" />
<link rel="prev" title="Flatpak*" href="flatpak.html" />
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
<!--[if lt IE 9]>
<script src="_static/css3-mediaqueries.js"></script>
<![endif]-->
</head><body>
<div class="related" role="navigation" aria-label="Related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="hpc.html" title="HPC Cluster"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="flatpak.html" title="Flatpak*"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">Documentation for Clear Linux* project</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Tutorials</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Function Multi-Versioning</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="function-multi-versioning">
<span id="fmv"></span><h1>Function Multi-Versioning<a class="headerlink" href="#function-multi-versioning" title="Link to this heading"></a></h1>
<p>In this tutorial, we will use <abbr title="Function Multi-Versioning">FMV</abbr> on
general code and on <abbr title="Fast Fourier Transform">FFT</abbr> library code (FFTW).
Upon completing the tutorial, you will be able to use this technology on your
code and use the libraries to deploy architecture-based optimizations to your
application code.</p>
<nav class="contents local" id="contents">
<ul class="simple">
<li><p><a class="reference internal" href="#description" id="id1">Description</a></p></li>
<li><p><a class="reference internal" href="#install-and-configure-a-cl-host-on-bare-metal" id="id2">Install and configure a Clear Linux OS host on bare metal</a></p></li>
<li><p><a class="reference internal" href="#detect-loop-vectorization-candidates" id="id3">Detect loop vectorization candidates</a></p></li>
<li><p><a class="reference internal" href="#generate-the-fmv-patch" id="id4">Generate the FMV patch</a></p></li>
<li><p><a class="reference internal" href="#fft-project-example-using-fftw" id="id5">FFT project example using FFTW</a></p></li>
</ul>
</nav>
<section id="description">
<h2><a class="toc-backref" href="#id1" role="doc-backlink">Description</a><a class="headerlink" href="#description" title="Link to this heading"></a></h2>
<p>CPU architectures often gain interesting new instructions as they evolve but
application developers find it difficult to take advantage of those
instructions. The reluctance to lose backward-compatibility is one of the
main roadblocks slowing developers from using advancements in newer computing
architectures. FMV, which first appeared in <a class="reference external" href="https://gcc.gnu.org">GCC</a> 4.8, is a way to have
multiple implementations of a function, each using different
architecture-specialized instruction-set extensions. GCC 6 introduces
changes to FMV to make it even easier to bring architecture-based
optimizations to the application code.</p>
</section>
<section id="install-and-configure-a-cl-host-on-bare-metal">
<h2><a class="toc-backref" href="#id2" role="doc-backlink">Install and configure a Clear Linux OS host on bare metal</a><a class="headerlink" href="#install-and-configure-a-cl-host-on-bare-metal" title="Link to this heading"></a></h2>
<p>First, follow our guide to <a class="reference internal" href="../get-started/bare-metal-install-desktop.html#bare-metal-install-desktop"><span class="std std-ref">Install Clear Linux* OS from the live desktop</span></a>. Once the bare
metal installation and initial configuration are complete, add the
<strong class="command">desktop-dev</strong> bundle to the system. <strong class="command">desktop-dev</strong> contains
the necessary development tools like GCC and Perl*.</p>
<ol class="arabic simple">
<li><p>To install the bundles, run the following command in the <code class="file docutils literal notranslate"><span class="pre">$HOME</span></code>
directory:</p></li>
</ol>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>swupd<span class="w"> </span>bundle-add<span class="w"> </span>desktop-dev
</pre></div>
</div>
</div></blockquote>
</section>
<section id="detect-loop-vectorization-candidates">
<h2><a class="toc-backref" href="#id3" role="doc-backlink">Detect loop vectorization candidates</a><a class="headerlink" href="#detect-loop-vectorization-candidates" title="Link to this heading"></a></h2>
<p>Now, we need to detect the loop vectorization candidates to be cloned for
multiple platforms with FMV. As an example, we will use the following
simple C code:</p>
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="w"> </span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdio.h&gt;</span>
<span class="linenos"> 2</span><span class="w"> </span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdlib.h&gt;</span>
<span class="linenos"> 3</span><span class="w"> </span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;sys/time.h&gt;</span>
<span class="linenos"> 4</span><span class="w"> </span><span class="cp">#define MAX 1000000</span>
<span class="linenos"> 5</span>
<span class="linenos"> 6</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="linenos"> 7</span>
<span class="linenos"> 8</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">foo</span><span class="p">(){</span>
<span class="linenos"> 9</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="n">x</span><span class="p">;</span>
<span class="linenos">10</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o">&lt;</span><span class="n">MAX</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o">++</span><span class="p">){</span>
<span class="linenos">11</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">&lt;</span><span class="mi">256</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="linenos">12</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="linenos">13</span><span class="w"> </span><span class="p">}</span>
<span class="linenos">14</span><span class="w"> </span><span class="p">}</span>
<span class="linenos">15</span><span class="w"> </span><span class="p">}</span>
<span class="linenos">16</span>
<span class="linenos">17</span>
<span class="linenos">18</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(){</span>
<span class="linenos">19</span><span class="w"> </span><span class="n">foo</span><span class="p">();</span>
<span class="linenos">20</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="linenos">21</span><span class="w"> </span><span class="p">}</span>
</pre></div>
</div>
<p>Save the example code as <code class="file docutils literal notranslate"><span class="pre">example.c</span></code> in the current directory and
build with the following flags:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>gcc<span class="w"> </span>-O3<span class="w"> </span>-fopt-info-vec<span class="w"> </span>example.c<span class="w"> </span>-o<span class="w"> </span>example
</pre></div>
</div>
<p>The build generates the following output:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">example.c:11:9: note: loop vectorized</span>
<span class="go">example.c:11:9: note: loop vectorized</span>
</pre></div>
</div>
<p>The output shows that line 11 is a good candidate for vectorization:</p>
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">&lt;</span><span class="mi">256</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
</pre></div>
</div>
</section>
<section id="generate-the-fmv-patch">
<h2><a class="toc-backref" href="#id4" role="doc-backlink">Generate the FMV patch</a><a class="headerlink" href="#generate-the-fmv-patch" title="Link to this heading"></a></h2>
<ol class="arabic">
<li><p>To generate the FMV patch with the <a class="reference external" href="https://github.com/clearlinux/make-fmv-patch">make-fmv-patch</a> project, we
must clone the project and generate a log file with the loop vectorized
information:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/clearlinux/make-fmv-patch.git
gcc<span class="w"> </span>-O3<span class="w"> </span>-fopt-info-vec<span class="w"> </span>example.c<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span><span class="p">&amp;</span>&gt;<span class="w"> </span>log
</pre></div>
</div>
</li>
<li><p>To generate the patch files, execute:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>perl<span class="w"> </span>./make-fmv-patch/make-fmv-patch.pl<span class="w"> </span>log<span class="w"> </span>.
</pre></div>
</div>
</li>
<li><p>The <code class="file docutils literal notranslate"><span class="pre">make-fmv-patch.pl</span></code> script takes two arguments: <cite>&lt;buildlog&gt;</cite>
and <cite>&lt;sourcecode&gt;</cite>. Replace <cite>&lt;buildlog&gt;</cite> and <cite>&lt;sourcecode&gt;</cite> with the
proper values and execute:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>perl<span class="w"> </span>make-fmv-patch.pl<span class="w"> </span>&lt;buildlog&gt;<span class="w"> </span>&lt;sourcecode&gt;
</pre></div>
</div>
<p>The command generates the following <code class="file docutils literal notranslate"><span class="pre">example.c.patch</span></code> patch:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">--- ./example.c 2017-09-27 16:05:42.279505430 +0000</span>
<span class="go">+++ ./example.c~ 2017-09-27 16:19:11.691544026 +0000</span>
<span class="go">@@ -5,6 +5,7 @@</span>
<span class="go"> int a[256], b[256], c[256];</span>
<span class="go">+__attribute__((target_clones(&quot;avx2&quot;,&quot;arch=atom&quot;,&quot;default&quot;)))</span>
<span class="go"> void foo(){</span>
<span class="go"> int i,x;</span>
<span class="go"> for (x=0; x&lt;MAX; x++){</span>
</pre></div>
</div>
<p>We recommend you use the <code class="file docutils literal notranslate"><span class="pre">make-fmv-patch</span></code> script to add the attribute
generating the target clones on the function <cite>foo</cite>. Thus, we can have the
following code:</p>
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdio.h&gt;</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;stdlib.h&gt;</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;sys/time.h&gt;</span>
<span class="cp">#define MAX 1000000</span>
<span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="mi">256</span><span class="p">],</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">target_clones</span><span class="p">(</span><span class="s">&quot;avx2&quot;</span><span class="p">,</span><span class="s">&quot;arch=atom&quot;</span><span class="p">,</span><span class="s">&quot;default&quot;</span><span class="p">)))</span>
<span class="kt">void</span><span class="w"> </span><span class="n">foo</span><span class="p">(){</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="n">x</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o">&lt;</span><span class="n">MAX</span><span class="p">;</span><span class="w"> </span><span class="n">x</span><span class="o">++</span><span class="p">){</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">&lt;</span><span class="mi">256</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">c</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kt">int</span><span class="w"> </span><span class="n">main</span><span class="p">(){</span>
<span class="w"> </span><span class="n">foo</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
</div>
</li>
<li><p>Changing the value of the <cite>$avx2</cite> variable, we can change the target
clones when adding the patches or in the <code class="file docutils literal notranslate"><span class="pre">make-fmv-patch.pl</span></code> script:</p>
<div class="highlight-perl notranslate"><div class="highlight"><pre><span></span><span class="k">my</span><span class="w"> </span><span class="nv">$avx2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&#39;__attribute__((target_clones(&quot;avx2&quot;,&quot;arch=atom&quot;,&quot;default&quot;)))&#39;</span><span class="o">.</span><span class="s">&quot;\n&quot;</span><span class="p">;</span>
</pre></div>
</div>
</li>
<li><p>Compile the code again with FMV and add the option to analyze the
<cite>objdump</cite> log:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>gcc<span class="w"> </span>-O3<span class="w"> </span>example.c<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span>-g
objdump<span class="w"> </span>-S<span class="w"> </span>example<span class="w"> </span><span class="p">|</span><span class="w"> </span>less
</pre></div>
</div>
<p>You can see the multiple clones of the <cite>foo</cite> function:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="go">foo</span>
<span class="go">foo.avx2.0</span>
<span class="go">foo.arch_atom.1</span>
</pre></div>
</div>
</li>
<li><p>The cloned functions use AVX2 registers and vectorized instructions. To
verify, enter the following commands:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">vpaddd</span> <span class="p">(</span><span class="o">%</span><span class="n">r8</span><span class="p">,</span><span class="o">%</span><span class="n">rax</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="o">%</span><span class="n">ymm0</span><span class="p">,</span><span class="o">%</span><span class="n">ymm0</span>
<span class="n">vmovdqu</span> <span class="o">%</span><span class="n">ymm0</span><span class="p">,(</span><span class="o">%</span><span class="n">rcx</span><span class="p">,</span><span class="o">%</span><span class="n">rax</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
</div>
</li>
</ol>
</section>
<section id="fft-project-example-using-fftw">
<h2><a class="toc-backref" href="#id5" role="doc-backlink">FFT project example using FFTW</a><a class="headerlink" href="#fft-project-example-using-fftw" title="Link to this heading"></a></h2>
<p>To follow the same approach with a package like FFTW, use the
<cite>-fopt-info-vec</cite> flag to get a build log file similar to:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>~/make-fmv-patch/make-fmv-patch.pl<span class="w"> </span>results/build.log<span class="w"> </span>fftw-3.3.6-pl2/
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/verify-lib.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">36</span><span class="w"> </span><span class="m">114</span><span class="w"> </span><span class="m">151</span><span class="w"> </span><span class="m">162</span><span class="w"> </span><span class="m">173</span><span class="w"> </span><span class="m">195</span><span class="w"> </span><span class="m">215</span><span class="w"> </span><span class="m">284</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/tools/fftw-wisdom.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">150</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/speed.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">26</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/tests/bench.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">27</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/util.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">181</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/problem.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">229</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/tests/fftw-bench.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">101</span><span class="w"> </span><span class="m">147</span><span class="w"> </span><span class="m">162</span><span class="w"> </span><span class="m">249</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/mp.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">79</span><span class="w"> </span><span class="m">190</span><span class="w"> </span><span class="m">215</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/caset.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">5</span><span class="o">)</span>
patching<span class="w"> </span>fftw-3.3.6-pl2/libbench2/verify-r2r.c<span class="w"> </span>@<span class="w"> </span>lines<span class="w"> </span><span class="o">(</span><span class="m">44</span><span class="w"> </span><span class="m">187</span><span class="w"> </span><span class="m">197</span><span class="w"> </span><span class="m">207</span><span class="w"> </span><span class="m">316</span><span class="w"> </span><span class="m">333</span><span class="w"> </span><span class="m">723</span><span class="o">)</span>
</pre></div>
</div>
<p>For example, the <code class="file docutils literal notranslate"><span class="pre">fftw-3.3.6-pl2/tools/fftw-wisdom.c.patch</span></code> file
generates the following patches:</p>
<div class="highlight-diff notranslate"><div class="highlight"><pre><span></span><span class="linenos"> 1</span><span class="w"> </span> --- fftw-3.3.6-pl2/libbench2/verify-lib.c 2017-01-27 21:08:13.000000000 +0000
<span class="linenos"> 2</span><span class="w"> </span> +++ fftw-3.3.6-pl2/libbench2/verify-lib.c~ 2017-09-27 17:49:21.913802006 +0000
<span class="linenos"> 3</span><span class="w"> </span> @@ -33,6 +33,7 @@
<span class="linenos"> 4</span>
<span class="linenos"> 5</span><span class="w"> </span> double dmax(double x, double y) { return (x &gt; y) ? x : y; }
<span class="linenos"> 6</span>
<span class="linenos"> 7</span><span class="w"> </span> +__attribute__((target_clones(&quot;avx2&quot;,&quot;arch=atom&quot;,&quot;default&quot;)))
<span class="linenos"> 8</span><span class="w"> </span> static double aerror(C *a, C *b, int n)
<span class="linenos"> 9</span><span class="w"> </span> {
<span class="linenos">10</span><span class="w"> </span> if (n &gt; 0) {
<span class="linenos">11</span><span class="w"> </span> @@ -111,6 +112,7 @@
<span class="linenos">12</span><span class="w"> </span> }
<span class="linenos">13</span>
<span class="linenos">14</span><span class="w"> </span> /* make array hermitian */
<span class="linenos">15</span><span class="w"> </span> +__attribute__((target_clones(&quot;avx2&quot;,&quot;arch=atom&quot;,&quot;default&quot;)))
<span class="linenos">16</span><span class="w"> </span> void mkhermitian(C *A, int rank, const bench_iodim *dim, int stride)
<span class="linenos">17</span><span class="w"> </span> {
<span class="linenos">18</span><span class="w"> </span> if (rank == 0)
<span class="linenos">19</span><span class="w"> </span> @@ -148,6 +150,7 @@
<span class="linenos">20</span><span class="w"> </span> }
<span class="linenos">21</span>
<span class="linenos">22</span><span class="w"> </span> /* C = A + B */
<span class="linenos">23</span><span class="w"> </span> +__attribute__((target_clones(&quot;avx2&quot;,&quot;arch=atom&quot;,&quot;default&quot;)))
<span class="linenos">24</span><span class="w"> </span> void aadd(C *c, C *a, C *b, int n)
<span class="linenos">25</span><span class="w"> </span> {
<span class="linenos">26</span><span class="w"> </span> int i;
<span class="linenos">27</span><span class="w"> </span> @@ -159,6 +162,7 @@
<span class="linenos">28</span><span class="w"> </span> }
<span class="linenos">29</span>
<span class="linenos">30</span><span class="w"> </span> /* C = A - B */
<span class="linenos">31</span><span class="w"> </span> +__attribute__((target_clones(&quot;avx2&quot;,&quot;arch=atom&quot;,&quot;default&quot;)))
<span class="linenos">32</span><span class="w"> </span> void asub(C *c, C *a, C *b, int n)
<span class="linenos">33</span><span class="w"> </span> {
<span class="linenos">34</span><span class="w"> </span> int i;
<span class="linenos">35</span><span class="w"> </span> @@ -170,6 +174,7 @@
<span class="linenos">36</span><span class="w"> </span> }
<span class="linenos">37</span>
<span class="linenos">38</span><span class="w"> </span> /* B = rotate left A (complex) */
<span class="linenos">39</span><span class="w"> </span> +__attribute__((target_clones(&quot;avx2&quot;,&quot;arch=atom&quot;,&quot;default&quot;)))
<span class="linenos">40</span><span class="w"> </span> void arol(C *b, C *a, int n, int nb, int na)
<span class="linenos">41</span><span class="w"> </span> {
<span class="linenos">42</span><span class="w"> </span> int i, ib, ia;
<span class="linenos">43</span><span class="w"> </span> @@ -192,6 +197,7 @@
<span class="linenos">44</span><span class="w"> </span> }
<span class="linenos">45</span><span class="w"> </span> }
</pre></div>
</div>
<p>With these patches, we can select where to apply the FMV technology, which
makes it even easier to bring architecture-based optimizations to
application code.</p>
<p><strong>Congratulations!</strong></p>
<p>You have successfully installed an FMV development environment on Clear Linux OS.
Furthermore, you used cutting edge compiler technology to improve the
performance of your application based on Intel® architecture and
profiling of the specific execution of your application.</p>
<p><em>Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries.</em></p>
</section>
</section>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="Main">
<div class="sphinxsidebarwrapper">
<p class="logo"><a href="../index.html">
<img class="logo" src="../_static/clearlinux.png" alt="Logo of Clear Linux* Project Docs"/>
</a></p>
<div>
<h3><a href="../index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Function Multi-Versioning</a><ul>
<li><a class="reference internal" href="#description">Description</a></li>
<li><a class="reference internal" href="#install-and-configure-a-cl-host-on-bare-metal">Install and configure a Clear Linux OS host on bare metal</a></li>
<li><a class="reference internal" href="#detect-loop-vectorization-candidates">Detect loop vectorization candidates</a></li>
<li><a class="reference internal" href="#generate-the-fmv-patch">Generate the FMV patch</a></li>
<li><a class="reference internal" href="#fft-project-example-using-fftw">FFT project example using FFTW</a></li>
</ul>
</li>
</ul>
</div>
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="flatpak.html"
title="previous chapter">Flatpak*</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="hpc.html"
title="next chapter">HPC Cluster</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="../_sources/tutorials/fmv.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<search id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</search>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="Related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="hpc.html" title="HPC Cluster"
>next</a> |</li>
<li class="right" >
<a href="flatpak.html" title="Flatpak*"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">Documentation for Clear Linux* project</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="index.html" >Tutorials</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Function Multi-Versioning</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2022 Intel Corporation. All Rights Reserved..
Last updated on Nov 04, 2024.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 8.1.3.
</div>
</body>
</html>