Kennethhttps://fediverse.blog/@/ksteimel@blog.ksteimel.duckdns.org/atom.xml2020-06-26T02:59:44.497202+00:00<![CDATA[Mti]]>https://blog.ksteimel.duckdns.org/~/NlpNotes/mti/2020-06-26T02:59:44.497202+00:00Kennethhttps://blog.ksteimel.duckdns.org/@/ksteimel/2020-06-26T02:59:44.497202+00:00<![CDATA[<h1>TLDR</h1>
<p>The parser can be found at <a href="https://parser.ksteimel.duckdns.org" rel="noopener noreferrer">https://parser.ksteimel.duckdns.org</a>.</p>
<h1>What's with the name?</h1>
<p>Mti ([mĚŠti]) is the Swahili word for tree. This parser generates trees so it seemed like a nice name.</p>
<h1>Motivation</h1>
<p>My department in Linguistics had previously used an online parser to illustrate the generative principle.
This principle is fundamental to generative grammar. Essentially, the sentences produced by a grammar
should be all the sentences in a language and only those sentences. By showing the sentences that
would be generated by a grammar that are ungrammatical and the sentences you want to be able to process
but are unable to, you are able to tune your grammar to conform to the generative principle.</p>
<p>However, the parser that they had used was not libre software and when the owner was not
interested in maintaining it any longer, they lost their teaching tool.</p>
<p>They asked me if I could make a new implementation. I was excited to try because I had wanted to
make a context free parser from scratch in Julia. This was the perfect thing to hold my feet to the fire.</p>
<h1>Implementation</h1>
<p>Mti uses the Earley algorithm to do parsing of Context Free Grammars and
the Genie framework for handling web requests.</p>
<h2>Parser</h2>
<p>The parser used is a version of the <a href="https://en.wikipedia.org/wiki/Earley_parser" rel="noopener noreferrer">Earley algorithm</a> that I wrote in Julia.
The code can be found on <a rel="noopener noreferrer">my personal gitea server</a>.</p>
<p>For an explanation of how this algorithm works, I encourage you to look at the pseudocode section in the above wikipedia link.</p>
<p>There are some areas that can be improved upon in the parser implementation.
At the moment, optionality is handled on the rule generation side. Syntactic elements surrounded by parenthesis are optional.</p>
<p>It's simple to see how you would handle optionality with one optional element. For example, if you had the following rule:</p>
<blockquote>
<p>NP -> D (Adj) N</p>
</blockquote>
<p>Then you would generate one rule where the optional component is there (e.g. NP -> D Adj N) and one rule
where the optional component is missing (e.g. NP -> D N).</p>
<p>However, this becomes a problem when you have multiple optional pieces in a sentence. All possible permutations of
optional elements have to be considered. For example, if we have this rule:</p>
<blockquote>
<p>NP -> D (Adj) N (PP)</p>
</blockquote>
<p>then there are four possibilites:</p>
<ul>
<li>NP -> D N</li>
<li>NP -> D Adj N</li>
<li>NP -> D N PP</li>
<li>NP -> D Adj N PP</li>
</ul>
<p>You'll notice that this means each sentence with optional elemnts has n^2 introduced sentences, where n is the number
of optional elements added in.</p>
<p>To handle this, a bitmask over optional elements is constructed. If we had three optional elements then we would count to 3^2 (16) in binary
and each of these would be a mask over the optional elements.</p>
<p>If the mask is 1, then that optional element is included. Once the optional elements that should be included are calculated, the
non-optional elements are woven in.</p>
<h3>NP -> (D) (Adj) N (PP)</h3>
<blockquote>
<p>Optional elements: D Adj PP</p>
</blockquote>
<table><thead><tr><th> bitmask </th><th> introduced rule </th></tr></thead><tbody>
<tr><td> 000 </td><td> N </td></tr>
<tr><td> 001 </td><td> N PP </td></tr>
<tr><td> 010 </td><td> Adj N </td></tr>
<tr><td> 011 </td><td> Adj N PP </td></tr>
<tr><td> 100 </td><td> D N </td></tr>
<tr><td> 101 </td><td> D N PP </td></tr>
<tr><td> 110 </td><td> D Adj N </td></tr>
<tr><td> 111 </td><td> D Adj N PP </td></tr>
</tbody></table>
<h3>Why is this problematic?</h3>
<p>The issue with this is that it has limitations when moving outside of optionality. For example, if we wanted to
represent repetition, the method to do so is unsatisfying. Sure we could easily add some arbitrary number of rules for each
repeated element. For example, if we had NP -> Adj* N, we could add a rule for NP -> Adj N, NP -> Adj Adj N, NP Adj Adj Adj N adn so on.</p>
<p>We would have to establish some arbitrary cutoff where we stop doing this though. Technically repetition is supposed to be possible forever
(if you believe in infinite strings). However, we have to set some stopping point where we don't continue adding rules, otherwise we would never get
to the actual parsing :)</p>
<p>The solution I want to move to going forward is to have the syntactic elements stored in the Earley graph consist of julia types with information about
the optionality or repeatability of that element (in addition to feature information for the future implementation of feature grammars).
This would move dealing with optionality and repetition into the earley algorithm itself. Instead of generating more rules, the algorithm would simply
have the option of ignoring the next syntacitc element (for optionality) or duplicating the current syntactic element (for repetition).</p>
<p>This has mostly been a discussion of the parser used in the web app, the pitfalls of the current implmentation and the directions I want to go instead.</p>
<p><strong>In a future post I will explain how the web framework works and how to install this webapp on your own server</strong></p>
]]><![CDATA[Building an AMD Deep Learning Machine (part 3)]]>https://blog.ksteimel.duckdns.org/~/DeepLearningBuilds/building-an-amd-deep-learning-machine-part-3/2020-02-24T19:50:01.498190+00:00Kennethhttps://blog.ksteimel.duckdns.org/@/ksteimel/2020-02-24T19:50:01.498190+00:00<![CDATA[<p>Since I wrote those last two posts (<a href="https://blog.steimel.info/%7E/DeepLearningBuilds/building-an-amd-deep-learning-machine-part-1" rel="noopener noreferrer">part 1</a> and <a href="https://blog.steimel.info/%7E/DeepLearningBuilds/building-an-amd-deep-learning-machine-part-2" rel="noopener noreferrer">part 2</a>), there have been some changes to the landscape of AMD deep learning and deep learning in general.</p>
<p>Those blog posts were originally written in mid 2018. Since then, dynamic computation graphs with imperative language design in neural network frameworks has become ubiquitous (and in my opinion a must have). If you're familiar with toolkits like tensorflow version 1.* or caffe and want to learn more about how this paradigm changes the code you write, I encourage you to checkout <a href="https://www.tensorflow.org/guide/effective_tf2" rel="noopener noreferrer">the guidelines for tensorflow 2.0</a> which show comparisons of static, declarative APIs and dynamic imperative APIs.</p>
<h1>Deprecation of HCC</h1>
<p>AMD has deprecated <a href="https://github.com/RadeonOpenCompute/hcc" rel="noopener noreferrer">hcc</a>. Hcc was not discussed in my previous posts but it is essentially the equivalent of nvcc for AMD GPUs: it acts as a compiler to enable creation of deep learning toolkits. To put it simply, this seems really bad. How can deep learning on AMD GPUs continue if the compiler that frameworks depend upon are no longer being developed?</p>
<p>In actuality, this may not be that bad. HIP is a compiler that AMD also developed, which allows for compilation to Nvidia GPUs (translating hip calls to cuda and then calling nvcc) and AMD GPUs (via hcc). AMD's official statement indicates that they are pushing for HIP development instead of HCC.</p>
<blockquote>
<p>AMD is deprecating HCC to put more focus on HIP development and on other languages supporting heterogeneous compute. We will no longer develop any new feature in HCC and we will stop maintaining HCC after its final release, which is planned for June 2019. If your application was developed with the hc C++ API, we would encourage you to transition it to other languages supported by AMD, such as HIP or OpenCL. HIP and hc language share the same compiler technology, so many hc kernel language features (including inline assembly) are also available through the HIP compilation path.</p>
</blockquote>
<p>By pushing for HIP over HCC, AMD is encouraging cross platform GPU computing: code written with HIP is compatible with both AMD GPUs and Nvidia GPUs. HCC was specific to AMD GPUs.</p>
<p>It's also unclear how much this actually matters: all the deep learning frameworks that were adapted to use AMD GPUs were already using HIP over HCC.</p>
<h1>Deep learning toolkit updates</h1>
<p>I'm now going to talk about some updates to the state of various deep learning toolkits using AMD GPUs.</p>
<h2>Tensorflow</h2>
<p>Tensorflow is still the best supported solution for ROCm on AMD GPUs). In fact, ROCm has been upstreamed to mainline Tensorflow. This is great for addressing concerns about the long term maintenance of AMD deep learning solutions. The community is currently supporting builds of stable and nightly versions of tensorflow in the <a href="https://github.com/tensorflow/tensorflow#community-supported-builds" rel="noopener noreferrer">CI build system</a>.</p>
<p>AMD is supported for tensorflow 2.0 which means that dynamic computation graphs and imperative styling are supported. For example, instead of running:</p>
<pre><code><span class="source"><span class=""><span class="">outputs</span></span> <span class="keyword operator">=</span> <span class=""><span class=""><span class="">session</span><span class="punctuation">.</span></span><span class=""><span class="variable function">run</span></span><span class="punctuation">(</span><span class=""><span class=""><span class=""></span><span class=""><span class="variable function">f</span></span><span class="punctuation">(</span><span class=""><span class=""><span class="">placeholder</span></span></span><span class="punctuation">)</span></span><span class="punctuation">,</span> <span class="variable parameter">feed_dict</span><span class="keyword operator">=</span><span class=""><span class="punctuation">{</span><span class=""><span class="">placeholder</span></span><span class="punctuation">:</span> <span class=""><span class="support function">input</span></span><span class="punctuation">}</span></span></span><span class="punctuation">)</span></span>
</span></code></pre>
<p>You would run</p>
<pre><code><span class="source"><span class=""><span class="">outputs</span></span> <span class="keyword operator">=</span> <span class=""><span class=""></span><span class=""><span class="variable function">f</span></span><span class="punctuation">(</span><span class=""><span class=""><span class="support function">input</span></span></span><span class="punctuation">)</span></span>
</span></code></pre>
<p>This allows more fine-grained control over the network design and greatly improves readability.</p>
<p>Personally, I have had great success running tensorflow 2.0 on an RX580. Around August 2019, running tensorflow either meant running a lagging version or compiling from source (<em>I personally do not like using bazel so compiling from source was not enjoyable in this instance</em>). However, now that there are community builds, you can simply install the prebuilt wheel files. Installation is incredibly smooth.</p>
<h2>Pytorch</h2>
<p>Pytorch support is not nearly as good as tensorflow support. Though it is possible to use pytorch, using a docker container seems to be the only supported way to do so. <a href="https://github.com/ROCmSoftwarePlatform/pytorch/issues/581" rel="noopener noreferrer">This github issue</a> discusses how to build pytorch for ROOCm outside of a container with the latest versions of ROCm (3.0) and pytorch (1.3), however, there do seem to be some bugs specific to these particular versions. An approach that is <a href="https://github.com/ROCmSoftwarePlatform/pytorch/issues/565#issuecomment-574848879" rel="noopener noreferrer">encouraged</a> is to build the wheel files inside a container and then install them on the host system. I have not tried this personally. However, I intend to investigate this soon on my RX580 system.</p>
<h3>A note on pytorch documentation</h3>
<p>I was initially, very confused as the documentation on AMD's own <a href="https://rocm.github.io/pytorch.html" rel="noopener noreferrer">ROCm website</a> appears to be very out of date (requesting a ROCm version of 2.1 when 3.0 is the current release).</p>
<p>However, <a href="https://github.com/ROCmSoftwarePlatform/pytorch/wiki" rel="noopener noreferrer">the wiki</a> for the github repo provides a ton of useful information, including build instructions, unit test status and more.</p>
<p>I will hopefully have more information about usability and performance once I get a chance to sink my teeth into pytorch on ROCm. I will keep you posted.</p>
<h1>Concluding comments</h1>
<p>While AMD support for pytorch is still a bit behind, support in tensorflow is exceptionally good. The process has been streamlined significantly and
In a follow up blog, I will talk about alternative methods of running deep learning on AMD GPUs including non-python implementations like PlaidML's tile language and julia frameworks. I'll also touch on SYCL: a cross GPU standard for computation developed by the Khronos group (the creators of Vulkan, OpenCL and OpenGL).</p>
]]><![CDATA[Building an AMD Deep Learning Machine (part 1)]]>https://blog.ksteimel.duckdns.org/~/DeepLearningBuilds/building-an-amd-deep-learning-machine-part-1/2020-02-19T16:13:18.264087+00:00Kennethhttps://blog.ksteimel.duckdns.org/@/ksteimel/2020-02-19T16:13:18.264087+00:00<![CDATA[<p><strong>This is a blog post I wrote on my previous plume instance about a year ago. A forthcoming blog post will explain what I would do differently now if you're looking at building an AMD deep learning machine</strong></p>
<p>Deep learning has historically been dominated by NVIDIA GPUs. The Nvidia CUDA API is a proprietary standard for writing code to run on graphics processing hardware. CUDA is tightly integrated in all the major deep learning toolkits and provides a relatively intuitive programming interface (in comparison to OpenCL). For a more in depth discussion of the history of GPGPU programming and the potential for an interoperable open-source gpu programming future check out <a href="https://www.youtube.com/watch?v=ZTq8wKnVUZ8" rel="noopener noreferrer">this youtube video</a>.</p>
<p>However, CUDA is proprietary, only works on NVIDIA GPUs, and requires proprietary linux drivers to work. Many people, myself included have objections to the monopolistic hold that NVIDIA has established on the deep learning infrastructure market and object to their non-open practices. In addition, using CUDA can be a flat out pain on the administration side. In my experience, the CUDA utilities integrate poorly with package managers. I have had a number of issues with removing CUDA or replacing it with a new version where installation added a large number of additional programs but removal only uninstalled a couple programs.</p>
<h1>Hardware considerations</h1>
<p>AMD HIP/ROCm is slightly more picky than CUDA with regard to the hardware it will run on. RX 5<em>0 GPUs, RX 4</em>0 GPUs and the R9 3*0 series are not able to run on older cpus where pcie v3 atomics are not supported. Newer GPUs like the Vega 56, Vega 64, Vega Founders Edition and Radeon VII are able to run in a mode without PCIE v3 atomic support with a performance penalty.</p>
<p>CPUs with PCIE v3 atomic support include all Ryzen CPUs as well as all Intel CPUs from Haswell on (e.g. all Intel processors greater than 4000). For more information on supported hardware check out <a href="https://rocm.github.io/hardware.html" rel="noopener noreferrer">this page</a></p>
<h1>Hardware used</h1>
<ul>
<li>32 gb (2 16 gb dimms) of 3000 MHz GSkill Trident Z</li>
<li>Ryzen 7 1700</li>
<li>Vega 56 (ASRock blower)</li>
<li>Wraith Spire</li>
<li>B450 Aorus M</li>
<li>128gb ssd</li>
<li>2 tb hard drive</li>
<li>750 watt power supply</li>
<li>Rosewill scm-01 case</li>
</ul>
<p>All RGB was merely an accident of pricing. I just went with the best performance for the money. In addition, a blower style vega 56 was used instead of a free flowing Vega 56 like those by PowerColor as the case has relatively poor airflow. Getting hot air out of the case was deemed much more important for prolonged workload performance.</p>
<p><img src="https://blog.ksteimel.duckdns.org/static/media/76346D17-BD88-4DF7-9D43-D70ABFE0D7C9.jpg" alt="ASROCK vega 56 gpu"> </p>
]]><![CDATA[Building an AMD Deep Learning Machine (part 2)]]>https://blog.ksteimel.duckdns.org/~/DeepLearningBuilds/building-an-amd-deep-learning-machine-part-2/2020-02-18T22:11:59.541079+00:00Kennethhttps://blog.ksteimel.duckdns.org/@/ksteimel/2020-02-18T22:11:59.541079+00:00<![CDATA[<p><strong>Once again, this is a blog post I made back in 2019. I will discuss how I would do things differently in 2020 in a forthcoming blog post</strong></p>
<h1>Operating System</h1>
<p>I used Ubuntu 19.04 partially because I wanted to try out the April release of Ubuntu and I knew that the newer kernels were more compatible with Vega (the amdgpu driver is merged into the kernel after 4.19 which reduces installation headaches) and the Ryzen CPU.</p>
<h2>A note about docker</h2>
<p>If you are not a fan of docker, for security or whatever reason, I don't advise that you use Ubuntu 19.04. This release has only python 3.7 and there are, at the moment, <a href="https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/389#issuecomment-485057344" rel="noopener noreferrer">a few issues</a> with running rocm 2.3 with python 3.7. This doesn't seem to be a problem on python 3.5 or 3.6 and with older versions of ROCm. However, the performance improvement with the newer version of ROCm is pretty substantial so I would use a version of Ubuntu where you can downgrade your python version instead.</p>
<h1>Initial Software Stack</h1>
<p>First, the debian repository has to be added</p>
<pre><code>wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
</code></pre>
<p>Then, the appropriate packages are installed.</p>
<pre><code>sudo apt update
sudo apt install rocm-libs miopen-hip cxlactivitylogger
sudo apt install rocm-dev
</code></pre>
<p>Because we are using the amdgpu drivers in kernel 5.0 that ships with Ubuntu 19.04, we need to add the following udev rule.</p>
<pre><code>echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules
</code></pre>
<h1>Docker install</h1>
<p>I added the following line into my <code>~/.bash_rc</code> file to allow for quick launching of the container:</p>
<pre><code>alias drun='sudo docker run -it --network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt \
seccomp=unconfined \
-v $HOME/dockerx:/dockerx'
</code></pre>
<p>To launch the container, you then simply run <code>drun rocm/tensorflow</code> to drop into your container. The first time you run this, it will pull the images from dockerhub. After that, it will use the cached image.</p>
<p><img src="https://blog.ksteimel.duckdns.org/static/media/C0CF3637-FB2D-B8FC-7582-85258941C92C.jpg" alt="Complete amd computer build, the word RADEON is lit up in red on the side of the gpu and the ram is a vibrant rainbow"></p>
]]><![CDATA[Building an AMD Deep Learning Machine]]>https://blog.ksteimel.duckdns.org/~/DeepLearningBuilds/building-an-amd-deep-learning-machine/2020-02-18T22:08:36.444714+00:00Kennethhttps://blog.ksteimel.duckdns.org/@/ksteimel/2020-02-18T22:08:36.444714+00:00<![CDATA[<p><strong>This is a blog post I wrote on my previous plume instance about a year ago. A forthcoming blog post will explain what I would do differently now if you're looking at building an AMD deep learning machine</strong></p>
<p>Deep learning has historically been dominated by NVIDIA GPUs. The Nvidia CUDA API is a proprietary standard for writing code to run on graphics processing hardware. CUDA is tightly integrated in all the major deep learning toolkits and provides a relatively intuitive programming interface (in comparison to OpenCL). For a more in depth discussion of the history of GPGPU programming and the potential for an interoperable open-source gpu programming future check out <a href="https://www.youtube.com/watch?v=ZTq8wKnVUZ8" rel="noopener noreferrer">this youtube video</a>.</p>
<p>However, CUDA is proprietary, only works on NVIDIA GPUs, and requires proprietary linux drivers to work. Many people, myself included have objections to the monopolistic hold that NVIDIA has established on the deep learning infrastructure market and object to their non-open practices. In addition, using CUDA can be a flat out pain on the administration side. In my experience, the CUDA utilities integrate poorly with package managers. I have had a number of issues with removing CUDA or replacing it with a new version where installation added a large number of additional programs but removal only uninstalled a couple programs.</p>
<h1>Hardware considerations</h1>
<p>AMD HIP/ROCm is slightly more picky than CUDA with regard to the hardware it will run on. RX 5<em>0 GPUs, RX 4</em>0 GPUs and the R9 3*0 series are not able to run on older cpus where pcie v3 atomics are not supported. Newer GPUs like the Vega 56, Vega 64, Vega Founders Edition and Radeon VII are able to run in a mode without PCIE v3 atomic support with a performance penalty.</p>
<p>CPUs with PCIE v3 atomic support include all Ryzen CPUs as well as all Intel CPUs from Haswell on (e.g. all Intel processors greater than 4000). For more information on supported hardware check out <a href="https://rocm.github.io/hardware.html" rel="noopener noreferrer">this page</a></p>
<h1>Hardware used</h1>
<ul>
<li>32 gb (2 16 gb dimms) of 3000 MHz GSkill Trident Z</li>
<li>Ryzen 7 1700</li>
<li>Vega 56 (ASRock blower)</li>
<li>Wraith Spire</li>
<li>B450 Aorus M</li>
<li>128gb ssd</li>
<li>2 tb hard drive</li>
<li>750 watt power supply</li>
<li>Rosewill scm-01 case</li>
</ul>
<p>All RGB was merely an accident of pricing. I just went with the best performance for the money. In addition, a blower style vega 56 was used instead of a free flowing Vega 56 like those by PowerColor as the case has relatively poor airflow. Getting hot air out of the case was deemed much more important for prolonged workload performance.</p>
<p><img src="https://blog.ksteimel.duckdns.org/static/media/CB4436AC-0291-B922-7ED6-363C5D2F2850.jpg" alt="ASROCK vega 56 gpu"> </p>
]]>