{"componentChunkName":"component---src-templates-blog-list-template-js","path":"/116","result":{"data":{"allMarkdownRemark":{"edges":[{"node":{"excerpt":"What is STL? STL stands for Standard Template Library. If you've used C++ even in small projects, you've likely already used STL - which is…","fields":{"slug":"/engineering/cpp-stl-containers/"},"html":"<h2 id=\"what-is-stl\" style=\"position:relative;\"><a href=\"#what-is-stl\" aria-label=\"what is stl permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is STL?</h2>\n<p>STL stands for Standard Template Library. If you've used C++ even in small projects, you've likely already used STL - which is a great thing! Using STL in C++ makes your code more expressive, simple, and easy to understand. This post will give you an overview of how STL works, some examples, and the basic knowledge you need to get started!</p>\n<h2 id=\"stl-components\" style=\"position:relative;\"><a href=\"#stl-components\" aria-label=\"stl components permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>STL Components</h2>\n<p>Loosely speaking, the first components you'd typically understand and use of STL are Containers &#x26; Algorithms. Then there's also Iterators &#x26; Functors, but you should try to take them on one by one, in that order perhaps. This blog will expand a bit on STL Containers.</p>\n<h3 id=\"containers\" style=\"position:relative;\"><a href=\"#containers\" aria-label=\"containers permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Containers</h3>\n<p>Suppose you are at your favorite cinema hall and it's the launch of a big movie. Since it's the launch, there are likely many people waiting in line to buy their tickets. Naturally, you join the queue at the back and wait for your turn. In the computing world, we have a queue too! This is a popular data structure. If you've had a Data Structures class before, you are most likely familiar with some other data structures as well. These are often used, so the STL provides a great implementation of all these data structures, otherwise known as containers.</p>\n<p>Take arrays, for example. Arrays are elements with the same type, stored in contiguous blocks of memory. In C++, you can use arrays as you would in C, like this:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk10\">std</span><span class=\"mtk1\">::string </span><span class=\"mtk12\">students</span><span class=\"mtk1\">[</span><span class=\"mtk7\">10</span><span class=\"mtk1\">];</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">students</span><span class=\"mtk1\">[</span><span class=\"mtk7\">0</span><span class=\"mtk1\">] = </span><span class=\"mtk8\">&quot;Adam&quot;</span><span class=\"mtk1\">;</span></span></code></pre>\n<p>But wait, STL provides a container for arrays too. It's available in the header <code>&#x3C;array></code>. Example usage of it would look like this:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk15\">#include</span><span class=\"mtk4\"> </span><span class=\"mtk8\">&lt;iostream&gt;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">#include</span><span class=\"mtk4\"> </span><span class=\"mtk8\">&lt;string&gt;</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">int</span><span class=\"mtk1\"> </span><span class=\"mtk11\">main</span><span class=\"mtk1\">() {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk10\">std</span><span class=\"mtk1\">::array&lt;</span><span class=\"mtk10\">std</span><span class=\"mtk1\">::string, </span><span class=\"mtk7\">10</span><span class=\"mtk1\">&gt; students;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk12\">students</span><span class=\"mtk1\">[</span><span class=\"mtk7\">0</span><span class=\"mtk1\">] = </span><span class=\"mtk8\">&quot;Adam&#39;s Friend&quot;</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    cout &lt;&lt; </span><span class=\"mtk12\">students</span><span class=\"mtk1\">[</span><span class=\"mtk7\">0</span><span class=\"mtk1\">] &lt;&lt; endl;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">}</span></span></code></pre>\n<p>So, what's the difference? Why should you use the STL Container class? The performance difference is negligible, but what makes the container classes better is that it is a class. A wrapper class around normal arrays, and offers some advantages like passing by values, bounds checking, etc. As a side note, you'd normally want to use an <code>std::vector</code> instead - which is a dynamically resizable array.</p>\n<h4 id=\"other-examples\" style=\"position:relative;\"><a href=\"#other-examples\" aria-label=\"other examples permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Other examples</h4>\n<p>Other containers in C++ include:</p>\n<ul>\n<li>\n<p>Sequence containers: (accessibility in a sequential manner)</p>\n<ul>\n<li><code>std::array</code></li>\n<li>\n<p><code>std::vector</code>: Dynamically resizable arrays.\nArrays have a set size, while <code>std::vector</code> doesn't. You can keep adding elements in contigious blocks of memory. For example:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk10\">std</span><span class=\"mtk1\">::vector&lt;</span><span class=\"mtk10\">std</span><span class=\"mtk1\">::string&gt; students;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">students</span><span class=\"mtk1\">.</span><span class=\"mtk11\">push_back</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Paul&quot;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">students</span><span class=\"mtk1\">.</span><span class=\"mtk11\">push_back</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Jack&quot;</span><span class=\"mtk1\">);</span></span></code></pre>\n<p>Paul &#x26; Jack are stored in contiguous memory blocks, even though we didn't provide a specific limit at the start - as we do with arrays.</p>\n</li>\n<li>\n<p><code>std::forward_list</code>: Singly linked-list.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">A -&gt; B -&gt; C -&gt; D</span></code></pre>\n<p>This is a simple example of linked-lists. Once you're at a point, let's say \"C\" - you can only go forwards.</p>\n</li>\n<li>\n<p><code>std::list</code>: Doubly linked-list.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">A &lt;-&gt; B &lt;-&gt; C &lt;-&gt; D</span></code></pre>\n<p>Above is a simple example of a doubly linked-list. Unlike Singly linked-lists, if you're at \"C\", you can go either way. To \"B\" or \"D\".</p>\n</li>\n<li><code>std::deque</code>: Double-ended queues - insertion and removal possible at both ends.</li>\n</ul>\n</li>\n<li>\n<p>Container Adaptors: (different interface of accessibility from sequence containers)</p>\n<ul>\n<li><code>std::queue</code>: A standard queue, where removals are done from the front, and insertions are done at the end. The queue is a FIFO structure (First in, First out).</li>\n<li><code>std::priority_queue</code>: A queue in which elements can have a varying level of importance. The ones with the highest importance are at the front, and thus processed first.\nFor example, let's say you're at a barbershop. A person arrived after you. If that person would've made a prior appointment to the shop, he would automatically be placed ahead of you with more \"importance\". Priority queues work similarly.</li>\n<li><code>std::stack</code>:\nSuppose you have a pile of 10 books. If you need a book from middle - of course, like most people, you will pull it right from the middle in one go. Let's think about how computers would take this problem. A computer needs a step-by-step instruction, so the first thing it would do is remove the book from the top. And continue to do that until it reaches the book. To keep books as they were again, it would place books one by one at the top of the pile.\nThus, insertions and removals in a stack are only done at the \"top\". </li>\n</ul>\n</li>\n<li>\n<p>Associative containers: (sorted in a specific order, these containers boast search speeds of O(log N))</p>\n<ul>\n<li><code>std::set</code>: Each element inserted into a set is it's own identifier, meaning that unique elements are entered. Each element acts as it's own \"key\" - which uniquely identifies it. For example, suppose you are a volunteer for entering student information for new students entering into the semester. They are uniquely identified by their roll numbers. Suppose you enter the same student's Roll No (the ID) twice into a set, it will be inserted just once - because a \"set\" can only have unique values.</li>\n<li><code>std::multiset</code>: Like a set, but here the same multiple elements are allowed. Entering the roll no from before twice into an <code>std::multiset</code> will result in it being added twice.</li>\n<li><code>std::map</code>: Take a literal map, for example. When you point to a location on the map, it tells you the details. The \"point\" is the key, and the details of that location are the \"values\". The keys are unique, and values can be anything. An <code>std::map</code> works the same way. Perhaps an example will help understanding more about it:</li>\n</ul>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"5\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\">// This maps a student name to his marks. But is this correct?</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk10\">std</span><span class=\"mtk1\">::map&lt;</span><span class=\"mtk10\">std</span><span class=\"mtk1\">::string, </span><span class=\"mtk4\">int</span><span class=\"mtk1\">&gt; mapOfStudentNameToMarks;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">// The answer is no. A student&#39;s name isn&#39;t necessarily unique, multiple students with the same name will have a clash this way.</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">// Thus, the correct way would be:</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk10\">std</span><span class=\"mtk1\">::map&lt;</span><span class=\"mtk4\">int</span><span class=\"mtk1\">, </span><span class=\"mtk4\">int</span><span class=\"mtk1\">&gt; mapOfStudentIDToMarks;</span><span class=\"mtk3\"> // This is correct, because StudentIDs are unique!</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">mapOfStudentIDToMarks</span><span class=\"mtk1\">[</span><span class=\"mtk7\">1</span><span class=\"mtk1\">] = </span><span class=\"mtk7\">100</span><span class=\"mtk1\">;</span><span class=\"mtk3\">   // StudentID 1 -&gt; 100 Marks.</span></span></code></pre>\n<ul>\n<li><code>std::multimap</code>: Can you guess what this might do, based on <code>std::multiset</code>?\nMultimap allows multiple elements to have the same key. So, for example, 10 -> 100 &#x26; 10 -> 150 are both valid for a multimap.</li>\n</ul>\n</li>\n<li>\n<p>Unordered Associative containers: (like associative containers, but implemented as hash-tables. They can be accessed at O(1), i.e., constant time.</p>\n<ul>\n<li><code>std::unordered_set</code></li>\n<li><code>std::unordered_multiset</code></li>\n<li><code>std::unordered_map</code></li>\n<li><code>std::unordered_multimap</code>\n-- </li>\n</ul>\n</li>\n</ul>\n<h3 id=\"stl---what-next\" style=\"position:relative;\"><a href=\"#stl---what-next\" aria-label=\"stl   what next permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>STL - What next?</h3>\n<p>Containers aren't the only part of STL. It is huge! You are encouraged to read up more on STL, as it makes your life easier in every manner! You don't need to reinvent the wheel. Next, you could try looking up the various algorithms contained in the STL. They're present in the header file <code>&#x3C;algorithm></code>. Good luck on your journey!</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk10 { color: #4EC9B0; }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk7 { color: #B5CEA8; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n</style>","frontmatter":{"date":"October 13, 2020","updated_date":null,"description":"Learn how Standard Template Library works in C++ with interactive examples and what you need to get started","title":"STL Containers & Data Structures in C++","tags":["C++","STL"],"pinned":null,"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/827c613a0a1ac4b2c67db9a2e91101b6/14b42/cover.jpg","srcSet":"/static/827c613a0a1ac4b2c67db9a2e91101b6/f836f/cover.jpg 200w,\n/static/827c613a0a1ac4b2c67db9a2e91101b6/2244e/cover.jpg 400w,\n/static/827c613a0a1ac4b2c67db9a2e91101b6/14b42/cover.jpg 800w,\n/static/827c613a0a1ac4b2c67db9a2e91101b6/47498/cover.jpg 1200w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Aryan Rawlani","github":"aryanrawlani28","avatar":null}}}},{"node":{"excerpt":"Before we get into details of finding out optimal clusters, let's first see what the KMeans clustering algorithm is and some basics about it…","fields":{"slug":"/engineering/optimal-clusters-kmeans/"},"html":"<p>Before we get into details of finding out optimal clusters, let's first see what the KMeans clustering algorithm is and some basics about it.</p>\n<h2 id=\"what-is-clustering\" style=\"position:relative;\"><a href=\"#what-is-clustering\" aria-label=\"what is clustering permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is Clustering?</h2>\n<p>Clustering is an unsupervised ML technique wherein we cluster the data to get insights from it. Clustering the data is quite essential for some business models and problems. It gives us conclusions on what is a cluster, i.e. data which is similar and in the form of cluster or groups.</p>\n<blockquote>\n<p>Clustering is the process of dividing the entire data into groups (also known as clusters) based on the patterns in the data.</p>\n</blockquote>\n<h2 id=\"what-is-the-kmeans-clustering-algorithm\" style=\"position:relative;\"><a href=\"#what-is-the-kmeans-clustering-algorithm\" aria-label=\"what is the kmeans clustering algorithm permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is the KMeans clustering algorithm?</h2>\n<p>It is an algorithm for clustering. We will be discussing this method with code in the further sections.</p>\n<h2 id=\"initial-imports-\" style=\"position:relative;\"><a href=\"#initial-imports-\" aria-label=\"initial imports  permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Initial Imports :</h2>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk15\">import</span><span class=\"mtk1\"> pandas </span><span class=\"mtk15\">as</span><span class=\"mtk1\"> pd</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">import</span><span class=\"mtk1\"> numpy </span><span class=\"mtk15\">as</span><span class=\"mtk1\"> np</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">import</span><span class=\"mtk1\"> matplotlib.pyplot </span><span class=\"mtk15\">as</span><span class=\"mtk1\"> plt</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.cluster </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> KMeans</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">%matplotlib inline</span></span></code></pre>\n<h2 id=\"method-\" style=\"position:relative;\"><a href=\"#method-\" aria-label=\"method  permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Method :</h2>\n<p>Now let's discuss the method behind finding out the right number of clusters on a K-Means clustering algorithm.\nSo we will learn how to decide what number of clusters to input into your K-Means algorithm.\nHere we've got a data science problem.\nWe've got only two variables, x and y coordinates.</p>\n<p>Now, if we run the K means clustering algorithm on this dataset with three clusters or with K pre-determine the clusters to be three, then the result will look something like this.</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 700px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 56.30769230769231%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAA7DAAAOwwHHb6hkAAABl0lEQVQoz21TCZKcMAyc/38uqTwgO7U7FJcB4xNfdCTPMgy1odz4kNxyS/bNbgkM991bz4gnjvWXPcKHRMg/bYTbDmCnX0il9ozjO2w7TvDnt4jVeJSyX2yMW6ZFxrA4RCblDeME34/INCcTDh8mCDHjqxXohIQm0pjyxed2TBa11RDmq4FuWuiPT+RlBQot572Cx4s0EIvBr/uIbpghJgWtHazZqs9JaBO2cYZlsmHCpgx267H7UANVQmpSWnSjhFQO92ZEoLwdQbl/EUodEMSCKBXM7z/IlKfdbUgdSV8UCs/zM4ue1hUFTKFcyPZKyNo58uphHy0c56/tEVaNwqdMb5uobd5BrTPkImhsn5UoePmckk1EnCQskVkxI1L+quS3HDKE6ND0f9F0HzRuSTIVZttQqDgXycolFCIo04xCZKkf3u7OeWeG/oGu/0Tb3RH4hDkjR05HuRI+Bg1Jp9SEVZIsl6FMIMRvBLIlqrCiogiMpMZQIec10PXa/yc51Ffi6AU4Svbz9eQf8GTzsdSefbWLF8J/VKBZvYLdxVoAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"initial\"\n        title=\"initial\"\n        src=\"/static/316f3eaecabb5af0df66b7d9b11d838c/8c557/initial.png\"\n        srcset=\"/static/316f3eaecabb5af0df66b7d9b11d838c/a6d36/initial.png 650w,\n/static/316f3eaecabb5af0df66b7d9b11d838c/8c557/initial.png 700w\"\n        sizes=\"(max-width: 700px) 100vw, 700px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<p>We need a specific metric, we need a way to understand or evaluate how a certain number of clusters performs compared to a different number of clusters, and preferably, that metric should be quantifiable.</p>\n<p>So what kind of metric can we impose upon our clustering algorithm that will tell us something about the final result?\nThere is such a metric called the within-cluster sum of squares. (WCSS)</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 700px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 56.30769230769231%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAA7DAAAOwwHHb6hkAAABDElEQVQoz6WS227EIAxE8//fWnWTcEkwNlMbQi5spa7UhwnGhsNgMvktY1TYGb/lP9FUADykn5wZb/lhzahenFgKuqTCBMu81lFY8yyq8pBwq11jQTk0GeSEamyLvr9mhECwE0TBJV8bWiw1vqBtDkEDdqi5NkducXCeLsiHOoH9uj5EEGVEH9RhRiZd8R9gUhgRY50X7PrSBrUCWz+7eFCWAXjA7lcOq68LVkfNpRWO3vzp0EYu16MY0B9Ac7YoNEZ1G7UV0drA1RXfJLnB7MDmUCdtbHHcdqREtZ/bTtrbpEATYfUJr2XDrHp1zRsocXNY7BrytF1M96tYWn/24Ly6dxBqm2v93Cs19wNLOGPEhMaFfAAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"Wcss\"\n        title=\"Wcss\"\n        src=\"/static/8751d08cd81fd9d1fb12a196b73d561a/8c557/Wcss.png\"\n        srcset=\"/static/8751d08cd81fd9d1fb12a196b73d561a/a6d36/Wcss.png 650w,\n/static/8751d08cd81fd9d1fb12a196b73d561a/8c557/Wcss.png 700w\"\n        sizes=\"(max-width: 700px) 100vw, 700px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<p>So you can see here that it jumps from 8000 down to 3000, that's a massive change of 5000. Let's just call them units, 5000 units and then from 3000 as we increase the number of the close from 2 to 3, they jump from 3000 to 1000.</p>\n<p>Again quite a large drop and then from three to four what's going to happen is going to jump from 1000 to maybe eight hundred and from 800 to 600, 600 to 500 and so on so as you can see the first two improvements or first two changes from one cluster to two from two to three created some huge jumps or considerable drops in the WTS going forward The WCR says drops not substantially. And this is our hint at selecting the optimum optimal number of clusters; and the method we're going to use is the elbow method, and it is very visual. All you have to do is look at your chart and look for that change, or that's kind of like it does look like an ELBOW.</p>\n<p>Look for that elbow in your chart where the drop goes from being quite substantial to being not as significant not as proven is not as great, and therefore, that point in your chart will be the optimal number of clusters.</p>\n<p>In this case, it is indeed three clusters.</p>\n<p>That is the optimal number. And as you can imagine, this method is entirely arbitrary.\nSometimes, the situations are not as pronounced as the elbow might not be as evident as in this case, and therefore, somebody might pick one number of clusters. Someone else might come along and select a different number.</p>\n<h2 id=\"code-\" style=\"position:relative;\"><a href=\"#code-\" aria-label=\"code  permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>CODE :</h2>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.cluster </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> KMeans</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">wcss = []</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">for</span><span class=\"mtk1\"> i </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> </span><span class=\"mtk11\">range</span><span class=\"mtk1\">(</span><span class=\"mtk7\">1</span><span class=\"mtk1\">, </span><span class=\"mtk7\">11</span><span class=\"mtk1\">):</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  kmeans = KMeans(</span><span class=\"mtk12\">n_clusters</span><span class=\"mtk1\"> = i, </span><span class=\"mtk12\">init</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;k-means++&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">random_state</span><span class=\"mtk1\"> = </span><span class=\"mtk7\">42</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  kmeans.fit(X)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  wcss.append(kmeans.inertia_)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">plt.plot(</span><span class=\"mtk11\">range</span><span class=\"mtk1\">(</span><span class=\"mtk7\">1</span><span class=\"mtk1\">, </span><span class=\"mtk7\">11</span><span class=\"mtk1\">), wcss)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">plt.title(</span><span class=\"mtk8\">&#39;The Elbow Method&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">plt.xlabel(</span><span class=\"mtk8\">&#39;Number of clusters&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">plt.ylabel(</span><span class=\"mtk8\">&#39;WCSS&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">plt.show()</span></span></code></pre>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk7 { color: #B5CEA8; }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n</style>","frontmatter":{"date":"October 12, 2020","updated_date":null,"description":null,"title":"Optimal clusters for KMeans Algorithm","tags":["Machine Learning"],"pinned":null,"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/68c5b281737075fa7f3065e67a5906d6/14b42/cover.jpg","srcSet":"/static/68c5b281737075fa7f3065e67a5906d6/f836f/cover.jpg 200w,\n/static/68c5b281737075fa7f3065e67a5906d6/2244e/cover.jpg 400w,\n/static/68c5b281737075fa7f3065e67a5906d6/14b42/cover.jpg 800w,\n/static/68c5b281737075fa7f3065e67a5906d6/8c2d7/cover.jpg 1192w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Neeraj Ap","github":"NEERAJAP2001","avatar":null}}}},{"node":{"excerpt":"Introduction When building APIs, the need to upload files is expected, which can be images, text documents, scripts, pdfs, among others. In…","fields":{"slug":"/engineering/upload-files-with-node-and-multer/"},"html":"<h1 id=\"introduction\" style=\"position:relative;\"><a href=\"#introduction\" aria-label=\"introduction permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Introduction</h1>\n<p>When building APIs, the need to upload files is expected, which can be images, text documents, scripts, pdfs, among others. In the development of this functionality, some problems can be found, such as the number of files, valid file types, sizes of these files, and several others. And to save us from these problems we have the <a href=\"https://github.com/expressjs/multer\">Multer</a> library. Multer is a node.js middleware for handling <code>multipart/form-data</code> that is used to send files in forms.</p>\n<h1 id=\"first-steps\" style=\"position:relative;\"><a href=\"#first-steps\" aria-label=\"first steps permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>First steps</h1>\n<p>The first step is to create a NodeJS project on your computer.</p>\n<h1 id=\"adding-express\" style=\"position:relative;\"><a href=\"#adding-express\" aria-label=\"adding express permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Adding Express</h1>\n<p>In your terminal, type the following command:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"jsx\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk12\">yarn</span><span class=\"mtk1\"> </span><span class=\"mtk12\">add</span><span class=\"mtk1\"> </span><span class=\"mtk12\">express</span></span></code></pre>\n<p>* <em>You can also use NPM for installation</em></p>\n<p>Create a file named <code>app.js</code> inside the <code>src/</code> folder. The next step is to start our Express server in our <code>app.js</code></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"js\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">express</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">require</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;express&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">app</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">express</span><span class=\"mtk1\">()</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">listen</span><span class=\"mtk1\">(</span><span class=\"mtk7\">3000</span><span class=\"mtk1\"> || </span><span class=\"mtk12\">process</span><span class=\"mtk1\">.</span><span class=\"mtk12\">env</span><span class=\"mtk1\">.</span><span class=\"mtk12\">PORT</span><span class=\"mtk1\">, () </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Server on...&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span></code></pre>\n<h1 id=\"adding-multer\" style=\"position:relative;\"><a href=\"#adding-multer\" aria-label=\"adding multer permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Adding Multer</h1>\n<p>With the project created, configured and with Express installed, we will add the multer to our project.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"js\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk12\">yarn</span><span class=\"mtk1\"> </span><span class=\"mtk12\">add</span><span class=\"mtk1\"> </span><span class=\"mtk12\">multer</span></span></code></pre>\n<p>The next step is to import the multer into our <code>app.js</code> file.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"jsx\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">multer</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">require</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;multer&quot;</span><span class=\"mtk1\">)</span></span></code></pre>\n<p>We are almost there. Now create a folder called <code>uploads</code> where we will store the uploaded files.</p>\n<h1 id=\"configuring-and-validating-the-upload\" style=\"position:relative;\"><a href=\"#configuring-and-validating-the-upload\" aria-label=\"configuring and validating the upload permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Configuring and validating the upload</h1>\n<p>Now we are at a very important stage which is the configuration of <code>diskStorage</code>. <code>DiskStorage</code> is a method made available by multer where we configure the destination of the file, the name of the file and we can also add validations regarding the type of the file. These settings are according to the needs of your project. Below I will leave an elementary example of the configuration.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"js\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">storage</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">multer</span><span class=\"mtk1\">.</span><span class=\"mtk11\">diskStorage</span><span class=\"mtk1\">({</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk11\">destination</span><span class=\"mtk12\">:</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">file</span><span class=\"mtk1\">, </span><span class=\"mtk12\">cb</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk11\">cb</span><span class=\"mtk1\">(</span><span class=\"mtk4\">null</span><span class=\"mtk1\">, </span><span class=\"mtk8\">&quot;uploads/&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk11\">filename</span><span class=\"mtk12\">:</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">file</span><span class=\"mtk1\">, </span><span class=\"mtk12\">cb</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk11\">cb</span><span class=\"mtk1\">(</span><span class=\"mtk4\">null</span><span class=\"mtk1\">, </span><span class=\"mtk10\">Date</span><span class=\"mtk1\">.</span><span class=\"mtk11\">now</span><span class=\"mtk1\">() + </span><span class=\"mtk8\">&quot;-&quot;</span><span class=\"mtk1\"> + </span><span class=\"mtk12\">file</span><span class=\"mtk1\">.</span><span class=\"mtk12\">originalname</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span></code></pre>\n<p>In the configuration above, we mentioned the destination for the uploaded files and also change the name of the file .</p>\n<h1 id=\"providing-an-upload-route\" style=\"position:relative;\"><a href=\"#providing-an-upload-route\" aria-label=\"providing an upload route permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Providing an upload route</h1>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"js\" data-index=\"5\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">uploadStorage</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">multer</span><span class=\"mtk1\">({ </span><span class=\"mtk12\">storage:</span><span class=\"mtk1\"> </span><span class=\"mtk12\">storage</span><span class=\"mtk1\"> })</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">// Single file</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">post</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;/upload/single&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">uploadStorage</span><span class=\"mtk1\">.</span><span class=\"mtk11\">single</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;file&quot;</span><span class=\"mtk1\">), (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">res</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk12\">req</span><span class=\"mtk1\">.</span><span class=\"mtk12\">file</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk12\">res</span><span class=\"mtk1\">.</span><span class=\"mtk11\">send</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Single file&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">//Multiple files</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">post</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;/upload/multiple&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">uploadStorage</span><span class=\"mtk1\">.</span><span class=\"mtk11\">array</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;file&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk7\">10</span><span class=\"mtk1\">), (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">res</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk12\">req</span><span class=\"mtk1\">.</span><span class=\"mtk12\">files</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk12\">res</span><span class=\"mtk1\">.</span><span class=\"mtk11\">send</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Multiple files&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span></code></pre>\n<p>In the code snippet above, we created 2 POST routes for sending files. The first <code>/upload/single</code> route receives only a single file, note that the uploadStorage variable receives our diskStorage settings. As a middleware in the route, it calls the <code>single</code> method for uploading a single file. The <code>/upload/multiple</code> route receives several files, but with a maximum limit of 10 files, note that the uploadStorage variable now calls the ʻarray` method for uploading multiple files.</p>\n<h1 id=\"the-end\" style=\"position:relative;\"><a href=\"#the-end\" aria-label=\"the end permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>The end</h1>\n<p>With all the settings done, our little API is already able to store the files sent.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"js\" data-index=\"6\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">express</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">require</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;express&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">multer</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">require</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;multer&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">app</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">express</span><span class=\"mtk1\">()</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">storage</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">multer</span><span class=\"mtk1\">.</span><span class=\"mtk11\">diskStorage</span><span class=\"mtk1\">({</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk11\">destination</span><span class=\"mtk12\">:</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">file</span><span class=\"mtk1\">, </span><span class=\"mtk12\">cb</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk11\">cb</span><span class=\"mtk1\">(</span><span class=\"mtk4\">null</span><span class=\"mtk1\">, </span><span class=\"mtk8\">&quot;uploads/&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk11\">filename</span><span class=\"mtk12\">:</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">file</span><span class=\"mtk1\">, </span><span class=\"mtk12\">cb</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk11\">cb</span><span class=\"mtk1\">(</span><span class=\"mtk4\">null</span><span class=\"mtk1\">, </span><span class=\"mtk10\">Date</span><span class=\"mtk1\">.</span><span class=\"mtk11\">now</span><span class=\"mtk1\">() + </span><span class=\"mtk8\">&quot;-&quot;</span><span class=\"mtk1\"> + </span><span class=\"mtk12\">file</span><span class=\"mtk1\">.</span><span class=\"mtk12\">originalname</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">uploadStorage</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">multer</span><span class=\"mtk1\">({ </span><span class=\"mtk12\">storage:</span><span class=\"mtk1\"> </span><span class=\"mtk12\">storage</span><span class=\"mtk1\"> })</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">// Single file</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">post</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;/upload/single&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">uploadStorage</span><span class=\"mtk1\">.</span><span class=\"mtk11\">single</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;file&quot;</span><span class=\"mtk1\">), (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">res</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk12\">req</span><span class=\"mtk1\">.</span><span class=\"mtk12\">file</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk12\">res</span><span class=\"mtk1\">.</span><span class=\"mtk11\">send</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Single file&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">//Multiple files</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">post</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;/upload/multiple&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">uploadStorage</span><span class=\"mtk1\">.</span><span class=\"mtk11\">array</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;file&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk7\">10</span><span class=\"mtk1\">), (</span><span class=\"mtk12\">req</span><span class=\"mtk1\">, </span><span class=\"mtk12\">res</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk12\">req</span><span class=\"mtk1\">.</span><span class=\"mtk12\">files</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk12\">res</span><span class=\"mtk1\">.</span><span class=\"mtk11\">send</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Multiple files&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">listen</span><span class=\"mtk1\">(</span><span class=\"mtk7\">3000</span><span class=\"mtk1\"> || </span><span class=\"mtk12\">process</span><span class=\"mtk1\">.</span><span class=\"mtk12\">env</span><span class=\"mtk1\">.</span><span class=\"mtk12\">PORT</span><span class=\"mtk1\">, () </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Server on...&quot;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span></code></pre>\n<p>Now it's up to you!</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n  .dark-default-dark .mtk7 { color: #B5CEA8; }\n  .dark-default-dark .mtk10 { color: #4EC9B0; }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n</style>","frontmatter":{"date":"October 12, 2020","updated_date":null,"description":"Learn how to upload files in a NodeJS application using Multer, Multer is a middleware for handling multipart/form-data that is used to send files in forms.","title":"Upload files using NodeJS + Multer","tags":["NodeJs","Express","Multer"],"pinned":null,"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/49a3115a8c11e7fd9aca612e846c5936/ee604/node-multer-upload.png","srcSet":"/static/49a3115a8c11e7fd9aca612e846c5936/69585/node-multer-upload.png 200w,\n/static/49a3115a8c11e7fd9aca612e846c5936/497c6/node-multer-upload.png 400w,\n/static/49a3115a8c11e7fd9aca612e846c5936/ee604/node-multer-upload.png 800w,\n/static/49a3115a8c11e7fd9aca612e846c5936/f3583/node-multer-upload.png 1200w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Gabriel Rabelo","github":"gabrielrab","avatar":null}}}},{"node":{"excerpt":"Introduction Learning Deep Features for Discriminative Localization: Machine learning and Deep learning are gaining traction in today’s…","fields":{"slug":"/engineering/class-activation-mapping/"},"html":"<h3 id=\"introduction\" style=\"position:relative;\"><a href=\"#introduction\" aria-label=\"introduction permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Introduction</h3>\n<h4 id=\"learning-deep-features-for-discriminative-localization\" style=\"position:relative;\"><a href=\"#learning-deep-features-for-discriminative-localization\" aria-label=\"learning deep features for discriminative localization permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Learning Deep Features for Discriminative Localization:</h4>\n<p>Machine learning and Deep learning are gaining traction in today’s world and are making significant and unimaginable progress in almost every industry. However, with the increase in complexity and accuracy of these algorithms, the interpretability of these is at stake- especially the deep learning models which take in more than a million parameters for complex, convoluted models. Class Activation Mapping (CAM) is one such technique which helps us in enhancing the interpretability of such complex models.</p>\n<h3 id=\"class-activation-mapping-cams\" style=\"position:relative;\"><a href=\"#class-activation-mapping-cams\" aria-label=\"class activation mapping cams permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Class Activation Mapping (CAMs)</h3>\n<p>For a particular class (or category), Class activation mapping basically indicates the discriminative region of the image, which influenced the deep learning model to make the decision. The architecture is very similar to a convolutional neural network. It comprises several convolution layers, with the layer just before the final output performing Global Average Pooling. The features that are obtained are fed into the fully connected neural network layer governed by the softmax activation function and thus, output us the required probabilities. The importance of the weights with respect to a category can be found out by projecting back the weights onto the last convolution layer’s feature map. </p>\n<h3 id=\"global-average-pooling-gap-vs-global-max-pooling-gmp\" style=\"position:relative;\"><a href=\"#global-average-pooling-gap-vs-global-max-pooling-gmp\" aria-label=\"global average pooling gap vs global max pooling gmp permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Global Average Pooling (GAP) vs Global Max Pooling (GMP)</h3>\n<p>The Global Average Pooling (GAP) is preferred over Global Max Pooling (GMP) because GAP helps us in identifying the full extent of the object as compared to the GMP layer, which identifies one discriminative part. In Global Average Pooling, an average is taken across all the activation maps which help us to find all the possible discriminative regions present in them. Contrary to this, the Global Max Pooling method just considers the most discriminative region. Hence, Global Average Pooling showed better results than Global Max Pooling.</p>\n<h3 id=\"mathematical-equations-governing-cams\" style=\"position:relative;\"><a href=\"#mathematical-equations-governing-cams\" aria-label=\"mathematical equations governing cams permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Mathematical equations governing CAMs</h3>\n<p>Let <img src=\"https://latex.codecogs.com/png.latex?f%28x%2Cy%29\" alt=\"Equation 1\"> be the activation map of unit <img src=\"https://latex.codecogs.com/png.latex?k\" alt=\"Equation 2\"> in the last convolutional layer at spatial location <img src=\"https://latex.codecogs.com/png.latex?%28x%2Cy%29\" alt=\"Equation 3\">.</p>\n<p><em>The result of GAP is represented as:-</em></p>\n<p><img src=\"https://latex.codecogs.com/png.latex?F_%7Bk%7D%3D%20%5Csum_%7Bx%2Cy%7Df_%7Bk%7D%28x%2Cy%29\" alt=\"Equation 4\"></p>\n<p><em>For a class c, an input to the softmax will be:-</em></p>\n<p><img src=\"https://latex.codecogs.com/png.latex?S_%7Bc%7D%3D%20%5Csum_%7Bk%7Dw%5E%7Bc%7D_%7Bk%7DF_%7Bk%7D\" alt=\"Equation 5\"></p>\n<p><em>Output of Softmax layer:-</em></p>\n<p><img src=\"https://latex.codecogs.com/png.latex?P_c%3D%20%5Cfrac%7Be%5E%7BS_c%7D%7D%7B%5Csum_ce%5E%7BS_c%7D%7D\" alt=\"Equation 6\"></p>\n<p>Thus, <strong>the final equation</strong> for an activation map of class c would be:- </p>\n<p><img src=\"https://latex.codecogs.com/png.latex?M_%7Bc%7D%28x%2Cy%29%3D%5Csum_%7Bk%7Dw%5E%7Bc%7D_%7Bk%7Df_%7Bk%7D%28x%2Cy%29\" alt=\"Equation 7\">  </p>\n<h3 id=\"weakly-supervised-object-localization\" style=\"position:relative;\"><a href=\"#weakly-supervised-object-localization\" aria-label=\"weakly supervised object localization permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Weakly-supervised Object Localization</h3>\n<p>The localization ability of the CAM method was put to the test when they were trained on the ILSVRC 2014 benchmark dataset. The CAM technique was used on popular CNN models like AlexNet, VGGNet and GoogLeNet by tweaking their models and fitting a GAP layer (similar to the CAM architecture) towards the end. This modified model was giving astounding results with the GAP layer as compared to their traditional architecture in terms of discriminative localization.</p>\n<h3 id=\"deep-features-for-generic-localization\" style=\"position:relative;\"><a href=\"#deep-features-for-generic-localization\" aria-label=\"deep features for generic localization permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Deep Features for Generic Localization</h3>\n<p>After applying a CAM architecture to fine-grained recognition and pattern discovery (like discovering informative objects in the scenes, concept localization in weakly labelled images, weakly supervised text detector and interpreting visual question answering), we can infer that feature capturing and localization was far more accurate in the CAM based GAP layer architecture, as the complete extent of the features were captured.\nVisualizing Class-specific Units:-\nWhen we use the GAP layer and the ranked softmax weight, we can directly visualize the units, which are the most discriminative for a particular class. Thus, CNN actually learns a bag of words, where each word is a discriminative class-specific unit. A combination of these class-specific units helps to guide CNNs in classifying each image.</p>\n<h3 id=\"conclusion\" style=\"position:relative;\"><a href=\"#conclusion\" aria-label=\"conclusion permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Conclusion:</h3>\n<p>CAMs are a great technique to interpret the information from the CNN models. However, the disadvantage of CAMs is that they can be noisy, and there might be some loss of spatial information. Hence, the Grad-CAM architecture and the Score-CAM architecture were built upon the CAM architecture to improve the accuracy, feature capturing and retain precise spatial information.</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n</style>","frontmatter":{"date":"October 10, 2020","updated_date":null,"description":"Learn about the importance of the explainability of deep learning models and Class Activation Map Technique","title":"Class Activation Mapping in Deep Learning","tags":["Explainable AI","Deep Learning","CNN","Machine Learning"],"pinned":null,"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/899d533ed44a50ce8e5dbf8103a0d717/ee604/Cover.png","srcSet":"/static/899d533ed44a50ce8e5dbf8103a0d717/69585/Cover.png 200w,\n/static/899d533ed44a50ce8e5dbf8103a0d717/497c6/Cover.png 400w,\n/static/899d533ed44a50ce8e5dbf8103a0d717/ee604/Cover.png 800w,\n/static/899d533ed44a50ce8e5dbf8103a0d717/f3583/Cover.png 1200w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Ankit Choraria","github":"Ankit810","avatar":null}}}},{"node":{"excerpt":"What is data enrichment? and its importance Data enrichment is the process of combining first-party data from internal sources with…","fields":{"slug":"/engineering/full-data-science-pipeline-implementation/"},"html":"<h2 id=\"what-is-data-enrichment-and-its-importance\" style=\"position:relative;\"><a href=\"#what-is-data-enrichment-and-its-importance\" aria-label=\"what is data enrichment and its importance permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is data enrichment? and its importance</h2>\n<p>Data enrichment is the process of combining first-party data from internal sources with disparate data from other internal systems or third-party data from external sources.</p>\n<p>Usually, the data available from clients or stakeholders are not enough to solve the given problem statement, like if a client comes with a problem statement to build a recommendation engine for his mutual fund industry, the usual data they have is old purchase data but that's not enough as client behaviour changes with time and is impacted by the present market condition, oil prices, etc. which needs to be incorporated in the model to make it efficient.</p>\n<p>Codes for this tutorial is at <a href=\"https://github.com/LoginRadius/engineering-blog-samples/tree/master/Data_Science/Full_DataScience_Pipeline_Implementation\">Link</a></p>\n<p><strong>The whole process id divided into four steps:</strong></p>\n<p>I have implemented a full pipeline of data science from scrapping data from web to implementing ml and NLP classification.</p>\n<ul>\n<li>Phase I:</li>\n</ul>\n<p>Here I have scraped data from IMDB website (imdb.py)</p>\n<ul>\n<li>Phase II:</li>\n</ul>\n<p>I have tried to implement simple ML regression on the data (ml_imdb.py)</p>\n<ul>\n<li>Phase III:</li>\n</ul>\n<p>I have prepared the data for NLP classification (multilabel_prep.py)</p>\n<ul>\n<li>Phase IV:</li>\n</ul>\n<p>I have implemented multilabel NLP classifier using various techniques like chain classifier etc. (multilabel<em>nlp</em>classifier.ipynb)</p>\n<h2 id=\"what-is-web-scraping\" style=\"position:relative;\"><a href=\"#what-is-web-scraping\" aria-label=\"what is web scraping permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is web scraping?</h2>\n<p>Web scraping is the process of extracting and parsing raw data from the web. Web scraping is a technique which helps data scientist to make their data-rich and is an efficient technique of data collection.</p>\n<p>This world is full of data, but unfortunately, most of them are not in the form to be used. Data is like crude oil, or we say it is in unstructured form. For a data scientist or engineer, our first challenge is to make the data model consumption ready, which takes the majority of the time, and this whole process is collectively known as data preprocessing.</p>\n<p>HTML  is a form of primary markup language and the base framework of mostly all websites. For performing web scraping its necessary to know it</p>\n<p>Here we will start with requesting the web page using python package requests.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk15\">from</span><span class=\"mtk1\"> requests </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> get</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  url = </span><span class=\"mtk8\">&#39;http://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1&#39;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  response = get(url)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk11\">print</span><span class=\"mtk1\">(</span><span class=\"mtk11\">len</span><span class=\"mtk1\">(response.text))</span></span></code></pre>\n<p>The whole web page is now stored in the variable object response.\nThen we parse the web page using beautifulsoup package.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk15\">from</span><span class=\"mtk1\"> bs4 </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> BeautifulSoup</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  html_soup = BeautifulSoup(response.text, </span><span class=\"mtk8\">&#39;html.parser&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk10\">type</span><span class=\"mtk1\">(html_soup)</span></span></code></pre>\n<p>Then I will store all the div with the class named lister-item mode-advanced in variable movie_containers.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">movie_containers = html_soup.find_all(</span><span class=\"mtk8\">&#39;div&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">class_</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;lister-item mode-advanced&#39;</span><span class=\"mtk1\">)</span></span></code></pre>\n<p>Then I iterate through this object and store the information in lists to make my final DataFrame, using simple for loops.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\"># Lists to store the scraped data in</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">names = []</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">years = []</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">imdb_ratings = []</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">metascores = []</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">votes = []</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">#gross=[] #many movies have no record</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">movie_description=[]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">movie_duration=[]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">movie_genre=[]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># Extract data from individual movie container</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">for</span><span class=\"mtk1\"> container </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> movie_containers:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># If the movie has Metascore, then extract:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> container.find(</span><span class=\"mtk8\">&#39;div&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">class_</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;ratings-metascore&#39;</span><span class=\"mtk1\">) </span><span class=\"mtk4\">is</span><span class=\"mtk1\"> </span><span class=\"mtk4\">not</span><span class=\"mtk1\"> </span><span class=\"mtk4\">None</span><span class=\"mtk1\">:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># The name</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        name = container.h3.a.text</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        names.append(name)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># The year</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        year = container.h3.find(</span><span class=\"mtk8\">&#39;span&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">class_</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;lister-item-year&#39;</span><span class=\"mtk1\">).text</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        years.append(year)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># The IMDB rating</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        imdb = </span><span class=\"mtk10\">float</span><span class=\"mtk1\">(container.strong.text)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        imdb_ratings.append(imdb)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># The Metascore</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        m_score = container.find(</span><span class=\"mtk8\">&#39;span&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">class_</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;metascore&#39;</span><span class=\"mtk1\">).text</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        metascores.append(</span><span class=\"mtk10\">int</span><span class=\"mtk1\">(m_score))</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># The number of votes</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        vote = container.find(</span><span class=\"mtk8\">&#39;span&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">attrs</span><span class=\"mtk1\"> = {</span><span class=\"mtk8\">&#39;name&#39;</span><span class=\"mtk1\">:</span><span class=\"mtk8\">&#39;nv&#39;</span><span class=\"mtk1\">})[</span><span class=\"mtk8\">&#39;data-value&#39;</span><span class=\"mtk1\">]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        votes.append(</span><span class=\"mtk10\">int</span><span class=\"mtk1\">(vote))</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># Gross income of movie</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk3\">#gross_inc =container.find_all(&#39;span&#39;, attrs = {&#39;name&#39;:&#39;nv&#39;})[1][&#39;data-value&#39;]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk3\">#gross.append(gross_inc)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># movie description</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        movie_desc=container.find_all(</span><span class=\"mtk8\">&#39;p&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">class_</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;text-muted&#39;</span><span class=\"mtk1\">)[</span><span class=\"mtk7\">1</span><span class=\"mtk1\">].text</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        movie_description.append(movie_desc)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        movie_det=container.find_all(</span><span class=\"mtk8\">&#39;p&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">class_</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;text-muted&#39;</span><span class=\"mtk1\">)[</span><span class=\"mtk7\">0</span><span class=\"mtk1\">]</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># Movie duration</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        movie_dur=movie_det.find(</span><span class=\"mtk8\">&#39;span&#39;</span><span class=\"mtk1\">,</span><span class=\"mtk12\">class_</span><span class=\"mtk1\">=</span><span class=\"mtk8\">&#39;runtime&#39;</span><span class=\"mtk1\">).text</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        movie_duration.append(movie_dur)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># Movie genre</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        movie_gen=movie_det.find(</span><span class=\"mtk8\">&#39;span&#39;</span><span class=\"mtk1\">,</span><span class=\"mtk12\">class_</span><span class=\"mtk1\">=</span><span class=\"mtk8\">&#39;genre&#39;</span><span class=\"mtk1\">).text</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        movie_genre.append(movie_gen)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">import</span><span class=\"mtk1\"> pandas </span><span class=\"mtk15\">as</span><span class=\"mtk1\"> pd</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">one_df = pd.DataFrame({</span><span class=\"mtk8\">&#39;movie&#39;</span><span class=\"mtk1\">: names,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk8\">&#39;year&#39;</span><span class=\"mtk1\">: years,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk8\">&#39;imdb&#39;</span><span class=\"mtk1\">: imdb_ratings,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk8\">&#39;metascore&#39;</span><span class=\"mtk1\">: metascores,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk8\">&#39;votes&#39;</span><span class=\"mtk1\">: votes,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">#&#39;gross&#39;:gross,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk8\">&#39;movie decription&#39;</span><span class=\"mtk1\">:movie_description,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk8\">&#39;movie duration&#39;</span><span class=\"mtk1\">:movie_duration,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk8\">&#39;movie genre&#39;</span><span class=\"mtk1\">:movie_genre</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">})</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk11\">print</span><span class=\"mtk1\">(one_df.info())</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">one_df.to_csv(</span><span class=\"mtk8\">&#39;50_movie_details.csv&#39;</span><span class=\"mtk1\">)</span></span></code></pre>\n<p>But this was only for one page which has data for 50 movies only which is not enough to build a model.</p>\n<p>Please refer my code to understand how I use simple for loops to iterate through all the movies and downloading data for 20 years(approx).</p>\n<h2 id=\"implementing-simple-linear-algorithms-in-numerical-data-we-just-scrapped\" style=\"position:relative;\"><a href=\"#implementing-simple-linear-algorithms-in-numerical-data-we-just-scrapped\" aria-label=\"implementing simple linear algorithms in numerical data we just scrapped permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Implementing simple linear algorithms in numerical data we just scrapped</h2>\n<p>Whats is linear regression??</p>\n<p>It is one of the most popular and used statistical techniques\n• Used to understand the relationship between variables</p>\n<p>  • Can also be used to predict a value of interest for new observations</p>\n<p>  • The aim is to predict the value of a continuous numeric variable of interest (known as the response or dependent or target variable)</p>\n<p>  • The values of one or more predictor (or independent) variables are used to make the prediction</p>\n<p>  • One predictor = simple regression</p>\n<p>  • More predictors = multiple regression</p>\n<p>Here I just tried to use metascore of movies firstly to predict IMDB ratings and secondly I wanted to enhance it by using metascore and votes to predict IMDB rating. </p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\">## ML model</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">X = data.loc[:, </span><span class=\"mtk8\">&#39;metascore&#39;</span><span class=\"mtk1\">].values</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">y = data.loc[:, </span><span class=\"mtk8\">&#39;imdb&#39;</span><span class=\"mtk1\">].values</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># Splitting the dataset into the Training set and Test set</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.cross_validation </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> train_test_split</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">X_train, X_test, y_train, y_test = train_test_split(X, y, </span><span class=\"mtk12\">test_size</span><span class=\"mtk1\"> = </span><span class=\"mtk7\">0.33</span><span class=\"mtk1\">, </span><span class=\"mtk12\">random_state</span><span class=\"mtk1\"> = </span><span class=\"mtk7\">0</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.linear_model </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> LinearRegression</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">regressor = LinearRegression()</span><span class=\"mtk3\">#making object for reg package</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">regressor.fit(X_train.reshape(-</span><span class=\"mtk7\">1</span><span class=\"mtk1\">,</span><span class=\"mtk7\">1</span><span class=\"mtk1\">), y_train.reshape(-</span><span class=\"mtk7\">1</span><span class=\"mtk1\">,</span><span class=\"mtk7\">1</span><span class=\"mtk1\">))</span><span class=\"mtk3\">#to fit the regressor to our training data</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">#predict the test results</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">y_pred =regressor.predict(X_test.reshape(-</span><span class=\"mtk7\">1</span><span class=\"mtk1\">,</span><span class=\"mtk7\">1</span><span class=\"mtk1\">))</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.metrics </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> mean_squared_error</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">mean_squared_error(y_test, y_pred)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># 0.18041462828221905</span></span></code></pre>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"5\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\">## Let try with imdb and votes</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">X1 = data.loc[:, [</span><span class=\"mtk8\">&#39;metascore&#39;</span><span class=\"mtk1\">,</span><span class=\"mtk8\">&#39;votes&#39;</span><span class=\"mtk1\">]].values</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">y1 = data.loc[:, </span><span class=\"mtk8\">&#39;imdb&#39;</span><span class=\"mtk1\">].values</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># Splitting the dataset into the Training set and Test set</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.cross_validation </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> train_test_split</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">X_train, X_test, y_train, y_test = train_test_split(X1, y1, </span><span class=\"mtk12\">test_size</span><span class=\"mtk1\"> = </span><span class=\"mtk7\">0.33</span><span class=\"mtk1\">, </span><span class=\"mtk12\">random_state</span><span class=\"mtk1\"> = </span><span class=\"mtk7\">0</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.linear_model </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> LinearRegression</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">regressor = LinearRegression()</span><span class=\"mtk3\">#making object for reg package</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">regressor.fit(X_train, y_train)</span><span class=\"mtk3\">#to fit the regressor to our training data</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">#predict the test results</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">y_pred =regressor.predict(X_test)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.metrics </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> mean_squared_error</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">mean_squared_error(y_test, y_pred)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># 0.15729132122310804 good score</span></span></code></pre>\n<p>I tried to scrape data from the IMDB site and then applied ML regression techniques on it. Later I found that the movies listed are multi-class like Logan belongs to Action, Drama, Sci-Fi, which led me to think about how to implement the classifier model in the multilabel data. Usually, the data we get in real-world is mostly multi labelled like chatbot data; the intent is many and like these movies which are multi-class.</p>\n<p>Here we will first see how we prep our data for multilabel classification.</p>\n<p>Here we have all tags in one single column which is not usable while we do classification, so we have to make separate columns for all labels, and if the row doesn't belong to that category, it will be filled by 0 else 1.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"6\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk15\">import</span><span class=\"mtk1\"> os</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">os.chdir(</span><span class=\"mtk8\">&#39;Desktop/web_scraping/imdb scrapper_ml/&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">import</span><span class=\"mtk1\"> pandas </span><span class=\"mtk15\">as</span><span class=\"mtk1\"> pd</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">data=pd.read_csv(</span><span class=\"mtk8\">&#39;multilabel_nlp_classification.csv&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">movie_list=[x </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> x </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> data[</span><span class=\"mtk8\">&#39;movie genre&#39;</span><span class=\"mtk1\">]]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">movie_list1=</span><span class=\"mtk8\">&#39;&#39;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">for</span><span class=\"mtk1\"> x </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> data[</span><span class=\"mtk8\">&#39;movie genre&#39;</span><span class=\"mtk1\">]:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    movie_list1+=</span><span class=\"mtk8\">&#39;,&#39;</span><span class=\"mtk1\">+x</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">li_m=movie_list1.split(</span><span class=\"mtk8\">&#39;,&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">li=[x.strip() </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> x </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> li_m]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">list_s=</span><span class=\"mtk10\">list</span><span class=\"mtk1\">(</span><span class=\"mtk10\">set</span><span class=\"mtk1\">(li))</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">for</span><span class=\"mtk1\"> x </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> list_s:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    data[x]=</span><span class=\"mtk7\">0</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">data[</span><span class=\"mtk8\">&#39;movie_genre&#39;</span><span class=\"mtk1\">]=[x.strip().split(</span><span class=\"mtk8\">&#39;,&#39;</span><span class=\"mtk1\">) </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> x </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> data[</span><span class=\"mtk8\">&#39;movie genre&#39;</span><span class=\"mtk1\">]]</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">de=data.copy()</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">#data.loc[0,&#39;Action&#39;]=1</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">de[</span><span class=\"mtk8\">&#39;id&#39;</span><span class=\"mtk1\">]=</span><span class=\"mtk11\">range</span><span class=\"mtk1\">(</span><span class=\"mtk7\">0</span><span class=\"mtk1\">,</span><span class=\"mtk7\">6116</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">#print(de.loc[de[&#39;id&#39;]==0,&#39;Action&#39;])</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">for</span><span class=\"mtk1\"> i </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> </span><span class=\"mtk11\">range</span><span class=\"mtk1\">(</span><span class=\"mtk7\">0</span><span class=\"mtk1\">,</span><span class=\"mtk7\">6116</span><span class=\"mtk1\">):</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> x </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> de.loc[de[</span><span class=\"mtk8\">&#39;id&#39;</span><span class=\"mtk1\">]==i,</span><span class=\"mtk8\">&#39;movie_genre&#39;</span><span class=\"mtk1\">]:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> y </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> x:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            y=y.strip()</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            de.loc[de[</span><span class=\"mtk8\">&#39;id&#39;</span><span class=\"mtk1\">]==i,y]=</span><span class=\"mtk7\">1</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">de.to_csv(</span><span class=\"mtk8\">&#39;multilabel_nlp_classification.csv&#39;</span><span class=\"mtk1\">)</span></span></code></pre>\n<p>Now, as our data is ready, we can start with NLP implementation.</p>\n<p>For multilabel classification, I used techniques like classifier chain, label powerset, etc.</p>\n<p>Here the problem statement is that using the movie description our model has to guess which genre the movie belongs to. It is a popular use case. Take an example of ecommerce product description data; now instead of manually assigning the labels to it, we can use a model which will find relevant labels or genre for it and make the content relevant to the type it belongs.</p>\n<p>I start with Exploratory data analysis and then data cleaning, which is the most crucial step as if all the description has some very 30-50 common words it will simply make the data-heavy and model slow and inefficient.</p>\n<p>Then we go on to make the data model ready as ML models don't understand text data we have to feed numbers in it. For that purpose, we use TfidfVectorizer.</p>\n<h3 id=\"what-is-tfidfvectorizer\" style=\"position:relative;\"><a href=\"#what-is-tfidfvectorizer\" aria-label=\"what is tfidfvectorizer permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is TfidfVectorizer?</h3>\n<p>TfidfVectorizer - Transforms text to feature vectors that can be used as input to the estimator.</p>\n<p>Then simply diving the data in train and test split. </p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"7\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">x_train = vectorizer.transform(train_text)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">y_train = train.drop(</span><span class=\"mtk12\">labels</span><span class=\"mtk1\"> = [</span><span class=\"mtk8\">&#39;id&#39;</span><span class=\"mtk1\">,</span><span class=\"mtk8\">&#39;movie decription&#39;</span><span class=\"mtk1\">], </span><span class=\"mtk12\">axis</span><span class=\"mtk1\">=</span><span class=\"mtk7\">1</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">x_test = vectorizer.transform(test_text)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">y_test = test.drop(</span><span class=\"mtk12\">labels</span><span class=\"mtk1\"> = [</span><span class=\"mtk8\">&#39;id&#39;</span><span class=\"mtk1\">,</span><span class=\"mtk8\">&#39;movie decription&#39;</span><span class=\"mtk1\">], </span><span class=\"mtk12\">axis</span><span class=\"mtk1\">=</span><span class=\"mtk7\">1</span><span class=\"mtk1\">)</span></span></code></pre>\n<p>I tried first with applying logistic regression and one vs rest classifier.</p>\n<h3 id=\"what-is-onevsrestclassifier\" style=\"position:relative;\"><a href=\"#what-is-onevsrestclassifier\" aria-label=\"what is onevsrestclassifier permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is OneVsRestClassifier??</h3>\n<p>OneVsRestClassifier strategy splits a multi-class classification into one binary classification problem per class.\nOneVsRestClassifier is when we want to do multi-class or multilabel classification, and its strategy consists of fitting one classifier per class. For each classifier, the class is fitted against all the other classes. </p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"8\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\"># Using pipeline for applying logistic regression and one vs rest classifier</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">LogReg_pipeline = Pipeline([</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                (</span><span class=\"mtk8\">&#39;clf&#39;</span><span class=\"mtk1\">, OneVsRestClassifier(LogisticRegression(</span><span class=\"mtk12\">solver</span><span class=\"mtk1\">=</span><span class=\"mtk8\">&#39;sag&#39;</span><span class=\"mtk1\">), </span><span class=\"mtk12\">n_jobs</span><span class=\"mtk1\">=-</span><span class=\"mtk7\">1</span><span class=\"mtk1\">)),</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            ])</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">for</span><span class=\"mtk1\"> category </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> categories:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    printmd(</span><span class=\"mtk8\">&#39;**Processing </span><span class=\"mtk4\">{}</span><span class=\"mtk8\"> comments...**&#39;</span><span class=\"mtk1\">.format(category))</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk3\"># Training logistic regression model on train data</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    LogReg_pipeline.fit(x_train, train[category])</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk3\"># calculating test accuracy</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    prediction = LogReg_pipeline.predict(x_test)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk11\">print</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;Test accuracy is </span><span class=\"mtk4\">{}</span><span class=\"mtk8\">&#39;</span><span class=\"mtk1\">.format(accuracy_score(test[category], prediction)))</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk11\">print</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;</span><span class=\"mtk6\">\\n</span><span class=\"mtk8\">&quot;</span><span class=\"mtk1\">)</span></span></code></pre>\n<p>Next, I tried with BinaryRelevance</p>\n<h3 id=\"what-is-binaryrelevance\" style=\"position:relative;\"><a href=\"#what-is-binaryrelevance\" aria-label=\"what is binaryrelevance permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is BinaryRelevance?</h3>\n<p>It is a simple technique which treats each label as a separate single class classification problem.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"9\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\"># using binary relevance</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> skmultilearn.problem_transform </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> BinaryRelevance</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.naive_bayes </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> GaussianNB</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># initialize binary relevance multi-label classifier</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># with a gaussian naive bayes base classifier</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">classifier = BinaryRelevance(GaussianNB())</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># train</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">classifier.fit(x_train, y_train)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># predict</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">predictions = classifier.predict(x_test)</span></span></code></pre>\n<p>Next, I tried using ClassifierChain.</p>\n<h3 id=\"what-is-classifierchain\" style=\"position:relative;\"><a href=\"#what-is-classifierchain\" aria-label=\"what is classifierchain permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is ClassifierChain?</h3>\n<p>It is almost similar to BinaryRelevance, here the first classifier is trained just on the input data, and then each next classifier is trained on the input space and all the previous classifiers in the chain.  </p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"10\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> skmultilearn.problem_transform </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> ClassifierChain</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> sklearn.linear_model </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> LogisticRegression</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># initialize classifier chains multi-label classifier</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">classifier = ClassifierChain(LogisticRegression())</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># Training logistic regression model on train data</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">classifier.fit(x_train, y_train)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># predict</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">predictions = classifier.predict(x_test)</span></span></code></pre>\n<p>Next, I tried using Label Powerset.</p>\n<h3 id=\"what-is-labelpowerset\" style=\"position:relative;\"><a href=\"#what-is-labelpowerset\" aria-label=\"what is labelpowerset permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is LabelPowerset?</h3>\n<p>Here we transform the problem into a multi-class problem with one multi-class classifier is trained on all unique label combinations found in the training data.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"11\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> skmultilearn.problem_transform </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> LabelPowerset</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># initialize label powerset multi-label classifier</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">classifier = LabelPowerset(LogisticRegression())</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># train</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">classifier.fit(x_train, y_train)</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\"># predict</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">predictions = classifier.predict(x_test)</span></span></code></pre>\n<p>Please refer my notebook multilabel<em>nlp</em>classifier.ipynb from my repo for more details.</p>\n<h2 id=\"improvement\" style=\"position:relative;\"><a href=\"#improvement\" aria-label=\"improvement permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Improvement:</h2>\n<ol>\n<li>More feature engineering and data to avoid this overfitting and make more efficient pipeline</li>\n<li>If we collect more data, deep learning and state of the art algorithms like BERT can help us to leverage the efficiency of the model.</li>\n</ol>\n<h2 id=\"summary\" style=\"position:relative;\"><a href=\"#summary\" aria-label=\"summary permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Summary:</h2>\n<ul>\n<li>We have learnt how to collect data by web scraping and tools to perform the same.</li>\n<li>We completed the modelling techniques on in numerical data</li>\n<li>We prepared the label data to be model fed ready</li>\n<li>We learnt how different ML techniques could be applied to text data and build a multilabel classifier.</li>\n</ul>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk10 { color: #4EC9B0; }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n  .dark-default-dark .mtk7 { color: #B5CEA8; }\n  .dark-default-dark .mtk6 { color: #D7BA7D; }\n</style>","frontmatter":{"date":"October 09, 2020","updated_date":null,"description":"Learn how to implement the full data science pipeline right from collecting the data to implementing ML algorithms.","title":" Full data science pipeline implementation","tags":["DataScience","Python","Web scraping","NLP","Machine learning"],"pinned":null,"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/ba7dbfec4d0d37cb83ec0cf3ba35fbe6/14b42/ds.jpg","srcSet":"/static/ba7dbfec4d0d37cb83ec0cf3ba35fbe6/f836f/ds.jpg 200w,\n/static/ba7dbfec4d0d37cb83ec0cf3ba35fbe6/2244e/ds.jpg 400w,\n/static/ba7dbfec4d0d37cb83ec0cf3ba35fbe6/14b42/ds.jpg 800w,\n/static/ba7dbfec4d0d37cb83ec0cf3ba35fbe6/7811e/ds.jpg 1125w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Rinki Nag","github":"eaglewarrior","avatar":null}}}},{"node":{"excerpt":"The two types of email you can send and receive are plain text emails (any email that contains just plain old text with no formatting) and…","fields":{"slug":"/engineering/html-email-concept/"},"html":"<p>The two types of email you can send and receive are plain text emails (any email that contains just plain old text with no formatting) and HTML emails, these are formatted and styled using HTML and inline CSS.\nHTML email is the use of HTML to provide formatting and semantic markup capabilities in an email that are not available in plain text.</p>\n<p>An HTML email is designed just like a website with the help of graphics, table columns, colors and links. A non-programmer can also create it since email marketing services provide flexible campaign builders. Email client vendors have not been as developing as web browser vendors in adopting new standards. </p>\n<p><strong>Definition</strong></p>\n<p>Emails which are formatted using Hypertext Markup Language(HTML), as opposed to plain text email.</p>\n<p><strong>How to Create an HTML Email</strong></p>\n<p>Many tools that create and send an email will offer pre-formatted, already built HTML templates that allow you to design emails without knowing or accessing any code of back-end.</p>\n<p>The best way to understand any process is to do it yourself, from level zero. We make any changes in the email editor; those changes will be automatically coded into the final result. This email building tool is the best option if you don't have an email designer, but you still want to send any professional marketing emails.</p>\n<p>If you want more control over the code of your emails and you are comfortable with HTML(that is just basic and easy), most email tools will allow you to import HTML files directly for using it as custom email templates. They have a wide variety of free HTML email templates available on the internet, and if you are familiar with HTML, it is a straightforward process to use that template in the email building tool of your own choice.</p>\n<p>To create an HTML email from scratch, you will need to have advanced knowledge of HTML. Because creating an HTML email from scratch can be quite tricky, we recommend you to work with a developer for this process, or you may go with a template for an easy process.</p>\n<p><strong>If you choose to code your HTML email by hand, these are the necessary steps you need to use while creating HTML email:</strong></p>\n<ol>\n<li>The perfect email template size should have 600-700 max-width.</li>\n<li>If the design has animation, then use .gif animated file because interactive elements like Flash, JavaScript, or HTML forms won't work in most email inboxes.</li>\n<li>Try to use HTML tables (HTML tables present tabular data in a semantic and structurally appropriate manner) for your presentation.</li>\n<li>To improve the presentation of Web, use inline CSS within your HTML email.</li>\n<li>CSS style should be either in a separate CSS file or below the body tag and not under the head tag.</li>\n<li>To save yourself from trouble, avoid the use of CSS shorthand code.</li>\n<li>The most genuine way of coding background colors is to use six-digit hexadecimal code for color (like #000000, i.e. for black).</li>\n<li>Be sure always to use \"display: block;\" for your image tags (either inline or embedded CSS) because this takes the baseline out of the equation and keeps everything arranged neatly and in order.</li>\n<li>If you wish to have padding on columns, it might be more cross-browser, so you can always create spacer DIVs in between the columns (or between rows).</li>\n<li>You need to use absolute paths for your images.</li>\n<li>Try adding a line-height and font-size of 1 under \"<TD>\" (or the desired size).</li>\n<li>Inline styles to <TD> and tables are the right way to go for Html email.</li>\n<li>In an HTML table, you can set the cell padding and cell spacing to zero to eliminate the unwanted spacing in your layout.</li>\n</ol>\n<p><strong>How to send  HTML emails through Outlook?</strong></p>\n<ol>\n<li>Select more commands to customize your quick access toolbar(suggestion).</li>\n<li>Choose \"attach\" function and then \"add\" it to the toolbar.</li>\n<li>Open \"attach a file\" window from the quick access toolbar.</li>\n<li>Select HTML file you need to import BUT do not click to INSERT yet.</li>\n<li>Switch \"insert\" button with the \"insert as a text\" button and click.</li>\n<li>Now, you can send it to your audience.</li>\n</ol>\n<p><strong>You can check HTML Email Template here:</strong></p>\n<p><a href=\"https://github.com/designmodo/html-email-templates\">HTML Email Templates</a></p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n</style>","frontmatter":{"date":"October 09, 2020","updated_date":null,"description":"If you choose to code your HTML email by hand, there are many different things you need to use while creating HTML email.","title":"HTML Email Concept","tags":["Html","Email"],"pinned":null,"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.7699115044247788,"src":"/static/68cacf6be60ee3e735f59ee70958946b/ee604/email_picture.png","srcSet":"/static/68cacf6be60ee3e735f59ee70958946b/69585/email_picture.png 200w,\n/static/68cacf6be60ee3e735f59ee70958946b/497c6/email_picture.png 400w,\n/static/68cacf6be60ee3e735f59ee70958946b/ee604/email_picture.png 800w,\n/static/68cacf6be60ee3e735f59ee70958946b/05d05/email_picture.png 1080w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Nivedita Singh","github":"Nivedita967","avatar":null}}}}]},"markdownRemark":{"excerpt":"Identity is evolving, and developers are at the forefront of this transformation. Every day brings a new learning—adapting to new standards…","fields":{"slug":"/identity/developer-first-identity-provider-loginradius/"},"html":"<p>Identity is evolving, and developers are at the forefront of this transformation. Every day brings a new learning—adapting to new standards and refining approaches to building secure, seamless experiences.</p>\n<p>We’re here to support developers on that journey. We know how important simplicity, efficiency, and well-structured documentation are when working with identity and access management solutions. That’s why we’ve redesigned the <a href=\"https://www.loginradius.com/\">LoginRadius website</a>—to be faster, more intuitive, and developer-first in every way.</p>\n<p>The goal? Having them spend less time searching and more time building.</p>\n<h2 id=\"whats-new-and-improved-on-the-loginradius-website\" style=\"position:relative;\"><a href=\"#whats-new-and-improved-on-the-loginradius-website\" aria-label=\"whats new and improved on the loginradius website permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What’s New and Improved on the LoginRadius Website?</h2>\n<p>LoginRadius’ vision is to give developers a product that simplifies identity management so they can focus on building, deploying, and scaling their applications. To enhance this experience, we’ve spent the last few months redesigning our interface— making navigation more intuitive and reassuring that essential resources are easily accessible.</p>\n<p>Here’s a closer look at what’s new and why it’s important:</p>\n<h3 id=\"a-developer-friendly-dark-theme\" style=\"position:relative;\"><a href=\"#a-developer-friendly-dark-theme\" aria-label=\"a developer friendly dark theme permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>A Developer-Friendly Dark Theme</h3>\n<p><img src=\"/f46881583c7518a93bb24e94c32320de/a-developer-friendly-dark-theme.webp\" alt=\"This image shows how LoginRadius offers several authentication methods like traditional login, social login, passwordless login, passkeys and more in a dark mode.\">    </p>\n<p>Developers spend long hours working in dark-themed IDEs and terminals, so we’ve designed the LoginRadius experience to be developer-friendly and align with that preference.</p>\n<p>The new dark mode reduces eye strain, enhances readability, and provides a seamless transition between a coding environment and our platform. Our new design features a clean, modern aesthetic with a consistent color scheme and Barlow typography, ensuring better readability. High-quality graphics and icons are thoughtfully placed to enhance the content without adding visual clutter.</p>\n<p>So, whether you’re navigating our API docs or configuring authentication into your system, our improved interface will make those extended development hours more comfortable and efficient.</p>\n<h3 id=\"clear-categorization-for-loginradius-capabilities\" style=\"position:relative;\"><a href=\"#clear-categorization-for-loginradius-capabilities\" aria-label=\"clear categorization for loginradius capabilities permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Clear Categorization for LoginRadius Capabilities</h3>\n<p><img src=\"/e5358b82be414940f3fb146013845933/capabilities.webp\" alt=\"This image shows a breakdown of all the LoginRadius CIAM capabilities, including authentication, security, UX, scalability and multi-brand management.\"></p>\n<p>We’ve restructured our website to provide a straightforward breakdown of our customer identity and access management platform capabilities, helping you quickly find what you need:</p>\n<ul>\n<li>Authentication: Easily understand <a href=\"https://www.loginradius.com/blog/identity/authentication-option-for-your-product/\">how to choose the right login method</a>, from traditional passwords and OTPs to social login, federated SSO, and passkeys with few lines of code.</li>\n<li>Security: Implement no-code security features like bot detection, IP throttling, breached password alerts, DDoS protection, and adaptive MFA to safeguard user accounts.</li>\n<li>User Experience: Leverage AI builder, hosted pages, and drag-and-drop workflows to create smooth, branded sign-up and login experiences.</li>\n<li>High Performance &#x26; Scalability: Confidently scale with sub-100ms API response times, 100% uptime, 240K+ RPS, and 28+ global data center regions.</li>\n<li>Multi-Brand Management: Efficiently manage multiple identity apps, choosing isolated or shared data stores based on your brand’s unique needs.</li>\n</ul>\n<p>This structured layout ensures you can quickly understand each capability and how it integrates into your identity ecosystem.</p>\n<h3 id=\"developer-first-navigation\" style=\"position:relative;\"><a href=\"#developer-first-navigation\" aria-label=\"developer first navigation permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Developer-First Navigation</h3>\n<p><img src=\"/a8c155c2b6faf3d5f4b4de4e2b14d763/developers-menu.webp\" alt=\"This image shows the LoginRadius menu bar, highlighting the developer dropdown.\">   </p>\n<p>We’ve been analyzing developer workflows to identify how you access key resources. That’s why we redesigned our navigation with one goal in mind: to reduce clicks and make essential resources readily available.</p>\n<p>The new LoginRadius structure puts APIs, SDKs, and integration guides right at the menu bar under the Developers dropdown so you can get started faster. Our Products, Solutions, and Customer Services are also clearly categorized, helping development teams quickly find the right tools and make informed decisions.</p>\n<h3 id=\"quick-understanding-of-integration-benefits\" style=\"position:relative;\"><a href=\"#quick-understanding-of-integration-benefits\" aria-label=\"quick understanding of integration benefits permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Quick Understanding of Integration Benefits</h3>\n<p><img src=\"/b2f9a964a2da0ea83e2f8596b833bba7/we-support-your-tech-stack.webp\" alt=\"This image shows a list of popular programming languages and frameworks offered by LoginRadius.\"></p>\n<p>Developers now have a clear view of the tech stack available with LoginRadius, designed to support diverse business needs.</p>\n<p>Our platform offers pre-built SDKs for Node.js, Python, Java, and more, making CIAM integration seamless across popular programming languages and frameworks.</p>\n<h2 id=\"over-to-you-now\" style=\"position:relative;\"><a href=\"#over-to-you-now\" aria-label=\"over to you now permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Over to You Now!</h2>\n<p>Check out our <a href=\"https://www.loginradius.com/\">revamped LoginRadius website</a> and see how the improved experience makes it easier to build, scale, and secure your applications.</p>\n<p>Do not forget to explore the improved navigation and API documentation, and get started with our free trial today. We’re excited to see what you’ll build with LoginRadius!</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n</style>","frontmatter":{"date":"February 21, 2025","updated_date":null,"description":"LoginRadius’ vision is to give developers a product that simplifies identity management so they can focus on building, deploying, and scaling their applications. To enhance this experience, we’ve redesigned our website interface, making navigation more intuitive and reassuring that essential resources are easily accessible.","title":"Revamped & Ready: Introducing the New Developer-First LoginRadius Website","tags":["Developer tools","API","Identity Management","User Authentication"],"pinned":true,"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.7857142857142858,"src":"/static/80b4e4fbe176a10a327d273504607f32/58556/hero-section.webp","srcSet":"/static/80b4e4fbe176a10a327d273504607f32/61e93/hero-section.webp 200w,\n/static/80b4e4fbe176a10a327d273504607f32/1f5c5/hero-section.webp 400w,\n/static/80b4e4fbe176a10a327d273504607f32/58556/hero-section.webp 800w,\n/static/80b4e4fbe176a10a327d273504607f32/99238/hero-section.webp 1200w,\n/static/80b4e4fbe176a10a327d273504607f32/7c22d/hero-section.webp 1600w,\n/static/80b4e4fbe176a10a327d273504607f32/1258b/hero-section.webp 2732w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Rakesh Soni","github":"oyesoni","avatar":"rakesh-soni.jpg"}}}},"pageContext":{"limit":6,"skip":690,"currentPage":116,"type":"///","numPages":161,"pinned":"ee8a4479-3471-53b1-bf62-d0d8dc3faaeb"}},"staticQueryHashes":["1171199041","1384082988","2100481360","23180105","528864852"]}