{"componentChunkName":"component---src-templates-tag-js","path":"/tags/streaming/","result":{"data":{"site":{"siteMetadata":{"title":"LoginRadius Blog"}},"allMarkdownRemark":{"totalCount":2,"edges":[{"node":{"fields":{"slug":"/engineering/guest-post/http-streaming-with-nodejs-and-fetch-api/"},"html":"<p>When your webapp has a large amount of data to visualize, you don't want your users to wait 10 seconds before seeing something.</p>\n<p>One technique that is often overlooked is HTTP streaming. It's broadly supported, works well, and doesn't require fancy libraries.</p>\n<p>We're going to go through how we can use HTTP streaming in our applications and what to consider when we do so.</p>\n<h2 id=\"introduction\" style=\"position:relative;\"><a href=\"#introduction\" aria-label=\"introduction permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Introduction</h2>\n<p>When building web applications, we typically have a REST API with GET endpoints that do something like this:</p>\n<ol>\n<li>Parse the request (URL, query params, etc..)</li>\n<li>Query data from a database</li>\n<li>Convert the database results into JSON</li>\n<li>Send the JSON response back</li>\n</ol>\n<p>The API will typically wait for each step to complete before going on to the next one, and by step 4, the database result and the JSON objects are all in memory before the request is handled, and everything can be cleaned up.</p>\n<p>This works, and there is nothing wrong with it (KISS, right?) as long as your database query results are small and quickly available.</p>\n<p>But let's say you want to render a chart with 10k data points. Querying will not be as smooth anymore.\nThe simple way will work, but ideally, you don't want to accumulate all the data in memory before sending the response.</p>\n<p>With HTTP streaming, you can start rendering the chart even before your query is complete.</p>\n<p>To make it happen:</p>\n<ol>\n<li>Your API should use HTTP streaming to send its response.</li>\n<li>Your webapp should use the <a href=\"https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API\">Fetch API</a> to make the request so that it can process the streaming response.</li>\n</ol>\n<h2 id=\"create-a-streaming-api\" style=\"position:relative;\"><a href=\"#create-a-streaming-api\" aria-label=\"create a streaming api permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Create a Streaming API</h2>\n<p>In this example, we use <a href=\"https://koajs.com/\">Koa</a> for the API, but you can use other libraries like Express or plain Node.js. Most will have support for streaming.</p>\n<p>Let's create an API:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"javascript\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">Koa</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">require</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;koa&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">app</span><span class=\"mtk1\"> = </span><span class=\"mtk4\">new</span><span class=\"mtk1\"> </span><span class=\"mtk10\">Koa</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">use</span><span class=\"mtk1\">(</span><span class=\"mtk4\">async</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">.</span><span class=\"mtk12\">request</span><span class=\"mtk1\">.</span><span class=\"mtk12\">url</span><span class=\"mtk1\"> === </span><span class=\"mtk8\">&#39;/measurements.json&#39;</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">.</span><span class=\"mtk12\">response</span><span class=\"mtk1\">.</span><span class=\"mtk11\">set</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;content-type&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk8\">&#39;application/json&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk3\">// This is where the magic happens: set a stream as the response body</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">.</span><span class=\"mtk12\">body</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">fs</span><span class=\"mtk1\">.</span><span class=\"mtk11\">createReadStream</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;./measurements.json&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">});</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">http</span><span class=\"mtk1\">.</span><span class=\"mtk11\">createServer</span><span class=\"mtk1\">(</span><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">callback</span><span class=\"mtk1\">()).</span><span class=\"mtk11\">listen</span><span class=\"mtk1\">(</span><span class=\"mtk7\">3000</span><span class=\"mtk1\">);</span></span></code></pre>\n<p>This code creates an API with 1 endpoint <code>GET /measurements</code> that will respond with the contents of a JSON file with 10k measurements.</p>\n<p>The file looks like this:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"json\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">[</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;1&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:39:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">239.34</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;2&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:40:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">820.14</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;3&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:41:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">926.03</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;4&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:42:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">513.01</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  </span><span class=\"mtk3\">// ...</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;99998&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-03-29T21:16:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">13.81</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;99999&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;2022-03-29T21:17:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk7\">465.28</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;100000&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-03-29T21:18:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">71.95</span><span class=\"mtk1\"> }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">]</span></span></code></pre>\n<p>We're creating an HTTP/1.1 server with Node's <code>http</code> module. By setting <code>ctx.body</code> to a stream, data will be sent to the requester as soon as it is loaded using a mechanism called <a href=\"https://en.wikipedia.org/wiki/Chunked_transfer_encoding\">chunked transfer encoding</a>.</p>\n<p>This saves time and memory of your API because it doesn't have to accumulate the whole result in memory before sending the response.</p>\n<p>In a real application, you're probably using a database instead of a pre-created JSON file.\nIf you're using MongoDB, you can create a stream with the <a href=\"https://docs.mongodb.com/drivers/node/current/fundamentals/crud/read-operations/cursor/#stream-api\">cursor's <code>stream()</code> method</a>.</p>\n<h2 id=\"consume-a-streaming-api-from-a-webapp\" style=\"position:relative;\"><a href=\"#consume-a-streaming-api-from-a-webapp\" aria-label=\"consume a streaming api from a webapp permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Consume a Streaming API From a Webapp</h2>\n<p>The <code>GET /measurements</code> endpoint we created can be consumed with any HTTP client, but you have to use the <a href=\"https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams#consuming_a_fetch_as_a_stream\">Fetch API</a> to take advantage of streaming.</p>\n<p><code>fetch()</code> will set <code>response.body</code> to a <a href=\"https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream\">ReadableStream</a> for streaming responses.</p>\n<p>Here's how you can read a streaming response:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"javascript\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk11\">fetch</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;http://localhost:3000/measurements.json&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    .</span><span class=\"mtk11\">then</span><span class=\"mtk1\">(</span><span class=\"mtk4\">async</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">response</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk3\">// response.body is a ReadableStream</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">reader</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">response</span><span class=\"mtk1\">.</span><span class=\"mtk12\">body</span><span class=\"mtk1\">.</span><span class=\"mtk11\">getReader</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> (</span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">chunk</span><span class=\"mtk1\"> </span><span class=\"mtk4\">of</span><span class=\"mtk1\"> </span><span class=\"mtk11\">readChunks</span><span class=\"mtk1\">(</span><span class=\"mtk12\">reader</span><span class=\"mtk1\">)) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk8\">`received chunk of size </span><span class=\"mtk4\">${</span><span class=\"mtk12\">chunk</span><span class=\"mtk1\">.</span><span class=\"mtk12\">length</span><span class=\"mtk4\">}</span><span class=\"mtk8\">`</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    });</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">// readChunks() reads from the provided reader and yields the results into an async iterable</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">function</span><span class=\"mtk1\"> </span><span class=\"mtk11\">readChunks</span><span class=\"mtk1\">(</span><span class=\"mtk12\">reader</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        async* [</span><span class=\"mtk10\">Symbol</span><span class=\"mtk1\">.</span><span class=\"mtk12\">asyncIterator</span><span class=\"mtk1\">]() {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk4\">let</span><span class=\"mtk1\"> </span><span class=\"mtk12\">readResult</span><span class=\"mtk1\"> = </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> </span><span class=\"mtk12\">reader</span><span class=\"mtk1\">.</span><span class=\"mtk11\">read</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">while</span><span class=\"mtk1\"> (!</span><span class=\"mtk12\">readResult</span><span class=\"mtk1\">.</span><span class=\"mtk12\">done</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk15\">yield</span><span class=\"mtk1\"> </span><span class=\"mtk12\">readResult</span><span class=\"mtk1\">.</span><span class=\"mtk12\">value</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk12\">readResult</span><span class=\"mtk1\"> = </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> </span><span class=\"mtk12\">reader</span><span class=\"mtk1\">.</span><span class=\"mtk11\">read</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    };</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">}</span></span></code></pre>\n<p>If you run the API that we created and then run the code above in a web browser, you'll see <code>received chunk of size x</code> several times in the console with varying x:</p>\n<p><img src=\"/75cf5afcae41ee6a1ea963da104389ff/console-received-chunk-of-size-x.gif\" alt=\"The console showing &#x22;received chunk of size x&#x22;\"></p>\n<p>To see it more clearly, open up Developer Tools (F12) and set network throttling to 3G. It will take longer for the file to download, so you can see that the chunks are being processed gradually.</p>\n<p>That's all very nice, but the chunks themselves are not very useful. You want to process the measurements that are in these chunks. Ideally, you would have an async iterable that gradually yields the JSON objects as they come in so that you can use <code>for await</code> to iterate over the measurements instead of the chunks.</p>\n<p>JavaScript doesn't have a JSON parser that can deal with streams. Let's assume we have a function that can do this called <code>parseJsonStream(readableStream)</code>. More details and a simple implementation can be found in \"Streaming Considerations\" below.</p>\n<p>We would then be able to do this:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"javascript\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk11\">fetch</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;http://localhost:3000/measurements.json&#39;</span><span class=\"mtk1\">)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    .</span><span class=\"mtk11\">then</span><span class=\"mtk1\">(</span><span class=\"mtk4\">async</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">response</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">let</span><span class=\"mtk1\"> </span><span class=\"mtk12\">measurementsReceived</span><span class=\"mtk1\"> = </span><span class=\"mtk7\">0</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> (</span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">measurement</span><span class=\"mtk1\"> </span><span class=\"mtk4\">of</span><span class=\"mtk1\"> </span><span class=\"mtk11\">parseJsonStream</span><span class=\"mtk1\">(</span><span class=\"mtk12\">response</span><span class=\"mtk1\">.</span><span class=\"mtk12\">body</span><span class=\"mtk1\">)) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk12\">measurementsReceived</span><span class=\"mtk1\">++;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk3\">// To prevent the console from flooding we only show 1 in every 100 measurements</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">measurementsReceived</span><span class=\"mtk1\"> % </span><span class=\"mtk7\">100</span><span class=\"mtk1\"> === </span><span class=\"mtk7\">0</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk10\">console</span><span class=\"mtk1\">.</span><span class=\"mtk11\">log</span><span class=\"mtk1\">(</span><span class=\"mtk8\">`measurement with id {</span><span class=\"mtk4\">${</span><span class=\"mtk12\">measurement</span><span class=\"mtk1\">.</span><span class=\"mtk12\">id</span><span class=\"mtk4\">}</span><span class=\"mtk8\">} at time </span><span class=\"mtk4\">${</span><span class=\"mtk12\">measurement</span><span class=\"mtk1\">.</span><span class=\"mtk12\">timestamp</span><span class=\"mtk4\">}</span><span class=\"mtk8\"> has value [</span><span class=\"mtk4\">${</span><span class=\"mtk12\">measurement</span><span class=\"mtk1\">.</span><span class=\"mtk12\">value</span><span class=\"mtk4\">}</span><span class=\"mtk8\">]`</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    });</span></span></code></pre>\n<p>This is the result:</p>\n<p><img src=\"/e2d673ca68b3af5f9fa335ce755032ed/console-measurements.gif\" alt=\"The console showing &#x22;measurement with id ... has value ...&#x22;\"></p>\n<p>This is a very powerful mechanism. Instead of <code>console.log()</code> statements, imagine a line chart where the measurements gradually become visible as more data comes in.</p>\n<p>This is an example created with <a href=\"https://echarts.apache.org/\">Apache ECharts</a>:</p>\n<p><img src=\"/76f92b76f2a2c36236b5610800abb88c/chart-loading-gradually.gif\" alt=\"A line chart gradually becoming visible as data comes in\"></p>\n<p>Before the Fetch API, it was impossible to do this because the alternative, <a href=\"https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest\"><code>XMLHttpRequest</code></a>, will load the whole response into memory before providing it to your code.</p>\n<p>Modern applications don't use <code>XMLHttpRequest</code> directly, but a lot of libraries like <a href=\"https://axios-http.com/\">Axios</a> or Angular's <a href=\"https://angular.io/api/common/http/HttpClient\">HttpClient</a> rely on it to make requests.</p>\n<p>People would rely on more advanced technologies like <a href=\"https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API\">WebSockets</a> to stream data.</p>\n<h2 id=\"http2-for-streaming\" style=\"position:relative;\"><a href=\"#http2-for-streaming\" aria-label=\"http2 for streaming permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>HTTP/2 for Streaming</h2>\n<p>You might have heard that HTTP/2 has a more efficient mechanism for streaming, and you would be right. So let's see if we can use HTTP/2.</p>\n<p>From the side of your webapp, the browser will automatically determine if it can communicate with the server over HTTP/2 and use it if it's available. The Fetch API will do this transparently, so you do not need to make any changes to your webapp.</p>\n<p>What you need to do is make your API available over HTTP/2:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"javascript\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">Koa</span><span class=\"mtk1\"> = </span><span class=\"mtk11\">require</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;koa&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">app</span><span class=\"mtk1\"> = </span><span class=\"mtk4\">new</span><span class=\"mtk1\"> </span><span class=\"mtk10\">Koa</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">use</span><span class=\"mtk1\">(</span><span class=\"mtk4\">async</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">) </span><span class=\"mtk4\">=&gt;</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">.</span><span class=\"mtk12\">request</span><span class=\"mtk1\">.</span><span class=\"mtk12\">url</span><span class=\"mtk1\"> === </span><span class=\"mtk8\">&#39;/measurements.json&#39;</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">.</span><span class=\"mtk12\">response</span><span class=\"mtk1\">.</span><span class=\"mtk11\">set</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;content-type&#39;</span><span class=\"mtk1\">, </span><span class=\"mtk8\">&#39;application/json&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk12\">ctx</span><span class=\"mtk1\">.</span><span class=\"mtk12\">body</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">fs</span><span class=\"mtk1\">.</span><span class=\"mtk11\">createReadStream</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;./measurements.json&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">});</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">http2</span><span class=\"mtk1\">.</span><span class=\"mtk11\">createSecureServer</span><span class=\"mtk1\">(</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk12\">key:</span><span class=\"mtk1\"> </span><span class=\"mtk12\">fs</span><span class=\"mtk1\">.</span><span class=\"mtk11\">readFileSync</span><span class=\"mtk1\">(</span><span class=\"mtk12\">path</span><span class=\"mtk1\">.</span><span class=\"mtk11\">resolve</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;path/to/localhost-key.pem&#39;</span><span class=\"mtk1\">)),</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk12\">cert:</span><span class=\"mtk1\"> </span><span class=\"mtk12\">fs</span><span class=\"mtk1\">.</span><span class=\"mtk11\">readFileSync</span><span class=\"mtk1\">(</span><span class=\"mtk12\">path</span><span class=\"mtk1\">.</span><span class=\"mtk11\">resolve</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;path/to/localhost.pem&#39;</span><span class=\"mtk1\">)),</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk12\">app</span><span class=\"mtk1\">.</span><span class=\"mtk11\">callback</span><span class=\"mtk1\">(),</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">).</span><span class=\"mtk11\">listen</span><span class=\"mtk1\">(</span><span class=\"mtk7\">3000</span><span class=\"mtk1\">);</span></span></code></pre>\n<p>This code is mostly the same as for HTTP/1.1 with two notable differences:</p>\n<ul>\n<li>We're using Node's <code>http2</code> module instead of <code>http</code>.</li>\n<li>We're using <code>HTTPS</code> instead of plain <code>HTTP</code> because this is mandatory for HTTP/2. You can set up HTTPS and get the <code>cert</code> and <code>key</code> files using <a href=\"https://github.com/FiloSottile/mkcert\">mkcert</a>. Or, use one of the other mechanisms described in this article: <a href=\"https://web.dev/how-to-use-local-https/\">Use HTTPS for Local Development</a></li>\n</ul>\n<p>That's it!</p>\n<p>If you restart the API and check the network tab in Developer Tools, you'll see that your application will now stream over HTTP/2 (don't forget to update the URL in your webapp, start with <code>https://</code> instead of <code>http://</code>).</p>\n<h2 id=\"http-streaming-considerations\" style=\"position:relative;\"><a href=\"#http-streaming-considerations\" aria-label=\"http streaming considerations permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>HTTP Streaming Considerations</h2>\n<p>In the code above, we assumed that there was a function <code>parseJsonStream(readableStream)</code> that would parse a <code>ReadableStream</code> containing JSON into an async iterable of objects.</p>\n<p>The difficulty is that reading from a <code>ReadableStream</code> will give you chunks of data that don't necessarily correspond to anything meaningful. To illustrate, let's take a look at this example:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"json\" data-index=\"5\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">[</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;1&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:39:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">239.34</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;2&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:40:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">820.14</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;3&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:41:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">926.03</span><span class=\"mtk1\"> },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  { </span><span class=\"mtk12\">&quot;id&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;4&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;timestamp&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk8\">&quot;2022-01-19T10:42:00.000Z&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;value&quot;</span><span class=\"mtk1\">:  </span><span class=\"mtk7\">513.01</span><span class=\"mtk1\"> }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">]</span></span></code></pre>\n<p>We could receive this JSON in chunks like this:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"6\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">[</span>\n<span class=\"grvsc-line\">  { &quot;id&quot;: &quot;1&quot;, &quot;timestamp&quot;:  &quot;2022-01-19T10:39:00.000Z&quot;, &quot;value&quot;:  239.34 },</span>\n<span class=\"grvsc-line\">  { &quot;id&quot;: &quot;2&quot;, &quot;tim</span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"> estamp&quot;:  &quot;2022-01-19T10:40:00.000Z&quot;, &quot;value&quot;:  820.14 },</span>\n<span class=\"grvsc-line\">  { &quot;id&quot;: &quot;3&quot;, &quot;timestamp&quot;:  &quot;2022-01-19T10:41:00.000Z&quot;, &quot;value&quot;:  926.03 },</span>\n<span class=\"grvsc-line\">  { &quot;id&quot;: &quot;4&quot;, &quot;timestamp&quot;:  &quot;2022-01-19T10:42</span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\">:00.000Z&quot;, &quot;value&quot;:  513.01 }</span>\n<span class=\"grvsc-line\">]</span></code></pre>\n<p>You need some way to determine where one measurement object starts and ends. To simplify this, we created a JSON file where each line contains precisely one object. Parsing the stream becomes manageable when we can make this assumption.</p>\n<p>Here is the implementation of <code>parseJsonStream(readableStream)</code></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"javascript\" data-index=\"7\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">async</span><span class=\"mtk1\"> </span><span class=\"mtk4\">function</span><span class=\"mtk1\"> </span><span class=\"mtk4\">*</span><span class=\"mtk11\">parseJsonStream</span><span class=\"mtk1\">(</span><span class=\"mtk12\">readableStream</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> (</span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">line</span><span class=\"mtk1\"> </span><span class=\"mtk4\">of</span><span class=\"mtk1\"> </span><span class=\"mtk11\">readLines</span><span class=\"mtk1\">(</span><span class=\"mtk12\">readableStream</span><span class=\"mtk1\">.</span><span class=\"mtk11\">getReader</span><span class=\"mtk1\">())) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">trimmedLine</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">line</span><span class=\"mtk1\">.</span><span class=\"mtk11\">trim</span><span class=\"mtk1\">().</span><span class=\"mtk11\">replace</span><span class=\"mtk1\">(</span><span class=\"mtk5\">/,</span><span class=\"mtk11\">$</span><span class=\"mtk5\">/</span><span class=\"mtk1\">, </span><span class=\"mtk8\">&#39;&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">trimmedLine</span><span class=\"mtk1\"> !== </span><span class=\"mtk8\">&#39;[&#39;</span><span class=\"mtk1\"> && </span><span class=\"mtk12\">trimmedLine</span><span class=\"mtk1\"> !== </span><span class=\"mtk8\">&#39;]&#39;</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">yield</span><span class=\"mtk1\"> </span><span class=\"mtk10\">JSON</span><span class=\"mtk1\">.</span><span class=\"mtk11\">parse</span><span class=\"mtk1\">(</span><span class=\"mtk12\">trimmedLine</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">async</span><span class=\"mtk1\"> </span><span class=\"mtk4\">function</span><span class=\"mtk1\"> </span><span class=\"mtk4\">*</span><span class=\"mtk11\">readLines</span><span class=\"mtk1\">(</span><span class=\"mtk12\">reader</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">textDecoder</span><span class=\"mtk1\"> = </span><span class=\"mtk4\">new</span><span class=\"mtk1\"> </span><span class=\"mtk10\">TextDecoder</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">let</span><span class=\"mtk1\"> </span><span class=\"mtk12\">partOfLine</span><span class=\"mtk1\"> = </span><span class=\"mtk8\">&#39;&#39;</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> (</span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">chunk</span><span class=\"mtk1\"> </span><span class=\"mtk4\">of</span><span class=\"mtk1\"> </span><span class=\"mtk11\">readChunks</span><span class=\"mtk1\">(</span><span class=\"mtk12\">reader</span><span class=\"mtk1\">)) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">chunkText</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">textDecoder</span><span class=\"mtk1\">.</span><span class=\"mtk11\">decode</span><span class=\"mtk1\">(</span><span class=\"mtk12\">chunk</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">const</span><span class=\"mtk1\"> </span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">chunkText</span><span class=\"mtk1\">.</span><span class=\"mtk11\">split</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&#39;</span><span class=\"mtk6\">\\n</span><span class=\"mtk8\">&#39;</span><span class=\"mtk1\">);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">.</span><span class=\"mtk12\">length</span><span class=\"mtk1\"> === </span><span class=\"mtk7\">1</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk12\">partOfLine</span><span class=\"mtk1\"> += </span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">[</span><span class=\"mtk7\">0</span><span class=\"mtk1\">];</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        } </span><span class=\"mtk15\">else</span><span class=\"mtk1\"> </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (</span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">.</span><span class=\"mtk12\">length</span><span class=\"mtk1\"> &gt; </span><span class=\"mtk7\">1</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">yield</span><span class=\"mtk1\"> </span><span class=\"mtk12\">partOfLine</span><span class=\"mtk1\"> + </span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">[</span><span class=\"mtk7\">0</span><span class=\"mtk1\">];</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> (</span><span class=\"mtk4\">let</span><span class=\"mtk1\"> </span><span class=\"mtk12\">i</span><span class=\"mtk1\">=</span><span class=\"mtk7\">1</span><span class=\"mtk1\">; </span><span class=\"mtk12\">i</span><span class=\"mtk1\"> &lt; </span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">.</span><span class=\"mtk12\">length</span><span class=\"mtk1\"> - </span><span class=\"mtk7\">1</span><span class=\"mtk1\">; </span><span class=\"mtk12\">i</span><span class=\"mtk1\">++) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk15\">yield</span><span class=\"mtk1\"> </span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">[</span><span class=\"mtk12\">i</span><span class=\"mtk1\">];</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk12\">partOfLine</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">[</span><span class=\"mtk12\">chunkLines</span><span class=\"mtk1\">.</span><span class=\"mtk12\">length</span><span class=\"mtk1\"> - </span><span class=\"mtk7\">1</span><span class=\"mtk1\">];</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">function</span><span class=\"mtk1\"> </span><span class=\"mtk11\">readChunks</span><span class=\"mtk1\">(</span><span class=\"mtk12\">reader</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        async* [</span><span class=\"mtk10\">Symbol</span><span class=\"mtk1\">.</span><span class=\"mtk12\">asyncIterator</span><span class=\"mtk1\">]() {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk4\">let</span><span class=\"mtk1\"> </span><span class=\"mtk12\">readResult</span><span class=\"mtk1\"> = </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> </span><span class=\"mtk12\">reader</span><span class=\"mtk1\">.</span><span class=\"mtk11\">read</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">while</span><span class=\"mtk1\"> (!</span><span class=\"mtk12\">readResult</span><span class=\"mtk1\">.</span><span class=\"mtk12\">done</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk15\">yield</span><span class=\"mtk1\"> </span><span class=\"mtk12\">readResult</span><span class=\"mtk1\">.</span><span class=\"mtk12\">value</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk12\">readResult</span><span class=\"mtk1\"> = </span><span class=\"mtk15\">await</span><span class=\"mtk1\"> </span><span class=\"mtk12\">reader</span><span class=\"mtk1\">.</span><span class=\"mtk11\">read</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        },</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    };</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">}</span></span></code></pre>\n<p>If you have control over the API you're calling, you can use the \"1 object per line\" formatting as part of the contract, but know that it could be prone to breaking. For robust JSON support, we need a real streaming parser.</p>\n<p>Other options include using a format with one object per line by default, like CSV, or a more advanced format with built-in support for streaming like <a href=\"https://arrow.apache.org/docs/js/\">Apache Arrow</a>.</p>\n<h2 id=\"advantages\" style=\"position:relative;\"><a href=\"#advantages\" aria-label=\"advantages permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Advantages</h2>\n<ol>\n<li><strong>Snappy User Experience:</strong> You can start showing data as soon as it's available.</li>\n<li><strong>Scalable API:</strong> No memory usage spikes from accumulating results in memory.</li>\n<li>Uses plain HTTP and a standard JavaScript API. There are no connections to manage or complicated frameworks that might become obsolete in a few years.</li>\n</ol>\n<h2 id=\"disadvantages\" style=\"position:relative;\"><a href=\"#disadvantages\" aria-label=\"disadvantages permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Disadvantages</h2>\n<ol>\n<li>Implementation is slightly more involved than using regular API calls.</li>\n<li>\n<p>Error handling becomes more difficult because HTTP status code 200 will be sent as soon as streaming starts. What do we do when something goes wrong in the middle of the stream?</p>\n<p>When something goes wrong, your API should close the stream. Your webapp can then determine if the stream was complete and show a fitting message to the user if it's not. For example: when using a JSON response, as we discussed, you can check that the last line contains only \"]\".</p>\n</li>\n<li>No streaming JSON parser is currently available. Needs formatting assumptions as part of the contract or a more unconventional format.</li>\n</ol>\n<h2 id=\"conclusion\" style=\"position:relative;\"><a href=\"#conclusion\" aria-label=\"conclusion permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Conclusion</h2>\n<p>You have learned how to stream HTTP data efficiently using Node.js and HTTP without congesting or burdening the memory. You have also understood the advantages and disadvantages of HTTP streaming.</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n  .dark-default-dark .mtk10 { color: #4EC9B0; }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n  .dark-default-dark .mtk7 { color: #B5CEA8; }\n  .dark-default-dark .mtk5 { color: #D16969; }\n  .dark-default-dark .mtk6 { color: #D7BA7D; }\n</style>","frontmatter":{"date":"April 27, 2022","updated_date":null,"title":"Implement HTTP Streaming with Node.js and Fetch API","tags":["Node.js","Fetch API","Streaming"],"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/15f56fac896fb3f3370431c08e4ffbac/ee604/http-streaming-with-nodejs-and-fetch-api.png","srcSet":"/static/15f56fac896fb3f3370431c08e4ffbac/69585/http-streaming-with-nodejs-and-fetch-api.png 200w,\n/static/15f56fac896fb3f3370431c08e4ffbac/497c6/http-streaming-with-nodejs-and-fetch-api.png 400w,\n/static/15f56fac896fb3f3370431c08e4ffbac/ee604/http-streaming-with-nodejs-and-fetch-api.png 800w,\n/static/15f56fac896fb3f3370431c08e4ffbac/f3583/http-streaming-with-nodejs-and-fetch-api.png 1200w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Nick Van Nieuwenhuyse","github":"Nivani","avatar":null}}}},{"node":{"fields":{"slug":"/engineering/apache-beam/"},"html":"<p>We'll talk about Apache Beam in this guide and discuss its fundamental concepts. We will begin by showing the features and advantages of using Apache Beam, and then we will cover basic concepts and terminologies.</p>\n<p>Ever since the concept of big data got introduced to the programming world, a lot of different technologies, frameworks have emerged. The processing of data can be categorized into two different paradigms. One is Batch Processing, and the other is Stream Processing. </p>\n<p>Different technologies came into existence for different paradigms, solving various big data world problems, for, e.g., Apache Spark, Apache Flink, Apache Storm, etc. </p>\n<p>As a developer or a business, it's always challenging to maintain different tech stacks and technologies. Hence, Apache Beam to the rescue!</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 768px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 49.84615384615385%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAKCAYAAAC0VX7mAAAACXBIWXMAAAsTAAALEwEAmpwYAAACHklEQVQoz2WSW2/TQBCF/VeReOYnIPHEMxL0pVJBkaCiSCBVqSiI0obSplUhbZM0F5qEksS5NY4vseO1vfbHOqagiJFWe5kzc3b3HC12Z4jrE8J+jWjcJhw2SYAkSUjimDTSvZSSWO3TcxknyxnSvJ9hjHcE+w/Q4sAjGDQJVbPotktk9FH4ZZM0HNNgYU2zolj+Icmypl7j+v0XWnuX9PKb6M8fobES0b+l6xJPhpTXcuiFwgrqvCsJUrRep5IrU3nVoLrZovL6Ek0sbBoXH7gq7VIsPuG6U+H8/Ai9VsQ6/kxUvVg2kbNb/FIR7BkPt4Y825/ysdKi16sj5z/xzRae2USzzTGlozd8/7pFu12nPxtyXN/jpLWP4Y1xwwRbqJuHAcbGU+Zrj7k6LbN1afLisE9v6jAY+1Tb4ZJYsy2T8WjAwvNWnlU92+bsIMdxYRPPzz5ehpJBp4lo7qgvaTDs99jJ53m7vceNnmG0KIoUuSCKQiVGpurCc9G7LSajXwT+/C+JUOnclUnvpkLn9CW1coF8YRdh1RRBE0PvpqLc6XlnhQQhFpizKYEQGcHCZTLWcWyTSEHmc5OjTxsc7K5zcbiO+Haf+ek9+qX/VF6NO6ogCLDMmWqsLCZ8jOkto5GOaVkkUihRfiCdhrJkT/nQmyBdHakEkN6I2DeyISziwFnOaRGhkxoRGQl8Z4zn2oSewoULktAniQKFn/MbmpDqVNye7XEAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"Timeline of Big Data Frameworks\"\n        title=\"Timeline of Big Data Frameworks\"\n        src=\"/static/1e7f65aaa920379cd8b429236ecf5cb7/e5715/timeline-bigdata-frameworks.png\"\n        srcset=\"/static/1e7f65aaa920379cd8b429236ecf5cb7/a6d36/timeline-bigdata-frameworks.png 650w,\n/static/1e7f65aaa920379cd8b429236ecf5cb7/e5715/timeline-bigdata-frameworks.png 768w,\n/static/1e7f65aaa920379cd8b429236ecf5cb7/07a9c/timeline-bigdata-frameworks.png 1440w\"\n        sizes=\"(max-width: 768px) 100vw, 768px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<h2 id=\"what-is-apache-beam\" style=\"position:relative;\"><a href=\"#what-is-apache-beam\" aria-label=\"what is apache beam permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What is Apache Beam?</h2>\n<p>Apache Beam is an open source, centralised model for describing parallel-processing pipelines for both batch and streaming data. The programming model of the Apache Beam simplifies large-scale data processing dynamics. </p>\n<p>The Apache Beam model offers helpful abstractions that insulate you from distributed processing information at low levels, such as managing individual staff, exchanging databases, and other activities. These low-level information are handled entirely by Dataflow. </p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 768px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 76.46153846153847%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAYAAADkmO9VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAEHElEQVQ4y02UC0xTVxjHv0IpYKaQyEOyMWhLawx0Mpgxm2yyKO/yKM4lsoTNMEbcSwLUIkUID43bkBCJzrBBykBgmDhlXQimKE8FlrmFxwUKLVVMnDQUS3m0vbf322lBspP88v9yT/LL951zcoHRy4BdkIFVmw64kQWIZ+BeyxEgi0PwJngSeASvbTzXS4WALRLAzkjArijAhnBYuyCEhYJQAAcRoiHTbY1K9UD8nIv4LcnT7rmn+N6iYF5A9H6vgJMf+vglv7s7MJzv+fr+N3n+eFHsjn2HuPgohov6WC7eedvjRbHArT/nDQBGJyNkyIl4bnM2fcQylTrm0Mmm2ScfNeOQpI4dCJ9c7TnwiPAQSc0ORdRhRViz5YJwerVUMGYqEYxslAlnsVpUgRdFWx0S4Y+sXoYbM+n4ciINWX0GMoaTE9YHkgH7gwg0qg+4cNbWfkk/VorHzUo+mpRh+FwhQDIuEmE7XhJvCQnBtC7jMD7LjFrVnZcaJ/LicT6Sb78NIXQXSKxDH8TYCLSaK7H/ASH4BfCt11MSN1XZaVgjjsLL4sNMhYi/WS7cEQIznwz4LA9wEGDhBYbNGejITUR/RNylW8FQnQVDnfUa+TZvwahxxGC88hZgLbmcSwJgKkVgLQ8jQtcZyjiMUeVBI8LG+kosso4VB8usOMzTL9FqNhLRMrLMMqLDyDqYZQLtoO13T5CXgM2pbsy1BHe2SsTBKhG4Fo/HA5vN5ma322HNYo5jHUSBDqR73sf1x2o0/LuMBr0OtVotmkwm1rlL0/RDEuBkaWkJRkdHt2RUqwambmo4RsrAdW5azOYklmVdQjvVjfTSItoZFu02G9qsNmRoMgfrEv7pfKt3vq51uyo96ybfl8Ap8o97JewFqukeTHQPw+jy/K6Zudk4al6bNm20yaaeLn42SU3KJ2emz80+0Z3WLuozSMq0Bl10dcQp+CHpS8jfEwvyoAQo/J/Qj2rRCMd/6zv4uKsvpruxU3C3sF6slteE93yviqkXfuqTCfDar/k1MaqsUsnPKYXiX5S1op/OVB29Iv0msmDvMaE8KDGg0P/4jrCWJE619TJU+33854Y6f0DR9HxQ2WwfKFHhcFnLrbbEcw0dqSXYIT1vu5WqNNcczD5bFJSwVhwixaLAeCQjN5KErXFbNZUk16ZaNfNUW+/i39d/z+tXNFGDJaqnwyXNppHymw0diYrvOqVKU0dysaEzVblQE5mdV+B3XFsUEK8nMjOR1ZHc6dCLpC9JH9Kh79jV255/lbXvbsyp2Jt79EToJ++lBCaFvOOfKToiUBz6OECdWbnnsiTLs3BfvI8iONmHyHyJzHurw9bebTQuqLb7sNg6BM051bC9vF/9ZbaTU3/sK7gRnQsF5MycXRUR5ETmFP4HPQyC2KrLXHkAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"Beam-Model\"\n        title=\"Beam-Model\"\n        src=\"/static/bcda8196d6c879a95ccdd397242e01fb/e5715/beam_architecture.png\"\n        srcset=\"/static/bcda8196d6c879a95ccdd397242e01fb/a6d36/beam_architecture.png 650w,\n/static/bcda8196d6c879a95ccdd397242e01fb/e5715/beam_architecture.png 768w,\n/static/bcda8196d6c879a95ccdd397242e01fb/1cfc2/beam_architecture.png 900w\"\n        sizes=\"(max-width: 768px) 100vw, 768px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<h2 id=\"features-of-apache-beam\" style=\"position:relative;\"><a href=\"#features-of-apache-beam\" aria-label=\"features of apache beam permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Features of Apache Beam</h2>\n<p>The unique features of Apache  beam are as follows:</p>\n<ol>\n<li>Unified - Use a single programming model for both batch and streaming use cases.</li>\n<li>Portable - Execute pipelines in multiple execution environments. Here, execution environments mean different runners. Ex. Spark Runner, Dataflow Runner, etc</li>\n<li>Extensible - Write custom SDKs, IO connectors, and transformation libraries.</li>\n</ol>\n<h2 id=\"apache-beam-sdks-and-runners\" style=\"position:relative;\"><a href=\"#apache-beam-sdks-and-runners\" aria-label=\"apache beam sdks and runners permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Apache Beam SDKs and Runners</h2>\n<p>As of today, there are 3 Apache beam programming SDKs</p>\n<ol>\n<li>Java</li>\n<li>Python</li>\n<li>Golang</li>\n</ol>\n<p>Beam Runners translate the beam pipeline to the API compatible backend processing of your choice. Beam currently supports runners that work with the following backends.</p>\n<ol>\n<li>Apache Spark</li>\n<li>Apache Flink</li>\n<li>Apache Samza</li>\n<li>Google Cloud Dataflow</li>\n<li>Hazelcast Jet</li>\n<li>Twister2</li>\n</ol>\n<p>Direct Runner to run on the host machine, which is used for testing purposes.</p>\n<h2 id=\"basic-concepts-in-apache-beam\" style=\"position:relative;\"><a href=\"#basic-concepts-in-apache-beam\" aria-label=\"basic concepts in apache beam permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Basic Concepts in Apache Beam</h2>\n<p>Apache Beam has three main abstractions. They are</p>\n<ol>\n<li>Pipeline</li>\n<li>PCollection</li>\n<li>PTransform</li>\n</ol>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 768px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 13.999999999999998%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAADCAYAAACTWi8uAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA/klEQVQI1wHzAAz/APXx65U1ZahDOWmr/0Fvr/osX6b/Y4e5ef///z8mWaBRNGWq/zdnq/ktYKf/dJO/bv///zsfVJ5bPWut/0BurvkvYaf/SXSwPvbz7Z3k5ebHAOPg28VLd7VKOmqu/0t2tP8zZKr/cJC8k///73coV5paOGir/0NxsP8xY6n/g57Cjf//73EfUphjPmyu/0l0sv82Zqr/Y4e6TNrX09PS0tP/AHlsWRhcc5MrJU6HZyNSlFUwYqlUDEeaFpWsywAuY60YK16mUyZapE0tX6ZUADWTD5OqygAxZa0cK16lVChcpU0vYadTCEacDf/v1wvf4eMUMeqAEPko5kUAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"Beam-Pipeline\"\n        title=\"Beam-Pipeline\"\n        src=\"/static/439968fcb84e700e21b6475c5aa49214/e5715/pipeline-design.png\"\n        srcset=\"/static/439968fcb84e700e21b6475c5aa49214/a6d36/pipeline-design.png 650w,\n/static/439968fcb84e700e21b6475c5aa49214/e5715/pipeline-design.png 768w,\n/static/439968fcb84e700e21b6475c5aa49214/1132d/pipeline-design.png 1158w\"\n        sizes=\"(max-width: 768px) 100vw, 768px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<h3 id=\"pipeline\" style=\"position:relative;\"><a href=\"#pipeline\" aria-label=\"pipeline permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Pipeline:</h3>\n<p>A pipeline is the first abstraction to be created. It holds the complete data processing job from start to finish, including reading data, manipulating data, and writing data to a sink. Every pipeline takes in options/parameters that indicate where and how to run. </p>\n<h3 id=\"pcollection\" style=\"position:relative;\"><a href=\"#pcollection\" aria-label=\"pcollection permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>PCollection:</h3>\n<p>A pcollection is an abstraction of distributed data. A pcollection can be bounded, i.e., finite data, or unbounded, i.e., infinite data. The initial pcollection is created by reading data from the source. From then on, pcollections are the source and sink of every step in the pipeline.</p>\n<h3 id=\"transform\" style=\"position:relative;\"><a href=\"#transform\" aria-label=\"transform permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Transform:</h3>\n<p>A transform is a data processing operation. A transform is applied on one or more pcollections. Complex transforms have other transform nested within them. Every transform has a generic <code>apply</code> method where the logic of the transform sits in.</p>\n<h2 id=\"example-of-pipeline\" style=\"position:relative;\"><a href=\"#example-of-pipeline\" aria-label=\"example of pipeline permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Example of Pipeline</h2>\n<p>Here, let's write a pipeline to output all the jsons where the name starts with a vowel.</p>\n<p>Let's take a sample input. Name the file as <code>input.json</code></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"json\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk8\">&quot;abhi&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;score&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk7\">12</span><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk8\">&quot;virat&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;score&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk7\">23</span><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk8\">&quot;dhoni&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;score&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk7\">45</span><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk8\">&quot;rahul&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;score&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk7\">156</span><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;Edmund&quot;</span><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;Ojha&quot;</span><span class=\"mtk1\">}</span></span></code></pre>\n<p>The input should be a newline delimited JSON.</p>\n<p>Include the following dependencies in your <code>pom.xml</code></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">&lt;dependency&gt;</span>\n<span class=\"grvsc-line\">    &lt;groupId&gt;org.apache.beam&lt;/groupId&gt;</span>\n<span class=\"grvsc-line\">    &lt;artifactId&gt;beam-sdks-java-core&lt;/artifactId&gt;</span>\n<span class=\"grvsc-line\">    &lt;version&gt;2.24.0&lt;/version&gt;</span>\n<span class=\"grvsc-line\">&lt;/dependency&gt;</span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\">&lt;dependency&gt;</span>\n<span class=\"grvsc-line\">    &lt;groupId&gt;org.apache.beam&lt;/groupId&gt;</span>\n<span class=\"grvsc-line\">    &lt;artifactId&gt;beam-runners-direct-java&lt;/artifactId&gt;</span>\n<span class=\"grvsc-line\">    &lt;version&gt;2.24.0&lt;/version&gt;</span>\n<span class=\"grvsc-line\">&lt;/dependency&gt;</span></code></pre>\n<p>Let's code the beam pipeline. Follow the steps</p>\n<ol>\n<li>\n<p>Create a pipeline.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"java\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk10\">Pipeline</span><span class=\"mtk1\"> </span><span class=\"mtk12\">pipeLine</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">Pipeline</span><span class=\"mtk1\">.</span><span class=\"mtk11\">create</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">// OR </span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">// Pipeline pipeLine = Pipeline.create(options);</span></span></code></pre>\n<p>Create a pipeline which binds all the pcollections and transforms. Optionally you can pass the PipelineOptions <code>options</code> if needed.</p>\n</li>\n<li>\n<p>Read the input file</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"java\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk10\">PCollection</span><span class=\"mtk1\">&lt;</span><span class=\"mtk10\">String</span><span class=\"mtk1\">&gt; </span><span class=\"mtk12\">inputCollection</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">pipeLine</span><span class=\"mtk1\">.</span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Read My File&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">TextIO</span><span class=\"mtk1\">.</span><span class=\"mtk11\">read</span><span class=\"mtk1\">().</span><span class=\"mtk11\">from</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;input.json&quot;</span><span class=\"mtk1\">));</span></span></code></pre>\n<p>Use the <code>TextIO</code> transform to read the input files. Every line is a different json record.</p>\n</li>\n<li>\n<p>Apply a transform to filter out the names starting from a vowel</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"java\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk10\">PCollection</span><span class=\"mtk1\"> </span><span class=\"mtk12\">filteredCollection</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">inputCollection</span><span class=\"mtk1\">.</span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Filter names starting with vowels&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">Filter</span><span class=\"mtk1\">.</span><span class=\"mtk11\">by</span><span class=\"mtk1\">(</span><span class=\"mtk15\">new</span><span class=\"mtk1\"> </span><span class=\"mtk10\">SerializableFunction</span><span class=\"mtk1\">&lt;</span><span class=\"mtk10\">String</span><span class=\"mtk1\">, </span><span class=\"mtk10\">Boolean</span><span class=\"mtk1\">&gt;() {</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">public</span><span class=\"mtk1\"> </span><span class=\"mtk10\">Boolean</span><span class=\"mtk1\"> </span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk10\">String</span><span class=\"mtk1\"> </span><span class=\"mtk12\">input</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk10\">ObjectMapper</span><span class=\"mtk1\"> </span><span class=\"mtk12\">jacksonObjMapper</span><span class=\"mtk1\"> = </span><span class=\"mtk15\">new</span><span class=\"mtk1\"> </span><span class=\"mtk11\">ObjectMapper</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">try</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk10\">JsonNode</span><span class=\"mtk1\"> </span><span class=\"mtk12\">jsonNode</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">jacksonObjMapper</span><span class=\"mtk1\">.</span><span class=\"mtk11\">readTree</span><span class=\"mtk1\">(input);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk10\">String</span><span class=\"mtk1\"> </span><span class=\"mtk12\">name</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">jsonNode</span><span class=\"mtk1\">.</span><span class=\"mtk11\">get</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;name&quot;</span><span class=\"mtk1\">).</span><span class=\"mtk11\">textValue</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk12\">vowels</span><span class=\"mtk1\">.</span><span class=\"mtk11\">contains</span><span class=\"mtk1\">(</span><span class=\"mtk12\">name</span><span class=\"mtk1\">.</span><span class=\"mtk11\">substring</span><span class=\"mtk1\">(</span><span class=\"mtk7\">0</span><span class=\"mtk1\">,</span><span class=\"mtk7\">1</span><span class=\"mtk1\">).</span><span class=\"mtk11\">toLowerCase</span><span class=\"mtk1\">());</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            } </span><span class=\"mtk15\">catch</span><span class=\"mtk1\"> (</span><span class=\"mtk10\">JsonProcessingException</span><span class=\"mtk1\"> </span><span class=\"mtk12\">e</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk12\">e</span><span class=\"mtk1\">.</span><span class=\"mtk11\">printStackTrace</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk4\">false</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    }))</span></span></code></pre>\n<p>The filter transform takes a SerializableFunction Object where the <code>apply</code> method is overridden. Every json-string record is converted to a JSON. The first character of the <code>name</code> is checked if it's a vowel. The transform is applied to each input JSON record. Based on the boolean value returned, the record is retained or discarded.</p>\n</li>\n<li>\n<p>Write the results to a file</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"java\" data-index=\"5\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk12\">inputCollection</span><span class=\"mtk1\">.</span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;write to file&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">TextIO</span><span class=\"mtk1\">.</span><span class=\"mtk11\">write</span><span class=\"mtk1\">().</span><span class=\"mtk11\">to</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;result&quot;</span><span class=\"mtk1\">).</span><span class=\"mtk11\">withSuffix</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;.txt&quot;</span><span class=\"mtk1\">).</span><span class=\"mtk11\">withoutSharding</span><span class=\"mtk1\">());</span></span></code></pre>\n<p>The results of the <code>Filter</code> transform are stored in a text file using the write method of the <code>TextIO</code> transform. As PCollections are distributed across machines, the results are written to multiple files/shards. To avoid this, we use <code>withoutSharding</code> where all the output is written to a single file.</p>\n</li>\n</ol>\n<p>Output:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"json\" data-index=\"6\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;Edmund&quot;</span><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">: </span><span class=\"mtk8\">&quot;Ojha&quot;</span><span class=\"mtk1\">}</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">{</span><span class=\"mtk12\">&quot;name&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk8\">&quot;abhi&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">&quot;score&quot;</span><span class=\"mtk1\">:</span><span class=\"mtk7\">12</span><span class=\"mtk1\">}</span></span></code></pre>\n<hr>\n<p>Complete Code:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"java\" data-index=\"7\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk10\">Pipeline</span><span class=\"mtk1\"> </span><span class=\"mtk12\">pipeLine</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">Pipeline</span><span class=\"mtk1\">.</span><span class=\"mtk11\">create</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">final</span><span class=\"mtk1\"> </span><span class=\"mtk10\">Set</span><span class=\"mtk1\">&lt;</span><span class=\"mtk10\">String</span><span class=\"mtk1\">&gt; </span><span class=\"mtk12\">vowels</span><span class=\"mtk1\"> = </span><span class=\"mtk15\">new</span><span class=\"mtk1\"> </span><span class=\"mtk10\">HashSet</span><span class=\"mtk1\">&lt;</span><span class=\"mtk10\">String</span><span class=\"mtk1\">&gt;(</span><span class=\"mtk12\">Arrays</span><span class=\"mtk1\">.</span><span class=\"mtk11\">asList</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;a&quot;</span><span class=\"mtk1\">,</span><span class=\"mtk8\">&quot;e&quot;</span><span class=\"mtk1\">,</span><span class=\"mtk8\">&quot;i&quot;</span><span class=\"mtk1\">,</span><span class=\"mtk8\">&quot;o&quot;</span><span class=\"mtk1\">,</span><span class=\"mtk8\">&quot;u&quot;</span><span class=\"mtk1\">));</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">pipeLine</span><span class=\"mtk1\">.</span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Read My File&quot;</span><span class=\"mtk1\">,</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk12\">TextIO</span><span class=\"mtk1\">.</span><span class=\"mtk11\">read</span><span class=\"mtk1\">().</span><span class=\"mtk11\">from</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;input.json&quot;</span><span class=\"mtk1\">))</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        .</span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;Filter names starting with vowels&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">Filter</span><span class=\"mtk1\">.</span><span class=\"mtk11\">by</span><span class=\"mtk1\">(</span><span class=\"mtk15\">new</span><span class=\"mtk1\"> </span><span class=\"mtk10\">SerializableFunction</span><span class=\"mtk1\">&lt;</span><span class=\"mtk10\">String</span><span class=\"mtk1\">, </span><span class=\"mtk10\">Boolean</span><span class=\"mtk1\">&gt;() {</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk4\">public</span><span class=\"mtk1\"> </span><span class=\"mtk10\">Boolean</span><span class=\"mtk1\"> </span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk10\">String</span><span class=\"mtk1\"> </span><span class=\"mtk12\">input</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk10\">ObjectMapper</span><span class=\"mtk1\"> </span><span class=\"mtk12\">jacksonObjMapper</span><span class=\"mtk1\"> = </span><span class=\"mtk15\">new</span><span class=\"mtk1\"> </span><span class=\"mtk11\">ObjectMapper</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk15\">try</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                    </span><span class=\"mtk10\">JsonNode</span><span class=\"mtk1\"> </span><span class=\"mtk12\">jsonNode</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">jacksonObjMapper</span><span class=\"mtk1\">.</span><span class=\"mtk11\">readTree</span><span class=\"mtk1\">(input);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                    </span><span class=\"mtk10\">String</span><span class=\"mtk1\"> </span><span class=\"mtk12\">name</span><span class=\"mtk1\"> = </span><span class=\"mtk12\">jsonNode</span><span class=\"mtk1\">.</span><span class=\"mtk11\">get</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;name&quot;</span><span class=\"mtk1\">).</span><span class=\"mtk11\">textValue</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                    </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk12\">vowels</span><span class=\"mtk1\">.</span><span class=\"mtk11\">contains</span><span class=\"mtk1\">(</span><span class=\"mtk12\">name</span><span class=\"mtk1\">.</span><span class=\"mtk11\">substring</span><span class=\"mtk1\">(</span><span class=\"mtk7\">0</span><span class=\"mtk1\">,</span><span class=\"mtk7\">1</span><span class=\"mtk1\">).</span><span class=\"mtk11\">toLowerCase</span><span class=\"mtk1\">());</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                } </span><span class=\"mtk15\">catch</span><span class=\"mtk1\"> (</span><span class=\"mtk10\">JsonProcessingException</span><span class=\"mtk1\"> </span><span class=\"mtk12\">e</span><span class=\"mtk1\">) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                    </span><span class=\"mtk12\">e</span><span class=\"mtk1\">.</span><span class=\"mtk11\">printStackTrace</span><span class=\"mtk1\">();</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">                </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk4\">false</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        }))</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        .</span><span class=\"mtk11\">apply</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;write to file&quot;</span><span class=\"mtk1\">, </span><span class=\"mtk12\">TextIO</span><span class=\"mtk1\">.</span><span class=\"mtk11\">write</span><span class=\"mtk1\">().</span><span class=\"mtk11\">to</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;result&quot;</span><span class=\"mtk1\">).</span><span class=\"mtk11\">withSuffix</span><span class=\"mtk1\">(</span><span class=\"mtk8\">&quot;.txt&quot;</span><span class=\"mtk1\">).</span><span class=\"mtk11\">withoutSharding</span><span class=\"mtk1\">());</span></span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"><span class=\"mtk12\">pipeLine</span><span class=\"mtk1\">.</span><span class=\"mtk11\">run</span><span class=\"mtk1\">().</span><span class=\"mtk11\">waitUntilFinish</span><span class=\"mtk1\">();</span></span></code></pre>\n<p>For more advanced concepts, refer to the official site - beam.apache.org</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n  .dark-default-dark .mtk7 { color: #B5CEA8; }\n  .dark-default-dark .mtk10 { color: #4EC9B0; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n</style>","frontmatter":{"date":"October 16, 2020","updated_date":null,"title":"Apache Beam: A Basic Guide","tags":["Engineering","Big Data","Streaming","Apache Beam","Java"],"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/17c96c6367455f79bdb854c4608ebaba/ee604/main.png","srcSet":"/static/17c96c6367455f79bdb854c4608ebaba/69585/main.png 200w,\n/static/17c96c6367455f79bdb854c4608ebaba/497c6/main.png 400w,\n/static/17c96c6367455f79bdb854c4608ebaba/ee604/main.png 800w,\n/static/17c96c6367455f79bdb854c4608ebaba/f3583/main.png 1200w","sizes":"(max-width: 800px) 100vw, 800px"}}},"author":{"id":"Abhilash K R","github":"Better-Boy","avatar":null}}}}]}},"pageContext":{"tag":"Streaming"}},"staticQueryHashes":["1171199041","1384082988","2100481360","23180105","528864852"]}