eLitmus BlogJekyll2023-12-22T08:53:58+05:30/eLitmus.com/site-admin@elitmus.com/technology/mastering-multi-tenant-setup-with-rails-part-12023-12-17 14:17:45 +0530T00:00:00-00:002023-12-17T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>Multi-tenancy is a software design where a single instance of a software application serves multiple customers or tenants (individual users or organizations). In a multi-tenant architecture, each tenant’s data and configuration are logically isolated from one another, providing a sense of individuality and privacy while sharing the same underlying infrastructure, codebase, and application instance.</p>
<h4 id="single-tenant-application"><strong>Single Tenant application</strong></h4>
<p>In a single-tenant application, each hosted instance has its dedicated database. Upon addition of a new organization that requires segregated data, a new application is hosted with a different database.</p>
<div style=" text-align:center;">
<img src="/blog/images/multi-tenant/single-tenant.png" style="height:600px" />
</div>
<h4 id="multi-tenant-application-types"><strong>Multi Tenant Application types</strong></h4>
<ol>
<li><strong>Single Database shared rows</strong>
<ul>
<li>Each table in database will contain an additional row known as tenant_id.</li>
<li>Whenever data is stored and retrieved from table this coloumn will be used to get/store the data.</li>
<li>
<p>Only the data that belongs to a specific customer/tenant will be fetched.</p>
<div style=" text-align:center;">
<img src="/blog/images/multi-tenant/single-db-shared-rows.png" style="height:600px;" />
</div>
</li>
</ul>
</li>
<li><strong>Single Database shared schema</strong>
<ul>
<li>For each tenant a different table will be maintained in same database.</li>
<li>
<p>Data will be segregated table wise.</p>
<div style=" text-align:center;">
<img src="/blog/images/multi-tenant/single-db-separate-tables.png" style="height:600px;" />
</div>
</li>
</ul>
</li>
<li><strong>Dedicated Database for Each Tenant</strong>
<ul>
<li>
<p>For each tenant a new database schema will be maintained, it can be termed as shard.</p>
<div style=" text-align:center;">
<img src="/blog/images/multi-tenant/multi-tenant.png" style="height:600px;" />
</div>
</li>
</ul>
</li>
</ol>
<p>In this blog post, we’ll take an in-depth look at the third approach, where we opt to manage separate databases for each tenant. To demonstrate this, we’ll walk through the process of creating a basic Rails blog application from the ground up.</p>
<h4 id="goal"><strong>Goal</strong></h4>
<ol>
<li>Setting up a multi-tenant application in development mode.</li>
<li>dynamically switching databases according to the requesting host name.</li>
</ol>
<h5 id="what-features-rails-6-brings-in"><strong>What features rails 6 brings in</strong></h5>
<p>Rails 6 introduced the multiple database setup with following features -</p>
<ol>
<li>Multiple writer databases and a replica for each.</li>
<li>Automatic connection switching for the model you’re working with.</li>
<li>Automatic swapping between the writer and replica depending on the HTTP verb and recent writes.</li>
<li>Rails tasks for creating, dropping, migrating, and interacting with the multiple databases.</li>
</ol>
<h4 id="setup"><strong>Setup</strong></h4>
<p><strong>Create new rails app</strong></p>
<ul>
<li><code>rails new multi_db_blog</code></li>
<li>update gemfile to use mysql2 instead of sqlite3</li>
</ul>
<p><strong>Setup databases</strong></p>
<ol>
<li>In <code>database.yml</code> file update the database with name.</li>
</ol>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span><span class="ss">development</span><span class="p">:</span>
<span class="ss">app1</span><span class="p">:</span>
<span class="ss">adapter</span><span class="p">:</span> <span class="n">mysql2</span>
<span class="ss">encoding</span><span class="p">:</span> <span class="n">utf8</span>
<span class="ss">reconnect</span><span class="p">:</span> <span class="kp">false</span>
<span class="ss">database</span><span class="p">:</span> <span class="n">app1_development</span>
<span class="ss">pool</span><span class="p">:</span> <span class="mi">5</span>
<span class="ss">username</span><span class="p">:</span>
<span class="ss">password</span><span class="p">:</span>
<span class="ss">socket</span><span class="p">:</span> <span class="sr">/tmp/m</span><span class="n">ysql</span><span class="o">.</span><span class="n">sock</span>
<span class="ss">host</span><span class="p">:</span> <span class="mi">127</span><span class="o">.</span><span class="mi">0</span><span class="o">.</span><span class="mi">0</span><span class="o">.</span><span class="mi">1</span>
<span class="ss">app2</span><span class="p">:</span>
<span class="ss">adapter</span><span class="p">:</span> <span class="n">mysql2</span>
<span class="ss">encoding</span><span class="p">:</span> <span class="n">utf8</span>
<span class="ss">reconnect</span><span class="p">:</span> <span class="kp">false</span>
<span class="ss">database</span><span class="p">:</span> <span class="n">app2_development</span>
<span class="ss">pool</span><span class="p">:</span> <span class="mi">5</span>
<span class="ss">username</span><span class="p">:</span>
<span class="ss">password</span><span class="p">:</span>
<span class="ss">socket</span><span class="p">:</span> <span class="sr">/tmp/m</span><span class="n">ysql</span><span class="o">.</span><span class="n">sock</span>
<span class="ss">host</span><span class="p">:</span> <span class="mi">127</span><span class="o">.</span><span class="mi">0</span><span class="o">.</span><span class="mi">0</span><span class="o">.</span><span class="mi">1</span></code></pre></figure>
<ol start="2">
<li><code>bin/rake db:create</code> <em>create databases for both the tenants.</em></li>
<li>You have the option to execute specific rake commands for each database. For instance, you can create the <code>app1</code> database using the command: <code>bin/rake db:create:app1</code></li>
</ol>
<p><strong>Generate Models and Controller</strong></p>
<ol>
<li>
<p>Model</p>
<p><code>bin/rails generate model Article title:string body:text</code></p>
</li>
<li>
<p>Run migrations</p>
<p><code>bin/rake db:migrate</code></p>
</li>
<li>
<p>Controller</p>
<p><code>bin/rails generate controller Articles index --skip-routes</code></p>
</li>
<li>
<p>update <code>routes.rb</code> file.</p>
</li>
</ol>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span><span class="n">root</span> <span class="s2">"articles#index"</span>
<span class="n">resources</span> <span class="ss">:articles</span>
</code></pre></figure>
<p>Complete the <code>Articles</code> Controller, Model and respective views by following <a href="https://guides.rubyonrails.org/getting_started.html" target="_blank" style="color: blue;">This Guide</a></p>
<p><strong>Start App</strong></p>
<ol>
<li>Run <code>bin/rails s</code> to start the server.</li>
<li>By default rails will connect to db1 now.</li>
<li>This will act as a default database for the current application.</li>
</ol>
<p><strong>Running up both databases simaltaneously</strong></p>
<p>Install nginx & paste the following code in nginx.conf file.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>http <span class="o">{</span>
server <span class="o">{</span>
listen <span class="m">3000</span><span class="p">;</span>
server_name localhost<span class="p">;</span>
location / <span class="o">{</span>
proxy_pass http://127.0.0.1:3000<span class="p">;</span> <span class="c1"># Rails app running on port 3000</span>
proxy_set_header Host <span class="nv">$host</span>:<span class="nv">$server_port</span><span class="p">;</span>
proxy_set_header X-Real-IP <span class="nv">$remote_addr</span><span class="p">;</span>
proxy_set_header X-Forwarded-For <span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span>
<span class="o">}</span>
<span class="o">}</span>
server <span class="o">{</span>
listen <span class="m">4000</span><span class="p">;</span>
server_name localhost<span class="p">;</span> <span class="c1"># Change this to your actual domain if needed</span>
location / <span class="o">{</span>
proxy_pass http://127.0.0.1:3000<span class="p">;</span> <span class="c1"># Rails app running on port 3000</span>
proxy_set_header Host <span class="nv">$host</span>:<span class="nv">$server_port</span><span class="p">;</span>
proxy_set_header X-Real-IP <span class="nv">$remote_addr</span><span class="p">;</span>
proxy_set_header X-Forwarded-For <span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
events <span class="o">{</span> <span class="o">}</span></code></pre></figure>
<p>Above nginx configurations listens to port 3000 and 4000 and redirect to rails application running in port 3000.</p>
<p><strong>Additional Rails changes</strong></p>
<p>Since We are using Rails 7 we can use automatic shard swap feature provided by rails. if using rails 6.1 or 6, a middleware can be introduced to automatic switch the tenants depending on request. Visit next section for the details.</p>
<p>Mention list of tenants in a <code>.yml</code> file. You can maintain these records in a separate database as well, for now I will create a <code>settings.yml</code> file.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span><span class="ss">development</span><span class="p">:</span>
<span class="ss">tenants</span><span class="p">:</span>
<span class="ss">app1</span><span class="p">:</span>
<span class="ss">hosts</span><span class="p">:</span>
<span class="o">-</span> <span class="ss">localhost</span><span class="p">:</span><span class="mi">3000</span>
<span class="ss">app2</span><span class="p">:</span>
<span class="ss">hosts</span><span class="p">:</span>
<span class="o">-</span> <span class="ss">localhost</span><span class="p">:</span><span class="mi">4000</span></code></pre></figure>
<p>update <code>application.rb</code> with following configurations.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span><span class="no">Rails</span><span class="o">.</span><span class="n">application</span><span class="o">.</span><span class="n">configure</span> <span class="k">do</span>
<span class="n">config</span><span class="o">.</span><span class="n">active_record</span><span class="o">.</span><span class="n">shard_selector</span> <span class="o">=</span> <span class="p">{</span> <span class="ss">lock</span><span class="p">:</span> <span class="kp">true</span> <span class="p">}</span>
<span class="n">tenants</span> <span class="o">=</span> <span class="no">Rails</span><span class="o">.</span><span class="n">application</span><span class="o">.</span><span class="n">config_for</span><span class="p">(</span><span class="ss">:settings</span><span class="p">)</span><span class="o">[</span><span class="ss">:tenants</span><span class="o">]</span> <span class="c1"># maintaining list of tenants with host</span>
<span class="n">config</span><span class="o">.</span><span class="n">active_record</span><span class="o">.</span><span class="n">shard_resolver</span> <span class="o">=</span> <span class="o">-></span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="p">{</span>
<span class="n">tenants</span><span class="o">.</span><span class="n">keys</span><span class="o">.</span><span class="n">find</span> <span class="p">{</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span> <span class="n">tenants</span><span class="o">[</span><span class="n">key</span><span class="o">][</span><span class="ss">:hosts</span><span class="o">].</span><span class="n">include?</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">env</span><span class="o">[</span><span class="s1">'HTTP_HOST'</span><span class="o">]</span><span class="p">)</span> <span class="p">}</span> <span class="o">||</span> <span class="ss">:app1</span>
<span class="p">}</span>
<span class="k">end</span></code></pre></figure>
<p>update <code>application_record.rb</code></p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="c1"># connects_to shards: {</span>
<span class="c1"># app1: { writing: :app1 },</span>
<span class="c1"># app2: { writing: :app2 }</span>
<span class="c1"># }</span>
<span class="c1"># OR</span>
<span class="no">TENANTS</span> <span class="o">=</span> <span class="no">Rails</span><span class="o">.</span><span class="n">application</span><span class="o">.</span><span class="n">config_for</span><span class="p">(</span><span class="ss">:settings</span><span class="p">)</span><span class="o">[</span><span class="ss">:tenants</span><span class="o">]</span>
<span class="n">connects_to</span> <span class="no">TENANTS</span><span class="o">.</span><span class="n">keys</span><span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">shard</span><span class="o">|</span> <span class="o">[</span><span class="n">shard</span><span class="p">,</span> <span class="p">{</span> <span class="ss">writing</span><span class="p">:</span> <span class="n">shard</span> <span class="p">}</span><span class="o">]</span> <span class="p">}</span><span class="o">.</span><span class="n">to_h</span></code></pre></figure>
<h5 id="creating-middleware-for-automatic-shard-switchingignore-if-using-rails-7-or-above"><strong>Creating Middleware for automatic shard switching(ignore if using rails 7 or above)</strong></h5>
<ol>
<li>Create a middleware named <code>middleware/tenant_selector.rb</code></li>
<li>Add following code</li>
</ol>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">module</span> <span class="nn">Middleware</span>
<span class="k">class</span> <span class="nc">TenantSelector</span>
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">tenants</span><span class="p">)</span>
<span class="vi">@app</span> <span class="o">=</span> <span class="n">app</span>
<span class="vi">@tenants</span> <span class="o">=</span> <span class="n">tenants</span>
<span class="k">end</span>
<span class="kp">attr_reader</span> <span class="ss">:tenants</span>
<span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
<span class="n">request</span> <span class="o">=</span> <span class="no">ActionDispatch</span><span class="o">::</span><span class="no">Request</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
<span class="n">tenant</span> <span class="o">=</span> <span class="n">selected_tenant</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">set_tenant</span><span class="p">(</span><span class="n">tenant</span><span class="p">)</span> <span class="k">do</span>
<span class="vi">@app</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="kp">private</span>
<span class="k">def</span> <span class="nf">selected_tenant</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">tenants</span><span class="o">.</span><span class="n">keys</span><span class="o">.</span><span class="n">find</span> <span class="p">{</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span> <span class="n">tenants</span><span class="o">[</span><span class="n">key</span><span class="o">][</span><span class="ss">:hosts</span><span class="o">].</span><span class="n">include?</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">env</span><span class="o">[</span><span class="s1">'HTTP_HOST'</span><span class="o">]</span><span class="p">)</span> <span class="p">}</span> <span class="o">||</span> <span class="ss">:app1</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">set_tenant</span><span class="p">(</span><span class="n">tenant</span><span class="p">,</span> <span class="o">&</span><span class="n">block</span><span class="p">)</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="o">.</span><span class="n">connected_to</span><span class="p">(</span><span class="ss">shard</span><span class="p">:</span> <span class="n">tenant</span><span class="o">.</span><span class="n">to_sym</span><span class="p">,</span> <span class="ss">role</span><span class="p">:</span> <span class="ss">:writing</span><span class="p">)</span> <span class="k">do</span>
<span class="k">yield</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></figure>
<ol start="3">
<li>Update <code>application.rb</code> file with following changes.</li>
</ol>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">tenants</span> <span class="o">=</span> <span class="no">Rails</span><span class="o">.</span><span class="n">application</span><span class="o">.</span><span class="n">config_for</span><span class="p">(</span><span class="ss">:settings</span><span class="p">)</span><span class="o">[</span><span class="ss">:tenants</span><span class="o">]</span>
<span class="n">config</span><span class="o">.</span><span class="n">app_middleware</span><span class="o">.</span><span class="n">use</span> <span class="no">Middleware</span><span class="o">::</span><span class="no">TenantSelector</span><span class="p">,</span> <span class="n">tenants</span></code></pre></figure>
<p><strong>Final Steps</strong></p>
<p>Follow these final steps to confirm your multi-tenant Rails application is up and running smoothly:</p>
<ol>
<li>Run <code>bin/rails s</code></li>
<li>Access localhost:3000 to connect to db1</li>
<li>Access localhost:4000 to connect to db2</li>
<li>If you wish to add more databases, simply update the <code>database.yml</code> and <code>settings.yml</code> files</li>
</ol>
<p><strong>What Next?</strong></p>
<p>In the upcoming series of blog posts, we will delve into the following topics:</p>
<ol>
<li>Maintaining Background Jobs.</li>
<li>Running Rake Tasks with Cron Jobs for Multiple Databases.</li>
<li>ActiveStorage Data Management with Different Storage Types for Each Tenant.</li>
<li>Caching.</li>
</ol>
<h4 id="summary"><strong>Summary</strong></h4>
<p>In this blog post we covered creating a multi tenant application from scratch and setting it up in development environment. We were able to automatically switch databases according to type of database.</p>
<h4 id="references"><strong>References</strong></h4>
<ol>
<li><a href="https://github.com/nikhilbhatt/rails-multi-db-tutorial/releases/tag/0.0.0" target="_blank" style="color: blue;">Github Code</a></li>
<li><a href="https://guides.rubyonrails.org/active_record_multiple_databases.html" target="_blank" style="color: blue;">Rails Multi Db introduction</a></li>
</ol>
<p><a href="/technology/mastering-multi-tenant-setup-with-rails-part-1/">Mastering Multi Tenant setup with rails part 1</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on December 17, 2023.</p>/technology/an-in-depth-look-at-database-indexing2023-12-10 15:49:39 +0530T00:00:00-00:002023-12-10T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>In this article, we will explore Database Indexing. We will begin by installing the Docker & running a Postgres container on it. Subsequently, to execute queries and comprehend how the database uses various indexing strategies, we will insert millions of rows into a Postgres table.</p>
<p>Following that, we will explore different tools to gaining insights into the SQL query planner and optimizers. After that, we will delve into understanding database indexing, examining how various types of indexing works with examples, and do a comparison between different types of database scan strategies.</p>
<p>Finally, we will then demystify how database indexes operates for the WHERE clause with the AND and OR operators.</p>
<h4 id="prerequisite"><strong>Prerequisite</strong></h4>
<ol style="margin-left: 2rem">
<li>
<b>Installing Docker & Running a Postgres Container:</b>
<ol style="margin-left: 2rem">
<li>
Install Docker by following the instructions provided in the <a href="https://www.docker.com/get-started/" target="_blank" style="color: blue;">getting started</a> guide on the official Docker website.
</li>
<li>
Verify that Docker is installed by running the command <code>docker --version</code>
</li>
</ol>
</li>
<li>
<b>
Running a PostgresSQL Container:
</b>
<ol style="margin-left: 2rem">
<li>
Spin up the Docker container by using the official Postgres <a href="https://hub.docker.com/_/postgres" target="_blank" style="color: blue;">image</a>.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>docker run -e <span class="nv">POSTGRES_PASSWORD</span><span class="o">=</span>secret --name pg postgres</code></pre></figure>
</li>
<li>
Start the Postgres command shell:
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>docker <span class="nb">exec</span> -it pg psql -U postgres</code></pre></figure>
</li>
</ol>
</li>
<li>
<b>
Inserting a Million Rows into a Postgres Table:
</b>
<ol style="margin-left: 2rem">
<li>
Create a table named <code>employees</code>:
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">create</span> <span class="k">table</span> <span class="n">employees</span><span class="p">(</span><span class="n">id</span> <span class="nb">serial</span> <span class="k">primary</span> <span class="k">key</span><span class="p">,</span> <span class="n">employeeid</span> <span class="nb">integer</span><span class="p">,</span> <span class="n">name</span> <span class="nb">TEXT</span><span class="p">);</span></code></pre></figure>
</li>
<li>
Insert into the <code>employees</code> table using the <a href="https://www.postgresql.org/docs/current/functions-srf.html" target="_blank" style="color: blue;">generate_series</a> function:
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">create</span> <span class="k">or</span> <span class="k">replace</span> <span class="k">function</span> <span class="n">gen_random_string</span><span class="p">(</span><span class="k">length</span> <span class="nb">integer</span><span class="p">)</span>
<span class="k">RETURNS</span> <span class="nb">VARCHAR</span> <span class="k">as</span>
<span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="k">result</span> <span class="nb">VARCHAR</span> <span class="p">:</span><span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="k">FOR</span> <span class="n">i</span> <span class="k">IN</span> <span class="mi">1</span><span class="p">..</span><span class="k">length</span> <span class="n">LOOP</span>
<span class="k">result</span> <span class="p">:</span><span class="o">=</span> <span class="k">result</span> <span class="o">||</span> <span class="n">chr</span><span class="p">((</span><span class="n">floor</span><span class="p">(</span><span class="n">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">26</span><span class="p">)</span> <span class="o">+</span> <span class="mi">65</span><span class="p">)::</span><span class="nb">integer</span><span class="p">);</span>
<span class="k">END</span> <span class="n">LOOP</span><span class="p">;</span>
<span class="k">RETURN</span> <span class="k">result</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span> <span class="k">language</span> <span class="n">plpgsql</span><span class="p">;</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">EMPLOYEES</span><span class="p">(</span><span class="n">employeeid</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="k">SELECT</span> <span class="o">*</span><span class="p">,</span> <span class="n">gen_random_string</span><span class="p">((</span><span class="n">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)::</span><span class="nb">integer</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">generate_series</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1000000</span><span class="p">);</span></code></pre></figure>
</li>
<li>
Confirm the result by executing the <code>count</code> query:
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">select</span> <span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">from</span> <span class="n">employees</span><span class="p">;</span>
<span class="k">count</span>
<span class="c1">--------</span>
<span class="mi">1000001</span>
<span class="p">(</span><span class="mi">1</span> <span class="k">row</span><span class="p">)</span></code></pre></figure>
</li>
</ol>
This sequence of steps creates a table named <code>employees</code> and inserts one million rows into it, generating random values for the <code>employeeid</code> and <code>name</code> columns. The final count query verifies the successful insertion of the specified number of rows.
</li>
<li>
<b>The SQL Query Planner and Optimizer:</b>
<ul style="margin-left: 2rem">
<li>
<b>Explanation:</b>
The <code>explain</code> command displays the execution plan generated by the PostgresSQL planner for the provided statement. This plan illustrates how the table(s) referenced in the statement will be scanned, whether through plain sequential scans, index scans, etc.
</li>
<li>
<b>Examples:</b>
<ol style="margin-left: 2rem">
<li>
<b>Select All Query:</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">explain</span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">employees</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-------------------------------------------------------------</span>
<span class="n">Seq</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">16139</span><span class="p">.</span><span class="mi">01</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1000001</span> <span class="n">width</span><span class="o">=</span><span class="mi">19</span><span class="p">)</span></code></pre></figure>
<ul style="margin-left: 2rem">
<li>
<code>Seq Scan</code>: Directly goes to the heap and fetches everything, similar to a Full Table Scan in other databases. In Postgres, with multiple threads, it's called <code>Parallel Seq Scan</code>.
</li>
<li>
<code>Cost=0.00..16139.01</code>: The first number represents work before fetching (e.g., aggregating, ordering), and the second number is the total estimated execution time.
</li>
<li>
<code>rows=1000001</code>: An approximate number of rows to be fetched.
</li>
<li>
<code>width=19</code>: The sum of bytes for all columns.
</li>
</ul>
</li>
<br />
<li>
<b>Select All Query with Order By (Indexed Column):</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">create</span> <span class="k">index</span> <span class="n">employees_employeeid_idx</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">(</span><span class="n">employeeid</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">INDEX</span>
<span class="n">postgres</span><span class="o">=#</span> <span class="k">explain</span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">order</span> <span class="k">by</span> <span class="n">employeeid</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-----------------------------------------------------------------</span>
<span class="k">Index</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">employees_employeeid_idx</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="p">..</span><span class="mi">32122</span><span class="p">.</span><span class="mi">44</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1000001</span> <span class="n">width</span><span class="o">=</span><span class="mi">19</span><span class="p">)</span></code></pre></figure>
<ul style="margin-left: 2rem">
<li>
<code>cost=0.42</code>: Postgres performs work, ordering by <code>employeeid</code>. An index on <code>employeeid</code> leads to an <code>Index Scan</code>.
</li>
</ul>
</li>
<br />
<li>
<b>Select All Query with Order By (Non-Indexed Column):</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">explain</span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">order</span> <span class="k">by</span> <span class="n">name</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">--------------------------------------------------------------</span>
<span class="n">Sort</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">136306</span><span class="p">.</span><span class="mi">96</span><span class="p">..</span><span class="mi">138806</span><span class="p">.</span><span class="mi">96</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1000001</span> <span class="n">width</span><span class="o">=</span><span class="mi">19</span><span class="p">)</span>
<span class="n">Sort</span> <span class="k">Key</span><span class="p">:</span> <span class="n">name</span>
<span class="o">-></span> <span class="n">Seq</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">16139</span><span class="p">.</span><span class="mi">01</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1000001</span> <span class="n">width</span><span class="o">=</span><span class="mi">19</span><span class="p">)</span></code></pre></figure>
<ul style="margin-left: 2rem">
<li>
<code>Seq Scan & Sort</code>: Seq Scan on the table, followed by sorting. Sorting cost is critical.
</li>
</ul>
</li>
<br />
<li>
<b>Select Only ID:</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">explain</span> <span class="k">select</span> <span class="n">id</span> <span class="k">from</span> <span class="n">employees</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">---------------------------------------------------</span>
<span class="n">Seq</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">16139</span><span class="p">.</span><span class="mi">01</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1000001</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span></code></pre></figure>
<ul style="margin-left: 2rem">
<li>
<code>width=4</code>: Fetching only <code>id</code>, resulting in a smaller <code>width</code> of 4 bytes (integer).
</li>
</ul>
</li>
<br />
<li>
<b>Select All Query for a Particular ID:</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">explain</span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">where</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-------------------------------------------------------------------</span>
<span class="k">Index</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">employees_pkey</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="p">..</span><span class="mi">8</span><span class="p">.</span><span class="mi">44</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">19</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">id</span> <span class="o">=</span> <span class="mi">10</span><span class="p">)</span></code></pre></figure>
<ul style="margin-left: 2rem">
<li>
<code>rows=1</code>: Fetching only 1 record using the primary key index.
</li>
</ul>
</li>
</ol>
</li>
</ul>
</li>
</ol>
<h4 id="what-is-database-indexing"><strong>What is Database indexing?</strong></h4>
<p>An index is a data structure that speeds up data retrieval without needing to scan every row present in the table. Index improves lookup performance but decreases write performance because every time a new row is created, indexes need to be updated.</p>
<p>Indexes are typically stored on the disk. An index is typically a small table with two columns: a primary/candidate key and address. Keys are made from one or more columns.</p>
<p>The data structure used for storing the index is B+ Trees. In the simplest form, an index is a stored table of key-value pairs that allows searches to be conducted in <code>O(logn)</code> time using binary search on sorted data.</p>
<p>Types of Indexes:</p>
<ul style="margin-left: 2rem">
<li>
Clustered Index
<ul style="margin-left: 2rem">
<li>Index and data reside together and are ordered by the key. A Clustered Index is basically a tree-organized table. Instead of storing the records in an unsorted Heap table space, the clustered index is actually B+Tree index having the Leaf Nodes, which are ordered by the clusters key column value, store the actual table records, as illustrated by the following diagram.</li>
</ul>
<img src="/blog/images/database-indexing/clustered-index.png" width="425" />
</li>
<li>
Nonclustered Index
<ul style="margin-left: 2rem">
<li>A nonclustered index contains the key values and each key value entry has a pointer to the data row that contains the key value. Since the Clustered Index is usually built using the Primary Key column values, if you want to speed up queries that use some other column, then you'll have to add a Secondary Non-Clustered Index.
The Secondary Index is going to store the Primary Key value in its Leaf Nodes, as illustrated by the following diagram
</li>
</ul>
<img src="/blog/images/database-indexing/nonclustered-index.png" width="425" />
</li>
</ul>
<h4 id="how-database-indexes-works-under-the-hood"><strong>How database indexes works under the hood?</strong></h4>
<p>We have already created a database index on the <code>employeeid</code> column in our employees table using the <code>CREATE INDEX</code> statement. Behind the scenes, Postgres creates a new pseudo-table in the database with two columns: a value for <code>employeeid</code> and a pointer to the corresponding record in the <code>employees</code> table. This pseudo-table is organized and stored as a binary tree with ordered values for the <code>employeeid</code> column. Consequently, the query operates with O(logn) efficiency and typically executes in a second or less.</p>
<p><img src="/blog/images/database-indexing/indexing-structure.png" width="425" />
Let’s delve into two scenarios:</p>
<ol style="margin-left: 2rem">
<li>
<b><code>SELECT * FROM employees WHERE employeeid = 4;</code></b>
<br />
Here, with an index on the <code>employeeid</code> column, the query initiates an <code>Index Scan</code>. The process begins by accessing the Index table, retrieving the reference for the Page number, and obtaining the row number for the specific record on that page. Subsequently, it navigates to the corresponding page in the heap and fetches the entire row. This method, known as an <code>Index Scan</code>.
</li>
<li>
<b><code>SELECT employeeid FROM employees WHERE employeeid = 4;</code></b><br />
In this instance, there is no need to access the heap to retrieve the complete record. Since the required value for <code>employeeid</code> is already present in the index table, the operation is streamlined, and it directly performs an <code>Index Only Scan</code>. This approach allows the system to retrieve the specific <code>employeeid</code> directly from the index table without the additional step of fetching the complete row from the heap. This can lead to improved performance, particularly when the index includes all the columns needed for the query, minimizing the amount of data that needs to be processed.
</li>
</ol>
<h4 id="what-are-different-type-of-database-scan-strategies"><strong>What are different type of database scan strategies?</strong></h4>
<ol style="margin-left: 2rem">
<li>
<b>Index Only Scan</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">select</span> <span class="n">id</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">where</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">100</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-----------------------------------------------------------------------------</span>
<span class="k">Index</span> <span class="k">Only</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">employees_pkey</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="p">..</span><span class="mi">4</span><span class="p">.</span><span class="mi">44</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">2</span><span class="p">.</span><span class="mi">529</span><span class="p">..</span><span class="mi">2</span><span class="p">.</span><span class="mi">542</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">id</span> <span class="o">=</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">Heap</span> <span class="n">Fetches</span><span class="p">:</span> <span class="mi">0</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">510</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">2</span><span class="p">.</span><span class="mi">708</span> <span class="n">ms</span></code></pre></figure>
If we examine the given query, we retrieve the ID using a filter on the ID column, which serves as the primary key and has an index on it.
<img src="/blog/images/database-indexing/index-only-scan.png" width="425" />
Let's break down the query output:
<ol style="margin-left: 2rem">
<li>
<b><code>Index Only Scan:</code></b> In the case of an <code>Index Only Scan</code>, Postgres scans the index table, resulting in faster performance as the Index table is significantly smaller than the actual table. With <code>Index Only Scan</code>, results are directly fetched from the Index table when querying columns for which indexes have been created.
</li>
<li>
<b><code>Heap Fetches: 0:</code></b> This indicates that the queried ID value did not necessitate accessing the heap table to retrieve information. The information was obtained inline, and this is referred to as an Inline query.
</li>
<li>
<b><code>Planning Time: 0.510 ms:</code></b> This represents the time taken by Postgres to determine whether to use the index or perform a full table scan.
</li>
<li>
<b><code>Execution Time: 2.708 ms:</code></b> This is the time taken by Postgres to actually fetch the records from the table.
</li>
</ol>
</li>
<li>
<b>Index Scan</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">where</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">----------------------------------------------------------------------------</span>
<span class="k">Index</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">employees_pkey</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="p">..</span><span class="mi">8</span><span class="p">.</span><span class="mi">44</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">11</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">1</span><span class="p">.</span><span class="mi">250</span><span class="p">..</span><span class="mi">1</span><span class="p">.</span><span class="mi">260</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">id</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">)</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">703</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">655</span> <span class="n">ms</span></code></pre></figure>
If we examine the given query, we are retrieving the <code>name</code> using a filter on the ID column, which serves as the primary key and has an index on it.
In this case, the process begins with an index scan on the Index table to retrieve information about the Page number and row number on the Heap. Since the <code>name</code> is not available in the Index table, we must go to the heap to fetch the <code>name</code>. This type of scan is referred to as an <code>Index Scan</code>.
<img src="/blog/images/database-indexing/index-scan.png" width="425" />
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">where</span> <span class="n">id</span> <span class="o"><</span> <span class="mi">1000</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">----------------------------------------------------------------------------------------------------------</span>
<span class="k">Index</span> <span class="n">Scan</span> <span class="k">using</span> <span class="n">employees_pkey</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">42</span><span class="p">..</span><span class="mi">40</span><span class="p">.</span><span class="mi">75</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1047</span> <span class="n">width</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">062</span><span class="p">..</span><span class="mi">1</span><span class="p">.</span><span class="mi">139</span> <span class="k">rows</span><span class="o">=</span><span class="mi">999</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">id</span> <span class="o"><</span> <span class="mi">1000</span><span class="p">)</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">4</span><span class="p">.</span><span class="mi">948</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">215</span> <span class="n">ms</span>
<span class="p">(</span><span class="mi">4</span> <span class="k">rows</span><span class="p">)</span></code></pre></figure>
In this case, we are filtering the record using the filter on the id with '<' operator and filtering out record which have id less than 1000. So the process begins with an Index scanning on the Index table then fetching the rows from the heap. Same as in case of fetching single id.
<img src="/blog/images/database-indexing/index-scan.png" width="425" />
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">select</span> <span class="n">name</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">where</span> <span class="n">id</span> <span class="o">></span> <span class="mi">1000</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">---------------------------------------------------------------------------</span>
<span class="n">Seq</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">18639</span><span class="p">.</span><span class="mi">01</span> <span class="k">rows</span><span class="o">=</span><span class="mi">998953</span> <span class="n">width</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">104</span><span class="p">..</span><span class="mi">168</span><span class="p">.</span><span class="mi">884</span> <span class="k">rows</span><span class="o">=</span><span class="mi">999001</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Filter</span><span class="p">:</span> <span class="p">(</span><span class="n">id</span> <span class="o">></span> <span class="mi">1000</span><span class="p">)</span>
<span class="k">Rows</span> <span class="n">Removed</span> <span class="k">by</span> <span class="n">Filter</span><span class="p">:</span> <span class="mi">1000</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">158</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">198</span><span class="p">.</span><span class="mi">259</span> <span class="n">ms</span>
<span class="p">(</span><span class="mi">5</span> <span class="k">rows</span><span class="p">)</span></code></pre></figure>
In this case, we are filtering the record using the filter on the id with '>' operator and filtering out records which have id greater than 1000. So, in this case, as Postgres knows it has to fetch 99% of the data anyway, it prefers to use the Seq Scan on the heap table. Rather than going to the Index table to filter out the records and then again going to the heap to filter those Index-scanned rows.
<img src="/blog/images/database-indexing/seq-scan.png" width="425" />
</li>
<li>
<b>Parallel Seq Scan</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">select</span> <span class="n">id</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">where</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'WABOY'</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">----------------------------------------------------------------------------</span>
<span class="n">Gather</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1000</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">12347</span><span class="p">.</span><span class="mi">44</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">3</span><span class="p">.</span><span class="mi">970</span><span class="p">..</span><span class="mi">120</span><span class="p">.</span><span class="mi">383</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Workers</span> <span class="n">Planned</span><span class="p">:</span> <span class="mi">2</span>
<span class="n">Workers</span> <span class="n">Launched</span><span class="p">:</span> <span class="mi">2</span>
<span class="o">-></span> <span class="n">Parallel</span> <span class="n">Seq</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">11347</span><span class="p">.</span><span class="mi">34</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">64</span><span class="p">.</span><span class="mi">894</span><span class="p">..</span><span class="mi">102</span><span class="p">.</span><span class="mi">448</span> <span class="k">rows</span><span class="o">=</span><span class="mi">0</span> <span class="n">loops</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">Filter</span><span class="p">:</span> <span class="p">((</span><span class="n">name</span><span class="p">)::</span><span class="nb">text</span> <span class="o">=</span> <span class="s1">'WABOY'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span>
<span class="k">Rows</span> <span class="n">Removed</span> <span class="k">by</span> <span class="n">Filter</span><span class="p">:</span> <span class="mi">333333</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">898</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">120</span><span class="p">.</span><span class="mi">850</span> <span class="n">ms</span>
<span class="p">(</span><span class="mi">8</span> <span class="k">rows</span><span class="p">)</span></code></pre></figure>
If we examine the given query, we are retrieving the <code>id</code> using a filter on the name column, which doesn't have an index on it.
As we don't have an index on the name column, that means we have to actually search for the name <code>WABOY</code> one by one and perform a sequential scan on the employees table. Postgres efficiently addresses this by executing multiple worker threads and conducting a parallel sequential scan.
<img src="/blog/images/database-indexing/parallel-seq-scan.png" width="425" />
</li>
<li>
<b>Bitmap Scan</b>
<br />
Let's create a Bitmap Index on the name column to get started.
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">employees_name_idx</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">INDEX</span></code></pre></figure>
Let's explore how Bitmap Scan works in PostgreSQL.
<br /><br />
Heap pages are stored on disk, and loading a page into memory can be expensive. When using an Index Scan, if the query yields a large number of rows, the query's performance may suffer because each row's retrieval involves loading a page into memory.
<br />
In contrast, with a Bitmap Scan, instead of loading rows into memory, we set a bit to 1 in an array of bits corresponding to heap page numbers. The operation then works on top of this bitmap.
<br />
<img src="/blog/images/database-indexing/bitmap-or-example.png" width="425" style="margin-top: 2rem; margin-bottom: 2rem" />
Here's a simplified breakdown of above image:
<ul style="margin-left: 2rem">
<li>
In a bitmap index scan, rows are not loaded into memory. PostgreSQL sets the bit to 1 for page number 1 when the name is 'CD' and 0 for other pages.
</li>
<li>
When the name is 'BC', page number 2 is set to 1, and others are set to 0.
</li>
<li>
Subsequently, a new bitmap is created by performing an OR operation on both bitmaps.
</li>
<li>
Finally, PostgreSQL executes a Bitmap Heap Scan where it fully scans each heap page and rechecks the conditions.
</li>
</ul>
<br />
This approach minimizes the need to load entire pages into memory for individual rows, improving the efficiency of the query. If the query results in a lot of rows located in only a limited number of heap pages then this strategy will be very efficient.
<br />
<br />
Now let's filter out the id, name by the name
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">select</span> <span class="n">id</span><span class="p">,</span> <span class="n">name</span> <span class="k">from</span> <span class="n">employees</span> <span class="k">where</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'WABOY'</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">--------------------------------------------------------------------------------</span>
<span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">employees</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">111</span><span class="p">.</span><span class="mi">17</span><span class="p">..</span><span class="mi">6277</span><span class="p">.</span><span class="mi">29</span> <span class="k">rows</span><span class="o">=</span><span class="mi">5000</span> <span class="n">width</span><span class="o">=</span><span class="mi">36</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">348</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">369</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'WABOY'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span>
<span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">1</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">employees_name_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">109</span><span class="p">.</span><span class="mi">92</span> <span class="k">rows</span><span class="o">=</span><span class="mi">5000</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">274</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">274</span> <span class="k">rows</span><span class="o">=</span><span class="mi">1</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s1">'WABOY'</span><span class="p">::</span><span class="nb">text</span><span class="p">)</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">905</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">734</span> <span class="n">ms</span></code></pre></figure>
Upon analyzing the provided query, we extract the <code>id</code> and <code>name</code> by applying a filter on the <code>name</code> column, which has an index.
<img src="/blog/images/database-indexing/bitmap-scan.png" width="425" />
Let's clarify the process:
<ol style="margin-left: 2rem">
<li>
<code>Bitmap Index Scan</code>: This step involves scanning the index table for the <code>name</code> column since an index exists on it. It retrieves the page number and row number to obtain references to the corresponding records in the heap.
</li>
<li>
<code>Bitmap Heap Scan</code>: Since we are filtering based on both <code>id</code> and <code>name</code>, this step is necessary to visit the heap and retrieve the values for both attributes for a specific record. The reference to the record is obtained from the preceding <code>Bitmap Index Scan</code>.
</li>
</ol>
</li>
</ol>
<h4 id="combining-database-indexes"><strong>Combining Database Indexes</strong></h4>
<ul style="margin-left: 2rem">
<li>
Prerequisite: Let's create a table to learn how to combine indexes.
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">NUMBERS</span><span class="p">(</span><span class="n">id</span> <span class="nb">serial</span> <span class="k">primary</span> <span class="k">key</span><span class="p">,</span> <span class="n">a</span> <span class="nb">integer</span><span class="p">,</span> <span class="n">b</span> <span class="nb">integer</span><span class="p">,</span> <span class="k">c</span> <span class="nb">integer</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">NUMBERS</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="k">c</span><span class="p">)</span> <span class="k">select</span> <span class="p">(</span><span class="n">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">100</span><span class="p">)::</span><span class="nb">integer</span><span class="p">,</span> <span class="p">(</span><span class="n">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">)::</span><span class="nb">integer</span><span class="p">,</span> <span class="p">(</span><span class="n">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">2000</span><span class="p">)::</span><span class="nb">integer</span> <span class="k">from</span> <span class="n">generate_series</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10000000</span><span class="p">);</span></code></pre></figure>
</li>
<li>
Now let's create index on the columns <code>A</code> and <code>B</code>.
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">numbers_a_idx</span> <span class="k">on</span> <span class="n">numbers</span><span class="p">(</span><span class="n">a</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">numbers_b_idx</span> <span class="k">on</span> <span class="n">numbers</span><span class="p">(</span><span class="n">b</span><span class="p">);</span></code></pre></figure>
</li>
</ul>
<ol style="margin-left: 2rem">
<li>
<b>Select column <code>c</code> for a particular value of column <code>a</code></b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">c</span> <span class="k">FROM</span> <span class="n">numbers</span> <span class="k">WHERE</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">88</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">---------------------------------------------------------------------------------</span>
<span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1101</span><span class="p">.</span><span class="mi">09</span><span class="p">..</span><span class="mi">57496</span><span class="p">.</span><span class="mi">05</span> <span class="k">rows</span><span class="o">=</span><span class="mi">98665</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">41</span><span class="p">.</span><span class="mi">110</span><span class="p">..</span><span class="mi">683</span><span class="p">.</span><span class="mi">631</span> <span class="k">rows</span><span class="o">=</span><span class="mi">99888</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">=</span> <span class="mi">88</span><span class="p">)</span>
<span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">45619</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers_a_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">1076</span><span class="p">.</span><span class="mi">42</span> <span class="k">rows</span><span class="o">=</span><span class="mi">98665</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">29</span><span class="p">.</span><span class="mi">403</span><span class="p">..</span><span class="mi">29</span><span class="p">.</span><span class="mi">403</span> <span class="k">rows</span><span class="o">=</span><span class="mi">99888</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">=</span> <span class="mi">88</span><span class="p">)</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">569</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">687</span><span class="p">.</span><span class="mi">152</span> <span class="n">ms</span></code></pre></figure>
Here, we can analyze that since we have an index only on column <code>a</code>, a bitmap index scan is performed on column <code>a</code>. To retrieve column <code>c</code>, it jumps to the heap and performs a bitmap heap scan.
</li>
<br />
<li>
<b>Select column c but we are going to query on both <code>a</code> and <code>b</code> with <code>AND</code> operation</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">c</span> <span class="k">FROM</span> <span class="n">numbers</span> <span class="k">WHERE</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">90</span> <span class="k">AND</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">500</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-----------------------------------------------------------------------------</span>
<span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1320</span><span class="p">.</span><span class="mi">12</span><span class="p">..</span><span class="mi">1746</span><span class="p">.</span><span class="mi">88</span> <span class="k">rows</span><span class="o">=</span><span class="mi">110</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">32</span><span class="p">.</span><span class="mi">300</span><span class="p">..</span><span class="mi">38</span><span class="p">.</span><span class="mi">262</span> <span class="k">rows</span><span class="o">=</span><span class="mi">107</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">((</span><span class="n">b</span> <span class="o">=</span> <span class="mi">500</span><span class="p">)</span> <span class="k">AND</span> <span class="p">(</span><span class="n">a</span> <span class="o">=</span> <span class="mi">90</span><span class="p">))</span>
<span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">107</span>
<span class="o">-></span> <span class="n">BitmapAnd</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1320</span><span class="p">.</span><span class="mi">12</span><span class="p">..</span><span class="mi">1320</span><span class="p">.</span><span class="mi">12</span> <span class="k">rows</span><span class="o">=</span><span class="mi">110</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">32</span><span class="p">.</span><span class="mi">079</span><span class="p">..</span><span class="mi">32</span><span class="p">.</span><span class="mi">081</span> <span class="k">rows</span><span class="o">=</span><span class="mi">0</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers_b_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">110</span><span class="p">.</span><span class="mi">88</span> <span class="k">rows</span><span class="o">=</span><span class="mi">9926</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">4</span><span class="p">.</span><span class="mi">494</span><span class="p">..</span><span class="mi">4</span><span class="p">.</span><span class="mi">494</span> <span class="k">rows</span><span class="o">=</span><span class="mi">9974</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span>
<span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="mi">500</span><span class="p">)</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers_a_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">1208</span><span class="p">.</span><span class="mi">93</span> <span class="k">rows</span><span class="o">=</span><span class="mi">111000</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">26</span><span class="p">.</span><span class="mi">799</span><span class="p">..</span><span class="mi">26</span><span class="p">.</span><span class="mi">800</span> <span class="k">rows</span><span class="o">=</span><span class="mi">99868</span> <span class="n">l</span>
<span class="n">oops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">=</span> <span class="mi">90</span><span class="p">)</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">3</span><span class="p">.</span><span class="mi">362</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">38</span><span class="p">.</span><span class="mi">604</span> <span class="n">ms</span></code></pre></figure>
Here, we can analyze the following:
<ol style="margin-left: 2rem">
<li>PostgreSQL executed a bitmap index scan on column 'A'.</li>
<li>Concurrently, a bitmap index scan was performed on column 'B'.</li>
<li>Subsequently, PostgreSQL executed a bitmap AND operation to combine the results of the scans on 'A' and 'B'.</li>
<li>After obtaining the references for the rows to be retrieved, PostgreSQL proceeds to perform a bitmap heap scan.</li>
</ol>
</li>
<br />
<li>
<b>Select column c but we are going to query on both a and b with <code>OR</code> operation.</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">c</span> <span class="k">FROM</span> <span class="n">numbers</span> <span class="k">WHERE</span> <span class="n">A</span> <span class="o">=</span> <span class="mi">50</span> <span class="k">OR</span> <span class="n">B</span> <span class="o">=</span> <span class="mi">500</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-------------------------------------------------------------------------------</span>
<span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1164</span><span class="p">.</span><span class="mi">23</span><span class="p">..</span><span class="mi">57490</span><span class="p">.</span><span class="mi">68</span> <span class="k">rows</span><span class="o">=</span><span class="mi">101835</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">37</span><span class="p">.</span><span class="mi">957</span><span class="p">..</span><span class="mi">600</span><span class="p">.</span><span class="mi">439</span> <span class="k">rows</span><span class="o">=</span><span class="mi">109466</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">((</span><span class="n">a</span> <span class="o">=</span> <span class="mi">50</span><span class="p">)</span> <span class="k">OR</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="mi">500</span><span class="p">))</span>
<span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">46998</span>
<span class="o">-></span> <span class="n">BitmapOr</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1164</span><span class="p">.</span><span class="mi">23</span><span class="p">..</span><span class="mi">1164</span><span class="p">.</span><span class="mi">23</span> <span class="k">rows</span><span class="o">=</span><span class="mi">101926</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">25</span><span class="p">.</span><span class="mi">625</span><span class="p">..</span><span class="mi">25</span><span class="p">.</span><span class="mi">626</span> <span class="k">rows</span><span class="o">=</span><span class="mi">0</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers_a_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">1002</span><span class="p">.</span><span class="mi">43</span> <span class="k">rows</span><span class="o">=</span><span class="mi">92000</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">24</span><span class="p">.</span><span class="mi">309</span><span class="p">..</span><span class="mi">24</span><span class="p">.</span><span class="mi">309</span> <span class="k">rows</span><span class="o">=</span><span class="mi">99602</span> <span class="n">lo</span>
<span class="n">ops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">=</span> <span class="mi">50</span><span class="p">)</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers_b_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">110</span><span class="p">.</span><span class="mi">88</span> <span class="k">rows</span><span class="o">=</span><span class="mi">9926</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">1</span><span class="p">.</span><span class="mi">313</span><span class="p">..</span><span class="mi">1</span><span class="p">.</span><span class="mi">314</span> <span class="k">rows</span><span class="o">=</span><span class="mi">9974</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span>
<span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">135</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">604</span><span class="p">.</span><span class="mi">165</span> <span class="n">ms</span></code></pre></figure>
Here, we can analyze the following:
<ol style="margin-left: 2rem">
<li>PostgreSQL executed a bitmap index scan on column <code>a</code>.</li>
<li>Concurrently, a bitmap index scan was performed on column <code>b</code>.</li>
<li>Subsequently, PostgreSQL executed a <code>BitmapOr</code> operation to combine the results of the scans on columns <code>a</code> and <code>b</code>.</li>
<li>After obtaining the references for the rows to be retrieved, PostgreSQL proceeds to perform a bitmap heap scan.</li>
</ol>
</li>
</ol>
<ul style="margin-left: 2rem">
<li><b>Composite Index</b></li>
<li style="margin-left: 2rem">
First, we need to drop the indexes on both columns <code>a</code> and <code>b</code>, and then create a composite index on columns <code>a</code> and <code>b</code>.
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">numbers_a_b_idx</span> <span class="k">on</span> <span class="n">numbers</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">INDEX</span></code></pre></figure>
</li>
<ol style="margin-left: 2rem">
<li>
<b>Select column <code>c</code> for a particular value of column <code>a</code></b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">c</span> <span class="k">FROM</span> <span class="n">numbers</span> <span class="k">WHERE</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">70</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-----------------------------------------------------------------------</span>
<span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1189</span><span class="p">.</span><span class="mi">93</span><span class="p">..</span><span class="mi">56830</span><span class="p">.</span><span class="mi">03</span> <span class="k">rows</span><span class="o">=</span><span class="mi">106000</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">38</span><span class="p">.</span><span class="mi">779</span><span class="p">..</span><span class="mi">610</span><span class="p">.</span><span class="mi">173</span> <span class="k">rows</span><span class="o">=</span><span class="mi">99789</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">=</span> <span class="mi">70</span><span class="p">)</span>
<span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">45549</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers_a_b_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">1163</span><span class="p">.</span><span class="mi">43</span> <span class="k">rows</span><span class="o">=</span><span class="mi">106000</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">27</span><span class="p">.</span><span class="mi">796</span><span class="p">..</span><span class="mi">27</span><span class="p">.</span><span class="mi">797</span> <span class="k">rows</span><span class="o">=</span><span class="mi">99789</span> <span class="n">loops</span>
<span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">=</span> <span class="mi">70</span><span class="p">)</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">5</span><span class="p">.</span><span class="mi">188</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">613</span><span class="p">.</span><span class="mi">305</span> <span class="n">ms</span>
<span class="p">(</span><span class="mi">7</span> <span class="k">rows</span><span class="p">)</span></code></pre></figure>
Here, we can analyze the following:
<ol style="margin-left: 2rem">
<li>This time, PostgreSQL decided to use the composite index <code>numbers_ab_idx</code> on both columns <code>a</code> and <code>b</code>.</li>
<li>Subsequently, it performs a Bitmap Heap Scan on the selected rows based on the composite index.</li>
</ol>
</li>
<br />
<li>
<b>Select column <code>c</code> for a particular value of column <code>b</code></b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">c</span> <span class="k">FROM</span> <span class="n">numbers</span> <span class="k">WHERE</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">900</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-----------------------------------------------------------------------</span>
<span class="n">Gather</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1000</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">108130</span><span class="p">.</span><span class="mi">94</span> <span class="k">rows</span><span class="o">=</span><span class="mi">9926</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">24</span><span class="p">.</span><span class="mi">402</span><span class="p">..</span><span class="mi">395</span><span class="p">.</span><span class="mi">326</span> <span class="k">rows</span><span class="o">=</span><span class="mi">10027</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Workers</span> <span class="n">Planned</span><span class="p">:</span> <span class="mi">2</span>
<span class="n">Workers</span> <span class="n">Launched</span><span class="p">:</span> <span class="mi">2</span>
<span class="o">-></span> <span class="n">Parallel</span> <span class="n">Seq</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">106138</span><span class="p">.</span><span class="mi">34</span> <span class="k">rows</span><span class="o">=</span><span class="mi">4136</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">9</span><span class="p">.</span><span class="mi">913</span><span class="p">..</span><span class="mi">317</span><span class="p">.</span><span class="mi">809</span> <span class="k">rows</span><span class="o">=</span><span class="mi">3342</span> <span class="n">loops</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">Filter</span><span class="p">:</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="mi">900</span><span class="p">)</span>
<span class="k">Rows</span> <span class="n">Removed</span> <span class="k">by</span> <span class="n">Filter</span><span class="p">:</span> <span class="mi">3329991</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">574</span> <span class="n">ms</span>
<span class="n">JIT</span><span class="p">:</span>
<span class="n">Functions</span><span class="p">:</span> <span class="mi">12</span>
<span class="k">Options</span><span class="p">:</span> <span class="n">Inlining</span> <span class="k">false</span><span class="p">,</span> <span class="n">Optimization</span> <span class="k">false</span><span class="p">,</span> <span class="n">Expressions</span> <span class="k">true</span><span class="p">,</span> <span class="n">Deforming</span> <span class="k">true</span>
<span class="n">Timing</span><span class="p">:</span> <span class="n">Generation</span> <span class="mi">4</span><span class="p">.</span><span class="mi">899</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Inlining</span> <span class="mi">0</span><span class="p">.</span><span class="mi">000</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Optimization</span> <span class="mi">3</span><span class="p">.</span><span class="mi">039</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Emission</span> <span class="mi">25</span><span class="p">.</span><span class="mi">030</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Total</span> <span class="mi">32</span><span class="p">.</span><span class="mi">968</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">398</span><span class="p">.</span><span class="mi">820</span> <span class="n">ms</span>
<span class="p">(</span><span class="mi">12</span> <span class="k">rows</span><span class="p">)</span></code></pre></figure>
Here, we can analyze the following:
<ol style="margin-left: 2rem">
<li>
This time, Postgres did not use the index <code>numbers_a_b_idx</code>. Even though we have a composite index on both columns <code>a</code> and <code>b</code>. Why? Because we cannot use this composite index when scanning a filter. The filter condition is on column <code>a</code>, and the composite index can be used for conditions involving both columns <code>a</code> and <code>b</code> or just column <code>a</code>. However, it cannot be used for conditions involving only column <code>b</code>. Therefore, if we have a composite index on columns <code>a</code> and <code>b</code>, querying on column <code>b</code> alone will not utilize the index.
</li>
</ol>
</li>
<br />
<li>
<b>Select column c but we are going to query on both <code>A</code> and <code>B</code> with <code>AND</code> operation</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">C</span> <span class="k">FROM</span> <span class="n">numbers</span> <span class="k">WHERE</span> <span class="n">A</span> <span class="o">=</span> <span class="mi">60</span> <span class="k">AND</span> <span class="n">B</span> <span class="o">=</span> <span class="mi">600</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">-------------------------------------------------------------------------</span>
<span class="n">Bitmap</span> <span class="n">Heap</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">5</span><span class="p">.</span><span class="mi">44</span><span class="p">..</span><span class="mi">386</span><span class="p">.</span><span class="mi">39</span> <span class="k">rows</span><span class="o">=</span><span class="mi">98</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">732</span><span class="p">..</span><span class="mi">6</span><span class="p">.</span><span class="mi">281</span> <span class="k">rows</span><span class="o">=</span><span class="mi">102</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Recheck</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">((</span><span class="n">a</span> <span class="o">=</span> <span class="mi">60</span><span class="p">)</span> <span class="k">AND</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="mi">600</span><span class="p">))</span>
<span class="n">Heap</span> <span class="n">Blocks</span><span class="p">:</span> <span class="n">exact</span><span class="o">=</span><span class="mi">101</span>
<span class="o">-></span> <span class="n">Bitmap</span> <span class="k">Index</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers_a_b_idx</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">5</span><span class="p">.</span><span class="mi">42</span> <span class="k">rows</span><span class="o">=</span><span class="mi">98</span> <span class="n">width</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">513</span><span class="p">..</span><span class="mi">0</span><span class="p">.</span><span class="mi">513</span> <span class="k">rows</span><span class="o">=</span><span class="mi">102</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">Index</span> <span class="n">Cond</span><span class="p">:</span> <span class="p">((</span><span class="n">a</span> <span class="o">=</span> <span class="mi">60</span><span class="p">)</span> <span class="k">AND</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="mi">600</span><span class="p">))</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">0</span><span class="p">.</span><span class="mi">756</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">6</span><span class="p">.</span><span class="mi">659</span> <span class="n">ms</span></code></pre></figure>
Here, the situation remains the same as earlier when we had an index on both columns A and B.
</li>
<br />
<li>
<b>Select column c but we are going to query on both <code>A</code> and <code>B</code> with <code>OR</code> operation</b>
<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="n">postgres</span><span class="o">=#</span> <span class="k">EXPLAIN</span> <span class="k">ANALYZE</span> <span class="k">SELECT</span> <span class="k">C</span> <span class="k">FROM</span> <span class="n">numbers</span> <span class="k">WHERE</span> <span class="n">A</span> <span class="o">=</span> <span class="mi">60</span> <span class="k">or</span> <span class="n">B</span> <span class="o">=</span> <span class="mi">80</span><span class="p">;</span>
<span class="n">QUERY</span> <span class="n">PLAN</span>
<span class="c1">------------------------------------------------------------------------</span>
<span class="n">Gather</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">1000</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">128404</span><span class="p">.</span><span class="mi">51</span> <span class="k">rows</span><span class="o">=</span><span class="mi">108495</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">20</span><span class="p">.</span><span class="mi">721</span><span class="p">..</span><span class="mi">388</span><span class="p">.</span><span class="mi">512</span> <span class="k">rows</span><span class="o">=</span><span class="mi">109443</span> <span class="n">loops</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">Workers</span> <span class="n">Planned</span><span class="p">:</span> <span class="mi">2</span>
<span class="n">Workers</span> <span class="n">Launched</span><span class="p">:</span> <span class="mi">2</span>
<span class="o">-></span> <span class="n">Parallel</span> <span class="n">Seq</span> <span class="n">Scan</span> <span class="k">on</span> <span class="n">numbers</span> <span class="p">(</span><span class="n">cost</span><span class="o">=</span><span class="mi">0</span><span class="p">.</span><span class="mi">00</span><span class="p">..</span><span class="mi">116555</span><span class="p">.</span><span class="mi">01</span> <span class="k">rows</span><span class="o">=</span><span class="mi">45206</span> <span class="n">width</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span> <span class="p">(</span><span class="n">actual</span> <span class="k">time</span><span class="o">=</span><span class="mi">8</span><span class="p">.</span><span class="mi">325</span><span class="p">..</span><span class="mi">304</span><span class="p">.</span><span class="mi">804</span> <span class="k">rows</span><span class="o">=</span><span class="mi">36481</span> <span class="n">loops</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">Filter</span><span class="p">:</span> <span class="p">((</span><span class="n">a</span> <span class="o">=</span> <span class="mi">60</span><span class="p">)</span> <span class="k">OR</span> <span class="p">(</span><span class="n">b</span> <span class="o">=</span> <span class="mi">80</span><span class="p">))</span>
<span class="k">Rows</span> <span class="n">Removed</span> <span class="k">by</span> <span class="n">Filter</span><span class="p">:</span> <span class="mi">3296853</span>
<span class="n">Planning</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">1</span><span class="p">.</span><span class="mi">009</span> <span class="n">ms</span>
<span class="n">JIT</span><span class="p">:</span>
<span class="n">Functions</span><span class="p">:</span> <span class="mi">12</span>
<span class="k">Options</span><span class="p">:</span> <span class="n">Inlining</span> <span class="k">false</span><span class="p">,</span> <span class="n">Optimization</span> <span class="k">false</span><span class="p">,</span> <span class="n">Expressions</span> <span class="k">true</span><span class="p">,</span> <span class="n">Deforming</span> <span class="k">true</span>
<span class="n">Timing</span><span class="p">:</span> <span class="n">Generation</span> <span class="mi">5</span><span class="p">.</span><span class="mi">795</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Inlining</span> <span class="mi">0</span><span class="p">.</span><span class="mi">000</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Optimization</span> <span class="mi">2</span><span class="p">.</span><span class="mi">561</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Emission</span> <span class="mi">21</span><span class="p">.</span><span class="mi">798</span> <span class="n">ms</span><span class="p">,</span> <span class="n">Total</span> <span class="mi">30</span><span class="p">.</span><span class="mi">154</span> <span class="n">ms</span>
<span class="n">Execution</span> <span class="k">Time</span><span class="p">:</span> <span class="mi">397</span><span class="p">.</span><span class="mi">675</span> <span class="n">ms</span></code></pre></figure>
Here, we can analyze the situation as follows:
<ul style="margin-left: 2rem">
<li>
As observed earlier, it's not feasible to use a composite index on column <code>B</code> individually. The option is either to use it on column <code>A</code> alone or on both columns <code>A</code> and <code>B</code>. Consequently, Postgres opts for a Parallel Sequential Scan in this scenario.
</li>
</ul>
</li>
</ol>
</ul>
<p><a href="/technology/an-in-depth-look-at-database-indexing/">An in-depth look at Database Indexing</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on December 10, 2023.</p>/technology/puma-from-daemonization-to-process-control-with-systemctl-and-monit2023-10-21 21:06:27 +0530T00:00:00-00:002023-10-21T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>Puma is a popular Ruby web server that is known for its speed and scalability. It has undergone significant changes in recent versions(starting 5.0.0). One of the most notable alterations is the removal of the daemonization feature. But what does it mean?</p>
<p>Daemonization, in the context of web servers, is a process that allows a program to run in the background as a system service. In older versions, Puma made it simple for users to daemonize their processes with a straightforward configuration snippet:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="c1">#config/puma.rb</span>
daemonize</code></pre></figure>
<p>However, in recent versions, attempting to use the <code>daemonize</code> code will result in an error, as this functionality has been removed from the codebase.</p>
<h4 id="why-daemonization-should-not-be-part-of-gem"><strong>Why daemonization should not be part of gem?</strong></h4>
<p>Incorporating daemonization directly within a gem can lead to undesirable consequences: as explained by Mike Perham in a <a href="https://www.mikeperham.com/2014/09/22/dont-daemonize-your-daemons/" target="_blank" style="color: blue;">Blog Post</a>. Here are some key points that should be considered -</p>
<ol>
<li><strong>Complexity</strong>: Adding daemonization features to a gem can make its code more complex and challenging.</li>
<li><strong>Maintenance</strong>: The responsibility of maintaining daemonization, automatic restart, and similar core features becomes an additional burden.</li>
<li><strong>Efficiency</strong>: System processes are better equipped to manage tasks like daemonization. Delegating this function to the system ensures more efficient and reliable execution, rather than embedding it within the gem.</li>
</ol>
<p>As a result of these considerations, Puma decided to remove the daemonization feature from the gem.</p>
<p>This decision led us to make some changes in our setup to ensure the smooth running of our applications.</p>
<h4 id="using-systemd"><strong>Using Systemd</strong></h4>
<p>We had previously implemented daemonization for <a href="https://www.elitmus.com/blog/technology/sidekiq-process-in-production-with-systemd-and-monit" target="_blank" style="color: blue;">Sidekiq</a>, which was a process similar to Puma’s needs. Although there were some minor adjustments required for Puma. Here are steps to achieve daemnization through systemctl:</p>
<ol>
<li>Remove <code>daemonization</code> from config/puma.rb file</li>
<li>
Create a file in <code>/lib/systemd/system/puma.service</code>. Below is sample systemd service configuration example, modify it according to your needs.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> <span class="o">[</span>Unit<span class="o">]</span>
<span class="nv">Description</span><span class="o">=</span>Puma HTTP Server
<span class="nv">After</span><span class="o">=</span>network.target
<span class="o">[</span>Service<span class="o">]</span>
<span class="nv">Type</span><span class="o">=</span>notify
<span class="nv">User</span><span class="o">=</span>username
<span class="nv">WorkingDirectory</span><span class="o">=</span>/dir/path
<span class="nv">ExecStart</span><span class="o">=</span>/bin/pumactl start -F /path/puma_config --environment env
<span class="nv">ExecStartPost</span><span class="o">=</span>/bin/sh -c <span class="s1">'/bin/echo $MAINPID > /usr/myapp/shared/pids/puma.pid'</span>
<span class="nv">ExecStop</span><span class="o">=</span>/bin/kill -TSTP <span class="nv">$MAINPID</span>
<span class="nv">RestartSec</span><span class="o">=</span><span class="m">10</span>
<span class="nv">Restart</span><span class="o">=</span>on-failure
<span class="o">[</span>Install<span class="o">]</span>
<span class="nv">WantedBy</span><span class="o">=</span>multi-user.target
</code></pre></figure>
</li>
<li>
Two prominent Puma restart strategies are Phased and Hot restarts. <strong>Phased restarts are slower but ensure that all workers finish their existing requests before restarting the server, while Hot restarts are faster but come with increased latency during the restart.</strong> <br />
To initiate Puma with a phased restart, you can pass the <code>phased-restart</code> option. This choice offers flexibility to adapt Puma's behavior according to specific needs. More about puma restarts <a href="https://github.com/puma/puma/blob/master/docs/restart.md" target="_blank" style="color: blue;">Here</a>.
</li>
<li>
<strong>Monit configurations</strong><br />
Monit is a utility for managing and monitoring processes, programs, files, directories and filesystems on a Unix system <a href="https://mmonit.com/monit/" target="_blank" style="color: blue;">Monit Docs</a>. <br />
Updated <code>monitrc</code> file
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> check process puma with pidfile <span class="s2">"/usr/myapp/shared/pids/puma.pid"</span>
start <span class="nv">program</span> <span class="o">=</span> <span class="s2">"/bin/bash -l -c 'sudo systemctl start puma'"</span> with timeout <span class="m">20</span> seconds
stop <span class="nv">program</span> <span class="o">=</span> <span class="s2">"/bin/bash -l -c 'sudo systemctl stop puma'"</span> with timeout <span class="m">20</span> seconds
<span class="k">if</span> totalmem is greater than <span class="m">800</span> MB <span class="k">for</span> <span class="m">3</span> cycles <span class="k">then</span> restart
<span class="k">if</span> cpu is greater than <span class="m">65</span>% <span class="k">for</span> <span class="m">2</span> cycles <span class="k">then</span> <span class="nb">exec</span> <span class="s2">"/etc/monit/slack_notifier.sh"</span> <span class="k">else</span> <span class="k">if</span> succeeded <span class="k">then</span> <span class="nb">exec</span> <span class="s2">"/etc/monit/slack_notifier.sh"</span>
</code></pre></figure>
</li>
<li>
To check if puma is running correctly follow the commands.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> ps aux <span class="p">|</span> grep puma
sudo monit summary
</code></pre></figure>
</li>
</ol>
<h4 id="exploring-other-alternatives"><strong>Exploring Other Alternatives</strong></h4>
<p>As alternative to this we considered using <a href="https://github.com/kigster/puma-daemon/) gem. it copied the removed code and maintained a separate gem" target="_blank" style="color: blue;">puma-daemon</a> gem, which essentially replicated the removed code and maintained it in a separate gem. However, after careful consideration, we chose not to adopt this alternative for the following reasons:</p>
<ol>
<li>Violation of system standards.</li>
<li>Additional gem and maintainence burden.</li>
</ol>
<h4 id="summary"><strong>Summary</strong></h4>
<p>While the removal of daemonization from Puma may require some adjustments, it aligns with the best practices of modern web server management Managing processes at the system level, using tools like systemd and Monit, is considered a more efficient and maintainable approach. Daemonizing processes within application code is discouraged, as it’s a task that falls under the system level. Ultimately, the shift towards system-level process management ensures the stability and efficiency of web applications.</p>
<p><a href="/technology/puma-from-daemonization-to-process-control-with-systemctl-and-monit/">Puma: From Daemonization to Process Control with Systemctl and Monit</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on October 21, 2023.</p>/technology/demystifying-rails-7-system-tests-configuring-ci-pipeline2023-08-28 17:28:05 +0530T00:00:00-00:002023-08-28T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>In Rails 5.1 and later versions, system tests were introduced as a new type of test to simulate a user interacting with a web application. These tests use a headless browser, typically powered by Capybara and a WebDriver, to mimic a user’s actions like clicking buttons, filling forms, and navigating through the application.</p>
<h3 id="why-do-we-need-system-tests"><strong>Why do we need System Tests?</strong></h3>
<ul>
<li><a href="https://guides.rubyonrails.org/testing.html#system-testing" target="_blank" style="color: blue;">System tests</a> let you test applications in the browser. Because system tests use a real browser experience, you can test all of your JavaScript easily from your test suite.</li>
<li>Typically used for:
<ul style="margin-left: 2rem">
<li><strong>Acceptance testing:</strong> verify that the app has implemented a specific feature</li>
<li><strong>Smoke testing:</strong> verify that the app is functional on a fundamental level and doesn't have code issues.</li>
<li><strong>Characterization testing:</strong> is a type of software testing that involves examining and documenting the behavior of an existing system or application without making any modifications to its code</li>
</ul>
</li>
</ul>
<div style="margin-top: 2rem"></div>
<h3 id="how-we-can-run-system-test"><strong>How we can run System Test?</strong></h3>
<ul>
<li>System Test interacts with your app via an actual browser to run them.</li>
<li>From a technical perspective, system tests aren’t necessarily required to interact with a real browser; they can be set up to utilize the <a href="https://github.com/rack/rack-test" target="_blank" style="color: blue;">rack test</a> backend, which emulates HTTP requests and processes the HTML responses. While system tests based on rack_test run faster and more dependable than front-end tests involving an actual browser, they have notable limitations in mimicking a genuine user experience as they are incapable of executing JavaScript.</li>
</ul>
<div style="margin-top: 2rem"></div>
<h3 id="the-anatomy-of-a-system-test"><strong>The Anatomy of a System Test?</strong></h3>
<div style="margin-bottom: 2rem"></div>
<p><img src="/blog/images/system-test/flow-chart.png" width="425" /></p>
<div style="margin-bottom: 2rem"></div>
<ul>
<li><strong>Minitest</strong>
<ul style="margin-left: 2rem">
<li><a href="https://github.com/seattlerb/minitest" target="_blank" style="color: blue;">Minitest</a> is a small and incredibly fast unit testing framework.</li>
<li>It provides the base classes for test cases.
For Rails System Tests, Rails provides an ApplicationSystemTestCase base class which is in turn based on <i>ActionDispatch::SystemTestCase:</i></li>
</ul>
</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="nb">require</span> <span class="s2">"test_helper"</span>
<span class="k">class</span> <span class="nc">ApplicationSystemTestCase</span> <span class="o"><</span> <span class="no">ActionDispatch</span><span class="o">::</span><span class="no">SystemTestCase</span>
<span class="n">driven_by</span> <span class="ss">:selenium</span><span class="p">,</span> <span class="ss">using</span><span class="p">:</span> <span class="ss">:chrome</span><span class="p">,</span> <span class="ss">screen_size</span><span class="p">:</span> <span class="o">[</span><span class="mi">1400</span><span class="p">,</span> <span class="mi">1400</span><span class="o">]</span>
<span class="k">end</span>
</code></pre></figure>
<ul style="margin-left: 2rem">
<li>In <code>ActionDispatch::SystemTestCase</code> we require the <code>capybara/minitest</code> library.</li>
<li>It provides basics assertions like <strong>assert_equal, assert_nil, assert_same, assert_raises, assert_includes</strong>.</li>
<li>A runner to run the tests and report on their success and failure.</li>
</ul>
<ul>
<li>
<p><strong>Capybara</strong></p>
<ul style="margin-left: 2rem">
<li><a href="https://github.com/teamcapybara/capybara" target="_blank" style="color: blue;">Capybara</a> starts your app in a separate process before running the tests. This ensures that the tests are run against the correct version of your app.</li>
<li>Capybara provides a high-level API that makes it easy to write tests in a natural way. For example, you can write a test that says <code>"click the button"</code> instead of having to write code to find the button and click it.</li>
<li>Here is an example of a test written with Capybara's DSL (Domain Specific Language):</li>
</ul>
</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">visit</span><span class="p">(</span><span class="s1">'/login'</span><span class="p">)</span>
<span class="n">fill_in</span><span class="p">(</span><span class="s1">'email'</span><span class="p">,</span> <span class="ss">with</span><span class="p">:</span> <span class="s1">'user@example.com'</span><span class="p">)</span>
<span class="n">fill_in</span><span class="p">(</span><span class="s1">'password'</span><span class="p">,</span> <span class="ss">with</span><span class="p">:</span> <span class="s1">'password'</span><span class="p">)</span>
<span class="n">click_button</span><span class="p">(</span><span class="s1">'Login'</span><span class="p">)</span>
</code></pre></figure>
<ul>
<li>
<p><strong>Selenium-Webdriver</strong></p>
<ul style="margin-left: 2rem">
<li>Capybara uses the <a href="https://rubygems.org/gems/selenium-webdriver/versions/4.11.0" target="_blank" style="color: blue;">Selenium Webdriver</a> library to interact with real browsers. Selenium WebDriver is a cross-platform library that provides a way to control web browsers from code. Capybara uses Selenium WebDriver to translate its high-level DSL (Domain Specific Language) into low-level commands that the browser can understand.</li>
</ul>
</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="nb">require</span> <span class="s2">"selenium-webdriver"</span>
<span class="n">driver</span> <span class="o">=</span> <span class="no">Selenium</span><span class="o">::</span><span class="no">WebDriver</span><span class="o">.</span><span class="n">for</span> <span class="ss">:firefox</span>
<span class="n">driver</span><span class="o">.</span><span class="n">navigate</span><span class="o">.</span><span class="n">to</span> <span class="s2">"http://google.com"</span>
<span class="n">element</span> <span class="o">=</span> <span class="n">driver</span><span class="o">.</span><span class="n">find_element</span><span class="p">(</span><span class="nb">name</span><span class="p">:</span> <span class="s1">'q'</span><span class="p">)</span>
<span class="n">element</span><span class="o">.</span><span class="n">send_keys</span> <span class="s2">"Hello WebDriver!"</span>
<span class="n">element</span><span class="o">.</span><span class="n">submit</span>
<span class="nb">puts</span> <span class="n">driver</span><span class="o">.</span><span class="n">title</span>
<span class="no">Driver</span><span class="o">.</span><span class="n">quit</span>
</code></pre></figure>
<ul style="margin-left: 2rem">
<li>You can see how it’s a bit lower-level than the Capybara example further up. The selenium-webdriver library translates these calls into WebDriver Protocol, which it speaks to a webdriver executable.</li>
</ul>
<ul>
<li>
<p><strong>Webdriver Protocol</strong></p>
<ul style="margin-left: 2rem">
<li>The Selenium WebDriver library translates its calls into the <a href="https://www.w3.org/TR/webdriver2/" target="_blank" style="color: blue;">WebDriver Protocol</a>. The WebDriver Protocol is a HTTP-based wire protocol that is used to communicate between the Selenium WebDriver library and the web browser.</li>
<li>In order to start a chrome browser window and navigate to google.com. We need to startup geckodriver.</li>
<li>We send it a <strong>“new session”</strong> command with a HTTP post request</li>
</ul>
</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> curl -X POST <span class="s1">'http://127.0.0.1:9515/session'</span> -d <span class="s1">'{"capabilities":{"firstMatch":[{"browserName":"firefox"}]}}'</span>
</code></pre></figure>
<ul style="margin-left: 2rem">
<li>This return a session id along with data</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> <span class="o">{</span> ... <span class="s2">"sessionId"</span>:<span class="s2">"f1776ba558e28309299dc5f62864e977"</span> ... <span class="o">}</span>
</code></pre></figure>
<ul style="margin-left: 2rem">
<li>Then we make another post request with a session id. And url in data parameters</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> curl -X POST <span class="s1">'http://127.0.0.1:9515/session/f1776ba558e28309299dc5f62864e977/url'</span> -d <span class="s1">'{"url": "https://google.com"}'</span>
</code></pre></figure>
<ul>
<li><strong>Webdriver</strong>
<ul style="margin-left: 2rem">
<li>Webdriver is a tool that speaks <strong>“Webdriver protocol”</strong> and controls the browser.</li>
<li>Every major browser there is an associated webdriver tool. Chrome has <a href="https://sites.google.com/a/chromium.org/chromedriver/home" target="_blank" style="color: blue;">chromedriver</a>. Firefox has a <a href="https://github.com/mozilla/geckodriver" target="_blank" style="color: blue;">geckodriver</a>. MS Edge has <a href="https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/" target="_blank" style="color: blue;">edgedriver</a>. Safari has <a href="https://developer.apple.com/documentation/webkit/testing_with_webdriver_in_safari" target="_blank" style="color: blue;">safaridriver</a>.</li>
<li>WebDriver tools act as servers: when you execute them, they start a persistent process that listens for HTTP requests until it is terminated.</li>
</ul>
</li>
</ul>
<div style="margin-bottom: 1rem"></div>
<ul>
<li><strong>Webdrivers gem</strong>
<ul style="margin-left: 2rem">
<li>Before selenium-webdriver 4.11, <a href="https://github.com/titusfortner/webdrivers" target="_blank" style="color: blue;">webdrivers</a> gem automatically determines which WebDriver executable needs to be downloaded for your platform and selected browser, downloads it, and arranges for that executable to be used by selenium-webdriver.</li>
<li>From version 4.11, they have incorporated the functionality in selenium-webdriver gem using <a href="https://www.selenium.dev/blog/2023/whats-new-in-selenium-manager-with-selenium-4.11.0/" target="_blank" style="color: blue;">selenium-manager</a>.</li>
</ul>
</li>
</ul>
<p><img src="/blog/images/system-test/webdriver.png" width="425" /></p>
<div style="margin-top: 2rem"></div>
<h3 id="running-rails-7-system-tests-with-docker-and-gitlab-runner-on-arm64-and-amd64-linux-machines"><strong>Running Rails 7 System Tests with Docker and Gitlab Runner on Arm64 and Amd64 linux machines</strong></h3>
<div style="margin-top: 2rem"></div>
<p><strong>Step 1: Prepare the Rails 7 application for testing</strong></p>
<ul>
<li>Run the command below to generate a very basic Ruby on Rails 7 app:</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>rails new minitest-rails-app</code></pre></figure>
<ul>
<li>Go ahead and open up the project in your favourite editor and proceed to the Gemfile, specifically to the test block:</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">group</span> <span class="ss">:test</span> <span class="k">do</span>
<span class="c1"># Use system testing [https://guides.rubyonrails.org/testing.html#system-testing]</span>
<span class="n">gem</span> <span class="s2">"capybara"</span>
<span class="n">gem</span> <span class="s2">"selenium-webdriver"</span>
<span class="n">gem</span> <span class="s2">"webdrivers"</span>
<span class="k">end</span>
</code></pre></figure>
<ul>
<li>Next, let’s do a quick scaffold generation to have something to work with:</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> rails generate scaffold Blog title:string body:text
</code></pre></figure>
<ul>
<li>Usually, generating a scaffold will automatically generate the <code>application_system_test_case.rb</code> and everything you need for the system tests</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">application_system_test_case</span><span class="o">.</span><span class="n">rb</span> <span class="p">(</span><span class="n">default</span><span class="p">)</span>
<span class="nb">require</span> <span class="s2">"test_helper"</span>
<span class="k">class</span> <span class="nc">ApplicationSystemTestCase</span> <span class="o"><</span> <span class="no">ActionDispatch</span><span class="o">::</span><span class="no">SystemTestCase</span>
<span class="n">driven_by</span> <span class="ss">:selenium</span><span class="p">,</span> <span class="ss">using</span><span class="p">:</span> <span class="ss">:chrome</span><span class="p">,</span> <span class="ss">screen_size</span><span class="p">:</span> <span class="o">[</span><span class="mi">1400</span><span class="p">,</span> <span class="mi">1400</span><span class="o">]</span>
<span class="k">end</span>
</code></pre></figure>
<ul>
<li>Run the database commands</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> rails db:setup
rails db:migrate
</code></pre></figure>
<ul>
<li>Running a Basic System For the First Time</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> rails test:system
</code></pre></figure>
<p><strong>Step 2: Exclude the gem webdrivers from the list of dependencies</strong></p>
<ul>
<li>Before selenium-webdriver 4.11, webdrivers gem automatically download webdriver executable.</li>
<li>From version 4.11, they have incorporated the functionality in selenium-webdriver gem using selenium-manager.</li>
<li>We can comment out the webdrivers line from Gemfile.</li>
<li>After change, <code>Gemfile</code> looks like this</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">group</span> <span class="ss">:test</span> <span class="k">do</span>
<span class="c1"># Use system testing [https://guides.rubyonrails.org/testing.html#system-testing]</span>
<span class="n">gem</span> <span class="s2">"capybara"</span>
<span class="n">gem</span> <span class="s2">"selenium-webdriver"</span><span class="p">,</span> <span class="s2">"~> 4.11"</span>
<span class="c1">#gem "webdrivers"</span>
<span class="k">end</span>
</code></pre></figure>
<p><strong>Step 3: Point the Selenium-webdriver to use the firefox browser</strong></p>
<ul>
<li>As chrome has not released binary compatible with <code>linux/arm64</code> machine. So the test failed on the arm64 linux machine. I tried multiple approaches to make it work with headless_chrome, but didn’t work and commend the issue in details in this <a href="https://github.com/titusfortner/webdrivers/issues/213#issuecomment-1686094017" target="_blank" style="color: blue;">issue tracker</a></li>
<li>We need to change the browser to the firefox.</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="c1">#application_system_test_case.rb (change driver to Firefox)</span>
<span class="nb">require</span> <span class="s2">"test_helper"</span>
<span class="k">class</span> <span class="nc">ApplicationSystemTestCase</span> <span class="o"><</span> <span class="no">ActionDispatch</span><span class="o">::</span><span class="no">SystemTestCase</span>
<span class="n">driven_by</span> <span class="ss">:selenium</span><span class="p">,</span> <span class="ss">using</span><span class="p">:</span> <span class="ss">:firefox</span><span class="p">,</span> <span class="ss">screen_size</span><span class="p">:</span> <span class="o">[</span><span class="mi">1400</span><span class="p">,</span> <span class="mi">1400</span><span class="o">]</span>
<span class="k">end</span>
</code></pre></figure>
<p><strong>Step 4: Prepare the docker image</strong></p>
<ul>
<li>Create <code>Dockerfile</code></li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> FROM ruby:3.1.2-slim-buster
RUN apt-get update
RUN apt-get -y install gnupg curl wget xvfb unzip
ENV NODE_VERSION <span class="m">19</span>
RUN curl -fsSL https://deb.nodesource.com/setup_<span class="si">${</span><span class="nv">NODE_VERSION</span><span class="si">}</span>.x <span class="p">|</span> bash - <span class="o">&&</span> <span class="se">\</span>
apt-get install --yes nodejs <span class="o">&&</span> <span class="se">\</span>
apt-get install --yes libxss1 libappindicator1 libindicator7 python2
RUN apt-get update <span class="o">&&</span> <span class="se">\</span>
apt-get install --yes software-properties-common build-essential libssl-dev sqlite3 libsqlite3-dev pkg-config ca-certificates firefox-esr
RUN apt-get install -y git-all
RUN npm install yarn -g
ADD . /data
</code></pre></figure>
<ul>
<li>
<p>This Dockerfile sets up an image with Ruby 3.1.2 and Node.js 19 installed. It installs system dependencies like Git, Yarn, various libraries for sqlite and Firefox.</p>
</li>
<li>
<p>Build Docker image</p>
</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> docker buildx build -t dockermanishelitmus/systemtest-rails-app:latest1.0 . --platform linux/amd64,linux/arm64 --push
</code></pre></figure>
<ul>
<li>Command is building a Docker image using the buildx extension, targeting two different platforms (Intel/AMD 64-bit and ARM 64-bit), tagging the image as latest1.0, and pushing the resulting image to a container registry.</li>
</ul>
<p><strong>Step 5: Prepare the gitlab-runner</strong></p>
<ul>
<li>In the project root directory create a file <code>.gitlab-ci.yml</code> with content</li>
</ul>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span></span><span class="nt">image</span><span class="p">:</span> <span class="s">"dockermanishelitmus/systemtest-rails-app:latest1.0"</span>
<span class="nt">services</span><span class="p">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">redis:latest</span>
<span class="nt">variables</span><span class="p">:</span>
<span class="nt">RAILS_ENV</span><span class="p">:</span> <span class="s">"test"</span>
<span class="nt">cache</span><span class="p">:</span>
<span class="nt">paths</span><span class="p">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">vendor/ruby</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">node_modules/</span>
<span class="nt">before_script</span><span class="p">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">gem install bundler --no-document</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">bundle config set force_ruby_platform true</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">bundle install</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">bin/rake db:drop</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">bin/rake db:setup</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">bin/rake db:migrate</span>
<span class="nt">stages</span><span class="p">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">tests</span>
<span class="nt">SystemTests</span><span class="p">:</span>
<span class="nt">stage</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">tests</span>
<span class="nt">script</span><span class="p">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">yarn install</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">bin/rake assets:precompile</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">bin/rails test:system</span>
<span class="nt">artifacts</span><span class="p">:</span>
<span class="nt">when</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">on_failure</span>
<span class="nt">name</span><span class="p">:</span> <span class="s">"$CI_JOB_NAME-$CI_COMMIT_REF_NAME"</span>
<span class="nt">paths</span><span class="p">:</span>
<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">coverage/</span>
<span class="nt">expire_in</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">1 day</span></code></pre></figure>
<ul>
<li>Finally run your test suite</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>gitlab-runner <span class="nb">exec</span> docker SystemTests</code></pre></figure>
<ul>
<li>Output</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> $ bin/rails test:system
Running <span class="m">4</span> tests <span class="k">in</span> a single process <span class="o">(</span>parallelization threshold is <span class="m">50</span><span class="o">)</span>
Run options: --seed <span class="m">13031</span>
<span class="c1"># Running:</span>
Capybara starting Puma...
* Version <span class="m">5</span>.6.7 , codename: Birdie<span class="err">'</span>s Version
* Min threads: <span class="m">0</span>, max threads: <span class="m">4</span>
* Listening on http://127.0.0.1:33385
....
Finished <span class="k">in</span> <span class="m">7</span>.865541s, <span class="m">0</span>.5085 runs/s, <span class="m">0</span>.5085 assertions/s.
<span class="m">4</span> runs, <span class="m">4</span> assertions, <span class="m">0</span> failures, <span class="m">0</span> errors, <span class="m">0</span> skips
Saving cache <span class="k">for</span> successful job
Creating cache SystemTests/main...
WARNING: vendor/ruby: no matching files. Ensure that the artifact path is relative to the working directory
node_modules/: found <span class="m">2</span> matching files and directories
No URL provided, cache will not be uploaded to shared cache server. Cache will be stored only locally.
Created cache
Job succeeded</code></pre></figure>
<h3 id="conclusion"><strong>Conclusion</strong></h3>
<p>Now we have a setup that enables us to run system tests in both arm64 and amd64 linux machines with minimal customizations we may want to add. A few tips and tricks should help to get your first system tests up and running in CI pipeline.</p>
<p><a href="/technology/demystifying-rails-7-system-tests-configuring-ci-pipeline/">Demystifying Rails 7 System Tests: Configuring CI Pipeline</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on August 28, 2023.</p>/technology/building-a-frontend-scoring-engine-automating-frontend-evaluation2023-10-05 20:17:46 +0530T00:00:00-00:002023-07-21T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>The frontend scoring engine is a powerful tool designed to assess the frontend skills of candidates based on code quality, responsiveness, and functionality. It aims to streamline the evaluation process for frontend development by automating the assessment of code quality, best practices, and functionality.</p>
<h2 id="what-youll-learn-from-this-blog"><strong>What you’ll learn from this blog</strong></h2>
<p>In this blog, we will dive into the technical aspects of building a frontend scoring engine.</p>
<ul>
<li>The need for frontend scoring engine in today’s technology landscape.</li>
<li>The technical requirements gathering and Research phase involved.</li>
<li>Generation of Test script for Test automation using Puppeteer.</li>
<li>Dockerizing the Application.</li>
<li>Features and Process of building the application.</li>
</ul>
<h2 id="need-for-the-frontend-scoring-engine"><strong>Need for the Frontend Scoring Engine</strong></h2>
<p>In today’s technology-driven world, the demand for skilled frontend developers is at an all-time high. With the rapid evolution of web applications and user interfaces, companies are constantly seeking talented individuals who can create visually appealing, intuitive, and responsive frontend experiences. However, evaluating frontend development skills can be a complex and time-consuming task. This is where a frontend scoring engine comes into play Automating the Evaluation Process, Measurement of Code Quality and Ensuring Mobile Responsiveness. By allowing users to input HTML, CSS and JavaScript code, and generating scores based on predefined test cases, the scoring engine provides a comprehensive evaluation of candidates’ frontend skills.</p>
<h2 id="research-work"><strong>Research Work</strong></h2>
<p>Before starting the implementation of the frontend scoring engine project, extensive research was conducted to understand the need for such a system, evaluate existing systems, explore testing tools, and plan the evaluation process. This research phase played a crucial role in shaping the project and ensuring its successful execution. Let’s take a brief look on highlight and the key areas of research conducted during the project’s inception.</p>
<ol>
<li><strong>Evaluating Existing Systems</strong> :
To gain insights into the existing solutions available in the market, a comprehensive evaluation of similar systems was conducted. Various frontend scoring engines, online code editors were explored to understand their features, functionalities, strengths, and weaknesses. This evaluation provided valuable insights that influenced the design decisions and feature set of the new scoring engine. <br />
Some similar existing systems:
<ul>
<li><a href="https://codier.io/" target="_blank" style="color: blue;">Codier.io</a></li>
<li><a href="https://www.frontendmentor.io/" target="_blank" style="color: blue;">Frontend Mentor</a></li>
<li><a href="https://cssbattle.dev" target="_blank" style="color: blue;">CSS Battle</a></li>
<li><a href="https://www.algoexpert.io/frontend/product" target="_blank" style="color: blue;">Algoexpert.io Frontend</a><br /><br /></li>
</ul>
</li>
<li><strong>Testing Tools and Technologies</strong> :
During our research, we explored various testing tools and technologies to find the perfect fit for executing test cases, assessing code quality, and evaluating frontend functionalities. The evaluation revolved around factors like capabilities, ease of use, and compatibility with our project requirements. Tools such as Selenium, Cypress, Jest, csslint, eslint were taken into consideration.<br />
Read more about the tools:
<ul>
<li><a href="https://www.selenium.dev/documentation/" target="_blank" style="color: blue;">Selenium</a></li>
<li><a href="https://docs.cypress.io/guides/overview/why-cypress" target="_blank" style="color: blue;">Cypress</a></li>
<li><a href="https://jestjs.io/docs/getting-started" target="_blank" style="color: blue;">Jest</a><br /><br /></li>
</ul>
</li>
<li><strong>Puppeteer</strong> :
Puppeteer was chosen over Selenium primarily due to its compatibility with Docker and its ability to control headless Chrome or Chromium instances. Docker provides an efficient and scalable environment for running tests, and Puppeteer seamlessly integrates with Docker containers. Additionally, Puppeteer offers a more modern and concise API, making it easier to write test scripts and perform browser automation tasks.
<ul>
<li><a href="https://oxylabs.io/blog/puppeteer-vs-selenium" target="_blank" style="color: blue;">Puppeteer vs Selenium</a></li>
<li><a href="https://pptr.dev/" target="_blank" style="color: blue;">Puppeteer Docs</a><br /><br /></li>
</ul>
</li>
<li><strong>Docker Integration</strong> :
We explored the benefits of Docker, a widely-used containerization platform, and discovered how it could greatly enhance our project. Docker allows us to create lightweight, portable, and isolated containers, which provide a consistent and reproducible environment. Leveraging Docker, we encapsulated and ran our scoring engine, testing tools, and other dependencies, ensuring seamless integration and efficient execution. <br />
We pulled various Docker images from Docker Hub, enabling us to set up the required tools effortlessly.
<ul>
<li><a href="https://hub.docker.com/r/eeacms/csslint" target="_blank" style="color: blue;">csslint</a></li>
<li><a href="https://hub.docker.com/r/cytopia/eslint" target="_blank" style="color: blue;">eslint</a></li>
<li><a href="https://hub.docker.com/r/cfreak/jest" target="_blank" style="color: blue;">jest</a><br /><br /></li>
</ul>
</li>
<li><strong>Real-Time Code Editor</strong> :
To provide a user-friendly and real-time code editing experience, we started searching for frontend code editors and existing projects available on GitHub. Various code editor projects were evaluated, and their source code were studied to understand the implementation details. This research helped in selecting the most suitable code editor framework and implementing it within our frontend scoring engine. <br />
<ul>
<li><a href="https://codepen.io/" target="_blank" style="color: blue;">Codepen</a></li>
<li><a href="https://www.fronteditor.dev/" target="_blank" style="color: blue;">Fronteditor</a></li>
<li><a href="https://github.com/Prince-Codemon/Code-G-The-Coding-Playground-" target="_blank" style="color: blue;">CodeG</a><br /><br /></li>
</ul>
</li>
<li>
<p><strong>Problem Statement and Test Case Creation</strong> :
The goal was to design problem statements that accurately reflect real-world frontend development challenges and create test cases that thoroughly evaluate candidates’ code. Puppeteer test scripts were written to simulate user interactions, perform assertions, and capture screenshots for image comparison using the PixelMatch JavaScript library.<br /></p>
</li>
<li><strong>Cloud Deployment and Infrastructure</strong> :
For our final Deployment and integration Amazon Web Services (AWS) was choosen. The research covered various AWS services, including EC2 instances for hosting the scoring engine, S3 for storage, and other relevant services for infrastructure setup. The deployment process, security considerations, and scaling options were thoroughly explored to ensure a robust and scalable deployment architecture.<br /></li>
</ol>
<h2 id="test-script-generation"><strong>Test Script Generation</strong></h2>
<p>In the frontend scoring engine, we ensure evaluation of user-submitted HTML, CSS, and JavaScript code by subjecting it to comprehensive testing against predefined test cases. These tests are designed to assess the code quality, functionality, and adherence to best practices, providing a total assessment of candidates’ frontend development skills. By conducting these thorough evaluations, we can accurately determine the proficiency of developers in creating efficient and reliable frontend solutions. Throughout this section, you’ll get an overview of the various types of tests performed, explaining their significance in evaluating code quality and functionality.</p>
<ul>
<li>
<p><strong>Heading/Element Testing</strong>
This test focuses on ensuring the presence and correctness of specific HTML elements within the user’s code. Test cases are designed to check if required headings, such as h1, h2, p or specific elements identified by ID or class, are present. The purpose of this test is to assess the structure and semantic correctness of the user’s HTML code.</p>
</li>
<li>
<p><strong>CSS Properties Testing</strong>
This test aims to verify the correct usage of CSS properties in the user’s code. It includes checking for the presence of essential CSS properties, such as margin, padding, font-size, or specific properties required for a particular problem statement. This test ensures that the user’s code adheres to the defined CSS requirements and best practices.</p>
</li>
<li>
<p><strong>Form Validation Testing</strong>
Form validation testing focuses on assessing the user’s code for proper form validation techniques. Test cases can include checking for required fields, validating email formats, enforcing password complexity, or implementing custom validation logic. This test ensures that the user’s code handles form validation correctly and provides appropriate error messages.</p>
</li>
<li>
<p><strong>Function Testing</strong>
This test evaluates the functionality and correctness of JavaScript functions implemented by the user. Test cases are designed to cover different scenarios and edge cases to ensure that the functions perform as expected. This test assesses the user’s ability to write functional and efficient JavaScript code.</p>
</li>
<li>
<p><strong>API Testing</strong>
API testing involves verifying the integration of API calls in the user’s code. Test cases may include checking if an API request is made, handling the API response correctly, and displaying the data from the API on the page. This test ensures that the user’s code effectively interacts with external APIs.</p>
</li>
<li>
<p><strong>Button Testing</strong>
Button testing focuses on evaluating the behavior and interactivity of buttons implemented by the user. Test cases may include checking if a button triggers a specific action, updates the UI, or performs a navigation action. This test ensures the proper functionality of user-defined buttons.</p>
</li>
<li>
<p><strong>Redirection Testing</strong>
This test aims to assess the behavior of navigation and redirection implemented by the user’s code. Test cases may include checking if clicking a link or a button redirects the user to the correct page or if the page refreshes as intended. This test ensures that the user’s code correctly handles navigation and redirection scenarios.</p>
</li>
</ul>
<h2 id="dockerizing-the-puppeteer-with-chrome-browser-support"><strong>Dockerizing the Puppeteer with Chrome Browser Support</strong></h2>
<h4 id="dockerfile">Dockerfile:</h4>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="c1"># Use the node:slim base image</span>
FROM node:slim
<span class="c1"># Set an environment variable to skip Puppeteer Chromium download during installation</span>
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD <span class="nb">true</span>
RUN apt-get update <span class="o">&&</span> apt-get install gnupg wget -y <span class="o">&&</span> <span class="se">\</span>
wget --quiet --output-document<span class="o">=</span>- https://dl-ssl.google.com/linux/linux_signing_key.pub <span class="p">|</span> gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg <span class="o">&&</span> <span class="se">\</span>
sh -c <span class="s1">'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'</span> <span class="o">&&</span> <span class="se">\</span>
apt-get update <span class="o">&&</span> <span class="se">\</span>
apt-get install google-chrome-stable -y --no-install-recommends <span class="o">&&</span> <span class="se">\</span>
rm -rf /var/lib/apt/lists/<span class="se">\*</span>
<span class="c1"># Set the working directory inside the container</span>
WORKDIR /usr/src/app
<span class="c1"># Copy the package.json file to the working directory</span>
COPY package.json ./
<span class="c1"># Install project dependencies using npm</span>
RUN npm install
<span class="c1"># Expose port 3000 to allow access to the app outside the container</span>
EXPOSE <span class="m">3000</span>
<span class="c1"># Run the app using the "npm test" command when the container starts</span>
CMD <span class="o">[</span><span class="s2">"npm"</span>, <span class="s2">"test"</span><span class="o">]</span></code></pre></figure>
<h4 id="build-command">Build command:</h4>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>docker build -t bhushan21z/puppchrome .</code></pre></figure>
<h4 id="publish-it-to-docker-hub">Publish it to Docker Hub:</h4>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>docker push bhushan21z/puppchrome:tagname</code></pre></figure>
<h4 id="pull-commnd">Pull commnd:</h4>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>docker pull bhushan21z/puppchrome</code></pre></figure>
<h4 id="run-command">Run command:</h4>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>docker run -it --rm -v <span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/files:usr/src/app/files puppeteerchrome</code></pre></figure>
<h2 id="features-and-architecture"><strong>Features and Architecture</strong></h2>
<p><img src="/blog/images/frontend-scoring-engine/frontend_scoring_engine_architecture.png" alt="Application Architecture" />
<br /></p>
<h4 id="scoring-engine"><strong>Scoring Engine:</strong></h4>
<ol>
<li>Inputs: The scoring engine takes HTML, CSS, and JavaScript files created by users on the client side, as well as the test cases file generated on the backend.</li>
<li>Code Quality Assessment: The engine assesses code quality using ESLint CSSlint and similar tools.</li>
<li>Scoring: The engine generates a score based on code quality, along with the results of the test cases executed on the client-side code.</li>
<li>Modular Architecture: The scoring engine is a separate entity, independent of the frontend and backend code.</li>
<li>Technology Stack: Python Flask framework is used to implement the scoring engine.</li>
<li>Working: Flask runs various Docker run commands to execute test script.</li>
</ol>
<h4 id="backend"><strong>Backend:</strong></h4>
<ol>
<li>MySql Database: Schema Created with various tables such as users, questions, testcases and submissions.</li>
<li>Node JS: Express framework is used to implement Rest APIs.</li>
<li>User auth: Contains user register and login APIs.</li>
<li>Questions: Questions create/get APIs.</li>
<li>Test Cases: Testcases create/get APIs and joining it with Questions table with question id as foreign key.</li>
<li>Scoring Engine: POST request to get user data and sending it to scoring engine and returning scoring engine response to frontend.</li>
<li>Submissions: User Submissions create/get APIs and joing it with users table and questions table.</li>
</ol>
<h4 id="frontend-admin-side"><strong>Frontend (Admin Side):</strong></h4>
<ol>
<li>Problem Creation: Admins can create problem statements, describing the problem to be solved.</li>
<li>Problem Settings: Problems can include various settings such as score weightage, best practices to check, and mobile responsiveness evaluation.</li>
<li>Test Cases: Admins can add multiple test cases related to each problem statement.</li>
<li>Test Case Visibility: Some test case outputs will be visible to users, while others will be hidden, showing only whether the score passed or failed.</li>
<li>User-Friendly Test Case Creation: Adding test cases are straightforward, even for users with limited programming knowledge.</li>
</ol>
<h4 id="frontend-client-side"><strong>Frontend (Client Side):</strong></h4>
<ol>
<li>Problem List: Users can view a list of problems on their screen.</li>
<li>Code Editor: Users can write HTML, CSS, and JavaScript code for each problem, similar to the CodePen editor.</li>
<li>Code Compilation: Users can compile their code and generate the output.</li>
<li>Score Display: Users can view the scores generated by the scoring engine based on the performed test cases.<br /></li>
</ol>
<h2 id="tools--technologies"><strong>Tools & Technologies</strong></h2>
<h4 id="frontend"><strong>Frontend:</strong></h4>
<ul>
<li>ReactJS is used develop the frontend of the scoring engine.</li>
</ul>
<h4 id="backend-1"><strong>Backend:</strong></h4>
<ul>
<li>Node.js is employed for building the backend of the scoring engine.</li>
<li>MySQL is used as the database management system.</li>
</ul>
<h4 id="scoring-engine-1"><strong>Scoring Engine</strong></h4>
<ul>
<li>Puppeteer is used for implementing testcases and browser testing.</li>
<li>Docker containers are utilized for testing code quality and running test cases.</li>
<li>Flask is used to make scoring engine server which takes data and interacts with docker.</li>
</ul>
<h2 id="conclusion"><strong>Conclusion</strong></h2>
<p>By implementing a frontend scoring engine, we can automate frontend development evaluation, resulting in a streamlined and efficient assessment process. This blog has explored the goals, research, features, technical requirements, and tools and technologies involved in developing a frontend scoring engine. The automation of code assessment, real-time editing, and integration of testing tools have resulted in an efficient and comprehensive evaluation platform. The challenges we faced during development have strengthened our understanding of frontend development and inspired innovative solutions. As we move forward, we remain committed to enhancing the scoring engine to meet the evolving needs of the tech industry. <br />
If you have any questions, doubts or suggestions feel free to reach out to me on <a href="https://www.linkedin.com/in/bhushan-wanjari-952042213/" target="_blank" style="color: blue;">LinkedIn</a></p>
<p><a href="/technology/building-a-frontend-scoring-engine-automating-frontend-evaluation/">Building a Frontend Scoring Engine: Automating Frontend Evaluation</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on July 21, 2023.</p>/technology/revamping-elitmus-dot-com-stand-alone-front-end-module2023-07-20 04:10:15 +0530T00:00:00-00:002023-07-20T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>The current elitmus.com is a web application built with Ruby on Rails Framework, and the views are sent directly from the backend server whenever requested. This was quite good before, but in present scenario of internet and web technologies, these seem to lack some very basic requirements. And Hence, an upgradation is required.</p>
<p>Formally, current <a href="http://elitmus.com">elitmus.com</a> has a monolithic structure i.e. the front-end and the back-end are tightly coupled together. As a result of this, it is not possible to divide the project’s logic and team for front-end and back-end. Only Full Stack Developers having knowledge of both the domains are required in order to work in this project. This somehow limits the people who are more expertised in one of the domains.</p>
<p>Also, the present <a href="http://elitmus.com">elitmus.com</a> is not using the latest web technologies available. This greatly impacts the user experience.</p>
<p>So, What’s the solution for this ?</p>
<p><img src="/blog/images/revamping-elitmus-dot-com-stand-alone-front-end-module/monolithic-distributed.png" alt="Monolithic and Distributed Systems" /></p>
<p>Well, we can separate the front-end and back-end. This will solve all the problems faced by the developers who work or tends to work in this project. This solves some of the major issues faced today by developers.</p>
<p>Now, we can have a distributed system, with the views ( front-end ) in one place and the Models and Controllers in the other. The Front-end we plan to build can be built using the latest and efficient web technologies currently available. This helps to improve the User Experience as well.</p>
<h2 id="what-benefits-">What Benefits ?</h2>
<hr />
<ul>
<li><strong>Developer Experience</strong>
<ul>
<li>Team Separation → We can have Dedicated teams for front-end and back-end, each expertised in their own domains</li>
<li>Logic Separation → We can separate the Logic of course for the frontend and backend</li>
<li>Easy to Manage</li>
<li>Easy to Scale</li>
</ul>
</li>
<li><strong>User Experience</strong>
<ul>
<li>Latest Web Tech like React can be used to Build Views</li>
<li>Improved Speed</li>
<li>Improved Performance</li>
<li>Consistency in design</li>
</ul>
</li>
</ul>
<p><br /></p>
<h2 id="how-do-we-do-it">How do we do it?</h2>
<hr />
<p>Well, now that we know what we have to do. We are halfway there already ( Just Kidding ). Let’s discuss some of the things we can use to make the front-end efficient and reliable.</p>
<ul>
<li><strong>React JS</strong>
<ul>
<li>Its component architecture , helps us building a consistent design across the site.</li>
<li>It’s fast and performant.</li>
</ul>
</li>
<li><strong>Tailwind CSS</strong>
<ul>
<li>This is a light-weight CSS framework which is highly reliable and easy to use.</li>
<li>This has a good community, which can help to borrow UI components rather than making it from scratch.</li>
</ul>
</li>
<li><strong>Redux Toolkit</strong>
<ul>
<li>Redux Toolkit is a light version of Redux, which extracts away a lot of boilerplate codes and provides us easy to use APIs to manage state.</li>
</ul>
</li>
<li><strong>Jest</strong>
<ul>
<li>Jest is the most popular library for writing tests in a react application. Infact, Create-React-App provides support for this out of the box when we initiate a new react project.</li>
</ul>
</li>
</ul>
<p>So, that’s all the core technologies we can use to build an efficient and reliable front-end. But, here is the catch: we can even improve more by following certain practices, which will be fruitful in the long run.</p>
<p><br /></p>
<h2 id="what-else-can-we-improve-">What else can we Improve ?</h2>
<hr />
<p>Following are some of the best practices that we can use to further improve the frontend application.</p>
<ul>
<li><strong>ES Lint</strong>
<ul>
<li>Enforcing a code style guide is important to maintain the source code of the application. It helps to maintain consistency across the application.</li>
<li>More Particularly, we can use the AirBNB Style Guide. This is the most popular style guide for React Application.</li>
<li>We can add rules as per our need and requirements in the .eslintrc.js</li>
</ul>
</li>
<li><strong>Nested Routes</strong>
<ul>
<li>This is one of the features of react. We can nest the routes under other routes to maintain a route intuition.</li>
</ul>
</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> <Route <span class="nv">path</span><span class="o">=</span><span class="s2">"/jobs"</span> <span class="nv">element</span><span class="o">={</span><JobsAndInterviews /><span class="o">}</span>>
<Route index <span class="nv">element</span><span class="o">={</span><AllJobs /><span class="o">}</span> />
<Route <span class="nv">path</span><span class="o">=</span><span class="s2">"my_jobs"</span> <span class="nv">element</span><span class="o">={</span><MyJobs /><span class="o">}</span>>
<Route index <span class="nv">element</span><span class="o">={</span><ActiveJobs /><span class="o">}</span> />
<Route <span class="nv">path</span><span class="o">=</span><span class="s2">"active"</span> <span class="nv">element</span><span class="o">={</span><ActiveJobs /><span class="o">}</span> />
<Route <span class="nv">path</span><span class="o">=</span><span class="s2">"inactive"</span> <span class="nv">element</span><span class="o">={</span><InActiveJobs /><span class="o">}</span> />
<Route <span class="nv">path</span><span class="o">=</span><span class="s2">"interviews"</span> <span class="nv">element</span><span class="o">={</span><Interviews /><span class="o">}</span> />
</Route>
<Route <span class="nv">path</span><span class="o">=</span><span class="s2">"all_jobs"</span> <span class="nv">element</span><span class="o">={</span><AllJobs /><span class="o">}</span> />
</Route></code></pre></figure>
<p>Like in this example snippet, we have a parent route for job and under that my_jobs and inside that we have active, inactive, interviews.</p>
<ul>
<li>
<p>/jobs/my_jobs/active → this route path is really gives a lot of information of the pages.</p>
</li>
<li>
<p><strong>Dynamic Routes</strong></p>
<ul>
<li>This is another feature of React Itself. This allows us to only load the pages that are requested by the user and not all.</li>
<li>Just imagine, our site has hundreds of pages. When the user wants to visit the homepage, we are trying to send him all the hundred pages. This doesn’t make any sense right ?</li>
</ul>
</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> // Jobs Page Routes
<span class="nb">export</span> const <span class="nv">JobsAndInterviews</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs'</span><span class="o">))</span><span class="p">;</span>
<span class="nb">export</span> const <span class="nv">AllJobs</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs/AllJobs'</span><span class="o">))</span><span class="p">;</span>
<span class="nb">export</span> const <span class="nv">ApplyJob</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs/ApplyJob'</span><span class="o">))</span><span class="p">;</span>
<span class="nb">export</span> const <span class="nv">JobDetails</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs/JobDetails'</span><span class="o">))</span><span class="p">;</span>
<span class="nb">export</span> const <span class="nv">MyJobs</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs/MyJobs'</span><span class="o">))</span><span class="p">;</span>
<span class="nb">export</span> const <span class="nv">ActiveJobs</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs/MyJobs/Active'</span><span class="o">))</span><span class="p">;</span>
<span class="nb">export</span> const <span class="nv">InActiveJobs</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs/MyJobs/Inactive'</span><span class="o">))</span><span class="p">;</span>
<span class="nb">export</span> const <span class="nv">Interviews</span> <span class="o">=</span> lazy<span class="o">(()</span> <span class="o">=</span>> import<span class="o">(</span><span class="s1">'../pages/Jobs/MyJobs/Interviews'</span><span class="o">))</span><span class="p">;</span></code></pre></figure>
<p>This above snippet shows how to import the components dynamically. But for this thing to work, we need to wrap the Routes in a Suspense Component which takes fallback.
The component given inside the fallback is rendered in between the dynamic loads. So, we can put our page loader here. Below is the snippet showing how to do it.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> import jobsRoutes from <span class="s1">'./routes/jobsRoutes'</span><span class="p">;</span>
const <span class="nv">App</span> <span class="o">=</span> <span class="o">()</span> <span class="o">=</span>> <span class="o">(</span>
<Router>
<Provider <span class="nv">store</span><span class="o">={</span>store<span class="o">}</span>>
<Layout>
<Suspense <span class="nv">fallback</span><span class="o">={()</span> <span class="o">=</span>> <Loader /><span class="o">}</span>>
<Routes>
<span class="o">{</span>jobsRoutes<span class="o">}</span>
</Routes>
</Suspense>
</Layout>
</Provider>
</Router>
<span class="o">)</span><span class="p">;</span></code></pre></figure>
<p>Now, this makes the website immensely faster than before.</p>
<ul>
<li><strong>Intuitive File and Folder Organization</strong> -> Organizing the files and folders properly is a very important task because it significantly helps the new developers. It lowers the learning curve for the new fellas.</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> /src
/__tests__
/categoryA
/page1.test.js
/page2.test.js
/categoryB
/page1.test.js
/page2.test.js
/assets
/components
/customElements
/Layout
/features
/redux_slices.js
/pages
/categoryA
/page1.jsx
/page2.jsx
/categoryB
/page1.jsx
/page2.jsx
/routes
/store
/redux_store.js
/styles</code></pre></figure>
<p>That’s how we can improve our codebase even more.</p>
<p>Then, we have to make sure if our application runs the same on every device, OS, and system specs. For that we can dockerize the react app.</p>
<p><br /></p>
<h2 id="dockerization-and-deployment">Dockerization and Deployment</h2>
<hr />
<p>Dockerizing the react app gives us the following benefits:</p>
<ol>
<li><strong>Consistency:</strong> Docker ensures the app runs consistently across different environments.</li>
<li><strong>Dependency Management:</strong> Docker encapsulates app dependencies, preventing conflicts.</li>
<li><strong>Easy Deployment:</strong> Docker simplifies deployment to various environments.</li>
<li><strong>Scalability:</strong> Docker facilitates easy scaling to handle increased traffic.</li>
<li><strong>Versioning and Rollbacks:</strong> Docker images can be versioned, enabling controlled updates and rollbacks.</li>
<li><strong>Development and Testing:</strong> Docker streamlines development and testing in a consistent environment.</li>
<li><strong>Infrastructure Agnostic:</strong> Docker allows running the app on various infrastructures.</li>
<li><strong>Resource Efficiency:</strong> Docker containers are lightweight and efficient in resource utilization.</li>
<li><strong>Easy Collaboration:</strong> Docker promotes seamless collaboration among developers and teams.</li>
<li><strong>Security:</strong> Docker provides isolation, adding an extra layer of security to the app.</li>
</ol>
<p>We can dockerize the react app by adding docker files i.e.</p>
<ul>
<li><strong>Dockerfile</strong> → contains environment and installation instructions for the app.</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> FROM node:18 as builder
WORKDIR /app
COPY package.json .
RUN npm install
COPY . .
RUN npm run build
FROM nginx
EXPOSE <span class="m">80</span>
COPY --from<span class="o">=</span>builder /app/build /usr/share/nginx/html</code></pre></figure>
<ul>
<li><strong>docker-compose.yml</strong> → contain commands to run our docker container.</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> version: <span class="s1">'3'</span>
services:
web:
build:
context: .
dockerfile: Dockerfile
ports:
- <span class="s1">'80:80'</span></code></pre></figure>
<p>Now, we have successfully containerized our react application. Finally, we need to deploy it to some cloud services such as AWS.</p>
<ul>
<li>We can first push our docker image to docker hub</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> docker push iamsmruti/elitmus-frontend</code></pre></figure>
<ul>
<li>Then we can login to EC2 instance and then pull the docker image</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> docker pull iamsmruti/elitmus-frontend</code></pre></figure>
<ul>
<li>Finally, we can run the docker image</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> docker run -d -p <span class="m">5000</span>:5000 iamsmruti/elitmus-frontend</code></pre></figure>
<p>That wraps up our frontend application which can now be live. It is fully capable of consuming the APIs from the backend. Now, the business logic is in the backend and doesn’t put much load on the frontend and hence it is performant and reliable.</p>
<p>If you have any questions, doubts, you can ping me at <code>smrutiranjanbadatya2@gmail.com</code>.</p>
<p>I would definitely get back to you.</p>
<p>I Hope this was a helpful and insightful guide for making a better frontend application with all the necessary good practices to maintain sustainability of the project.</p>
<p>See Ya 👋🏻 … Peace ✌🏻</p>
<p><strong>References</strong></p>
<ol>
<li>
<p>React Docs - <a href="https://react.dev/" target="_blank" style="color: blue;">Here</a></p>
</li>
<li>
<p>Tailwind Docs - <a href="https://tailwindcss.com/docs/installation" target="_blank" style="color: blue;">Here</a></p>
</li>
<li>
<p>Redux Toolkit Docs - <a href="https://redux-toolkit.js.org/" target="_blank" style="color: blue;">Here</a></p>
</li>
<li>
<p>Jest Docs - <a href="https://jestjs.io/docs/getting-started" target="_blank" style="color: blue;">Here</a></p>
</li>
<li>
<p>ES Lint Docs - <a href="https://eslint.org/docs/latest/" target="_blank" style="color: blue;">Here</a></p>
</li>
<li>
<p>Docker Docs - <a href="https://docs.docker.com" target="_blank" style="color: blue;">Here</a></p>
</li>
</ol>
<p><a href="/technology/revamping-elitmus-dot-com-stand-alone-front-end-module/">Revamping eLitmus.com | Stand-Alone Front-end Module</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on July 20, 2023.</p>/technology/my-experience-as-a-summer-intern-at-elitmus-building-a-telegram-bot2023-07-14 15:18:53 +0530T00:00:00-00:002023-07-19T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>As a summer intern at eLitmus, I had the opportunity to work on an exciting project that involved building a Telegram Bot. In today’s digital era, effective communication channels play a crucial role in connecting businesses with their stakeholders. eLitmus, a talent-tech platform, identified the need for a two-way communication channel between the platform and candidates. To achieve this, Telegram bots were chosen as the ideal starting point. This blog post will delve into the Telegram Bot Integration project.</p>
<h2 id="how-it-began"><strong>How it Began</strong>:</h2>
<p>The project started with the idea of leveraging the Telegram platform as a communication channel between eLitmus and its candidates. The goal was to create a two-way communication channel, enabling candidates to access information, receive updates, and engage in various activities through Telegram bots. This opened up possibilities for automating communication, collecting data, running quizzes, and providing valuable services to candidates.</p>
<h2 id="design"><strong>Design</strong>:</h2>
<p>Before diving into development phase, thorough planning and design are crucial. I begin by defining the core functionalities of the Telegram bots. I discovered that creating a bot through Bot Father (Telegram’s official bot) was the standard approach. As I was tasked with implementing the project using Ruby on Rails, I focused on two key aspects: developing the Telegram bot and designing the Admin panel.
Designing such an application involves three key aspects: architecture design, database design, and UI/UX design. Let’s dive into each of these parts in more detail:</p>
<ul>
<li><strong>Architecture</strong> : The Telegram bots interact with users through messages and commands. Users can access FAQs, participate in quizzes, and receive responses based on their interactions with the bots. The bots handle user inputs, validate quiz answers, and provide feedback and results accordingly. An intuitive admin panel is developed using Ruby on Rails to facilitate easy management of the bot’s functionalities. The admin panel allows administrators to add, update, and delete FAQs, quizzes, and other content. It also provides insights and analytics related to user engagement and bot usage.
<img src="/blog/images/telegram-bot/architecture.png" width="425" /></li>
<li>
<p><strong>Database</strong>: The project utilizes a MySQL database to store and manage data related to users, FAQs, quizzes, quiz attempts, analytics, and other relevant information. The database schema is designed to efficiently store and retrieve data, ensuring optimal performance.</p>
</li>
<li><strong>UI/UX</strong>: To ensure a visually appealing and user-friendly Telegram bot interface, I delved into various UI options and explored the best ways to present information and interact with users. This research helped me identify the most effective strategies for creating an engaging and intuitive bot interface. And for the Admin panel, I took the initiative to design the entire interface using Figma. By visualizing the layout, components, and functionalities, I was able to ensure a cohesive and user-friendly experience for administrators managing the bot’s functionalities. Figma provided a powerful toolset for creating wireframes, mock-ups, and interactive prototypes, allowing me to iterate and refine the design before implementation.</li>
</ul>
<h2 id="development"><strong>Development</strong>:</h2>
<p>Before starting this project, I had experience developing mobile applications, and most of them followed the Model-View-Template (MVT) pattern for backend, such as Django. However, for this project, I needed to learn and work with Ruby on Rails, which follows the Model-View-Controller (MVC) architectural pattern. Fortunately, my previous experience with backend development made it easier for me to understand Rails, and within the first two weeks, I was able to develop the basic functionalities of both the FAQ and Quiz bots.</p>
<p>Integrating the telegram bot consists of 3 steps:</p>
<ul>
<li>
<p>Creating a bot using Bot Father ( Official bot of telegram for creating telegram bot) and get the token that was generated by the bot father.</p>
<p><img src="/blog/images/telegram-bot/botfather.png" width="425" /></p>
</li>
<li>
<p>Initalizing the bot in the ruby file and declare a listening function that listens every messsage from the bot.</p>
</li>
<li>
<p>Writing the message specified functions that is called only when a specified message if recieved from the bot.</p>
</li>
</ul>
<p>I have used Ruby on Rails for both front-end and back-end to develope admin panel. For database I have used mysql and for hosting purpose I have used AWS, EC2 to host admin panel using docker and telegram and RDS for database.</p>
<p>Using docker to host the bot and admin panel was another part of the development that gave me an idea of how to does docker used by most of the companies, it was my personal goal in the year to learn docker so it got done by this project. And to say using docker wasn’t the difficult part. I had to learn how to write a docker file and docker compose file.</p>
<h2 id="features-developed"><strong>Features Developed</strong></h2>
<ul>
<li><strong>User Flow</strong></li>
</ul>
<p>I focused on refining the functionalities and user flow of the bots, particularly in the context of the Telegram channels. The FAQ bot is connected to the Telegram channel, and when a user posts a question in the channel’s comment section, it gets stored in the database. The admin can then view and answer the question, which is sent back to the user personally through Telegram. Additionally, users can access the FAQ bot to view existing FAQs and request the addition of new ones.</p>
<h6>
<strong>
FAQ bot flow
</strong>
</h6>
<table>
<tr>
<td>
<img src="/blog/images/telegram-bot/faqbot2.png" width="425" />
</td>
<td>
<img src="/blog/images/telegram-bot/faqbot3.png" width="425" />
</td>
</tr>
</table>
<h6>
<strong>
Quiz bot flow
</strong>
</h6>
<table>
<tr>
<td>
<img src="/blog/images/telegram-bot/quizbot1.png" width="425" />
</td>
<td>
<img src="/blog/images/telegram-bot/quizbot2.png" width="425" />
</td>
</tr>
</table>
<ul>
<li>
<p><strong>Admin Panel</strong></p>
<p>On the other hand, the admin panel allows the admin to create quizzes and questions. These quizzes are then posted in the Telegram channel, with a button redirecting users to the Quiz bot. Users can access multiple quizzes and attempt them through the bot.</p>
<p>By developing these functionalities, I was able to establish a seamless flow for users, ensuring they can interact with the bots and access relevant information easily. The admin panel provides the necessary tools for managing FAQs, quizzes, and user interactions, allowing for efficient administration and engagement with the users.</p>
<p>In the Admin panel, I implemented the design that I had previously created using Figma. The Admin panel offers various functionalities to enhance the administration and management of the Telegram bots. Here are some key features of the Admin panel:</p>
<ul>
<li>
<p><strong>User Management</strong>: The Admin panel allows the admin to view active users and access individual user data. This includes information about the user’s activities, quiz attempts, and questions asked through the bot.</p>
</li>
<li>
<p><strong>FAQ Management</strong>: The Admin can view and manage the FAQs. They have the ability to add, edit, or remove FAQs as needed. Additionally, the Admin can track the number of reads by users, providing insights into the popularity and relevance of different FAQs.</p>
</li>
<li>
<p><strong>Quiz Management</strong>: The Admin can create quizzes and manage them within the Admin panel. They can add questions, set multiple options, and define correct answers. The Admin also has access to the responses of the quizzes, allowing them to analyze individual question analytics and gain insights into user performance. This can also be used to host surveys on telegram channels.</p>
</li>
<li>
<p><strong>Analytics</strong>: The Admin panel provides analytics on user activities related to both the FAQ and Quiz bots. The Admin can view data such as the number of attempts per day, week, month, or year, as well as the number of FAQ reads per day, week, month, or year. These analytics help the Admin understand user engagement and make data-driven decisions.</p>
</li>
<li>
<p><strong>Post Management</strong>: The Admin can utilize the post section in the Admin panel to create and publish posts in the Telegram channel directly from Telegram and to make it effective I have created two phases of create and publishing the post so that post get reviewed before publishing the post. This feature streamlines the process of sharing content and updates with users in the channel.</p>
</li>
</ul>
<p><strong>Admin Panel</strong></p>
<p><img src="/blog/images/telegram-bot/admin-panel1.png" /></p>
<p>By incorporating these functionalities into the Admin panel, I ensured that the administrative tasks associated with managing the Telegram bots were streamlined and efficient. The panel provides comprehensive control and insights, empowering the admin to effectively manage user interactions, content, and analytics.</p>
</li>
</ul>
<h2 id="challenges-faced"><strong>Challenges Faced</strong>:</h2>
<p>Working with 3rd party API’s is one of the most challenging task and that is the challenging task of the project using telegram bot API. I could able to use telegram api to minimal amount of data of user, for example I couldn’t able to get users contact details, and I have crossed this challenge by finidng a feature of telegram that is by using permissions to access user details and request user to send the mobile number and location, but I couldn’t able to get location from the web or laptop. The biggest challenge I have faced was setting up and displaying analytics using charts and graphs. Initially, I tried using gems like Chartkick and FusionCharts, but faced issues with rendering the graphs correctly. Despite spending considerable time troubleshooting, the graphs weren’t displaying as expected. Eventually, I opted for Chart.js, which proved to be a more suitable solution for my needs. With Chart.js, I could create visually appealing and interactive charts to showcase the data collected through admin panel. The transition to Chart.js was smooth, and it enabled me to present data insights effectively, providing a valuable user experience.</p>
<h2 id="conclusion"><strong>Conclusion</strong>:</h2>
<p>In summary, working on this project presented its fair share of challenges. However, with perseverance and problem-solving skills, I was able to overcome these obstacles and achieve success. I was able to develop the Telegram bots and the Admin panel effectively. I am thrilled to share that my hard work did not go unnoticed, and my project was selected for use by the company. This recognition is truly gratifying, as it demonstrates the value my work brings to the organization and the impact it can have on the company operations. Overall, this project was a rewarding journey that expanded my knowledge and skills in web development.</p>
<p><a href="/technology/my-experience-as-a-summer-intern-at-elitmus-building-a-telegram-bot/">My Experience as a Summer Intern at eLitmus: Building a Telegram Bot</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on July 19, 2023.</p>/technology/resume-parsing-insights-and-steps-to-create-your-own-parser2023-06-20 13:40:00 +0530T00:00:00-00:002023-06-20T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>Resume parsing is the automated process of extracting relevant information from resumes or CVs.
It analyzes the unstructured text of a resume and extracts specific details like contact information, work experience, education, skills, and achievements.
The extracted data is then converted into a structured format, allowing for easy analysis and integration into recruitment systems.</p>
<h2 id="benefits-of-resume-parsing">Benefits of Resume Parsing</h2>
<ul>
<li>It is a time-saving automation</li>
<li>It increases efficiency in candidate screening</li>
<li>Improves accuracy in data extraction</li>
<li>It standardizes the data extraction and formatting</li>
</ul>
<h2 id="what-youll-learn-from-this-blog">What you’ll learn from this blog:</h2>
<ol>
<li>Resume parsing techniques for different file formats.</li>
<li>Extracting specific details from resumes.</li>
<li>Leveraging NLP techniques for parsing.</li>
<li>Handling multicolumn resumes.</li>
<li>Dockerizing the Application: Simplifying Deployment and Scalability</li>
<li>Hosting it on AWS EC2.</li>
</ol>
<p><strong>Let’s get Started 🎉</strong></p>
<p>We’ll utilize Python and its Flask framework to create a resume parsing server.</p>
<h2 id="application-flow-chart">Application Flow Chart:</h2>
<p><img src="/blog/images/resume-parsing-insights-and-steps-to-create-your-own-parser/file-flow.jpg" alt="Application Flow Chart Image" /></p>
<p>We will be primarily working on 3 categories of file formats:</p>
<ol>
<li>PDF</li>
<li>DOCX</li>
<li>Images (.png, .jpg, etc.)</li>
</ol>
<h3 id="data-that-we-will-be-extracting">Data that we will be extracting</h3>
<ol>
<li>Embedded links in PDF</li>
<li>Personal data: <br />
2.1. Name: First name and last name <br />
2.2. Email <br />
2.3. Phone Number <br />
2.4. Address: City, Country, and Zip code <br />
2.5. Links: Social and Coding Platform links <br /></li>
<li>Education <br />
3.1. Institute name <br />
3.2. Duration: Start date and End date <br />
3.3. Grade/CGPA <br />
3.4. Degree <br /></li>
<li>Experience<br />
4.1. Company name<br />
4.2. Role<br />
4.3. Durations: Start date and End date<br />
4.4. Skills<br /></li>
<li>Certification: <br />
5.1. Description <br />
5.2. Duration <br />
5.3. Skill <br /></li>
<li>Project: <br />
6.1. Project name <br />
6.2. Skills <br />
6.3. Description <br /></li>
<li>Skills</li>
<li>Achievements</li>
<li>Exam scores<br />
9.1. Exam name<br />
9.2 Score<br /></li>
<li>All other sections present in resume</li>
</ol>
<h2 id="dateduration-extraction">Date/Duration Extraction</h2>
<p>To extract dates from text, we will use <code>datefinder</code> module, and regexp to extract years.
Then we will combine these two and sort dates to get start and end date for our duration.</p>
<pre><code class="language-python">import re
from datetime import date
import datefinder
def get_date(input_string):
'''Get date from text'''
matches = list(datefinder.find_dates(input_string))
res = []
for i in matches:
date_str = str(i).split(' ')
extracted_date = date_str[0]
res.append(extracted_date)
return res
def get_years(txt):
'''Get years from text'''
pattern = r'[0-9]{4}'
lst = re.findall(pattern, txt)
current_date = date.today()
current_year = current_date.year
res = []
for i in lst:
year = int(i)
if 1900 <= year <= (current_year + 10):
res.append(i + "-01-01")
return res
def get_duration(input_text):
'''Get duration from text'''
dates = get_date(input_text)
years = get_years(input_text)
for i in years:
dates.append(i)
dates.sort()
duration = {
"start_date": "",
"end_date": ""
}
if len(dates) > 1:
duration["start_date"] = dates[0]
duration["end_date"] = dates[len(dates) - 1]
return duration
</code></pre>
<h2 id="extracting-links-from-pdf">Extracting links from PDF:</h2>
<p>To extract links from the PDF, we will use the python module <code>PDFx</code>.</p>
<pre><code class="language-python">import pdfx
def get_urls_from_pdf(file_path):
'''extract urls from pdf file'''
url_list = []
# for invalid file path
if os.path.exists(file_path) is False:
return url_list
pdf = pdfx.PDFx(file_path)
# get urls
pdf_url_dict = pdf.get_references_as_dict()
if "url" not in pdf_url_dict.keys():
return url_list
url_list = pdf_url_dict["url"]
return url_list
</code></pre>
<h2 id="pdf-to-text">PDF to Text</h2>
<pre><code class="language-python">import pdfx
def get_text_from_pdf(file_path):
'''extract complete text from pdf'''
# for invalid file path
if os.path.exists(file_path) is False:
return ""
pdf = pdfx.PDFx(file_path)
pdf_text = pdf.get_text()
return pdf_text
</code></pre>
<h2 id="extracting-personal-details">Extracting Personal Details:</h2>
<p>We will extract text from the PDF and move ahead with further extractions.</p>
<h3 id="name">Name</h3>
<p>Extracting the name from the text is one of the challenging tasks.</p>
<p>For this, we will be using <code>NLP: Named Entity Recognition</code> to extract name from the text.</p>
<h4 id="nlp-function">NLP function:</h4>
<pre><code class="language-python">def get_name_via_nltk(input_text):
'''extract name from text via nltk functions'''
names = []
for sent in nltk.sent_tokenize(input_text):
for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
if hasattr(chunk, 'label'):
name = ' '.join(c[0] for c in chunk.leaves())
names.append(name)
return names
</code></pre>
<ul>
<li>The text is tokenized into sentences using nltk.sent_tokenize().</li>
<li>Each sentence is further tokenized into words using nltk.word_tokenize().</li>
<li>The part-of-speech tags are assigned to each word using nltk.pos_tag().</li>
<li>The named entities are identified by applying the named entity recognition (NER) using nltk.ne_chunk().</li>
<li>For each identified named entity chunk, if it has a ‘label’, indicating it is a named entity, the individual words are concatenated to form a name.</li>
<li>The extracted names are appended to the names list.</li>
</ul>
<h3 id="phone-number">Phone Number</h3>
<p>To extract the Phone number, we use the following module <code>phonenumbers</code>, we extract users country from text and using that we will extract relevant phone numbers.</p>
<pre><code class="language-python">import geotext
from phonenumbers import PhoneNumberMatcher
def get_phone(input_text):
'''extract phone number from text'''
phone_numbers = []
countries_dict = geotext.GeoText(input_text).country_mentions
country_code = "IN"
for i in countries_dict.items():
country_code = i[0]
break
search_result = PhoneNumberMatcher(input_text, country_code)
phone_number_list = []
for i in search_result:
i = str(i).split(' ')
match = i[2:]
phone_number = ''.join(match)
phone_number_list.append(phone_number)
for i in phone_number_list:
if i not in phone_numbers:
phone_numbers.append(i)
return phone_numbers
</code></pre>
<h3 id="email">Email</h3>
<p>To extract the Email, we use the following regexp: <code>[^\s]+@[^\s]+[.][^\s]+</code></p>
<pre><code class="language-python">def get_email(input_text):
'''extract email from text'''
email_pattern = '[^\s]+@[^\s]+[.][^\s]+'
emails = []
emails = re.findall(email_pattern, input_text)
# pick only unique emails
emails = set(emails)
emails = list(emails)
return emails
</code></pre>
<h3 id="address">Address</h3>
<p>To Extract address, we use the <code>geotext</code> module; we get City, Country, and Zipcode.</p>
<pre><code class="language-python">import geotext
def get_address(input_arr):
'''get address information from input array'''
input_text = " \n ".join(input_arr)
res = {}
# getting all countries
countries_dict = geotext.GeoText(input_text).country_mentions
res["country"] = []
for i in countries_dict:
res["country"].append(i)
# getting all cities
res["city"] = geotext.GeoText(input_text).cities
# zip code
pattern = "\b([1-9]{1}[0-9]{5}|[1-9]{1}[0-9]{2}\\s[0-9]{3})\b"
res["zipcode"] = re.findall(pattern, input_text)
return res
</code></pre>
<h3 id="links">Links</h3>
<p>As we already have a URL list from 1st operation, we will match links from a list of our own, this can be saved in any database or hard-coded, and categorize them into <code>social</code> or <code>coding</code> sections.</p>
<h2 id="other-sections">Other Sections</h2>
<p>There can be many sections in a resume, that we cannot always account for.
To extract them, we will create a list of possible section heading and match them against each line from the resume that we have extracted.</p>
<p>The code will be as following:</p>
<pre><code class="language-python">
from utils import dynamo_db
RESUME_SECTIONS = dynamo_db.get_item_db("RESUME_SECTIONS")
def extract_resume_sections(text):
'''Extract section based on resume heading keywords'''
text_split = [i.strip() for i in text.split('\n')]
entities = {}
entities["extra"] = []
key = False
for phrase in text_split:
if len(phrase.split(' ')) > 10:
if key is not False:
entities[key].append(phrase)
else:
entities["extra"].append(phrase)
continue
if len(phrase) == 1:
p_key = phrase
else:
p_key = set(phrase.lower().split()) & set(RESUME_SECTIONS)
try:
p_key = list(p_key)[0]
except IndexError:
pass
if p_key in RESUME_SECTIONS and (p_key not in entities.keys()):
entities[p_key] = []
key = p_key
elif key and phrase.strip():
entities[key].append(phrase)
else:
if len(phrase.strip()) < 1:
continue
entities["extra"].append(phrase)
return entities
</code></pre>
<h2 id="education">Education</h2>
<p>To extract education, we need to identify a line from our education section that represent the school/institute name, and a line that represents the degree. After which we can search for CGPA or Percentage using regexp.
For name recognition, we will make use of a list of keywords that can be present in the name.</p>
<p>Code to get school name, similarly we can implement to get degree as well.</p>
<pre><code class="language-python">import re
from utils import helper, dynamo_db
SCHOOL_KEYWORDS = dynamo_db.get_item_db("SCHOOL_KEYWORDS")
def get_school_name(input_text):
'''Extract list of school names from text'''
text_split = [i.strip() for i in input_text.split('\n')]
school_names = []
for phrase in text_split:
p_key = set(phrase.lower().split(' ')) & set(SCHOOL_KEYWORDS)
if (len(p_key) == 0):
continue
school_names.append(phrase)
return school_names
</code></pre>
<p>Code to extract CGPA/GPA or Percentage grade</p>
<pre><code class="language-python">def get_percentage(txt):
'''Extract percentage from text'''
pattern = r'((\d+\.)?\d+%)'
lst = re.findall(pattern, txt)
lst = [i[0] for i in lst]
return lst
def get_gpa(txt):
'''Extract cgpa or gpa from text in format x.x/x'''
pattern = r'((\d+\.)?\d+\/\d+)'
lst = re.findall(pattern, txt)
lst = [i[0] for i in lst]
return lst
def get_grades(input_text):
'''Extract grades from text'''
input_text = input_text.lower()
# gpa
gpa = get_gpa(input_text)
if (len(gpa) != 0):
return gpa
# percentage
percentage = get_percentage(input_text)
if (len(percentage) != 0):
return percentage
return []
</code></pre>
<h2 id="skills">Skills</h2>
<p>In order to extract skills from the text, a master list of commonly used skills can be created and stored in a database, such as AWS DynamoDB. Each skill from the list can be matched against the text to identify relevant skills. By doing so, a comprehensive master skill list can be generated, which can be utilized for more specific skill extraction in subsequent sections.</p>
<pre><code class="language-python">
from utils import dynamo_db
skills = dynamo_db.get_item_db("ALL_SKILLS")
def get_skill_tags(input_text):
'''Extract skill tags from text'''
user_skills = []
for skill in skills:
if skill in input_text.lower():
user_skills.append(skill.upper())
return user_skills
</code></pre>
<h2 id="experience">Experience</h2>
<p>To extract company names and roles, a similar strategy can be employed as we used for finding school names and degrees. By applying appropriate techniques, such as named entity recognition or pattern matching, we can identify company names and associated job roles from the text. Additionally, for skill extraction, we can match the text against our previously calculated list of skills to identify and extract relevant skills mentioned in the text</p>
<h2 id="achievements-and-certifications">Achievements and Certifications</h2>
<p>We can use the section text that we extracted previously and for each line of it, we can search for duration and skills in it.</p>
<pre><code class="language-python">
from utils import helper, skill_tags
def get_certifications(input_array):
'''Function to extract certificate information'''
res = {
"description": input_array,
"details": []
}
try:
for cert in input_array:
elem_dict = {
"institute_name": str(cert),
"skills": skill_tags.get_skill_tags(cert),
"duration": helper.get_duration(cert)
}
res["details"].append(elem_dict)
except Exception as function_exception:
helper.logger.error(function_exception)
return res
</code></pre>
<h2 id="projects">Projects</h2>
<p>When it comes to extracting project titles, it can be challenging due to the variations in how individuals choose to title their projects. However, we can make an assumption that project titles are often written in a larger font size compared to the rest of the text. Leveraging this assumption, we can analyze the font sizes of each line in the text and sort them in descending order. By selecting the lines with the largest font sizes from the top, we can identify potential project titles. This approach allows us to further segment the project section and extract additional details such as skills utilized and project durations.</p>
<p>Link: <a href="https://stackoverflow.com/questions/68097779/how-to-find-the-font-size-of-every-paragraph-of-pdf-file-using-python-code">How to find the Font Size of every paragraph of PDF file using python code?</a></p>
<pre><code class="language-python">import fitz
def scrape(keyword, filePath):
results = [] # list of tuples that store the information as (text, font size, font name)
pdf = fitz.open(filePath) # filePath is a string that contains the path to the pdf
for page in pdf:
dict = page.get_text("dict")
blocks = dict["blocks"]
for block in blocks:
if "lines" in block.keys():
spans = block['lines']
for span in spans:
data = span['spans']
for lines in data:
results.append((lines['text'], lines['size'], lines['font']))
pdf.close()
return results
</code></pre>
<p>Using this we find our project titles:</p>
<pre><code class="language-python">from utils import helper, skill_tags
from difflib import SequenceMatcher
def similar(string_a, string_b):
'''Find similarity between two string'''
return SequenceMatcher(None, string_a, string_b).ratio()
def extract_project_titles(input_array, text_font_size):
ls = []
for line_tuple in text_font_size:
line = line_tuple[0]
for s in input_array:
if similar(line,s) > 0.85:
ls.append([line_tuple[1], s])
ls.sort(reverse=True)
title_font_size = ls[0][0] if(len(ls) > 0) else 0
project_title = []
for i in ls:
if i[0] == title_font_size:
project_title.append(i[1])
return project_title
def get_projects(input_array, text_font_size):
'''extract project details from text'''
res = {
"description": input_array,
"details": []
}
txt = ' \n '.join(input_array)
project_titles = helper.extract_titles_via_font_size(
input_array, text_font_size)
project_sections = helper.extract_sections(txt, project_titles)
try:
for i in project_sections.items():
key = i[0]
txt = '\n'.join(project_sections[key])
elem_dict = {
"project_name": key,
"skills": skill_tags.get_skill_tags(txt),
"duration": helper.get_duration(txt)
}
res["details"].append(elem_dict)
except Exception as function_exception:
helper.logger.error(function_exception)
return res
</code></pre>
<h2 id="handling-multicolumn-resumes">Handling multicolumn resumes</h2>
<p>Up until now, we have explored techniques to handle single-column resumes successfully.
However, when it comes to two-column or multicolumn resumes, a direct extraction of text may not be sufficient. If we attempt to extract text from a multicolumn PDF using the same method as before, we will encounter challenges such as, the text from different columns will merge together, as our previous approach scans the text from left to right and top to bottom, rather than column-wise.</p>
<p>To overcome this issue, let’s delve into how we can solve this problem and effectively handle multicolumn resumes.</p>
<h3 id="drawing-textboxes">Drawing textboxes</h3>
<p><code>Optical Character Recognition (OCR)</code> comes to the rescue by identifying textboxes and providing their coordinates within the document. By utilizing OCR, we can pinpoint the location of these textboxes, which serve as a starting point for further analysis.</p>
<p>To tackle the challenge of multicolumn resumes, a line sweep algorithm is implemented. This algorithm systematically scans along the X-axis and determines how many textboxes intersect each point. By analyzing this distribution, potential column divide lines can be inferred. These lines act as reference markers, indicating the boundaries between columns.</p>
<p>Once the column lines are established, the text can be extracted from the identified textboxes in a column-wise manner. Following the order of the column lines, the text can be retrieved and processed accordingly.</p>
<p>By leveraging OCR, the line sweep algorithm, and the concept of column lines, we can effectively handle multicolumn resumes and extract the necessary information in an organized and structured manner.</p>
<p>Code:</p>
<pre><code class="language-python">import cv2
import fitz
from fitz import Document, Page, Rect
import pytesseract
import functools
def textbox_recognition(file_path):
'''Extract text_boxes from image'''
img = cv2.imread(file_path, cv2.IMREAD_GRAYSCALE)
ret, thresh1 = cv2.threshold(
img, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
# kernel
kernel_size = 10
rect_kernel = cv2.getStructuringElement(
cv2.MORPH_RECT, (kernel_size, kernel_size))
# Applying dilation on the threshold image
dilation = cv2.dilate(thresh1, rect_kernel, iterations=1)
# Finding contours
contours, hierarchy = cv2.findContours(
dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
segments = []
text_boxes = []
# Looping through the identified contours
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
segments.append([x, x+w])
text_boxes.append((x, y, w, h))
return (segments, text_boxes)
def detect_column_lines(segments):
'''Detect column lines from segments'''
mx = max(i[1] for i in segments)
line_sweep_arr = [0 for _ in range(mx+10)]
for i in segments:
line_sweep_arr[i[0] + 1] += 1
line_sweep_arr[i[1]] -= 1
for i in range(1, mx+10):
line_sweep_arr[i] += line_sweep_arr[i-1]
line_mean = sum(line_sweep_arr)/len(line_sweep_arr)
potential_points = []
for i in range(1, mx+10):
if line_sweep_arr[i] < int(line_mean/2.5):
potential_points.append(i)
line_points = []
for i in potential_points:
if len(line_points) == 0:
line_points.append(i)
continue
prev = line_points[len(line_points) - 1]
if i == prev + 1:
line_points[len(line_points) - 1] = i
else:
line_points.append(i)
return line_points
def get_text(img, box_data):
'''Extract text from given box data'''
(x, y, w, h) = box_data
cropped_image = img[y:y+h, x:x+w]
# to show image
txt = pytesseract.image_to_string(cropped_image)
return txt
def box_coverage_percentage(x, w, line):
'''Extract coverage area in percentage for box'''
covered_width = line - x
cover_percentage = covered_width / w
return cover_percentage
def clean_text(txt):
'''Clean text'''
txt = txt.strip()
txt = txt.replace("•", '')
return txt
Y_LIMIT = 10
def custom_sort(a, b):
'''custom sort logic'''
if a[1] - Y_LIMIT <= b[1] >= a[1] + Y_LIMIT:
return -1 if (a[0] <= b[0]) else 1
return -1 if (a[1] <= b[1]) else 1
def get_boxes_for_line(text_boxes, line, ordered_text_box, prev_line):
'''get boxes with line constraints'''
temp_boxes = [i for i in text_boxes]
temp_boxes.sort(key=functools.cmp_to_key(custom_sort))
res = []
# check if 90% of box is before line
for box in temp_boxes:
if box in ordered_text_box:
continue
(x, y, w, h) = box
if (x >= prev_line - Y_LIMIT and x < line and box_coverage_percentage(x, w, line) >= 0.9):
res.append(box)
res.sort(key=lambda x: x[1])
return res
def map_size(x, org, new):
'''map box co-ordinates from image to pdf'''
return (x*new)/org
def get_text_from_pdf(box, img_shape, pdf_shape, page):
'''extract text from pdf box'''
(x, y, w, h) = box
(height, width) = img_shape
(W, H) = pdf_shape
x = map_size(x, width, W)
w = map_size(w, width, W)
y = map_size(y, height, H)
h = map_size(h, height, W)
rect = Rect(x, y, x+w, y+h)
text = page.get_textbox(rect)
return text
def image_to_text(file_path, pdf_file_path=""):
'''extract text from image'''
segments, text_boxes = textbox_recognition(file_path)
column_lines = detect_column_lines(segments)
# if single column
if len(column_lines) < 3:
return ""
# align text boxes by column
# text boxes within columns
ordered_text_box = []
for i in range(len(column_lines)):
prev_line = column_lines[i-1] if ((i-1) >= 0) else 0
boxes = get_boxes_for_line(
text_boxes, column_lines[i], ordered_text_box, prev_line)
for b in boxes:
ordered_text_box.append(b)
# boxes that are not in any column
# text boxes not in any column
non_selected_boxes = []
for i in text_boxes:
if i not in ordered_text_box:
non_selected_boxes.append(i)
for i in non_selected_boxes:
y = i[1]
if y <= ordered_text_box[0][1]:
ordered_text_box.insert(0, i)
else:
ordered_text_box.append(i)
img = cv2.imread(file_path, cv2.IMREAD_GRAYSCALE)
ret, thresh = cv2.threshold(img, 225, 255, 0)
img_shape = img.shape
pdf_shape = (0, 0)
page = None
if pdf_file_path != "":
doc = fitz.open(pdf_file_path)
page = doc[0]
pdf_shape = (page.rect.width, page.rect.height)
resume_text = ""
for i in ordered_text_box:
if pdf_file_path != "":
txt = clean_text(get_text_from_pdf(i, img_shape, pdf_shape, page))
else:
txt = clean_text(get_text(thresh, i))
resume_text += txt + "\n"
# clean text
txt = resume_text.split("\n")
res = []
for line in txt:
if len(line) == 0:
continue
res.append(line)
resume_text = ' \n '.join(res)
return resume_text
</code></pre>
<h2 id="dockerizing-the-application">Dockerizing the Application</h2>
<p>To make deploying the application easy we will be <code>Dockerizing the Application</code>.</p>
<p>Dockerfile</p>
<pre><code># syntax=docker/dockerfile:1
FROM python:3.9-buster
WORKDIR /resume-parser-docker
RUN mkdir input_files
RUN pip3 install --upgrade pip
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
# download nltk required
RUN python -m nltk.downloader punkt
RUN python -m nltk.downloader averaged_perceptron_tagger
RUN python -m nltk.downloader maxent_ne_chunker
RUN python -m nltk.downloader words
RUN apt-get update \
&& apt-get -y install tesseract-ocr
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
COPY . .
EXPOSE 5000/tcp
CMD [ "python3", "-u" , "main.py"]
</code></pre>
<p>Then run following commands to create image and run it.</p>
<ul>
<li>Build Image
<pre><code>docker build --tag jhamadhav/resume-parser-docker .
</code></pre>
</li>
<li>Run Image at port 5000
<pre><code>docker run -d -p 5000:5000 jhamadhav/resume-parser-docker
</code></pre>
</li>
<li>Check images
<pre><code>docker ps
</code></pre>
</li>
<li>Stop once done
<pre><code>docker stop jhamadhav/resume-parser-docker
</code></pre>
</li>
</ul>
<h2 id="hosting-on-aws">Hosting on AWS</h2>
<p>Now that we have a docker image of our application.</p>
<p>We can publish it to dockerHub:</p>
<pre><code>docker push jhamadhav/resume-parser-docker
</code></pre>
<p>Then login to your EC2 instance and pull the image:</p>
<pre><code>docker pull jhamadhav/resume-parser-docker
</code></pre>
<p>Run the image:</p>
<pre><code>docker run -d -p 5000:5000 jhamadhav/resume-parser-docker
</code></pre>
<blockquote>
<p>🎉🎉🎉 We have a fully functional Resume parser ready.</p>
</blockquote>
<h2 id="future-work">Future Work</h2>
<p>We can make use of <code>Large Language Models (LLM)</code>, train on datasets and fine tune LLM model to make extraction of below fields more accurate:</p>
<ol>
<li>School/Institute name</li>
<li>Degree</li>
<li>Company name</li>
<li>Role in a job</li>
</ol>
<h2 id="conclusion">Conclusion</h2>
<ul>
<li>In conclusion, resume parsing using NLP techniques offers a streamlined approach to extract crucial information from resumes, enhancing the efficiency and accuracy of candidate screening.</li>
<li>By leveraging OCR, named entity recognition, and line sweep algorithms, we can handle various resume formats, including multicolumn layouts.</li>
<li>The power of NLP automates the parsing process, empowering recruiters to efficiently process resumes and make informed hiring decisions.</li>
<li>Embracing resume parsing techniques ensures fair and objective evaluation of applicants, leading to successful recruitment outcomes.</li>
<li>With this skillset, you can revolutionize resume processing and contribute to more efficient hiring practices.</li>
</ul>
<p>If you have any questions, doubts, or just want to say hi, feel free to reach out to me at <code>contact@jhamadhav.com</code> ! I’m always ready to chat about this cool project and help you out. Don’t be shy, drop me a line and let’s geek out together!</p>
<p><a href="/technology/resume-parsing-insights-and-steps-to-create-your-own-parser/">Resume Parsing: Insights and Steps to Create Your Own Parser</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on June 20, 2023.</p>/technology/debugging-and-fixing-mysql-deadlock-issue2023-06-12 18:31:00 +0530T00:00:00-00:002023-06-12T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>Recently, during one of our tests, we encountered a deadlock issue that was reported by Sentry. The deadlock occurred while attempting to insert scores into a table after completing a candidate’s test. We were initially unsure about the cause of this deadlock. Upon investigation, we discovered that it was due to the interplay of various locks in our MySQL database. In this blog post, we will deep dive into the nature of these locks, understand their impact on transactions, and present the solutions we implemented to mitigate deadlock occurrences.</p>
<h4 id="understanding-deadlocks"><strong>Understanding deadlocks</strong></h4>
<p>To understand the deadlock situation, let’s familiarize ourselves with the different types of locks involved, as defined by the official MySQL documentation:</p>
<p><strong>GAP Lock:</strong></p>
<p>A gap lock is a lock on a gap between index records, or a lock on the gap before the first or after the last index record. A gap might span a single index value, multiple index values, or even be empty.</p>
<p><em>If id is not indexed or has a nonunique index, the statement does lock the preceding gap.</em></p>
<p><strong>Next Key Lock:</strong></p>
<p>A next-key lock is a combination of a record lock on the index record and a gap lock on the gap before the index record. in simple words If one session has a shared or exclusive lock on record R in an index, another session cannot insert a new index record in the gap immediately before R in the index order.</p>
<p><strong>Insert Intention Lock:</strong></p>
<p>An insert intention lock is a type of gap lock set by INSERT operations prior to row insertion. This lock signals the intent to insert in such a way that multiple transactions inserting into the same index gap need not wait for each other if they are not inserting at the same position within the gap.</p>
<h4 id="problem-scenario"><strong>Problem Scenario</strong></h4>
<p>In our case, we have two tables, table1 and table2, with a has_many relationship. All operations are performed on table2, which has an index on table1 as a foreign key.</p>
<p><strong>Transaction A</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>BEGIN<span class="p">;</span>
DELETE FROM table2 WHERE table2.table1_id<span class="o">=</span><span class="m">127</span><span class="p">;</span>
Query OK, <span class="m">1</span> row affected <span class="o">(</span><span class="m">0</span>.00 sec<span class="o">)</span></code></pre></figure>
<p>Resulting data locks</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>mysql> SELECT INDEX_NAME, LOCK_TYPE,LOCK_DATA,LOCK_MODE,LOCK_STATUS, EVENT_ID FROM performance_schema.data_locks<span class="p">;</span>
+-----------------------------------------+-----------+-----------+---------------+-------------+----------+
<span class="p">|</span> INDEX_NAME <span class="p">|</span> LOCK_TYPE <span class="p">|</span> LOCK_DATA <span class="p">|</span> LOCK_MODE <span class="p">|</span> LOCK_STATUS <span class="p">|</span> EVENT_ID <span class="p">|</span>
+-----------------------------------------+-----------+-----------+---------------+-------------+----------+
<span class="p">|</span> NULL <span class="p">|</span> TABLE <span class="p">|</span> NULL <span class="p">|</span> IX <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
<span class="p">|</span> index_table2_on_table1_id <span class="p">|</span> RECORD <span class="p">|</span> <span class="m">127</span>, <span class="m">92</span> <span class="p">|</span> X <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
<span class="p">|</span> PRIMARY <span class="p">|</span> RECORD <span class="p">|</span> <span class="m">92</span> <span class="p">|</span> X,REC_NOT_GAP <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
<span class="p">|</span> index_table2_on_table1_id <span class="p">|</span> RECORD <span class="p">|</span> <span class="m">128</span>, <span class="m">93</span> <span class="p">|</span> X,GAP <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
+-----------------------------------------+-----------+-----------+---------------+-------------+----------+
<span class="m">4</span> rows <span class="k">in</span> <span class="nb">set</span> <span class="o">(</span><span class="m">0</span>.00 sec<span class="o">)</span></code></pre></figure>
<p>This query acquires a gap lock on table2 and an insert intention lock on table1_id values 126 and 127.</p>
<p><strong>Transaction B</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>BEGIN<span class="p">;</span>
INSERT INTO table2<span class="o">(</span>table1_id<span class="o">)</span> VALUES<span class="o">(</span><span class="m">126</span><span class="o">)</span><span class="p">;</span>
ERROR <span class="m">1205</span> <span class="o">(</span>HY000<span class="o">)</span>: Lock <span class="nb">wait</span> timeout exceeded<span class="p">;</span> try restarting transaction</code></pre></figure>
<p>Resulting data locks</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>mysql> SELECT INDEX_NAME,LOCK_TYPE,LOCK_DATA,LOCK_MODE,LOCK_STATUS, EVENT_ID FROM performance_schema.data_locks<span class="p">;</span>
+-----------------------------------------+-----------+-----------+------------------------+-------------+----------+
<span class="p">|</span> INDEX_NAME <span class="p">|</span> LOCK_TYPE <span class="p">|</span> LOCK_DATA <span class="p">|</span> LOCK_MODE <span class="p">|</span> LOCK_STATUS <span class="p">|</span> EVENT_ID <span class="p">|</span>
+-----------------------------------------+-----------+-----------+------------------------+-------------+----------+
<span class="p">|</span> NULL <span class="p">|</span> TABLE <span class="p">|</span> NULL <span class="p">|</span> IX <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">351</span> <span class="p">|</span>
<span class="p">|</span> index_table2_on_table1_id <span class="p">|</span> RECORD <span class="p">|</span> <span class="m">127</span>, <span class="m">92</span> <span class="p">|</span> X,GAP,INSERT_INTENTION <span class="p">|</span> WAITING <span class="p">|</span> <span class="m">351</span> <span class="p">|</span>
<span class="p">|</span> NULL <span class="p">|</span> TABLE <span class="p">|</span> NULL <span class="p">|</span> IX <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
<span class="p">|</span> index_table2_on_table1_id <span class="p">|</span> RECORD <span class="p">|</span> <span class="m">127</span>, <span class="m">92</span> <span class="p">|</span> X <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
<span class="p">|</span> PRIMARY <span class="p">|</span> RECORD <span class="p">|</span> <span class="m">92</span> <span class="p">|</span> X,REC_NOT_GAP <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
<span class="p">|</span> index_table2_on_table1_id <span class="p">|</span> RECORD <span class="p">|</span> <span class="m">128</span>, <span class="m">93</span> <span class="p">|</span> X,GAP <span class="p">|</span> GRANTED <span class="p">|</span> <span class="m">408</span> <span class="p">|</span>
+-----------------------------------------+-----------+-----------+------------------------+-------------+----------+
<span class="m">6</span> rows <span class="k">in</span> <span class="nb">set</span> <span class="o">(</span><span class="m">0</span>.01 sec<span class="o">)</span></code></pre></figure>
<p>As Transaction A holds the lock on table1_id 126 due to the gap lock, Transaction B waits for the lock. However, it eventually times out, resulting in a lock wait timeout error.</p>
<p>To create a deadlock, one must perform a delete query in Transaction B. Then, when attempting to insert a record in Transaction A, a deadlock error is thrown, with Transaction B becoming the victim. <strong>This deadlock situation arises due to the conflicts in the next-key lock, preventing Transaction B from inserting the record.</strong></p>
<h4 id="in-a-nutshell"><strong>In a nutshell</strong></h4>
<p>Lets understood the above queries in nutshell to create a deadlock.</p>
<ul>
<li>Transaction A -> BEGIN;</li>
<li>Transaction A -> DELETE records on table2 with table1_id=x.</li>
<li>Transaction B -> BEGIN;</li>
<li>Transaction B -> DELETE record on table2 with table1_id=y;</li>
<li>Transaction B -> INSERT a record on table2 and table1_id is x-1.</li>
<li>Transaction A -> INSERT a record on table2 and table1_id is y-1.</li>
<li>A deadlock occurs, with Transaction A being the victim.</li>
</ul>
<h4 id="practical-example-of-gap-lock--next-key-lock"><strong>Practical example of GAP lock & Next Key Lock.</strong></h4>
<p>Gap lock is basically on range of values & will be aquired on a range if we try to delete a record which does not exist.</p>
<p><strong>table1</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>+----+
<span class="p">|</span> id <span class="p">|</span>
+----+
<span class="p">|</span> <span class="m">73</span> <span class="p">|</span>
<span class="p">|</span> <span class="m">74</span> <span class="p">|</span>
<span class="p">|</span> <span class="m">81</span> <span class="p">|</span>
<span class="p">|</span> <span class="m">82</span> <span class="p">|</span>
+----+</code></pre></figure>
<p><strong>table2</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>+-----+-----------+
<span class="p">|</span> id <span class="p">|</span> table1_id <span class="p">|</span>
+-----+-----------+
<span class="p">|</span> <span class="m">1</span> <span class="p">|</span> <span class="m">73</span> <span class="p">|</span>
<span class="p">|</span> <span class="m">2</span> <span class="p">|</span> <span class="m">82</span> <span class="p">|</span>
+-----+-----------+</code></pre></figure>
<p><strong>Transaction A</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>BEGIN<span class="p">;</span>
DELETE from table2 where <span class="nv">table1_id</span><span class="o">=</span><span class="m">75</span><span class="p">;</span>
Query OK, <span class="m">0</span> rows affected <span class="o">(</span><span class="m">0</span>.00 sec<span class="o">)</span></code></pre></figure>
<p>This transaction will aquire a gap lock on range from 74-80.
this means if we try to insert new values in table2(in another session) with table1_id ranging from 74-80 it will wait until delete transaction commits.</p>
<h4 id="other-issues"><strong>Other issues</strong></h4>
<p>In addition to addressing the deadlock issues caused by gap locks, we also encountered problems related to AASM records. We were using the AASM gem, a library that manages state transitions. In our case, this library was responsible for changing the state of the test to “completed” and executing several callback functions. These operations were performed as part of a single transaction, which sometimes resulted in prolonged transaction durations and increased the likelihood of deadlocks.</p>
<p><strong>Model dummy code</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>aasm <span class="k">do</span>
state :active, initial: <span class="nb">true</span>
state :complete
event :complete, after: <span class="o">[</span>:method1, :method2, :method3<span class="o">]</span> <span class="k">do</span>
transitions from: :active, to: :complete
end
end</code></pre></figure>
<p>When the test is marked as complete and the state changes, all the MySQL-related queries are executed as part of a single transaction.</p>
<p>Due to the execution of all these methods within a single transaction, there were instances where the transaction took a considerable amount of time to complete. These prolonged transactions duration increased the risk of deadlocks occurrence and also resulted in issues related to lock wait time.</p>
<h4 id="fix"><strong>FIX</strong></h4>
<ol>
<li>To fix this we moved the insertion of records as a separate transaction out of the aasm state change.</li>
<li>Optimized transaction size: We optimized the other badly written queries in the transaction.</li>
<li>Reduced transaction duration: Only limited number of queries were part of the state change transaction (to keep the transaction short).</li>
<li>We further optimized the GAP lock by avoiding unnecessary delete queries when the records were not present in the table with the corresponding ID.</li>
</ol>
<h4 id="references"><strong>References</strong></h4>
<ol>
<li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-gap-locks" target="_blank" style="color: blue;">Innodb Gap Lock</a></li>
<li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-next-key-locks" target="_blank" style="color: blue;">Innodb Next Key Lock</a></li>
<li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-insert-intention-locks" target="_blank" style="color: blue;">Innodb Insert Intention Lock</a></li>
<li><a href="https://medium.com/@tanishiking/avoid-deadlock-caused-by-a-conflict-of-transactions-that-accidentally-acquire-gap-lock-in-innodb-a114e975fd72" target="_blank" style="color: blue;">Gap lock with example medium article</a></li>
<li><a href="https://www.percona.com/blog/innodbs-gap-lock" target="_blank" style="color: blue;">Gap lock article by percona</a></li>
</ol>
<p><a href="/technology/debugging-and-fixing-mysql-deadlock-issue/">Debugging & Fixing mysql deadlock issue</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on June 12, 2023.</p>/technology/website-monitor-using-google-app-script2022-12-30 02:04:27 +0530T00:00:00-00:002022-12-30T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>Recently, I was looking for a solution to notify me when a website is down and when it is back up. I found a few solutions, but they all had a learning curve. So I thought of an alternative solution using Google App Script, which I had recently learned about.</p>
<p><strong>Requirements</strong></p>
<ul>
<li>Can run every 5 minutes.</li>
<li>Can send emails when the website is down.</li>
<li>Trustworthy.</li>
</ul>
<p>I wasn’t sure if the first requirement was possible with Google App Script, but the other two were. After reading the documentation, I found that it was possible to create a time-based trigger for a script.</p>
<p><strong>Steps to follow:</strong></p>
<ul>
<li>Create a new Google App Script project.</li>
<li>Create a function to track the website. Here is an example:</li>
</ul>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="k">function</span> myFunction<span class="o">()</span> <span class="o">{</span>
const <span class="nv">initialUrls</span> <span class="o">=</span> <span class="o">[</span>
<span class="o">{</span> uri: <span class="s1">'https://mock.codes/200'</span>, status: <span class="s1">''</span><span class="o">}</span>,
<span class="o">{</span> uri: <span class="s1">'https://mock.codes/500'</span>, status: <span class="s1">''</span><span class="o">}</span>
<span class="o">]</span><span class="p">;</span>
const <span class="nv">properties</span> <span class="o">=</span> PropertiesService.getScriptProperties<span class="o">()</span><span class="p">;</span>
<span class="nb">let</span> <span class="nv">urls</span> <span class="o">=</span> JSON.parse<span class="o">(</span>properties.getProperty<span class="o">(</span><span class="s1">'URL_LIST'</span><span class="o">))</span> <span class="o">||</span> initialUrls<span class="p">;</span>
const <span class="nv">errorResponseCodes</span> <span class="o">=</span> <span class="o">[</span><span class="m">500</span>, <span class="m">502</span>, <span class="m">503</span>, <span class="m">504</span><span class="o">]</span><span class="p">;</span>
const <span class="nv">alertEmail</span> <span class="o">=</span> <span class="s1">'alertmail@gmail.com'</span><span class="p">;</span>
const <span class="nv">options</span> <span class="o">=</span> <span class="o">{</span> muteHttpExceptions: <span class="nb">true</span> <span class="o">}</span><span class="p">;</span>
urls.forEach<span class="o">((</span>url<span class="o">)</span> <span class="o">=</span>> <span class="o">{</span>
<span class="nb">let</span> <span class="nv">responseCode</span> <span class="o">=</span> UrlFetchApp.fetch<span class="o">(</span>url.uri, options<span class="o">)</span>.getResponseCode<span class="o">()</span><span class="p">;</span>
const <span class="nv">isErrorResponse</span> <span class="o">=</span> errorResponseCodes.includes<span class="o">(</span>responseCode<span class="o">)</span><span class="p">;</span>
const <span class="nv">wasPreviouslyDown</span> <span class="o">=</span> url.status <span class="o">===</span> <span class="s1">'down'</span><span class="p">;</span>
<span class="k">if</span> <span class="o">(</span>isErrorResponse <span class="o">&&</span> !wasPreviouslyDown<span class="o">)</span> <span class="o">{</span>
// Site is now down <span class="k">for</span> the first <span class="nb">time</span>
const <span class="nv">subject</span> <span class="o">=</span> <span class="sb">`</span>Alert: Your site <span class="si">${</span><span class="nv">url</span><span class="p">.uri</span><span class="si">}</span> is currently down<span class="sb">`</span><span class="p">;</span>
const <span class="nv">body</span> <span class="o">=</span> <span class="sb">`</span><span class="si">${</span><span class="nv">url</span><span class="p">.uri</span><span class="si">}</span> has encountered an error with status code <span class="si">${</span><span class="nv">responseCode</span><span class="si">}</span><span class="sb">`</span><span class="p">;</span>
MailApp.sendEmail<span class="o">(</span>alertEmail, subject, body<span class="o">)</span><span class="p">;</span>
url.status <span class="o">=</span> <span class="s1">'down'</span><span class="p">;</span>
<span class="o">}</span> <span class="k">else</span> <span class="k">if</span> <span class="o">(</span>!isErrorResponse <span class="o">&&</span> wasPreviouslyDown<span class="o">)</span> <span class="o">{</span>
// Site was previously down, but is now back up
const <span class="nv">subject</span> <span class="o">=</span> <span class="sb">`</span>Your site <span class="si">${</span><span class="nv">url</span><span class="p">.uri</span><span class="si">}</span> is now back up<span class="sb">`</span><span class="p">;</span>
const <span class="nv">body</span> <span class="o">=</span> <span class="sb">`</span><span class="si">${</span><span class="nv">url</span><span class="p">.uri</span><span class="si">}</span> has recovered and is now back up<span class="sb">`</span><span class="p">;</span>
MailApp.sendEmail<span class="o">(</span>alertEmail, subject, body<span class="o">)</span><span class="p">;</span>
url.status <span class="o">=</span> <span class="s1">''</span><span class="p">;</span>
<span class="o">}</span>
<span class="o">})</span><span class="p">;</span>
properties.setProperty<span class="o">(</span><span class="s1">'URL_LIST'</span>, JSON.stringify<span class="o">(</span>urls<span class="o">))</span><span class="p">;</span>
<span class="o">}</span></code></pre></figure>
<ul>
<li>Go to the “Triggers” menu in the left sidebar of the Google App Script project.</li>
<li>Click the “Add Trigger” button and select the function to run.</li>
<li>Choose the options to run the trigger every 5 minutes and click “Save”</li>
</ul>
<p><strong>Explanation</strong></p>
<p>This above code uses the UrlFetchApp service to make HTTP requests to the websites and check their status. it stores the value of each trigger in a variable so that whenver site goes live again it can send email of website backed up.</p>
<blockquote>
<p>You can also check the logs for each trigger execution in the “Execution” menu on the left side of the project.</p>
</blockquote>
<p><img src="/blog/images/website-monitor/email.jpg" alt="Email Sample" /></p>
<p><strong>Conclusion</strong></p>
<p>In conclusion, Google App Script is a useful tool for creating a customized website tracker that can notify the user when a website is down. The process of setting up the tracker is straightforward and the logs can be easily accessed to track the execution of the function. this basic functionality can be enhanced more to record the status in a csv file. also interesting graphs and charts can be made using that data.</p>
<p><strong>Additional investigations</strong></p>
<p><a href="https://github.com/upptime/upptime" target="_blank" style="color: blue;">Upptime</a> is one of the good open-source alternative which can be used to monitor a website. it uses github actions to make sure the website is up and creates a issue if website is down for some reason. it also logs the information about the website speed.</p>
<p><a href="/technology/website-monitor-using-google-app-script/">Website Monitor Using Google App Script</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on December 30, 2022.</p>/technology/the-revamp-of-a-video-proctoring-solution-a-behind-the-scenes-look2022-12-27 12:33:45 +0530T00:00:00-00:002022-12-27T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p><strong><em>The story of how we took a good platform and made it even better</em></strong></p>
<p><img src="/blog/images/revamp-of-proctoring-solution/proctoring-dashboard.png" alt="Protcoring Dashboard" /></p>
<p><strong>Background</strong></p>
<p>For the past few months, the number of test takers and clients at eLitmus has increased significantly. Conducting all of these tests remotely poses a significant challenge in terms of preventing cheating. To address this issue, eLitmus has developed an in-house solution using the open-source <a href="https://github.com/Kurento/kurento-media-server">Kurento media server</a>. While this solution has been effective in terms of recording videos, it is not horizontally scalable.</p>
<p>In search of a more effective solution, eLitmus turned to <a href="https://aws.amazon.com/kinesis/">Amazon Kinesis</a> and worked with the AWS team to conduct a proof-of-concept. While this approach allowed for live proctoring, it was not possible to record the exams.</p>
<p><strong>How did it get begin?</strong></p>
<p>As I was learning about <a href="https://webrtc.org/">WebRTC</a> and Amazon Kinesis during this time, I had the opportunity to attend a session by a company called <a href="https://www.100ms.live/">100ms</a>. This company is focused on solving problems related to live conferencing, and I was eager to learn more about their approach.</p>
<p>After connecting with the co-founder of <a href="https://www.100ms.live/">100ms</a>, I received a message from their salesperson to schedule a demo call. During the call, we determined that 100ms could be a potential solution for eLitmus’ scalability problem. However, we needed to weigh the costs of maintaining engineering time and effort to maintain the solution against the opportunity cost of using that time to build a new product, as well as overall server and bandwidth costs.</p>
<p>Based on this analysis, we decided to proceed with a proof-of-concept for live remote proctoring. I spent the next week working on the proof-of-concept and was able to complete it successfully. From there, we saw potential synergies between 100ms and eLitmus and decided to make the product(<a href="https://github.com/elitmus/knights-watch">Knights Watch</a>) an open-source platform.</p>
<p><strong>Designing & Developing</strong></p>
<p>I created a document outlining the requirements for the video proctoring solution, including features such as a proctoring dashboard, candidate tests screen, cheating analysis and verification dashboard, admin dashboard, and auto proctoring. For the first version (v0.1), we planned to roll out the proctoring dashboard with multiple streams visible to the proctor, storage of the video stream on an s3 server, retrieval of the video stream in the cheating analysis and verification dashboard, and admin configuration.</p>
<p>After outlining the requirements for the video proctoring solution, I designed the <a href="https://docs.google.com/presentation/d/1_CebvXEStUtx8m4Hw9DLQPK6AD8gxBKU/edit?usp=sharing&ouid=100590295233713204603&rtpof=true&sd=true">architecture</a> for the solution, diagrammatically representing how all of the components would be connected. The main components of the app were the 100ms server API, the eLitmus server, and the candidate or proctor’s browser.</p>
<p>Next, I created a <a href="https://github.com/elitmus/knights-watch/milestone/1">milestone</a> on Github and listed out the issues that needed to be addressed, including the integration of the proctor dashboard, candidate test screen, algorithm for assigning candidates and proctors to rooms, and storage of videos on the eLitmus prescribed directory structure on an s3 server.</p>
<p><img src="/blog/images/revamp-of-proctoring-solution/milestone.png" alt="Milestone" /></p>
<p>I began working on these issues and was able to roll out the v0.1 of the proctoring solution within a few weeks. During this time, our team encountered various challenges and suggested various features to 100ms.</p>
<p><strong>Challenges Faced</strong></p>
<p>As we worked on storing videos on an AWS s3 server in our prescribed directory structure, we encountered a challenge with the 100ms API. The webhook provided by 100ms was only for the composite recording of the room, not for individual recordings. However, we needed webhooks to notify us of the success of each individual recording. In addition, 100ms had the functionality for only a single webhook per account, but we needed to support multiple environments with multiple applications within a single account. We requested this feature from 100ms.</p>
<p>While working on an algorithm to assign candidates and proctors to rooms, I faced the challenge of storing authentication tokens in the user’s browser and in Redis storage in production. I wrote an algorithm to handle the expiration of tokens from both ends and to handle multiple events.</p>
<p>As we configured 100ms for various environments including staging, production, and edge, we encountered several issues and suggested various features to 100ms. These included the ability to delete apps and templates from the 100ms dashboard from the front-end, team management options in the dashboard, and handling of access keys and secrets for multiple environments.</p>
<p><strong>Testing live remote proctoring solution</strong></p>
<p>After completing the first version (v0.1) of the video proctoring solution, we were ready to test it in production. eLitmus was conducting an internal hiring event at the time, and we used the live video proctoring feature for this event with around 400 candidates. The event went smoothly, with minor issues. The proctor was able to hear the voices of the candidates and all of the videos were recorded throughout the session.</p>
<p>This success gave us confidence in the solution, and we made some minor tweaks. However, our main concern from the start had been scalability, and we wanted to test the solution at a larger scale. We had an in-person test at IITK with over 600 candidates, and decided to conduct the event with live proctoring. The event went smoothly, but the next day we conducted data analysis and discovered that 14 out of 600+ videos had some data loss or were not recorded.</p>
<p>We had a meeting with 100ms to discuss this issue, and after working with their engineering team, we determined that the issue was caused by network connectivity problems. We fixed the issue and the proctoring solution became more stable, with 97% of the videos being recorded.</p>
<p><strong>Open-sourcing video proctoring solution</strong></p>
<p>After this event, we had discussions with 100ms about pricing and suggested various features, including pricing on the 100ms dashboard itself and the option to opt-in or opt-out of composite recording and browser-based recording.</p>
<p>After making the video proctoring solution an open-source project, I focused on documenting the project so that it could be used by others in the community and more developers could contribute to it. I wrote several documents, including a readme file, information on the architecture and prerequisites, installation guidelines, development guidelines, deployment guidelines, a code of conduct, and guidelines for contributing and welcoming new contributors.</p>
<p><strong>Conclusion</strong></p>
<p>In conclusion, the development of the video proctoring solution at eLitmus was a challenging but rewarding process. By identifying a need to solve the problem of vertical scalability, we were able to explore various solutions and ultimately choose 100ms as a partner to help us build a scalable and effective video proctoring platform. Through the development process, we encountered various challenges and were able to work closely with the 100ms team to find solutions and improve the stability of the platform. We are proud to have made the video proctoring solution an open-source project and to have contributed to the community by documenting the project and welcoming new contributors. We hope that others will find this project useful and will be able to build upon it to create even better solutions in the future.</p>
<p><a href="/technology/the-revamp-of-a-video-proctoring-solution-a-behind-the-scenes-look/">The revamp of a Video Proctoring Solution: A Behind-the-Scenes Look</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on December 27, 2022.</p>/technology/fixing-capybara-flaky-tests2022-12-20 00:21:23 +0530T00:00:00-00:002022-12-20T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>When writing system tests for a user interface, it is common to encounter test cases that fail randomly. One of the common failure can occur when the JavaScript on a page takes time to render, causing issues with the test case.</p>
<p>For example, imagine a test case that clicks a button on a page and then checks for the presence of certain content after the click.</p>
<p><strong>Demo Code</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>visit submit_page
click_on <span class="s1">'Submit'</span>
assert page.has_content <span class="s1">'Some content after clicking on submit'</span></code></pre></figure>
<p>In most cases, this test will run without any issues. However, occasionally the test may fail on the third line with the error “Expected false to be truthy”. This error can occur when the page is visited and the JavaScript on the page takes a few seconds to load. During this time, the submit button may be clicked, but because there is no JavaScript associated with the button yet, the button click does not do anything. As a result, the test is still on the submit page when it tries to assert that the expected content is present, causing the test to fail.</p>
<p><strong>Solution</strong></p>
<p>One solution to this problem is to increase the <code>wait_time</code> setting in capybara. However, this approach has several limitations. First, the wait_time setting is global and applies to all test cases, so if it is set to a high value, it will increase the overall execution time of the test suite. Additionally, the wait_time setting only waits for a fixed amount of time before moving on with the test, without checking whether the page has finished loading. This means that if the page takes longer to load than the wait_time</p>
<p>The other solution is to use the <code>execute_script</code> method provided by Capybara to click the button instead of the <code>click_on</code> method. The execute_script method allows you to execute JavaScript code within the context of the current page. By using this method to click the button, the click action is added to the end of the browser’s call stack. This means that the click action will be executed after any existing JavaScript code on the page has finished running, ensuring that the button is fully initialized and ready to be interacted with before the test tries to click it.</p>
<p>To use the execute_script method to click the button, you can use the following code:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>page.find_button<span class="o">(</span><span class="s1">'Submit'</span><span class="o">)</span>.execute_script<span class="o">(</span><span class="s1">'this.click()'</span><span class="o">)</span></code></pre></figure>
<p>This way we can ensure that click method will run only after the page javascript is fully loaded.</p>
<p><strong>Browser Call Stack</strong></p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span> <span class="p">|</span> <span class="p">|</span>
<span class="p">|</span> <span class="p">|</span>
<span class="p">|</span> JavaScript <span class="p">|</span> <-- existing code on the page<span class="o">(</span><span class="m">1</span><span class="o">)</span>
<span class="p">|</span>_______________<span class="p">|</span>
<span class="p">|</span> <span class="p">|</span>
<span class="p">|</span> JavaScript <span class="p">|</span> <-- existing code on the page<span class="o">(</span><span class="m">2</span><span class="o">)</span>
<span class="p">|</span>_______________<span class="p">|</span>
<span class="p">|</span> <span class="p">|</span>
<span class="p">|</span> click action <span class="p">|</span> <-- added by execute_script method<span class="o">(</span><span class="m">3</span><span class="o">)</span>
<span class="p">|</span>_______________<span class="p">|</span></code></pre></figure>
<p><a href="/technology/fixing-capybara-flaky-tests/">Fixing Capybara Flaky Tests</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on December 20, 2022.</p>/technology/sidekiq-process-in-production-with-systemd-and-monit2022-06-29 08:10:37 +0530T00:00:00-00:002022-06-29T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>Recently, we have upgraded our Sidekiq version from 5.2 to 6.5. Before Sidekiq 6.0 we were managing the Sidekiq process directly using Monit. With the release of Sidekiq 6, the team has removed the <strong>daemonization, logfile, and pidfile command line arguments and sidekiqctl binary</strong>. Managing services manually is more error-prone, let our operating system do it for us.
We have three options to go with systemd, upstart, and foreman. We decided to go ahead with the systemd.</p>
<h4 id="systemd"><strong>Systemd</strong></h4>
<p><a href="https://wiki.debian.org/systemd#systemd_-_system_and_service_manager">Systemd</a> is a system and service manager for linux. Systemd tasks are organized as units. Most common units are services(.service), mount points(.mount), devices(.device), sockets(.socket), or timers(.timer)</p>
<h4 id="systemctl"><strong>Systemctl</strong></h4>
<p>The systemctl command is a utility which is responsible for examining and controlling the systemd system and service manager.</p>
<h4 id="sidekiq"><strong>Sidekiq</strong></h4>
<p>Simple, efficient background processing for Ruby.</p>
<h3 id="sidekiq-running-as-systemd-service"><strong>Sidekiq running as Systemd Service</strong></h3>
<ol>
<li>
To manage Sidekiq we need to create a service file for Sidekiq which can be used to start, stop or restart the Sidekiq process.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>Sudo nano /lib/systemd/system/sidekiq.service</code></pre></figure>
</li>
<li>
Content in the Sidekiq.service. Sidekiq has provided us with the template for the service file here <a href="https://github.com/mperham/sidekiq/blob/main/examples/systemd/sidekiq.service">Sidekiq.service</a>. We modified it according to our use case
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="o">[</span>Unit<span class="o">]</span>
<span class="nv">Description</span><span class="o">=</span>sidekiq
<span class="nv">After</span><span class="o">=</span>syslog.target network.target
<span class="o">[</span>Service<span class="o">]</span>
<span class="nv">Type</span><span class="o">=</span>simple
<span class="c1"># If your Sidekiq process locks up, systemd's watchdog will restart it within seconds.</span>
<span class="c1">#WatchdogSec=10</span>
<span class="nv">WorkingDirectory</span><span class="o">=</span>/opt/myapp/current
<span class="nv">ExecStart</span><span class="o">=</span>/usr/local/bin/bundle <span class="nb">exec</span> sidekiq -C /opt/myapp/shared/config/sidekiq.yml -e production
<span class="nv">ExecStop</span><span class="o">=</span>/bin/kill -TSTP <span class="nv">$MAINPID</span>
<span class="nv">ExecStartPost</span><span class="o">=</span>/bin/sh -c <span class="s1">'/bin/echo $MAINPID > /opt/myapp/shared/pids/sidekiq.pid'</span>
<span class="nv">ExecStopPost</span><span class="o">=</span>/bin/sh -c <span class="s1">'rm /opt/myapp/shared/pids/sidekiq.pid'</span>
<span class="nv">User</span><span class="o">=</span>deploy
<span class="nv">Group</span><span class="o">=</span>deploy
<span class="nv">UMask</span><span class="o">=</span><span class="m">0002</span>
<span class="c1"># Greatly reduce Ruby memory fragmentation and heap usage</span>
<span class="c1"># https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/</span>
<span class="nv">Environment</span><span class="o">=</span><span class="nv">MALLOC_ARENA_MAX</span><span class="o">=</span><span class="m">2</span>
<span class="c1"># if we crash, restart</span>
<span class="nv">RestartSec</span><span class="o">=</span><span class="m">10</span>
<span class="nv">Restart</span><span class="o">=</span>on-failure
<span class="c1"># output goes to /var/log/syslog (Ubuntu) or /var/log/messages (CentOS)</span>
<span class="nv">StandardOutput</span><span class="o">=</span>syslog
<span class="nv">StandardError</span><span class="o">=</span>syslog
<span class="c1"># This will default to "bundler" if we don't specify it</span>
<span class="nv">SyslogIdentifier</span><span class="o">=</span>sidekiq
<span class="o">[</span>Install<span class="o">]</span>
<span class="nv">WantedBy</span><span class="o">=</span>multi-user.target</code></pre></figure>
</li>
<li>
Our Modified Configurations:
<ul>
<li>
As we were using system ruby and using Sidekiq with some custom configurations. To start Sidekiq we used.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="nv">ExecStart</span><span class="o">=</span>/usr/local/bin/bundle <span class="nb">exec</span> sidekiq -C /opt/myapp/shared/config/sidekiq.yml -e production</code></pre></figure>
</li>
<li>
To stop Sidekiq we need to send a TSTP signal to process all the busy jobs before terminating Sidekiq.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="nv">ExecStop</span><span class="o">=</span>/bin/kill -TSTP <span class="nv">$MAINPID</span></code></pre></figure>
</li>
<li>
For Managing with Monit we need the process id, After starting or stopping the service we were maintaining the process id file.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="nv">ExecStartPost</span><span class="o">=</span>/bin/sh -c <span class="s1">'/bin/echo $MAINPID > /opt/myapp/shared/pids/sidekiq.pid'</span>
<span class="nv">ExecStopPost</span><span class="o">=</span>/bin/sh -c <span class="s1">'rm /opt/myapp/shared/pids/sidekiq.pid'</span></code></pre></figure>
</li>
<li>
As we want to use our app user to run this service.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="nv">User</span><span class="o">=</span>deploy
<span class="nv">Group</span><span class="o">=</span>deploy
<span class="nv">UMask</span><span class="o">=</span><span class="m">0002</span></code></pre></figure>
</li>
<li>
And we want to restart only when there is a failure.
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="c1"># if we crash, restart</span>
<span class="nv">RestartSec</span><span class="o">=</span><span class="m">10</span>
<span class="nv">Restart</span><span class="o">=</span>on-failure</code></pre></figure>
</li>
</ul>
</li>
<li>
Reload the systemctl daemon for the new created service
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>Sudo systemctl daemon-reload</code></pre></figure>
</li>
<li>
Now we can start the Sidekiq service:
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>sudo systemctl start<span class="p">|</span>stop<span class="p">|</span>restart sidekiq</code></pre></figure>
</li>
</ol>
<h3 id="monitor-sidekiq-process-with-monit"><strong>Monitor Sidekiq process with Monit</strong></h3>
<p>Now we have systemd to start, stop and restart the Sidekiq process. Now we will look at how to monitor the Sidekiq process with the help of monit.</p>
<h4 id="monit"><strong>Monit</strong></h4>
<p>Monit is a utility for managing and monitoring processes, programs, files, directories and filesystems on a Unix system.</p>
<ol>
<li>
Modified monitrc
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>check process sidekiq with pidfile <span class="s2">"/opt/myapp/shared/pids/sidekiq.pid"</span>
start <span class="nv">program</span> <span class="o">=</span> <span class="s2">"/bin/bash -l -c 'sudo systemctl start sidekiq' as uid deploy and gid deploy"</span>
with timeout <span class="m">20</span> seconds
stop <span class="nv">program</span> <span class="o">=</span> <span class="s2">"/bin/bash -l -c 'sudo systemctl stop sidekiq' as uid deploy and gid deploy"</span>
with timeout <span class="m">20</span> seconds
<span class="k">if</span> totalmem is greater than <span class="m">800</span> MB <span class="k">for</span> <span class="m">3</span> cycles <span class="k">then</span> restart
<span class="k">if</span> changed pid <span class="k">then</span> <span class="nb">exec</span> <span class="s2">"/etc/monit/slack_notifier.sh"</span>
<span class="k">if</span> cpu is greater than <span class="m">65</span>% <span class="k">for</span> <span class="m">2</span> cycles <span class="k">then</span> <span class="nb">exec</span> <span class="s2">"/etc/monit/slack_notifier.sh"</span> <span class="k">else</span> <span class="k">if</span> succeeded <span class="k">then</span> <span class="nb">exec</span> <span class="s2">"/etc/monit/slack_notifier.sh"</span></code></pre></figure>
</li>
<li>
We can check if sidekiq is up and running:
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>sudo monit summary sidekiq</code></pre></figure>
</li>
</ol>
<p>Monit will check the Sidekiq process and it will automatically start in case of the unexpected kill of the Sidekiq process.</p>
<p>We have successfully completed the Sidekiq process monitoring with the help of Monit and Systemd.</p>
<p><a href="/technology/sidekiq-process-in-production-with-systemd-and-monit/">Sidekiq process in production with Systemd and Monit</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on June 29, 2022.</p>/the-other-side/outliers-the-story-of-success-book-review2022-05-02 17:59:16 +0530T00:00:00-00:002022-05-02T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>For the last two months, I have been reading the book “<strong>Outliers</strong>” by <strong>Malcolm Gladwell</strong>. “<strong>Outliers - The story of success</strong>” has two parts: <strong>Opportunity</strong> and <strong>Legacy</strong>.</p>
<p>“In outliers author survey the ingredients for the success. He wrote about the reason behind the success of great people like Bill Gates, Bill Joy, Joseph Flom, and the musical group Beatles. And how Chris Langan and Oppenheimer ended up with different stories. And how the culture, family, and friends play a role in determining individual success.”</p>
<p><ins><strong>Key Factor of Success:</strong></ins></p>
<p><strong>1. Opportunity</strong></p>
<p>We all have equal opportunities. Some people recognize and take advantage of them. But as per the author, In reality, these people are benefited from hidden and extraordinary opportunities. Culture benefits, the where and in which family and the time we grew up. Values received from our ancestors.</p>
<p><strong>2. Environment affect</strong></p>
<p>The values of the world we live in and the people around us, have a profound effect on who we are. The place we live and the people we spend time with are critical factors to success. The author explains the Roseto Mystery. Why do the people of Roseto have good health and rare heart attacks? How community benefits play a dominant role in Roseto Mystery.</p>
<p><strong>3. Hard work</strong></p>
<p>It takes 10000 hours to master anything from beginner to world-class expert in any field. “10000” hours is a lot, so it’s a good advantage for me to start at a young age.</p>
<p>The author explains how Bill Joy from age of 16 started with computers. And from that day in university, when introduced to the computer, the place became his life, and he programmed whenever he gets time. He wrote the UNIX and the program for TCP/IP, which allows us to connect to the Internet. In his early days he devoted his 10,000 hours with passion and abilities.</p>
<p>How Beatles were invited to play in a club where they had to play for a lot of hours, even the whole night. By the time the Beatles reached success, they played almost 12000 times and for more than 10,000 hours.</p>
<p><strong>4. Legacy often drives our behavior</strong></p>
<p>Gladwell points out how values passed from generation to generation. He points out the cultural legacy of Asian countries where rice is the dominant crop. How the success of rice paddy depends on the amount of hard work and diligence; we put in. To have a successful paddy, wake up at dawn and work all day. Which then creates a cultural legacy of hard work.</p>
<p><a href="/the-other-side/outliers-the-story-of-success-book-review/">Outliers: The story of Success - Book Review</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on May 02, 2022.</p>/technology/creating-an-npm-package-from-my-react-component2021-05-25 22:14:23 +0530T00:00:00-00:002021-05-25T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>So, you have created a useful, customisable, modular component in REACT. Now, you want to share it with everyone by making a package so that anyone can install it ? That is exactly what I had done and now I also wanted to create a npm package and publish it and this is how I did it.</p>
<h2 id="prerequisites">Prerequisites</h2>
<p>Since I was going to create a npm package, I needed to have Node and npm installed on my system.</p>
<p>Also I needed a npm account. I didn’t have one so I had to create one before I got started. You can also create one from <a href="https://www.npmjs.com/">here</a>.</p>
<h2 id="getting-started">Getting Started</h2>
<p>First order of business was to select an unique name for my package. I settled on <code>react-rails-pagination</code> as the name for my package.</p>
<p>To confirm that no package with the same name existed I had to use the following command.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>npm search react-rails-pagination</code></pre></figure>
<p>You can use</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>npm search <your-package-name></code></pre></figure>
<p>And if no existing package is found with the same name, then you are good to go.</p>
<p>After I selected a package name, I had to run the following command in my terminal to initialise the package.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>npx create-react-library react-rails-pagination</code></pre></figure>
<p>I was prompted to answer a few questions about my package now</p>
<p><img src="/blog/images/creating-an-npm-package-from-my-react-component/package-info.png" alt="npm package basic info" /></p>
<p>After entering all the information, it will automatically setup the project. This process might take a little time.</p>
<p>The advantage of using <code>create-react-library</code> is that it will initialise your project to be published along with an example where you can test your package. It will also initialise it as a local git repository which you can simply push to github after adding the URL for your remote repository.</p>
<p>After <code>create-react-library</code> finishes, the folder structure looks like this</p>
<p><img src="/blog/images/creating-an-npm-package-from-my-react-component/file-structure.png" alt="npm package file structure" /></p>
<p>I had to run the following commands in two different terminal tabs to start the development environment</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="nb">cd</span> react-rails-pagination <span class="o">&&</span> npm start</code></pre></figure>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="nb">cd</span> react-rails-pagination/example <span class="o">&&</span> npm start</code></pre></figure>
<p>The first command watches the <code>src/</code> and recompiles it into the <code>dist/</code> folder when you make changes.</p>
<p>The second command runs the example app that links to your package.</p>
<h2 id="adding-my-react-component">Adding my REACT Component</h2>
<p>Now, I had a look inside the <code>src/</code> folder in my project. There was an <code>index.js</code> file which held an <code>ExampleComponent</code> that was being used in the example app.</p>
<p>To add my own REACT Component, I placed my <code>Pagination.jsx</code> file that held my Pagination component inside the <code>src/</code> folder. Since, my component required a css file too, I placed my css file <code>index.css</code> inside the same folder as well. I import this <code>index.css</code> file inside my Pagination component.</p>
<p>I don’t use a separate css module in my component so I deleted the generated <code>styles.modules.css</code> file inside the src directory.</p>
<p>After I had done these changes, my src directory looked something like this</p>
<p><img src="/blog/images/creating-an-npm-package-from-my-react-component/src-structure.png" alt="npm package src folder structure" /></p>
<p>Now, I need to make sure that my component is being exported from this package, so that any project that uses my package, will get to use my component as well.</p>
<p>For this I have to make some changes to the <code>index.js</code> file.</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span></span><span class="k">import</span> <span class="nx">Pagination</span> <span class="nx">from</span> <span class="s1">'./Pagination'</span><span class="p">;</span>
<span class="k">export</span> <span class="k">default</span> <span class="nx">Pagination</span><span class="p">;</span></code></pre></figure>
<p>This imports my component into the <code>index.js</code> file and sets it as the default export from the package. I do this because the source file or the entrypoint of my package is the <code>src/index.js</code> file.</p>
<p>If you don’t want to use the <code>index.js</code> file or want to create a new entrypoint then open the <code>package.json</code> file in the root of the project and change the value of the <code>source</code> key in that file.</p>
<p>This completes the process of adding my component to the package.</p>
<h2 id="checking-if-my-package-is-working-as-expected">Checking if my package is working as expected</h2>
<p>To check if my package is working or not, I have to go to the <code>example/</code> folder.</p>
<p>In that folder, I have to edit the <code>App.js</code> file which imports the <code>ExampleComponent</code> that I modified earlier to use my <code>Pagination</code> component.</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span></span><span class="k">import</span> <span class="nx">React</span> <span class="nx">from</span> <span class="s1">'react'</span>
<span class="k">import</span> <span class="nx">Pagination</span> <span class="nx">from</span> <span class="s1">'react-rails-pagination'</span>
<span class="k">import</span> <span class="s1">'react-rails-pagination/dist/index.css'</span>
<span class="kd">const</span> <span class="nx">App</span> <span class="o">=</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">page</span> <span class="o">=</span> <span class="mf">1</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">totalPages</span> <span class="o">=</span> <span class="mf">5</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">handleChangePage</span> <span class="o">=</span> <span class="p">(</span><span class="nx">currentPage</span><span class="p">)</span> <span class="p">=></span> <span class="p">{</span>
<span class="nx">page</span> <span class="o">=</span> <span class="nx">currentPage</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="o"><</span><span class="nx">Pagination</span> <span class="nx">page</span><span class="o">=</span><span class="p">{</span><span class="nx">page</span><span class="p">}</span> <span class="nx">pages</span><span class="o">=</span><span class="p">{</span><span class="nx">totalPages</span><span class="p">}</span> <span class="nx">handleChangePage</span><span class="o">=</span><span class="p">{</span><span class="nx">handleChangePage</span><span class="p">}</span> <span class="o">/></span>
<span class="p">}</span>
<span class="k">export</span> <span class="k">default</span> <span class="nx">App</span><span class="p">;</span></code></pre></figure>
<p>These changes allow me to import my package into this example application and check if it is working or not.</p>
<p>Now if I open the address that the local development server is running on in my browser, I can see that my component is loaded and functioning now.</p>
<p><img src="/blog/images/creating-an-npm-package-from-my-react-component/pagination-1.png" alt="pagination component 1" />
<img src="/blog/images/creating-an-npm-package-from-my-react-component/pagination-2.png" alt="pagination component 2" /></p>
<h2 id="publishing-my-package">Publishing my package</h2>
<p>I need to add a few things to get this package ready for publishing.</p>
<p>First I add a <code>.npmignore</code> file to stop a few things from being included in my published package to reduce it’s size. It works the same as a <code>.gitignore</code> file but for npm in this case.</p>
<p>The <code>.npmignore</code> looks like this in my project</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span><span class="c1">## the src folder</span>
src
.babelrc
rollup.config.js
<span class="c1">## node modules folder</span>
node_modules
<span class="c1">## git repository related files</span>
.git
.gitignore
CVS
.svn
.hg
.lock-wscript
.wafpickle-N
.DS_Store
npm-debug.log
.npmrc
<span class="c1">#others</span>
config.gypi
package-lock.json</code></pre></figure>
<p>Next I opened the <code>package.json</code> and added a few things in there as well.</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span></span><span class="p">{</span>
<span class="nt">"name"</span><span class="p">:</span> <span class="s2">"react-rails-pagination"</span><span class="p">,</span>
<span class="nt">"version"</span><span class="p">:</span> <span class="s2">"1.0.0"</span><span class="p">,</span>
<span class="nt">"description"</span><span class="p">:</span> <span class="s2">"React Pagination Component for Rails and other MVC Frameworks"</span><span class="p">,</span>
<span class="nt">"license"</span><span class="p">:</span> <span class="s2">"MIT"</span><span class="p">,</span>
<span class="nt">"repository"</span><span class="p">:</span> <span class="s2">"piyushswain/react-rails-pagination"</span><span class="p">,</span>
<span class="nt">"main"</span><span class="p">:</span> <span class="s2">"dist/index.js"</span><span class="p">,</span>
<span class="nt">"module"</span><span class="p">:</span> <span class="s2">"dist/index.modern.js"</span><span class="p">,</span>
<span class="nt">"source"</span><span class="p">:</span> <span class="s2">"src/index.js"</span><span class="p">,</span>
<span class="nt">"engines"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"node"</span><span class="p">:</span> <span class="s2">">=10"</span>
<span class="p">},</span>
<span class="nt">"keywords"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"react"</span><span class="p">,</span>
<span class="s2">"rails"</span><span class="p">,</span>
<span class="s2">"mvc"</span><span class="p">,</span>
<span class="s2">"react-component"</span><span class="p">,</span>
<span class="s2">"pagination"</span>
<span class="p">],</span>
<span class="nt">"author"</span><span class="p">:</span> <span class="p">{</span>
<span class="nt">"name"</span><span class="p">:</span> <span class="s2">"Piyush Swain"</span><span class="p">,</span>
<span class="nt">"email"</span><span class="p">:</span> <span class="s2">"piyush.swain3@gmail.com"</span>
<span class="p">},</span>
<span class="nt">"homepage"</span><span class="p">:</span> <span class="s2">"https://github.com/piyushswain/react-rails-pagination"</span><span class="p">,</span>
<span class="err">.</span>
<span class="err">.</span>
<span class="err">.</span>
<span class="err">.</span>
<span class="p">}</span></code></pre></figure>
<p>I updated the <code>author</code> field to add my email.</p>
<p>Next, I added the keys <code>homepage</code> and <code>keywords</code>.</p>
<p><code>homepage</code> can be used to add a website link to your project. I used my github repository link for now but I will change it later when I add a demo to this project. If you have a working demo, you can add that link in it’s place.</p>
<p>The <code>keywords</code> key can be used to give the npm search directory keywords to attach to your project so that people using the npm search engine can find your project more easily. It takes an array of words as an argument.</p>
<p>Finally, I update the <code>README.md</code> file in the root directory to add a description and instructions for anyone using my package. You will have to update your <code>README.md</code> according to your package as well.</p>
<p>I review all the changes and then first push my code to my github repository.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>git remote add origin https://github.com/piyushswain/react-rails-pagination.git <span class="c1"># Sets the new remote for the local repo</span>
git add .
git commit -m <span class="s1">'Initial Commit'</span>
git push -u origin main <span class="c1"># Pushes the changes to the remote repository</span></code></pre></figure>
<p>Now, my package is ready to be published. I run the following commands to start the process of publishing my package to npm.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>npm login</code></pre></figure>
<p>Login command asks for the username and password of your npm account. Enter those succeessfully and it will log you in to npm. If you have already logged in to npm, then you can skip this step.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>npm run build</code></pre></figure>
<p>This optimizes and creates a production build for your package. I recommend running this everytime before you issue a publish command.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span></span>npm publish</code></pre></figure>
<p>Finally, running this command will upload your package to npm. You can check it in your npm profile where you can find all your uploaded packages.</p>
<p>If you wish to publish again after making some changes then open your <code>package.json</code> file and update the <code>version</code> key to publish again. Remember to build your package before publishing as it will create an optimized production build for your package.</p>
<p><strong>TIP:</strong> If for some reason you cannot get the <code>css</code> to work, then a small hack is to directly update the <code>dist/index.css</code> file as this is the file that is published and used by anyone importing your package</p>
<blockquote>
<p>You can find this article on the author’s blog <a href="https://piyushswain.github.io/blog">piyushswain.github.io</a> as well.</p>
</blockquote>
<p><a href="/technology/creating-an-npm-package-from-my-react-component/">Creating an npm package from my REACT Component</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on May 25, 2021.</p>/technology/migration-from-paperclip-to-activestorage2021-05-21 15:07:15 +0530T00:00:00-00:002021-05-21T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p><em>How we migrated hundreds of thousands of attachments from Paperclip to ActiveStorage without downtime.</em></p>
<p>At <a href="https://www.elitmus.com">eLitmus</a>, recently we migrated thousands of attachment records from <a href="https://github.com/thoughtbot/paperclip">Paperclip</a> to Rails-owned <a href="https://guides.rubyonrails.org/active_storage_overview.html">ActiveStorage</a>. Paperclip and Active Storage solve similar problems - uploading files to cloud storage like Amazon S3, Google Cloud Storage, or Microsoft Azure Storage. In our case, we are uploading files to Amazon s3. And then attach those files to Active Records objects. So migrating from one to another is straightforward data-rewriting.</p>
<h3 id="why-do-we-migrate-from-paperclip-to-active-storage"><strong>Why do we migrate from paperclip to active storage?</strong></h3>
<p>ActiveStorage was introduced in Rails version 5.2. At the time of migration, we were at Rails version 6.0. So, we were already running behind in keeping things up to date. Active storage is a highly recommended tool for uploading files. For a long, before ActiveStorage, this functionality was provided by outside gems, including Paperclip. With the release of Active storage, Paperclip was already deprecated for some time, and we wanted to move forward with Active Storage knowing it’s not as mature as Paperclip, but it’s owned by the rails’ community behind it. So we were happy with that.</p>
<h3 id="how-do-we-migrate-from-paperclip-to-active-storage"><strong>How do we migrate from paperclip to active storage?</strong></h3>
<p>After reading articles on the web and the migration guide provided by the Paperclip process seemed pretty straightforward. We had around 2 Million records belonging to 16 different Active Records. In our case, we need migration that is fast and with no downtime. We had records in millions we cannot afford to wait for days to run migrations. We decided to do it in small steps. One step at a time, migrating all attachments of one Active Record. So a total of 32 Merge Requests were merged in production during this time. For each Active Record, two Merge Requests deployed because we didn’t want to have any unavailable attachments during the whole process, we split it into two steps or Merge Requests.</p>
<p>So both steps revolve around the Paperclip and ActiveStorage. Let us refresh our understanding of how paperclip and active storage works. Paperclip works by attaching file data to the model. At the same time, it changes the schema of the model by introducing four columns in the Active Record table. It manages rails validations based on size and presence of file data if required.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">create_table</span> <span class="s2">"users"</span><span class="p">,</span> <span class="ss">force</span><span class="p">:</span> <span class="ss">:cascade</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="s2">"image_file_name"</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="s2">"image_content_type"</span>
<span class="n">t</span><span class="o">.</span><span class="n">integer</span> <span class="s2">"image_file_size"</span>
<span class="n">t</span><span class="o">.</span><span class="n">datetime</span> <span class="s2">"image_updated_at"</span>
<span class="k">end</span></code></pre></figure>
<p>Here’s how it would go for a <code>User</code> with an <code>image</code>, that is this in Paperclip:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">User</span> <span class="o"><</span> <span class="no">ApplicationRecord</span>
<span class="n">has_attached_file</span> <span class="ss">:image</span>
<span class="n">validates_attachment</span> <span class="ss">:avatar</span><span class="p">,</span> <span class="ss">presence</span><span class="p">:</span> <span class="kp">true</span><span class="p">,</span>
<span class="ss">content_type</span><span class="p">:</span> <span class="s2">"image/jpeg"</span><span class="p">,</span>
<span class="ss">size</span><span class="p">:</span> <span class="p">{</span> <span class="k">in</span><span class="p">:</span> <span class="mi">0</span><span class="o">..</span><span class="mi">10</span><span class="o">.</span><span class="n">kilobytes</span> <span class="p">}</span>
<span class="k">end</span></code></pre></figure>
<p>On another side, we start by installing ActiveStorage. Normally, Rails 6.1 already comes with it, so all we need is run:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">rails</span> <span class="ss">active_storage</span><span class="p">:</span><span class="n">install</span></code></pre></figure>
<p>ActiveStorage creates three database tables ActiveStorageBlobs table storing attachment metadata, the ActiveStorageAttachments table, which is a polymorphic table between the blobs table and rails model and the ActiveStorageVariantRecords table tracks the presence of variant in the database. ActiveStorage doesn’t come with validations. we found some outside gems, including <a href="https://github.com/igorkasyanchuk/active_storage_validations">active_storage_validations</a> which works for us.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">create_table</span> <span class="ss">:active_storage_blobs</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="ss">:key</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="ss">:filename</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="ss">:content_type</span>
<span class="n">t</span><span class="o">.</span><span class="n">text</span> <span class="ss">:metadata</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="ss">:service_name</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">bigint</span> <span class="ss">:byte_size</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="ss">:checksum</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">datetime</span> <span class="ss">:created_at</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">index</span> <span class="o">[</span> <span class="ss">:key</span> <span class="o">]</span><span class="p">,</span> <span class="ss">unique</span><span class="p">:</span> <span class="kp">true</span>
<span class="k">end</span>
<span class="n">create_table</span> <span class="ss">:active_storage_attachments</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">references</span> <span class="ss">:record</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span><span class="p">,</span> <span class="ss">polymorphic</span><span class="p">:</span> <span class="kp">true</span><span class="p">,</span> <span class="ss">index</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">references</span> <span class="ss">:blob</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">datetime</span> <span class="ss">:created_at</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">index</span> <span class="o">[</span> <span class="ss">:record_type</span><span class="p">,</span> <span class="ss">:record_id</span><span class="p">,</span> <span class="ss">:name</span><span class="p">,</span> <span class="ss">:blob_id</span> <span class="o">]</span><span class="p">,</span> <span class="nb">name</span><span class="p">:</span> <span class="s2">"index_active_storage_attachments_uniqueness"</span><span class="p">,</span> <span class="ss">unique</span><span class="p">:</span> <span class="kp">true</span>
<span class="n">t</span><span class="o">.</span><span class="n">foreign_key</span> <span class="ss">:active_storage_blobs</span><span class="p">,</span> <span class="ss">column</span><span class="p">:</span> <span class="ss">:blob_id</span>
<span class="k">end</span>
<span class="n">create_table</span> <span class="ss">:active_storage_variant_records</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="o">.</span><span class="n">belongs_to</span> <span class="ss">:blob</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span><span class="p">,</span> <span class="ss">index</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">string</span> <span class="ss">:variation_digest</span><span class="p">,</span> <span class="ss">null</span><span class="p">:</span> <span class="kp">false</span>
<span class="n">t</span><span class="o">.</span><span class="n">index</span> <span class="o">%</span><span class="n">i</span><span class="o">[</span> <span class="n">blob_id</span> <span class="n">variation_digest</span> <span class="o">]</span><span class="p">,</span> <span class="nb">name</span><span class="p">:</span> <span class="s2">"index_active_storage_variant_records_uniqueness"</span><span class="p">,</span> <span class="ss">unique</span><span class="p">:</span> <span class="kp">true</span>
<span class="n">t</span><span class="o">.</span><span class="n">foreign_key</span> <span class="ss">:active_storage_blobs</span><span class="p">,</span> <span class="ss">column</span><span class="p">:</span> <span class="ss">:blob_id</span>
<span class="k">end</span></code></pre></figure>
<p>Here’s how it would go for a <code>User</code> with an <code>image</code>, that is this in ActiveStorage:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">User</span> <span class="o"><</span> <span class="no">ApplicationRecord</span>
<span class="n">has_one_attached</span> <span class="ss">:image</span>
<span class="n">validates</span> <span class="ss">:image</span><span class="p">,</span> <span class="ss">attached</span><span class="p">:</span> <span class="kp">true</span><span class="p">,</span>
<span class="ss">content_type</span><span class="p">:</span> <span class="s1">'image/png'</span><span class="p">,</span>
<span class="ss">size</span><span class="p">:</span> <span class="p">{</span> <span class="k">in</span><span class="p">:</span> <span class="mi">0</span><span class="o">..</span><span class="mi">10</span><span class="o">.</span><span class="n">kilobytes</span> <span class="p">}</span>
<span class="k">end</span></code></pre></figure>
<p>Let’s deep dive into the two steps we adopted, <strong>Migrated Paperclip</strong> data and <strong>Adopted ActiveStorage</strong></p>
<h4 id="migrated-paperclip-data"><strong>Migrated Paperclip Data</strong></h4>
<p>In this step, we did the most crucial part of the process, running a rake job to migrate paperclip data to active storage tables. We kept everything from the Paperclip as it is and, we also added support for Active Storage. We were using both functionalities at the same time. During the time, attachments for the model were migrated from Paperclip to ActiveStorage if a user decides to upload any attachments, the user still uses the paperclip implementation, but in the background after the successful commit of all transaction related to Paperclip. We were duplicating the same attachment to active storage by using Active Record Callback after_commit.</p>
<h4 id="what-does-our-rake-task-flow-look-like"><strong>What does our rake task flow look like?</strong></h4>
<p>In this step, we created a rake task that copies all the data produced by Paperclip to the new ActiveStorage format.</p>
<ul>
<li>Firstly, we pushed every column_name matching the Regex containing the file_name into the array. For example, we have a UserSignature model having a column image_file_name.</li>
<li>Secondly, for each instance of the model, created an ActiveStorage record only if ActiveStorage doesn’t contain a record for that instance. The reason for this is that for some reason, we cancel our rake task or it gets crashes, we had a choice to restart it from the place where it left off.</li>
<li>So for each instance, we were first constructing the direct URL of the attachment. Direct URL is the Amazon s3 URL to download the attachment from Amazon s3. We then pass on this direct URL to ActiveStorage::Blob create_and_upload! Method, which first downloads it and re-upload it to the s3 bucket. We then created the associated polymorphic ActiveStorage record.</li>
</ul>
<h4 id="what-challenges-did-we-face-running-rake-tasks"><strong>What challenges did we face running rake tasks?</strong></h4>
<p>At eLitmus, models with CDN bucket configurations have less than 20 thousand records. For models with a limited number of records above approach works well for us. It looks quite straightforward for us. As soon we started migrating the Default bucket, with each model with records greater than 50,000, problems came arising. We started with records in increasing order of their count. For the Default bucket, we started our journey with 56,000 records by following the approach mentioned above. It took around more than 4 hours to migrate 56,000 in a staging environment. We can’t afford to wait for hours to migrate 56,000 attachments. So we had to come up with a different approach and, this is where things become interesting.</p>
<p>After all the specs, we found that in the above approach, we have an open URI to download the attachment from Amazon s3 and re-upload it to the s3 bucket in the transaction that prolonged the database connection time. We came up with a different approach by designing our rake task; in such a way that instead of hitting s3 of every record, we decided to just come up with a database migration that copies all of the data generated from the paperclip to the new Active Storage required format. Paperclip adds attachment columns directly to the model’s tables such as image_file_name, image_content_type, image_updated_at, image_file_size. ActiveStorage stores this information in two dedicated tables ActiveStorageBlobs table and ActiveStorageAttachments table.</p>
<p>We loop through the records of the model and then through each attachment definition within the model. If the model record doesn’t have an uploaded attachment, skip to the next record. Otherwise, we converted the Paperclip data to ActiveStorage records. We set the values for the new ActiveStorage records based on the data from Paperclip’s field for the ActiveStorageBlobs table.</p>
<p>For the records with limited numbers, less than 1,00,000 approach works well for us. It took only 8 minutes to migrate 96,000 records. Our next target was to migrate around 4,50,000 migrate. We started migrating with the same approach we used for 96,00,000. But things do not go as straightforward. While migrating 4,50,000 maximum number of records in our Paperclip data had missing file size. As ActiveStorageBlobs table byte_size is the required field, We had to hit s3 API to fetch file size. It took around 4 hours in staging to migrate. On optimizing the rake task, we came up with another approach instead of reading data from a Paperclip column and then writing them to ActiveStorageBlobs at, same time. We decided to first read all the data from the Paperclip and then write it back to ActiveStorage. Firstly we read all the data from paperclip model columns and made them compatible with ActiveStorage Required format in CSV. Then we write data from CSV to ActiveStorage tables. It took 2 hours for us to migrate 4,50,000 records in production.
With the same approach next, we migrated around 14,00,000 records and, it took 45 minutes in staging and 18 hours in production.</p>
<h4 id="adopted-active-storage"><strong>Adopted Active Storage</strong></h4>
<p>After the job finished, we removed everything related to the paperclip and replaced its usage with active storage. We updated config files, added Amazon s3 storage definitions to storage.yml, and removed paperclip configuration for attachments related to the model. Updated model, views, and controllers related to Active Record. The red, green, and refactor approach helped us to improve confidence that our code was working as expected.</p>
<h4 id="what-challenges-did-we-face-during-migration"><strong>What challenges did we face during migration?</strong></h4>
<ul>
<li>Paperclip provides us several validators to validate our attachments. Out of the box, ActiveStorage doesn’t come with validations. We need to write custom validations in ActiveStorage, to add simple validations for attachments to validate presence, content type, attachment size. After some research, we found some outside gems, including <a href="https://github.com/igorkasyanchuk/active_storage_validations">active_storage_validations</a>, provide us validators as Paperclip. As ActiveStorage is evolving day by day, validations are on the to-do list of the rails community. As soon as it is released, we will be ready to get the outside gem replaced.</li>
<li>At eLitmus, we were using two Amazon s3 buckets - default bucket and CDN bucket, to store our attachments. Paperclip provides us functionality to store attachments on different buckets by giving an option bucket name while uploading attachment data. We started migrating from Paperclip to ActiveStorage with our application rails version 6.0. In Rails 6.0, there was no such tool to categorize the bucket name while uploading an attachment. Almost half of the models in our application are using CDN bucket, and the rest are using default bucket. The Rails community is behind the ActiveStorage in the rails version 6.1 service column was introduced in the ActiveStorageBlobs table for categorizing the bucket name while uploading an attachment. So we migrated the first CDN bucket attachment with rails version 6.0. Then we upgraded our rails version to 6.1 and migrated the other half records to the default bucket.</li>
<li>After the migration of 14,00,000 records after a week, we encountered a bug in production around 500, records key were missing from the amazon s3 bucket. After few hours of debugging, we found that between the time, 1st and 2nd MR’s merge in production. During, this period we kept everything from the paperclip as it is we, also added support for Active Storage. We were using both functionalities at the same time. During the time attachment for the model were migrated from paperclip to active storage, if a user decides to upload any attachments, the user still uses the paperclip implementation, but in the background after the successful commit of all transaction related to paperclip. We were duplicating the same attachment to active storage by using Active Record Callback after_commit. We produce the bug when the user uploads the attachment with the same filename as in our database before the migration process. We accidentally deleted the record’s key from amazon s3. After specs and debugging we, came up with a solution to recover these deleted files from amazon s3. We created a new rake task for recovering the deleted files from s3 by deleting the latest delete markers version for the key from s3. And all files were successfully recovered and working fine now on production.</li>
<li>After three weeks, we encountered another problem in production. Some of our users reported to us with queries that some of them were having problems uploading a resume. After specs and analysis, we figure out that for around one thousand resume records, there were two ActiveStorage attachments for them in ActiveStorage tables. As ActiveStorage works on the principle that for one ActiveRecord object, there will be one ActiveStorage attachment for has_one_attached relationship. During specs, one more problem comes to our front that on our database there were around 3 thousand active storage attachments with missing resume ActiveRecord objects. After deep-diving into the codebase, we figured out that due to our daily cron job, which deletes all inactive users from our database. So for the past three weeks, this job was deleting all the ActiveRecord objects but not ActiveStorage Attachments. On the solution part, we first decided to restrict inactive users to upload the attachments without activating their accounts and updated cron jobs to delete all the ActiveStorage attachments associated with the ActiveRecord object whenever it is deleted. On the other hand, to match the same number of our ActiveRecord objects and ActiveStorage attachments for resumes, we created three rake tasks. The first one to remove all attachments except the latest one from the ActiveStorage tables for an ActiveRecord object with more than one attachment in ActiveStorage tables. The second one, to filter out all the active storage of type resumes which doesn’t have any records for them in the resume table. And saved active storage attachment ID and resume ID in CSV. The third one, that processes CSV generated in the second rake task and deletes all the active storage records associated with them from active storage tables. It took around 15-20 min to run all three rake tasks. As a result of it, both the ActiveRecord and ActiveStorage number matched. Now, it’s running fine on production. We have not received any queries yet.</li>
</ul>
<h3 id="conclusion"><strong>Conclusion</strong></h3>
<p>ActiveStorage has now been in production for over a week, and it’s been seamless. It provided us everything we needed though they are certainly more things that need to be evolved validations for attachments, supporting directory structure for active storage blob key. Looking Forward to seeing active storage evolve. And this will conclude our journey regarding migration from paperclip to ActiveStorage.</p>
<p><a href="/technology/migration-from-paperclip-to-activestorage/">Migration from Paperclip to ActiveStorage</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on May 21, 2021.</p>/technology/revamp-of-our-coding-platform2021-05-17 20:30:00 +0530T00:00:00-00:002021-05-17T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p><strong><em>The story of how we took a good platform and made it even better</em></strong></p>
<p>Before I start telling you this story, I want to just make this clear that this is not filled with technical details of our implementation but rather with the thought process and the journey of redeveloping our coding platform. I will definitely share my learnings and some technical details of this whole endeavour in later blog posts.</p>
<p>So, this story starts in October 2019, with me looking at a web application that I inherited from previous developers at the organisation that I had joined 6 months earlier and thinking to myself that, “Here we have a perfectly functional web application that does it’s job pretty well, but still why does it feel so underwhelming and out of place on the modern web”.</p>
<h3 id="realization">Realization</h3>
<p>What I realized after 2 days of pondering on this topic was that, with the way web applications and their popularity has been growing in our times, The UX had become as important as the function of the web application.</p>
<p>If that was not clear, then let me explain further. In broad terms we can breakdown the components of a web application into 2 areas -</p>
<ul>
<li>Back End or Server Side components</li>
<li>Front End or User Side components</li>
</ul>
<p>The <strong>Back End</strong> controls the functions of the web application, what it can do and how efficiently it can do that task.</p>
<p>While the <strong>Front End</strong> dictates the interaction between the user and the application.</p>
<p>Now, both of these components need to be as good as the other one to ensure that your web application provides a seamless experience. In the case of our coding platform, this was not true, as we had a great <strong>Back End</strong> implementation but the <strong>Front End</strong> felt like it was still stuck in the early 2010s.</p>
<p><img src="/blog/images/revamp-of-our-coding-platform/codelitmus_old_dashboard.png" alt="Codelitmus Old Dashboard" />
<img src="/blog/images/revamp-of-our-coding-platform/codelitmus_old_pl.png" alt="Codelitmus Old Dashboard" />
<img src="/blog/images/revamp-of-our-coding-platform/codelitmus_old_editor.png" alt="Codelitmus Old Dashboard" /></p>
<h3 id="identify-the-issue">Identify the Issue</h3>
<p>I knew I wanted to change this platform, but it was important to focus on a few specifics instead of getting bogged down by all the things I wanted to improve. So, I sat down with my colleague <a href="https://www.shubhampandey.in">Shubham Pandey</a> (Please do checkout his blog and website. He has some amazing stuff on there) and we tried to categorise the problems in the platform under a few broad umbrellas.</p>
<p><strong>Experience</strong> - We used this category to encompass all the problems that were related to causing an inconvenience to the user who was using our platform.
Some of the problems we put under this category were things like the user not being able to see the list of problems while coding, the user’s event time starting before they can see the editor, not being able to see the result and the problem statement at one time and a few more things similar to these problems.</p>
<p><strong>Interface</strong> - We brought all the issues regarding design, layouts and colours on the platform under this category.
Problems like the text being too small in some places, buttons not being of a standard size, the event timer not eye catching feature of the design and again a few more problems similar to these ones.</p>
<p>The actual list was a lot longer than mentioned here but, all of them importantly came under these two broad categories.</p>
<h3 id="setting-objectives">Setting Objectives</h3>
<p>Now that we had our problem well-defined, we could move on to coming up with a plan of action to solve these problems. To solve these problems we started thinking like a user who had minimal technical background to give us a set of objectives.</p>
<p>One of the biggest issues we noticed was the number of clicks that a user had to go through to reach the problem and start coding. On the old platform, a user had to go through the following steps to start their event:</p>
<p><code>=> Login</code></p>
<p><code>=> Find event on dashboard</code></p>
<p><code>=> Click on "Load Challenge"</code></p>
<p><code>=> Find/Select a problem from the list</code></p>
<p><code>=> Click on "Start"</code></p>
<p><code>=> Start Coding</code></p>
<p>This was a lot of clicks to start an event on a platform dedicated to hosting coding events and we needed to reduce this as every click meant a complete page reload.</p>
<blockquote>
<p><strong>Objective 1:</strong> Reduce the amount of clicks a user needs to reach the Coding Test</p>
</blockquote>
<blockquote>
<p><strong>Objective 2:</strong> Minimise Page Reloads</p>
</blockquote>
<p>Another issue was the dated look and feel of the UI. It did not feel slick or intuitive. This might have been a very good UI by 2012 standards but for 2019 it was not up to the mark.</p>
<blockquote>
<p><strong>Objective 3:</strong> Modernise the UI</p>
</blockquote>
<p>We found another issue with the editor we were using on the platform. We used Codemirror on the older platform which although was a good editor, had a few problems that were holding it back. The size of the library was huge, we had to load multiple script tags to access the full set of features, few editing options were missing and some more.</p>
<p><strong>P.S :</strong> After the recent Codemirror 6 updates some of these problems were solved but at that time there was no confirmation if that would be the case.</p>
<blockquote>
<p><strong>Objective 4:</strong> Use a featureful coding editor with long term support</p>
</blockquote>
<p>So, these were the 4 objectives that we set out to achieve in the first version of our new coding platform. Even though this was technically an overhaul of an existing project, we had started calling it “new” so that we start thinking for solutions from scratch instead of just updating a few things and complicating the whole code base and the project even more.</p>
<h3 id="plan-of-action-and-execution">Plan of Action and Execution</h3>
<p>To achieve our 4 objectives, we selected the following libraries and plugins and I will also briefly explain why we opted for these:</p>
<ol>
<li>REACT</li>
<li>Bootstrap</li>
<li>Monaco Editor</li>
</ol>
<p>To achieve <strong>Objectives 1 & 2</strong>.
We decided that we had to change flow of the user journey on the platform.
This was the only way that we could reduce the amount of clicks on the platform and for minimising page reloads, REACT came to our rescue.</p>
<p>REACT allowed us to develop, what we call a SPA (Single Page Application) quite easily and without much hassle.
I will explain the specific use cases and advantages of a SPA in a future blog post.
Also an added benefit was that REACT had a pretty simple integration with our existing application which is a Ruby on Rails based web application. We integrated REACT into our Rails 5.2 application using the webpacker gem.
After Rails 6 the webpacker gem now comes as standard with Rails so using REACT as front end for a Rails application has become easier now.</p>
<p>Bootstrap is a very popular library that makes developing beautiful UIs very simple with its plethora of classes and functionality that it offers. So, that was a very obvious choice to achieve <strong>Objective 3</strong>.</p>
<p>And lastly Monaco Editor is also a very popular and well-supported coding editor. It is being officially maintained by Microsoft and contains a lot of features that Virtual Studio Code Editor provides on a desktop. That makes it an obvious choice when we were deciding on an editor to use for our platform to achieve <strong>Objective 4</strong>.</p>
<p>Now you can check out the redeveloped platform and see how we executed our plan.</p>
<p><img src="/blog/images/revamp-of-our-coding-platform/codelitmus_new_dashboard.png" alt="Codelitmus New Dashboard" />
<img src="/blog/images/revamp-of-our-coding-platform/codelitmus_new_editor_pl.png" alt="Codelitmus New Dashboard" />
<img src="/blog/images/revamp-of-our-coding-platform/codelitmus_new_editor_pd.png" alt="Codelitmus New Dashboard" /></p>
<p>Remember, the number of clicks the older platform required to get to the actual coding? That has been reduced to the following now in this new platform:</p>
<p><code>=> Login</code></p>
<p><code>=> Find event on dashboard</code></p>
<p><code>=> Click on "Load Challenge"</code></p>
<p><code>=> Start Event</code></p>
<p>That’s it. Everything was compressed into a single page to provide a more intuitive and easy to use coding platform that would allow the candidate to focus on coding more than worrying about other things. We tried to make everything else like time, problem list, result etc. available at a glance whenever the candidate needs it.</p>
<p>And if you are wondering, we did add a “Dark Mode” also, which has become quite the rage nowadays in modern web design. Notice the sun and moon icons on the right edge of the top bar that denoted the Light and Dark Modes respectively.</p>
<p><img src="/blog/images/revamp-of-our-coding-platform/codelitmus_new_dashboard_dark.png" alt="Codelitmus New Dashboard Dark" />
<img src="/blog/images/revamp-of-our-coding-platform/codelitmus_new_editor_pl_dark.png" alt="Codelitmus New Dashboard Dark" />
<img src="/blog/images/revamp-of-our-coding-platform/codelitmus_new_editor_pd_dark.png" alt="Codelitmus New Dashboard Dark" /></p>
<p>So, that was the story of how we did a complete overhaul of our coding platform to make it fit for the modern web.
It took us about 2 months to complete this project, from coming up with the concept, finalising the technical specifications, development, testing and finally deployment.</p>
<p>The process that we followed is what I still use whenever I have to come up with a solution to any problem. That is probably the biggest learning that I took from this project along with learning REACT and developing Single Page Applications that I use quite a lot now.</p>
<blockquote>
<p>You can find this article on the author’s blog <a href="https://piyushswain.github.io/blog">piyushswain.github.io</a> as well.</p>
</blockquote>
<p><a href="/technology/revamp-of-our-coding-platform/">Revamp of our Coding Platform</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on May 17, 2021.</p>/technology/migrating-from-state-machine-to-aasm-in-rails2018-06-29 16:29:40 +0530T00:00:00-00:002018-06-29T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p><strong><em>First things first. State machines are awesome, be it any part of technology you use them in.</em></strong></p>
<p>Recently at work, we passed many pipelines on migrating a very large Rails app from Rails 4 to Rails 5. One of the major parts of this change was shifting from <code>state_machine</code> to <code>aasm</code> for our state transitions. We rely heavily on state machines for how our instances shift states. Much of our tasks associated with the models too are integrated with the after/before actions of our state machines.</p>
<p><img src="/blog/images/aasm_migration/state_machine_diagram.png" alt="Generated using https://github.com/Katee/aasm-diagram" /></p>
<h3 id="need-for-transition">Need for transition:</h3>
<p>One and only one reason, <a href="https://github.com/pluginaweek/state_machine"><code>state_machine</code></a> has been dead, and for quite some time. We shifted from Rails 3.2 to Rails 4.2 last year, and since it was a really, really painful migration, we fixed our focus on changed syntax and <code>ActiveJob</code>, found the much famous <a href="https://github.com/pluginaweek/state_machine/issues/334#issuecomment-68168119">monkeypatch</a> for Rails 4.2 and stayed happy for the time being with state_machine. Though there is <a href="https://github.com/state-machines/state_machines-activerecord">state_machines_activerecord</a>, we wanted to move to a more reliable and tested library, and as we already use <a href="https://github.com/state-machines/state_machines-activerecord">acts_as_state_machine</a> or <code>aasm</code> in one of our other projects, we tried and gave it a shot, when we began our Rails 5 voyage, for which of course neither state_machine and its patch worked, nor it was recommended.</p>
<h3 id="what-changed">What changed:</h3>
<p>As it turned out, the process was not too messy. After a small study of the way both state_machine and aasm handle state transitions, one can easily find an analogy. Here are a few things which usually are a part of a state_machine laden project and how they should be modified to work with aasm</p>
<p><strong>1. The gem itself</strong></p>
<p>Goes without saying, remove from your <code>Gemfile/gems.rb</code> :</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">gem</span> <span class="s1">'state_machine'</span></code></pre></figure>
<p>and add :</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">gem</span> <span class="s1">'aasm'</span></code></pre></figure>
<p><strong>2. Get rid of the state_machine monkey-patch if present</strong></p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">module</span> <span class="nn">StateMachine</span>
<span class="k">module</span> <span class="nn">Integrations</span>
<span class="k">module</span> <span class="nn">ActiveModel</span>
<span class="kp">public</span> <span class="ss">:around_validation</span>
<span class="k">end</span>
<span class="k">module</span> <span class="nn">ActiveRecord</span>
<span class="kp">public</span> <span class="ss">:around_save</span>
<span class="k">def</span> <span class="nf">define_state_initializer</span>
<span class="n">define_helper</span> <span class="ss">:instance</span><span class="p">,</span> <span class="o"><<-</span><span class="dl">end_eval</span><span class="p">,</span> <span class="bp">__FILE__</span><span class="p">,</span> <span class="bp">__LINE__</span> <span class="o">+</span> <span class="mi">1</span>
<span class="sh"> def initialize(*)</span>
<span class="sh"> super do |*args|</span>
<span class="sh"> self.class.state_machines.initialize_states self</span>
<span class="sh"> yield(*args) if block_given?</span>
<span class="sh"> end</span>
<span class="sh"> end</span>
<span class="dl"> end_eval</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span></code></pre></figure>
<p>Yes, get rid of this if you have it, most probably in one of your <code>config/initializers</code>.</p>
<p><strong>3. Transitioning the transitions:</strong></p>
<p>This is the major part of the change and yet the easiest to implement. This includes code change in models. Take a look at the documentation over at aasm and start changing the code. Here are a few pointers.</p>
<p>add <code>include AASM</code> to your model</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">Question</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">AASM</span>
<span class="o">...</span>
<span class="k">end</span></code></pre></figure>
<p>specify the column name on which you are observing state transitions, for eg. if the column name is <code>state</code></p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">Question</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">AASM</span>
<span class="o">...</span>
<span class="n">aasm</span><span class="o">.</span><span class="n">attribute_name</span> <span class="ss">:state</span>
<span class="o">...</span>
<span class="k">end</span></code></pre></figure>
<p>Initiate your state machine block by listing out all your states. The common way is using one line to specify your initial state, and a second line to list all your non-initial states</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">Question</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">AASM</span>
<span class="o">...</span>
<span class="n">aasm</span><span class="o">.</span><span class="n">attribute_name</span> <span class="ss">:state</span>
<span class="n">aasm</span> <span class="k">do</span>
<span class="n">state</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="kp">true</span>
<span class="n">state</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:non_active</span><span class="p">,</span> <span class="ss">:active</span><span class="p">,</span> <span class="ss">:removed</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span></code></pre></figure>
<p>Convert your events. All event blocks of the form transition <code>:a => :b</code> will be replaced by transitions <code>from: :a, to: :b</code></p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">Question</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">AASM</span>
<span class="o">...</span>
<span class="c1"># State machine code</span>
<span class="n">state_machine</span> <span class="ss">:state</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="ss">:authored</span> <span class="k">do</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">transition</span> <span class="ss">:authored</span> <span class="o">=></span> <span class="ss">:piloted</span>
<span class="k">end</span>
<span class="n">event</span> <span class="ss">:activate</span> <span class="k">do</span>
<span class="n">transition</span> <span class="o">[</span><span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:non_active</span><span class="o">]</span> <span class="o">=></span> <span class="ss">:active</span>
<span class="k">end</span>
<span class="o">..</span>
<span class="k">end</span>
<span class="c1"># AASM code</span>
<span class="n">aasm</span><span class="o">.</span><span class="n">attribute_name</span> <span class="ss">:state</span>
<span class="n">aasm</span> <span class="k">do</span>
<span class="n">state</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="kp">true</span>
<span class="n">state</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:non_active</span><span class="p">,</span> <span class="ss">:active</span><span class="p">,</span> <span class="ss">:removed</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">transitions</span> <span class="ss">from</span><span class="p">:</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">to</span><span class="p">:</span> <span class="ss">:piloted</span>
<span class="k">end</span>
<span class="n">event</span> <span class="ss">:activate</span> <span class="k">do</span>
<span class="n">transitions</span> <span class="ss">from</span><span class="p">:</span> <span class="o">[</span><span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:non_active</span><span class="o">]</span><span class="p">,</span> <span class="ss">to</span><span class="p">:</span> <span class="ss">:active</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span></code></pre></figure>
<p>Callbacks like <code>before_transition</code> and <code>after_transition</code> from state_machine can be converted like this:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">Question</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">AASM</span>
<span class="o">...</span>
<span class="c1"># State machine code</span>
<span class="n">state_machine</span> <span class="ss">:state</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="ss">:authored</span> <span class="k">do</span>
<span class="n">before_transition</span> <span class="ss">:authored</span> <span class="o">=></span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:do</span> <span class="o">=></span> <span class="ss">:prepare_cockpit</span>
<span class="n">after_transition</span> <span class="ss">:authored</span> <span class="o">=></span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:do</span> <span class="o">=></span> <span class="ss">:fly_the_plane</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">transition</span> <span class="ss">:authored</span> <span class="o">=></span> <span class="ss">:piloted</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="c1"># AASM code</span>
<span class="n">aasm</span><span class="o">.</span><span class="n">attribute_name</span> <span class="ss">:state</span>
<span class="n">aasm</span> <span class="k">do</span>
<span class="n">state</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="kp">true</span>
<span class="n">state</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:non_active</span><span class="p">,</span> <span class="ss">:active</span><span class="p">,</span> <span class="ss">:removed</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">before</span> <span class="k">do</span>
<span class="n">prepare_cockpit</span>
<span class="k">end</span>
<span class="n">transitions</span> <span class="ss">from</span><span class="p">:</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">to</span><span class="p">:</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">after</span><span class="p">:</span> <span class="ss">:fly_the_plane</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">prepare_cockpit</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">fly_the_plane</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="k">end</span></code></pre></figure>
<p>However, in case of callbacks on a part of a transitions defined inside an event, one needs to define the transitions separately</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">Question</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">AASM</span>
<span class="o">...</span>
<span class="c1"># State machine code</span>
<span class="n">state_machine</span> <span class="ss">:state</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="ss">:authored</span> <span class="k">do</span>
<span class="n">after_transition</span> <span class="ss">:authored</span> <span class="o">=></span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:do</span> <span class="o">=></span> <span class="ss">:fly</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">transition</span> <span class="o">[</span><span class="ss">:inactive</span><span class="p">,</span> <span class="ss">:authored</span><span class="o">]</span> <span class="o">=></span> <span class="ss">:piloted</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="c1"># AASM code</span>
<span class="n">aasm</span><span class="o">.</span><span class="n">attribute_name</span> <span class="ss">:state</span>
<span class="n">aasm</span> <span class="k">do</span>
<span class="n">state</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="kp">true</span>
<span class="n">state</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:non_active</span><span class="p">,</span> <span class="ss">:active</span><span class="p">,</span> <span class="ss">:removed</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">transitions</span> <span class="ss">from</span><span class="p">:</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">to</span><span class="p">:</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">after</span><span class="p">:</span> <span class="ss">:fly</span>
<span class="n">transitions</span> <span class="ss">from</span><span class="p">:</span> <span class="ss">:inactive</span><span class="p">,</span> <span class="ss">to</span><span class="p">:</span> <span class="ss">:piloted</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">fly</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="k">end</span></code></pre></figure>
<p><code>if</code> and <code>unless</code> guard blocks on transitions work the same way as in state_machine, and can also be substituted with a guard clause. The guards as well as callbacks can take arguments, <code>lambda</code> as well as <code>Proc</code>, same as the state machine guards</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="k">class</span> <span class="nc">Question</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span>
<span class="kp">include</span> <span class="no">AASM</span>
<span class="o">...</span>
<span class="c1"># State machine code</span>
<span class="n">state_machine</span> <span class="ss">:state</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="ss">:authored</span> <span class="k">do</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">transition</span> <span class="ss">:authored</span> <span class="o">=></span> <span class="ss">:piloted</span><span class="p">,</span> <span class="k">if</span><span class="p">:</span> <span class="ss">:can_fly?</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="c1"># AASM code</span>
<span class="n">aasm</span><span class="o">.</span><span class="n">attribute_name</span> <span class="ss">:state</span>
<span class="n">aasm</span> <span class="k">do</span>
<span class="n">state</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">initial</span><span class="p">:</span> <span class="kp">true</span>
<span class="n">state</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">:non_active</span><span class="p">,</span> <span class="ss">:active</span><span class="p">,</span> <span class="ss">:removed</span>
<span class="n">event</span> <span class="ss">:pilot</span> <span class="k">do</span>
<span class="n">transitions</span> <span class="ss">from</span><span class="p">:</span> <span class="ss">:authored</span><span class="p">,</span> <span class="ss">to</span><span class="p">:</span> <span class="ss">:piloted</span><span class="p">,</span> <span class="ss">guard</span><span class="p">:</span> <span class="ss">:can_fly?</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">can_fly?</span>
<span class="o">...</span>
<span class="k">end</span>
<span class="k">end</span></code></pre></figure>
<p>Yes, that’s it for the models. You can take a detailed look at the docs if you have more complex needs.</p>
<p><strong>4. The helpers:</strong></p>
<p>One plus point for <code>state_machine</code> , it has/had a variety of useful helpers for making use of states and events in views and controllers. <code>aasm</code>, though lagging behind a little in this domain, still has a good pool of helpers, both <code>class</code> and <code>instance</code> to make good use of. Here are some pointers.</p>
<ul>
<li><code>Question.aasm.states</code> will give you an object list of all states available for the <code>Question</code> model</li>
<li><code>Question.aasm.events</code> will give you an object list of all events available for the <code>Question</code> model</li>
<li><code>Question.first.aasm.states</code> will give an object list of all states available for transitioning to for a <code>Question</code> object, in this case the first one.</li>
<li><code>Question.first.aasm.events</code> will give an object list of all events that can be applied on the current state of the <code>Question</code> object, i.e the first</li>
<li>All of the above helpers will produce an object list that contains name as the name of object, so appending <code>.map(&:name)</code> will give a symbol array of the name of objects, that will come handy in drop-downs. Eg.</li>
</ul>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">pry</span><span class="p">(</span><span class="n">main</span><span class="p">)</span><span class="o">></span> <span class="no">Question</span><span class="o">.</span><span class="n">last</span><span class="o">.</span><span class="n">aasm</span><span class="o">.</span><span class="n">events</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="o">&</span><span class="ss">:name</span><span class="p">)</span>
<span class="o">=></span> <span class="o">[</span><span class="ss">:pilot</span><span class="p">,</span> <span class="ss">:deactivate</span><span class="o">]</span></code></pre></figure>
<p>Another great point in favor of <code>state_machine</code> is its <code>state_event</code> attribute over the instance. For eg.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="n">pry</span><span class="p">(</span><span class="n">main</span><span class="p">)</span><span class="o">></span> <span class="n">question</span> <span class="o">=</span> <span class="no">Question</span><span class="o">.</span><span class="n">first</span>
<span class="n">pry</span><span class="p">(</span><span class="n">main</span><span class="p">)</span><span class="o">></span> <span class="n">question</span><span class="o">.</span><span class="n">state_event</span> <span class="o">=</span> <span class="ss">:deactivate</span>
<span class="n">pry</span><span class="p">(</span><span class="n">main</span><span class="p">)</span><span class="o">></span> <span class="n">question</span><span class="o">.</span><span class="n">save</span></code></pre></figure>
<p>The code above will end up saving the question after calling the <code>deactivate</code> event over it. This attribute is highly useful in rails forms where one can easily pass what event to call from, and the transition will happen without extra hassle. Unfortunately, there’s no equivalent attribute cum method in aasm . But one can always write a common <code>ActiveRecord::Base</code> helper for the same.</p>
<p>On another note, the not-so-good-looking <code>with_state</code> / <code>with_states</code> scope methods of <code>state_machine</code> can be replaced by the enum equivalent syntax of <code>aasm</code> . For eg.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="no">Question</span><span class="o">.</span><span class="n">with_state</span><span class="p">(</span><span class="ss">:active</span><span class="p">)</span> <span class="c1"># state_machine</span></code></pre></figure>
<p>gets replaced by a much cleaner :</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span> <span class="no">Question</span><span class="o">.</span><span class="n">active</span></code></pre></figure>
<p>So yes, a couple of tweaks here and there, and a good pool of existing test cases which run green, you are done and production ready. This will get you started, but do back yourself up with the aasm docs.</p>
<p><a href="/technology/migrating-from-state-machine-to-aasm-in-rails/">Migrating from state_machine to aasm in Rails</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on June 29, 2018.</p>/technology/android-versioning-using-docker-and-git-like-a-pro2018-06-10 12:25:27 +0530T00:00:00-00:002018-06-10T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p>Unlike web, android still lacks the ease of version deployments. Specially when you don’t want to use Play Store.</p>
<h3 id="introduction">Introduction</h3>
<p>There will be five stages:</p>
<ol>
<li>Signing application</li>
<li>Versioning of application. For that we gonna use git revision and Major.Minor.Patch naming convention.</li>
<li>Building application using a docker. So that running environment doesn’t change.</li>
<li>Pushing new release to s3, while maintaining the previous versions.</li>
<li>Pushing new tag to git, with the new version. So, we’ll have tags for each version.</li>
</ol>
<p>Basically, we gonna use docker, git, and some simple hacks to put things in work. In the end, I’ve shared a sample application.</p>
<h3 id="stage-1-signing-our-application"><em>Stage 1</em>: Signing Our Application</h3>
<p>It’s better to start thinking about security right from the big bang.
From android studio, you can generate a new keystore, a jks file. <a href="https://developer.android.com/studio/publish/app-signing">Help?</a>
Copy the keystore file details in a <em>config.yaml</em> file like below:</p>
<figure class="highlight"><pre><code class="language-yaml" data-lang="yaml"><span></span><span class="nt">key_store</span><span class="p">:</span>
<span class="nt">key</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">/xyz/xyz.jks</span>
<span class="nt">alias</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">key0</span>
<span class="nt">store_password</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">wuhoo</span>
<span class="nt">key_password</span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">nibataunga</span></code></pre></figure>
<p>Studio will take care of signing, but to generate signed apk from command line, you’ll need to make some changes in your build.gradle. The credentials we have put in above yaml file will be passed as command line args to gradle(Build stage[2]).</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span></span><span class="n">android</span> <span class="o">{</span>
<span class="o">...</span>
<span class="n">signingConfigs</span> <span class="o">{</span>
<span class="n">release</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">project</span><span class="o">.</span><span class="na">hasProperty</span><span class="o">(</span><span class="s1">'APP_RELEASE_STORE_FILE'</span><span class="o">))</span> <span class="o">{</span>
<span class="n">storeFile</span> <span class="nf">file</span><span class="o">(</span><span class="s2">"$APP_RELEASE_STORE_FILE"</span><span class="o">)</span>
<span class="n">storePassword</span> <span class="s2">"$APP_RELEASE_STORE_PASSWORD"</span>
<span class="n">keyAlias</span> <span class="s2">"$APP_RELEASE_KEY_ALIAS"</span>
<span class="n">keyPassword</span> <span class="s2">"$APP_RELEASE_KEY_PASSWORD"</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">buildTypes</span> <span class="o">{</span>
<span class="n">release</span> <span class="o">{</span>
<span class="o">...</span>
<span class="k">if</span> <span class="o">(</span><span class="n">project</span><span class="o">.</span><span class="na">hasProperty</span><span class="o">(</span><span class="s1">'APP_RELEASE_STORE_FILE'</span><span class="o">))</span> <span class="o">{</span>
<span class="n">signingConfig</span> <span class="n">signingConfigs</span><span class="o">.</span><span class="na">release</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<h3 id="stage-2-release-versioning-digging-git"><em>Stage 2</em>: Release Versioning, Digging Git</h3>
<p>I’am here using the <a href="https://semver.org/">semantic versioning</a>.</p>
<p>Major.Minor.<em>GitRevision</em>.Patch</p>
<p>Let’s dig into GitRevision
It counts the number of commits from git, so you’ll get incremental values everytime you release a new version. GitRevision will make versioning easy and consistent.</p>
<p>We’ll put the below code in build.gradle[app]</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span></span><span class="kt">def</span> <span class="n">getGitRevision</span> <span class="o">=</span> <span class="o">{</span> <span class="o">-></span>
<span class="k">try</span> <span class="o">{</span>
<span class="kt">def</span> <span class="n">stdout</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ByteArrayOutputStream</span><span class="o">()</span>
<span class="n">exec</span> <span class="o">{</span>
<span class="n">standardOutput</span> <span class="o">=</span> <span class="n">stdout</span>
<span class="n">commandLine</span> <span class="s1">'git'</span><span class="o">,</span> <span class="s1">'rev-list'</span><span class="o">,</span> <span class="s1">'--first-parent'</span><span class="o">,</span> <span class="s1">'--count'</span><span class="o">,</span> <span class="s1">'master'</span>
<span class="o">}</span>
<span class="n">logger</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s2">"Building revision #"</span><span class="o">+</span><span class="n">stdout</span><span class="o">)</span>
<span class="k">return</span> <span class="n">stdout</span><span class="o">.</span><span class="na">toString</span><span class="o">(</span><span class="s2">"ASCII"</span><span class="o">).</span><span class="na">trim</span><span class="o">().</span><span class="na">toInteger</span><span class="o">()</span>
<span class="o">}</span>
<span class="k">catch</span> <span class="o">(</span><span class="n">Exception</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
<span class="n">e</span><span class="o">.</span><span class="na">printStackTrace</span><span class="o">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>And in build.gradle[app]</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span></span> <span class="n">defaultConfig</span> <span class="o">{</span>
<span class="o">...</span>
<span class="n">versionCode</span> <span class="o">=</span> <span class="mi">10000000</span><span class="o">*</span><span class="n">majorVersion</span><span class="o">+</span><span class="mi">10000</span><span class="o">*</span><span class="n">minorVersion</span> <span class="o">+</span> <span class="mi">10</span><span class="o">*</span><span class="n">revision</span>
<span class="n">versionName</span> <span class="o">=</span> <span class="s1">'v'</span> <span class="o">+</span> <span class="n">majorVersion</span> <span class="o">+</span> <span class="s1">'.'</span> <span class="o">+</span> <span class="n">minorVersion</span> <span class="o">+</span> <span class="s1">'.'</span> <span class="o">+</span> <span class="n">revision</span> <span class="o">+</span> <span class="n">patch</span>
<span class="o">}</span></code></pre></figure>
<h3 id="docker-image-savage">Docker Image, Savage</h3>
<p>We first need to build a docker image with minimum libraries and dependencies required.</p>
<figure class="highlight"><pre><code class="language-docker" data-lang="docker"><span></span><span class="k">FROM</span> <span class="s">openjdk:8</span>
<span class="k">RUN</span> apt-get update
<span class="k">RUN</span> <span class="nb">cd</span> /opt/
<span class="k">RUN</span> wget -nc https://dl.google.com/android/repository/sdk-tools-linux-4333796.zip
<span class="k">ENV</span> ANDROID_HOME /opt/android-sdk-linux
<span class="k">RUN</span> mkdir -p <span class="si">${</span><span class="nv">ANDROID_HOME</span><span class="si">}</span>
<span class="k">RUN</span> unzip -n -d <span class="si">${</span><span class="nv">ANDROID_HOME</span><span class="si">}</span> sdk-tools-linux-4333796.zip
<span class="k">ENV</span> PATH <span class="si">${</span><span class="nv">PATH</span><span class="si">}</span>:<span class="si">${</span><span class="nv">ANDROID_HOME</span><span class="si">}</span>/tools:<span class="si">${</span><span class="nv">ANDROID_HOME</span><span class="si">}</span>/tools/bin:<span class="si">${</span><span class="nv">ANDROID_HOME</span><span class="si">}</span>/platform-tools
<span class="k">RUN</span> yes <span class="p">|</span> sdkmanager --licenses
<span class="k">RUN</span> yes <span class="p">|</span> sdkmanager <span class="se">\</span>
<span class="s2">"platform-tools"</span> <span class="se">\</span>
<span class="s2">"build-tools;27.0.3"</span> <span class="se">\</span>
<span class="s2">"platforms;android-27"</span>
<span class="k">RUN</span> apt-get -y install ruby
<span class="k">RUN</span> gem install trollop</code></pre></figure>
<p>Trollop will be helpful in compiling scripts, spicing the boring command line args.</p>
<p>We are using openjdk as base image for java environment and installed our sdk with version 27. You can change that accordingly.</p>
<h4 id="building-the-image">Building the image:</h4>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>docker build -t <span class="si">${</span><span class="nv">docker_image</span><span class="si">}</span> -f ./scripts/Dockerfile .</code></pre></figure>
<p>Or you can directly pull my latest base image.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>docker pull mukarramali98/androidbase</code></pre></figure>
<h3 id="docker-container-on-the-way">Docker container on the way</h3>
<p>To automate the process, let’s dig into a small script:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="ch">#!/usr/bin/env bash</span>
<span class="nb">set</span> -xeuo pipefail
<span class="nv">app_name</span><span class="o">=</span>xyz
<span class="nv">container_name</span><span class="o">=</span>androidcontainer
<span class="k">if</span> <span class="o">[</span> ! <span class="s2">"</span><span class="k">$(</span>docker ps -q -f <span class="nv">name</span><span class="o">=</span><span class="si">${</span><span class="nv">container_name</span><span class="si">}</span><span class="k">)</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
<span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="k">$(</span>docker ps -aq -f <span class="nv">status</span><span class="o">=</span>exited -f <span class="nv">name</span><span class="o">=</span><span class="si">${</span><span class="nv">container_name</span><span class="si">}</span><span class="k">)</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
<span class="c1"># cleanup</span>
docker rm <span class="nv">$container_name</span>
<span class="k">fi</span>
<span class="c1"># run your container</span>
docker run -v <span class="si">${</span><span class="nv">PWD</span><span class="si">}</span>:/<span class="si">${</span><span class="nv">app_name</span><span class="si">}</span>/ --name <span class="si">${</span><span class="nv">container_name</span><span class="si">}</span> -w /<span class="si">${</span><span class="nv">app_name</span><span class="si">}</span> -d -i -t mukarramali98/androidbase
<span class="k">fi</span>
docker <span class="nb">exec</span> <span class="si">${</span><span class="nv">container_name</span><span class="si">}</span> ruby /<span class="si">${</span><span class="nv">app_name</span><span class="si">}</span>/scripts/compile.rb -k /<span class="si">${</span><span class="nv">app_name</span><span class="si">}</span>/config.yaml</code></pre></figure>
<p>Here we first check if the container already exists. Then create accordingly.
While creating the container, we <em>mount</em> our current project directory. So next time we run this container, our updated project will already be there in the container.</p>
<h3 id="stage-3-running-container-build-stage"><em>Stage 3</em>: Running container, <em>Build Stage</em></h3>
<p>We run the container, with our compile script. Pass the signing config file we created earlier.</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span><span class="n">config</span> <span class="o">=</span> <span class="no">YAML</span><span class="o">.</span><span class="n">load_file</span><span class="p">(</span><span class="n">key_config_file</span><span class="p">)</span>
<span class="n">key_store</span> <span class="o">=</span> <span class="n">config</span><span class="o">[</span><span class="s1">'key_store'</span><span class="o">]</span>
<span class="n">output_file</span> <span class="o">=</span> <span class="s1">'app/build/outputs/apk/release/app-release.apk'</span>
<span class="sb">`rm </span><span class="si">#{</span><span class="n">output_file</span><span class="si">}</span><span class="sb">`</span> <span class="k">if</span> <span class="no">File</span><span class="o">.</span><span class="n">exists?output_file</span>
<span class="nb">puts</span> <span class="sb">`</span><span class="si">#{</span><span class="no">File</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="bp">__FILE__</span><span class="p">)</span><span class="si">}</span><span class="sb">/../gradlew assembleRelease --stacktrace \</span>
<span class="sb"> -PAPP_RELEASE_STORE_FILE=</span><span class="si">#{</span><span class="n">key_store</span><span class="o">[</span><span class="s1">'key'</span><span class="o">]</span><span class="si">}</span><span class="sb"> \</span>
<span class="sb"> -PAPP_RELEASE_KEY_ALIAS=</span><span class="si">#{</span><span class="n">key_store</span><span class="o">[</span><span class="s1">'alias'</span><span class="o">]</span><span class="si">}</span><span class="sb"> \</span>
<span class="sb"> -PAPP_RELEASE_STORE_PASSWORD='</span><span class="si">#{</span><span class="n">key_store</span><span class="o">[</span><span class="s1">'store_password'</span><span class="o">]</span><span class="si">}</span><span class="sb">' \</span>
<span class="sb"> -PAPP_RELEASE_KEY_PASSWORD='</span><span class="si">#{</span><span class="n">key_store</span><span class="o">[</span><span class="s1">'key_password'</span><span class="o">]</span><span class="si">}</span><span class="sb">'`</span></code></pre></figure>
<h3 id="stage-4-pushing-to-s3"><em>Stage 4</em>: Pushing to S3</h3>
<p>So, now we have build a signed apk from a docker container. It’s time to push them.
Connect with your s3 bucket and generate <em>$HOME/.s3cfg</em> file, and pass it to ruby script below:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span><span class="k">if</span> <span class="no">File</span><span class="o">.</span><span class="n">file?</span><span class="p">(</span><span class="n">s3_config</span><span class="p">)</span>
<span class="c1"># Push the generate apk file with the app and version name</span>
<span class="sb">`s3cmd put app/build/outputs/apk/release/app-release.apk s3://</span><span class="si">#{</span><span class="n">bucket</span><span class="si">}</span><span class="sb">/</span><span class="si">#{</span><span class="n">app_name</span><span class="si">}</span><span class="sb">-</span><span class="si">#{</span><span class="n">version_name</span><span class="si">}</span><span class="sb">.apk -m application/vnd.android.package-archive -f -P -c </span><span class="si">#{</span><span class="n">s3_config</span><span class="si">}</span><span class="sb">`</span>
<span class="c1"># application/vnd.android.package-archive is an apk file format descriptor</span>
<span class="c1"># Replace the previous production file</span>
<span class="sb">`s3cmd put app/build/outputs/apk/release/app-release.apk s3://</span><span class="si">#{</span><span class="n">bucket</span><span class="si">}</span><span class="sb">/</span><span class="si">#{</span><span class="n">app_name</span><span class="si">}</span><span class="sb">.apk -m application/vnd.android.package-archive -f -P -c </span><span class="si">#{</span><span class="n">s3_config</span><span class="si">}</span><span class="sb">`</span>
<span class="c1"># To keep the track of latest release</span>
<span class="sb">`echo </span><span class="si">#{</span><span class="n">version_code</span><span class="si">}</span><span class="sb">> latest_version.txt`</span>
<span class="sb">`s3cmd put latest_version.txt s3://</span><span class="si">#{</span><span class="n">bucket</span><span class="si">}</span><span class="sb">/latest_version.txt -f -P -c </span><span class="si">#{</span><span class="n">s3_config</span><span class="si">}</span><span class="sb">`</span>
<span class="sb">`rm latest_version.txt`</span>
<span class="nb">puts</span> <span class="s2">"Successfully released new app version."</span>
<span class="k">end</span></code></pre></figure>
<p><code>application/vnd.android.package-archive</code> is the apk file type descriptor.</p>
<h3 id="stage-5-finally-git-tagging-the-new-release-version-hashtag"><em>Stage 5</em>: Finally, Git Tagging The New Release Version, <em>#hashtag</em></h3>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span></span><span class="k">def</span> <span class="nf">push_new_tag</span> <span class="n">version_name</span>
<span class="sb">`git tag </span><span class="si">#{</span><span class="n">version_name</span><span class="si">}</span><span class="sb">`</span>
<span class="sb">`git push origin </span><span class="si">#{</span><span class="n">version_name</span><span class="si">}</span><span class="sb">`</span>
<span class="nb">puts</span> <span class="s2">"New tag pushed to repo."</span>
<span class="k">end</span></code></pre></figure>
<p><a href="https://github.com/mukarramali/android_deployment_example.git">Demo Application</a></p>
<p><a href="/technology/android-versioning-using-docker-and-git-like-a-pro/">Android Versioning Using Docker & Git Like A Pro</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on June 10, 2018.</p>/career/how-tough-is-it-to-score-well-in-board-exams2018-02-22 10:49:14 +0530T00:00:00-00:002018-02-22T00:00:00+05:30eLitmus.comsite-admin@elitmus.com
<p><em>I completed my entire schooling (Classes I through XII) at one of Kolkata’s favoured catholic schools. In those days, discipline and academic excellence were the primary parameters that mattered and my school checked both these boxes rather well.</em>
<!--more--></p>
<p>Trouble started brewing once I graduated to Class XI and started thinking about higher education, specifically opportunities at the national level. That’s when I truly realized the impact of the education board. In my case, the impact was limited to a couple of aspects, viz. a) subjects / topics not covered in the Bengal board syllabus, and b) the frugality in awarding marks.</p>
<p>Fortunately, some extra tuitions covered up for the former, while the latter did not come into play at all in any of the options I signed up for, or in the higher education option I finally opted for.</p>
<p>Times have changed….</p>
<p>For a few years, till 2016, 40% weightage was accorded to an applicant’s Class XII board marks in calculating her All India Rank in the JEE (entrance tests for admission to India’s flagship IITs, and a few other engineering schools) exams. However, since 2017, the rules were changed to treat the Class XII marks as a qualifying criterion: a minimum of 75% marks, or a rank in the top 20th percentile in the board.</p>
<p>The 75% cut-off may appear inconsequential to folks intimately familiar with the CBSE or ISCE boards, but not all students find it amusing. The JEE implementation committee publishes the 80th percentile cut-off marks for every higher secondary educational board in the country to level the playing field. Finally, we have access to data that clearly shows the disparity in awarding marks across boards in India.</p>
<p>According to data for the 2016 Class XII exams, the 5 most liberal boards are (percentages indicate the 80th percentile cut-off score):</p>
<ol>
<li>Telengana Board of Secondary Education (95%)</li>
<li>Andhra Pradesh Board of Intermediate Education (94%)</li>
<li>Council for the Indian School Certificate Examinations (88.6%)</li>
<li>Banasthali Vidyapeeth, Rajasthan (87.4%)</li>
<li>Tamil Nadu Board of Higher Secondary Education (87.2)</li>
</ol>
<p>While the 4 most frugal boards are:</p>
<ol>
<li>Tripura Board of Secondary Education (59.8%)</li>
<li>Jharkhand Academic Council (60.6%)</li>
<li>Meghalaya Board of Secondary Education (61.6%)</li>
<li>Odisha Council of Higher Secondary Education (62%)</li>
<li>Bihar Intermediate Education Council (63%)</li>
</ol>
<p>What this essentially means is that a student scoring 95% in the Telengana board exams is academically comparable to a student scoring 60.6% in the Jharkhand board exams, despite a whopping 34.6% gap is scores!</p>
<p>The data clearly shows how a single mark-based cut-off or a mark-based weightage criterion can result in gross injustice to students from boards that are frugal in awarding marks! Thankfully, the JEE implementation committee, in its infinite wisdom, has taken steps to normalize this inherent disparity.</p>
<p>Time will tell whether the practice of allotting an explicit or implicit weightage to board exam performance will become the norm, not only in JEE but in other national level entrance tests as well. But for now, this is definitely something for parents to consider while looking for a school for their children.</p>
<p>But what about employment? It is common practice among potential employers to set mark-based cut-offs for board exams (while hiring entry-level talent), among others. And in almost all cases, the cut-off is a single number applicable across the board (pun intended!).</p>
<p>Let’s say company X sets a Class XII marks cut-off at 75%. Referring to the 10 boards listed above, company X will end up considering a population far larger than the top quintile from to 5 most generous states, and a population far smaller than the top quintile in the 5 most frugal states. The playing field is not so level anymore…..</p>
<p><a href="/career/how-tough-is-it-to-score-well-in-board-exams/">How Tough is it to Score Well in Board Exams?</a> was originally published by eLitmus.com at <a href="">eLitmus Blog</a> on February 22, 2018.</p>