<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="https://blog.jonlu.ca/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.jonlu.ca/" rel="alternate" type="text/html" /><updated>2024-01-18T17:37:28-05:00</updated><id>https://blog.jonlu.ca/feed.xml</id><title type="html">JonLuca’s Blog</title><subtitle>JonLuca&apos;s Blog - A blog about tech, programming, and finance</subtitle><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><entry><title type="html">Noticing when an app is only hosted in us-east-1</title><link href="https://blog.jonlu.ca/posts/us-east-1-latency" rel="alternate" type="text/html" title="Noticing when an app is only hosted in us-east-1" /><published>2023-06-26T12:26:53-04:00</published><updated>2023-06-26T12:26:53-04:00</updated><id>https://blog.jonlu.ca/posts/us-east-1-latency</id><content type="html" xml:base="https://blog.jonlu.ca/posts/us-east-1-latency"><![CDATA[<p>Every time I leave New York and land back in Europe or in Asia, I can immediately tell which apps have a global presence and which apps only deploy to a single US region. Everything just immediately feels a little slower. The pull to refresh feels a bit sluggish, the preview images take a little longer to load, and even native apps just feel less responsive.</p>

<h2 id="floored-latency">Floored Latency</h2>

<p>The speed of the experience you can offer your users is floored by a few variables, most notably the physical distance from the origin server to where the user is actually sitting.</p>

<p>Us-east-1 (appropriately located right by “Centreville, Virginia”) is one of the main data centers run by AWS. If you’re a startup (or even quite a few mature companies), you are likely to deploy your application here. If you manage a stateful service, or your architecture doesn’t support distributed compute, you are likely to <em>only</em> deploy here. If you have users in Sydney, Australia, any network request you  make will need to travel 15,677km to make it there, as the crow flies.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/va-aus-50-33c5fd7c6.webp 50w, https://blog.jonlu.ca/images/generated/va-aus-100-33c5fd7c6.webp 100w, https://blog.jonlu.ca/images/generated/va-aus-200-33c5fd7c6.webp 200w, https://blog.jonlu.ca/images/generated/va-aus-400-33c5fd7c6.webp 400w, https://blog.jonlu.ca/images/generated/va-aus-800-33c5fd7c6.webp 800w, https://blog.jonlu.ca/images/generated/va-aus-1200-33c5fd7c6.webp 1200w, https://blog.jonlu.ca/images/generated/va-aus-1600-33c5fd7c6.webp 1600w, https://blog.jonlu.ca/images/generated/va-aus-2140-33c5fd7c6.webp 2140w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/va-aus-50-49883ae13.png 50w, https://blog.jonlu.ca/images/generated/va-aus-100-49883ae13.png 100w, https://blog.jonlu.ca/images/generated/va-aus-200-49883ae13.png 200w, https://blog.jonlu.ca/images/generated/va-aus-400-49883ae13.png 400w, https://blog.jonlu.ca/images/generated/va-aus-800-49883ae13.png 800w, https://blog.jonlu.ca/images/generated/va-aus-1200-49883ae13.png 1200w, https://blog.jonlu.ca/images/generated/va-aus-1600-49883ae13.png 1600w, https://blog.jonlu.ca/images/generated/va-aus-2140-49883ae13.png 2140w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/va-aus-800-49883ae13.png" alt="Distance from Centreville, Virginia to Sydney, Australia" width="2140" height="1438" /></picture>

<p class="footnote">Distance from Centreville, Virginia to Sydney, Australia</p>

<p>If you’re traveling at:</p>

<p>1) The speed of light</p>

<p>2) as the crow flies</p>

<p>3) with no overhead</p>

<p>then that means that you are <em>floored</em> at 104ms of latency for your request.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/va-aus-light-50-5fce2cfd0.webp 50w, https://blog.jonlu.ca/images/generated/va-aus-light-100-5fce2cfd0.webp 100w, https://blog.jonlu.ca/images/generated/va-aus-light-200-5fce2cfd0.webp 200w, https://blog.jonlu.ca/images/generated/va-aus-light-400-5fce2cfd0.webp 400w, https://blog.jonlu.ca/images/generated/va-aus-light-800-5fce2cfd0.webp 800w, https://blog.jonlu.ca/images/generated/va-aus-light-1200-5fce2cfd0.webp 1200w, https://blog.jonlu.ca/images/generated/va-aus-light-1600-5fce2cfd0.webp 1600w, https://blog.jonlu.ca/images/generated/va-aus-light-1630-5fce2cfd0.webp 1630w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/va-aus-light-50-e5651063e.png 50w, https://blog.jonlu.ca/images/generated/va-aus-light-100-e5651063e.png 100w, https://blog.jonlu.ca/images/generated/va-aus-light-200-e5651063e.png 200w, https://blog.jonlu.ca/images/generated/va-aus-light-400-e5651063e.png 400w, https://blog.jonlu.ca/images/generated/va-aus-light-800-e5651063e.png 800w, https://blog.jonlu.ca/images/generated/va-aus-light-1200-e5651063e.png 1200w, https://blog.jonlu.ca/images/generated/va-aus-light-1600-e5651063e.png 1600w, https://blog.jonlu.ca/images/generated/va-aus-light-1630-e5651063e.png 1630w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/va-aus-light-800-e5651063e.png" alt="Time in takes for the speed of light to travel from us-east-1 to Sydney" width="1630" height="1532" /></picture>

<p class="footnote">Time in takes for the speed of light to travel from us-east-1 to Sydney</p>

<p>And this is assuming no interference, other traffic, or time spent handling the request.</p>

<p>In reality, the ping you’ll experience will be worse, at around 215ms (which is a pretty amazing feat in and of itself - all those factors above only double the time it takes to get from Sydney to the eastern US).</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/ping-50-b6a5a1bc9.webp 50w, https://blog.jonlu.ca/images/generated/ping-100-b6a5a1bc9.webp 100w, https://blog.jonlu.ca/images/generated/ping-200-b6a5a1bc9.webp 200w, https://blog.jonlu.ca/images/generated/ping-400-b6a5a1bc9.webp 400w, https://blog.jonlu.ca/images/generated/ping-800-b6a5a1bc9.webp 800w, https://blog.jonlu.ca/images/generated/ping-1200-b6a5a1bc9.webp 1200w, https://blog.jonlu.ca/images/generated/ping-1600-b6a5a1bc9.webp 1600w, https://blog.jonlu.ca/images/generated/ping-2000-b6a5a1bc9.webp 2000w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/ping-50-077fdf770.png 50w, https://blog.jonlu.ca/images/generated/ping-100-077fdf770.png 100w, https://blog.jonlu.ca/images/generated/ping-200-077fdf770.png 200w, https://blog.jonlu.ca/images/generated/ping-400-077fdf770.png 400w, https://blog.jonlu.ca/images/generated/ping-800-077fdf770.png 800w, https://blog.jonlu.ca/images/generated/ping-1200-077fdf770.png 1200w, https://blog.jonlu.ca/images/generated/ping-1600-077fdf770.png 1600w, https://blog.jonlu.ca/images/generated/ping-2000-077fdf770.png 2000w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/ping-800-077fdf770.png" alt="screenshot shwing ping to various cities around the globe" width="2000" height="1064" /></picture>

<p class="footnote">Ping to various cities around the globe</p>

<p>And this is just what is added on top of everything else that happens on a request - TLS termination and DNS lookup. This isn’t a one-time hit to performance - every connection the client opens needs to go through TLS termination, or be queued up on the same connection and be executed serially. For simple sites that are basic html and css this isn’t an issue, but for many sites you’ll have dozens of requests to different domains, and each one will need to be established and terminated.</p>

<h2 id="compounding-latency">Compounding latency</h2>

<p>The full TLS 1.2 handshake requires 2 round-trips to complete, and when combined with TCP’s SYN and SYN-ACK negotiation it extends to 3 full round-trips. While, TLS 1.3 reduces that to two round-trips when under TCP, it still adds considerable latency to every connection. <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<p>The problem gets compounded when you realize many requests are chained, or dependent on each other. Images in CSS files can only be requested once the CSS has been downloaded, and the CSS can only be downloaded once the browser has parsed the HTML. HTTP/2 and its multiplexing improves this but doesn’t completely solve this - each of these could be hosted on a different domain, or saturate the max concurrent connections from your browser.</p>

<p>Protocol improvements also don’t fix fundamental issues with how the site is architected - if you have an SPA that hasn’t been optimized properly, your browser needs to first download all the content, then execute the javascript, and only once the JS has executed will it begin to make the API requests and fetch the assets to populate the content of the page.</p>

<p>It’s cascading latency hell. In aggregate this can add thousands of milliseconds for a simple site.</p>

<h2 id="realized-latency">Realized latency</h2>

<p>Having spent so much time trying to optimize web pages and API responses for performance, I’ve gotten a pretty good internal model for latency. I can’t quite tell the difference between a us-east-1 server when I’m in New York versus San Francisco, but I can definitely tell if you’ve got an instance deployed in <code class="language-plaintext highlighter-rouge">eu-central-1</code> or not when I’m in Italy, or <code class="language-plaintext highlighter-rouge">ap-east-1</code> when I’m in Sydney. It’s much easier to tell with native apps, where the UI is much more responsive and the only variable is the duration of the API request. You’ll pull to refresh on a list and it’ll hang for just a little longer than you’re used to.</p>

<p>Using a global CDN can help get your assets to your users quicker, and most companies by this point are using something like Cloudflare or Vercel, but many still only serve static or cached content this way. Very frequently the origin server will still be a centralized monolith deployed in only one location, or there will only be a single database cluster.</p>

<p>As soon as you land back in the United States and turn of Airplane mode on your phone everything just starts feeling… snappier? A little more fluid? As much as T-Mobile and Verizon would like to take credit for that I don’t think theres much more to it than the physical location of the servers and where you are at that moment.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://www.gnutls.org/manual/html_node/Reducing-round_002dtrips.html">https://www.gnutls.org/manual/html_node/Reducing-round_002dtrips.html</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[Every time I leave New York and land back in Europe or in Asia, I can immediately tell which apps have a global presence and which apps only deploy to a single US region. Everything just immediately feels a little slower. The pull to refresh feels a bit sluggish, the preview images take a little longer to load, and even native apps just feel less responsive.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/ping.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/ping.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Exploring how Magic Link works</title><link href="https://blog.jonlu.ca/posts/magic-link" rel="alternate" type="text/html" title="Exploring how Magic Link works" /><published>2023-05-30T20:53:42-04:00</published><updated>2023-05-30T20:53:42-04:00</updated><id>https://blog.jonlu.ca/posts/magic-link</id><content type="html" xml:base="https://blog.jonlu.ca/posts/magic-link"><![CDATA[<p>This blog is co-writte by <a href="https://twitter.com/_ricky_mo">Ricky Moezinia</a></p>

<p><a href="https://magic.link/">Magic Link</a> is a web3 wallet-as-a-service. They provide an SDK that enables users to have a crypto wallet linked to just their email address, instead of having to install a chrome extension or local wallet.</p>

<p>I wanted to explore how it worked, and what it was actually doing under the hood.</p>

<h2 id="web3-wallets">Web3 Wallets</h2>

<p>There are broadly two options for storing your crypto assets - custodial and non-custodial.</p>

<p>A custodial solution is something like coinbase - typically a single company, which manages the private key to your wallet for you, and which is the centralized gatekeeper to your funds.</p>

<p>A non custodial solution is one in which you store the private key yourself - the wallet simply provides the software that you run locally that generates and stores the key. This looks like Metamask or Phantom - you will typically install a chrome extension, go through an onboarding flow, save your recovery pass phrase, and only then can you begin to use it on various dapps or to move tokens around.</p>

<p>Generally speaking, the user experience with custodial services is better, and allows for easier user onboarding. However, it comes with some fairly major downsides, including centralization and greater likelihood of regulatory scrutiny.</p>

<p>Magic is somewhere in between - they say they are non custodial, but offer the UX of a custodial solution. No need for a user to install a separater chrome extension - they just need an email address and their browser.</p>

<h2 id="under-the-hood">Under the hood</h2>

<p>Magic is relies on AWS for their product. Their sign in and user logic uses AWS Cognito, and their key storage uses KMS.</p>

<p>AWS Cognito is the identity solution offered by AWS - they will handle authenticating your users for you, and offer a wide variety of sign on methods, including email/password, phone number, 3rd party, and magic email.</p>

<p>When setting up a Magic wallet, you start with your email address. This will talk to AWS, which will send a link to your email address containing a unique token.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/magic-onboarding-1-50-df8ec358d.webp 50w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-100-df8ec358d.webp 100w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-200-df8ec358d.webp 200w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-400-df8ec358d.webp 400w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-766-df8ec358d.webp 766w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/magic-onboarding-1-50-79fd63d9a.png 50w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-100-79fd63d9a.png 100w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-200-79fd63d9a.png 200w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-400-79fd63d9a.png 400w, https://blog.jonlu.ca/images/generated/magic-onboarding-1-766-79fd63d9a.png 766w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/magic-onboarding-1-766-79fd63d9a.png" alt="magic link form" width="766" height="576" /></picture>

<p class="footnote">Embedded Magic Link form (in this case, ImmutableX)</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/magic-onboarding-2-50-f05299b1a.webp 50w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-100-f05299b1a.webp 100w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-200-f05299b1a.webp 200w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-400-f05299b1a.webp 400w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-744-f05299b1a.webp 744w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/magic-onboarding-2-50-8baf70ace.png 50w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-100-8baf70ace.png 100w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-200-8baf70ace.png 200w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-400-8baf70ace.png 400w, https://blog.jonlu.ca/images/generated/magic-onboarding-2-744-8baf70ace.png 744w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/magic-onboarding-2-744-8baf70ace.png" alt="magic link form 2" width="744" height="948" /></picture>

<p class="footnote">Confirmation code for auth sign in</p>

<p>Once you’re on the verification page, your browser will talk to AWS to authenticate you, and Magic will start creating your key.</p>

<p>They will first generate the private key for the chain you’re using in the browser, using javascript, and keep that in memory. This is the root key material, which is used to sign transactions for whichver chain you’re using - this can be imported into metamask, a hardware wallet, etc.</p>

<p>This key, which we’ll call user key (UK), is never shown to the user. If you intercept the network request for initial encryption you’ll see it, base64 encoded.</p>

<p>It will then talk to KMS, using the user account that was just authenticated for you, and create a new key within KMS. This is the key that lives inside of Amazons data centers, and which will be used to encrypt and decrypt the UK.</p>

<p>Your browser will send the UK in plain text to KMS directly, never speaking to Magic links servers. It will retrieve the encrypted material, and store that both within the browser cache and with magic link.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/keymaterialresponse-50-3cce1e430.webp 50w, https://blog.jonlu.ca/images/generated/keymaterialresponse-100-3cce1e430.webp 100w, https://blog.jonlu.ca/images/generated/keymaterialresponse-200-3cce1e430.webp 200w, https://blog.jonlu.ca/images/generated/keymaterialresponse-400-3cce1e430.webp 400w, https://blog.jonlu.ca/images/generated/keymaterialresponse-800-3cce1e430.webp 800w, https://blog.jonlu.ca/images/generated/keymaterialresponse-1200-3cce1e430.webp 1200w, https://blog.jonlu.ca/images/generated/keymaterialresponse-1600-3cce1e430.webp 1600w, https://blog.jonlu.ca/images/generated/keymaterialresponse-2400-3cce1e430.webp 2400w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/keymaterialresponse-50-0a6e0b991.png 50w, https://blog.jonlu.ca/images/generated/keymaterialresponse-100-0a6e0b991.png 100w, https://blog.jonlu.ca/images/generated/keymaterialresponse-200-0a6e0b991.png 200w, https://blog.jonlu.ca/images/generated/keymaterialresponse-400-0a6e0b991.png 400w, https://blog.jonlu.ca/images/generated/keymaterialresponse-800-0a6e0b991.png 800w, https://blog.jonlu.ca/images/generated/keymaterialresponse-1200-0a6e0b991.png 1200w, https://blog.jonlu.ca/images/generated/keymaterialresponse-1600-0a6e0b991.png 1600w, https://blog.jonlu.ca/images/generated/keymaterialresponse-2400-0a6e0b991.png 2400w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/keymaterialresponse-800-0a6e0b991.png" alt="raw private key" width="2822" height="870" /></picture>

<p class="footnote">Decrypted private key coming back from KMS</p>

<p>If you copy and paste the “Plaintext” from above into Metamask, you can use your Magic.link private key outside of the Magic ecosystem.</p>

<p>Once they have the encrypted key contents, the decrypted key in memory, and the public key, magic will create an account for you on their centralized servers. They will save the encrypted key contents, and will use this for subsequent logins on new devices.</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"data"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"auth_user_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my-magic-link-id"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"auth_user_mfa_active"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w">
    </span><span class="nl">"auth_user_wallet_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my-wallet-id="</span><span class="p">,</span><span class="w">
    </span><span class="nl">"challenge_message"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w">
    </span><span class="nl">"client_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my-client-id"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"consent"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span><span class="w">
    </span><span class="nl">"delegated_wallet_info"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"delegated_access_token"</span><span class="p">:</span><span class="w"> </span><span class="s2">"{</span><span class="se">\"</span><span class="s2">ciphertext</span><span class="se">\"</span><span class="s2">: </span><span class="se">\"</span><span class="s2">cipher text</span><span class="se">\"</span><span class="s2">}"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"delegated_identity_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"us-west-2:aws-key-id"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"delegated_key_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"delegated-key-id"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"delegated_pool_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"us-west-2:delegated-pool-id"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"should_create_delegated_wallet"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="nl">"encrypted_private_address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"long base64 encoded encrypted private key"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"encrypted_seed_phrase"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w">
    </span><span class="nl">"hd_path"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w">
    </span><span class="nl">"login"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"identifiers"</span><span class="p">:</span><span class="w"> </span><span class="p">[],</span><span class="w">
      </span><span class="nl">"oauth2"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"email_link"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"webauthn"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="nl">"public_address"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0xmypublickey"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"recovery_factors"</span><span class="p">:</span><span class="w"> </span><span class="p">[],</span><span class="w">
    </span><span class="nl">"utc_timestamp_ms"</span><span class="p">:</span><span class="w"> </span><span class="mi">1687790408203</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"error_code"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
  </span><span class="nl">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
  </span><span class="nl">"status"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ok"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Any subsequent logins follow this same setup, with the only difference being that once you’ve authenticated with AWS, your browser will first check with Magic to see if you already have an encrypted key stored with them, and if so, will retrieve the encrypted key and decrypt it using AWS KMS.</p>

<h2 id="is-this-safe">Is this safe?</h2>

<p>The common refrain amongst crypto purists is “not your keys not your tokens” - this is in reference to custodial services, like Coinbase and FTX.</p>

<p>Magic is being a bit disingenous when they say they say they are non custodial - while it’s true that in their current set up they don’t directly have access to your raw private keys, they are still the admins of their AWS account. They are limited only by the policies they themselves have put in place - they can just go into their AWS console and change the key policy, and decrypt the encrypted private keys they have for every account.</p>

<p>Additionally, every time you authenticate on a new device, the raw key is transmitted over HTTPS to your machine. If your machine is being man in the middle’d, an attacker will be able to see your <em>raw key materials</em> if you sign in. Additionally, because this is in-browser, Magic is not using any form of certificate pinning for AWS.</p>

<p>This isn’t ideal. This isn’t MPC or any sort of advanced cryptography - it’s just using AWS as your trust layer, and some clever engineering to make the user experience good. I’m a bit skeptical of their claims that this isn’t custodial, since Magic technically has the ability to just recover everyone’s private keys. This might be good enough, though, and the users magic is targeting (web3 gamers and casual users) are’t likely to care about the security implications.</p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[This blog is co-writte by Ricky Moezinia]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/magic-onboarding-1.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/magic-onboarding-1.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Semantic search in iMessage, iMessage Wrapped, and AI conversations</title><link href="https://blog.jonlu.ca/posts/mimessage" rel="alternate" type="text/html" title="Semantic search in iMessage, iMessage Wrapped, and AI conversations" /><published>2023-04-14T00:31:20-04:00</published><updated>2023-04-14T00:31:20-04:00</updated><id>https://blog.jonlu.ca/posts/mimessage</id><content type="html" xml:base="https://blog.jonlu.ca/posts/mimessage"><![CDATA[<p>TLDR - You can <a href="https://github.com/jonluca/mimessage/releases/latest">download Mimessage on GitHub</a></p>

<p>I’ve always been surprised at how slow and and clunky it is to search in iMessage. It feels like it doesn’t search through your whole history, the UI for the results is too small given the prominence of the feature, and there exist quite literally no ways to filter or refine your search.</p>

<p>I realized that iMessage just stores its database locally as a sqlite file, so I went about building an alternate UI for searching, and adding in a few features that I thought would be interesting. These include:</p>

<ul>
  <li><strong>Semantic Search</strong> - I wanted to create embeddings for every message/conversation and then add proper semantic search on top of it</li>
  <li><strong>Wrapped</strong> - I really like seeing stats about my life, and I really enjoy what Spotify has done with Wrapped, so I set out to do the same for iMessage</li>
  <li><strong>AI Conversations</strong> - Talk with your friends, AI-ifiied; once you have all the context around a conversation, you can understand how that person texts and what their tone it. Then it’s as simple as plugging it into ChatGPT</li>
  <li><strong>Export</strong> - Export your conversations in JSON or plain text</li>
</ul>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/wrapped-50-c7fff5937.webp 50w, https://blog.jonlu.ca/images/generated/wrapped-100-c7fff5937.webp 100w, https://blog.jonlu.ca/images/generated/wrapped-200-c7fff5937.webp 200w, https://blog.jonlu.ca/images/generated/wrapped-400-c7fff5937.webp 400w, https://blog.jonlu.ca/images/generated/wrapped-800-c7fff5937.webp 800w, https://blog.jonlu.ca/images/generated/wrapped-1200-c7fff5937.webp 1200w, https://blog.jonlu.ca/images/generated/wrapped-1600-c7fff5937.webp 1600w, https://blog.jonlu.ca/images/generated/wrapped-2400-c7fff5937.webp 2400w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/wrapped-50-c1a221da4.png 50w, https://blog.jonlu.ca/images/generated/wrapped-100-c1a221da4.png 100w, https://blog.jonlu.ca/images/generated/wrapped-200-c1a221da4.png 200w, https://blog.jonlu.ca/images/generated/wrapped-400-c1a221da4.png 400w, https://blog.jonlu.ca/images/generated/wrapped-800-c1a221da4.png 800w, https://blog.jonlu.ca/images/generated/wrapped-1200-c1a221da4.png 1200w, https://blog.jonlu.ca/images/generated/wrapped-1600-c1a221da4.png 1600w, https://blog.jonlu.ca/images/generated/wrapped-2400-c1a221da4.png 2400w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/wrapped-800-c1a221da4.png" alt="imessage wrapped" width="4336" height="2622" /></picture>

<p class="footnote">Your iMessage Wrapped</p>

<h2 id="reverse-engineering-the-database-schema">Reverse Engineering the Database Schema</h2>

<p>The database sits at <code class="language-plaintext highlighter-rouge">/Library/Messages/chat.db</code>, while all the attachments live in <code class="language-plaintext highlighter-rouge">~/Library/Messages/Attachments</code>. I made a copy of it and then opened it up and datagrip and then was ready to go. The database schema is relatively straightforward to understand - there where <code class="language-plaintext highlighter-rouge">message</code>, <code class="language-plaintext highlighter-rouge">chat</code>, <code class="language-plaintext highlighter-rouge">handle</code>, and <code class="language-plaintext highlighter-rouge">attachment</code> tables, each containing the data you’d expect. A rust library named <a href="https://github.com/ReagentX/imessage-exporter">imessage-exporter</a> was particularly useful for understanding how the tables were joined together and what queries to make.</p>

<p>A few gotchas were:</p>

<ul>
  <li>
    <p>iMessage does not receive raw text as its messages - it actually receives a <code class="language-plaintext highlighter-rouge">typedstream</code>, and then later on post processes it and backfills in the <code class="language-plaintext highlighter-rouge">text</code> column. There’s a fairly obscure library for parsing these in typescript (with a bug fixed) <a href="https://github.com/jonluca/node-typedstream">here</a></p>
  </li>
  <li>
    <p>When using Messages in iCloud, your messages app won’t have all your conversations and attachments. This will make your old conversations look much more sparse than they really are. You can force iMessage to download all your conversations by toggling it on and off in iMessage settings.</p>
  </li>
</ul>

<h2 id="wrapped">Wrapped</h2>

<p>Wrapped gives you a breakdown of your iMessage habits, broken down by year and by conversation/person.</p>

<p>I actually used ChatGPT to generate the product features included in iMessage Wrapped - as soon as it can do figma mocks I’ll redesign the UI using that, as well.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/chatgpt-50-643735100.webp 50w, https://blog.jonlu.ca/images/generated/chatgpt-100-643735100.webp 100w, https://blog.jonlu.ca/images/generated/chatgpt-200-643735100.webp 200w, https://blog.jonlu.ca/images/generated/chatgpt-400-643735100.webp 400w, https://blog.jonlu.ca/images/generated/chatgpt-800-643735100.webp 800w, https://blog.jonlu.ca/images/generated/chatgpt-1200-643735100.webp 1200w, https://blog.jonlu.ca/images/generated/chatgpt-1600-643735100.webp 1600w, https://blog.jonlu.ca/images/generated/chatgpt-1788-643735100.webp 1788w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/chatgpt-50-955727fac.png 50w, https://blog.jonlu.ca/images/generated/chatgpt-100-955727fac.png 100w, https://blog.jonlu.ca/images/generated/chatgpt-200-955727fac.png 200w, https://blog.jonlu.ca/images/generated/chatgpt-400-955727fac.png 400w, https://blog.jonlu.ca/images/generated/chatgpt-800-955727fac.png 800w, https://blog.jonlu.ca/images/generated/chatgpt-1200-955727fac.png 1200w, https://blog.jonlu.ca/images/generated/chatgpt-1600-955727fac.png 1600w, https://blog.jonlu.ca/images/generated/chatgpt-1788-955727fac.png 1788w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/chatgpt-800-955727fac.png" alt="chatgpt product suggestions" width="1788" height="1908" /></picture>

<p class="footnote">ChatGPT's suggestions for the features in wrapped</p>

<h2 id="ai-conversations">AI Conversations</h2>

<p>I’ve always thought the amount of information stored in iMessage wasn’t being used appropriately - I thought that it could learn more about you, or the person you’re talking to, and suggest more things or be smarter about your interactions.</p>

<p>One of the first ideas that came to mind when I was building this was to use GPT4 to just continue any conversation, right where you left off.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/ai-chat-50-0560ca546.webp 50w, https://blog.jonlu.ca/images/generated/ai-chat-100-0560ca546.webp 100w, https://blog.jonlu.ca/images/generated/ai-chat-200-0560ca546.webp 200w, https://blog.jonlu.ca/images/generated/ai-chat-400-0560ca546.webp 400w, https://blog.jonlu.ca/images/generated/ai-chat-800-0560ca546.webp 800w, https://blog.jonlu.ca/images/generated/ai-chat-1200-0560ca546.webp 1200w, https://blog.jonlu.ca/images/generated/ai-chat-1600-0560ca546.webp 1600w, https://blog.jonlu.ca/images/generated/ai-chat-2400-0560ca546.webp 2400w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/ai-chat-50-72e31d31a.png 50w, https://blog.jonlu.ca/images/generated/ai-chat-100-72e31d31a.png 100w, https://blog.jonlu.ca/images/generated/ai-chat-200-72e31d31a.png 200w, https://blog.jonlu.ca/images/generated/ai-chat-400-72e31d31a.png 400w, https://blog.jonlu.ca/images/generated/ai-chat-800-72e31d31a.png 800w, https://blog.jonlu.ca/images/generated/ai-chat-1200-72e31d31a.png 1200w, https://blog.jonlu.ca/images/generated/ai-chat-1600-72e31d31a.png 1600w, https://blog.jonlu.ca/images/generated/ai-chat-2400-72e31d31a.png 2400w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/ai-chat-800-72e31d31a.png" alt="Chatting with someone" width="2710" height="1082" /></picture>

<p class="footnote">AI generated chat that really can't make it to dinner</p>

<p>I started building Mimessage on April 1st, and just a few days ago <a href="https://www.izzy.co/blogs/robo-boys.html">I saw someone on hackernews had had the same idea to clone their friends</a>. I think that training a model is actually a much smarter way of accomplishing this, and seems to lead to better results than naively continuing the conversation with GPT4. I’ll try and get it running locally with LLaMA soon.</p>

<h2 id="better-search">Better Search</h2>

<p>I added in global search with filters, as well as a chat specific search that allows you to do either fuzzy matching using <a href="https://fusejs.io/">fuse.js</a> or just writing raw regex in the query itself.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/global-filter-50-a9cf04d70.webp 50w, https://blog.jonlu.ca/images/generated/global-filter-100-a9cf04d70.webp 100w, https://blog.jonlu.ca/images/generated/global-filter-200-a9cf04d70.webp 200w, https://blog.jonlu.ca/images/generated/global-filter-400-a9cf04d70.webp 400w, https://blog.jonlu.ca/images/generated/global-filter-800-a9cf04d70.webp 800w, https://blog.jonlu.ca/images/generated/global-filter-1200-a9cf04d70.webp 1200w, https://blog.jonlu.ca/images/generated/global-filter-1366-a9cf04d70.webp 1366w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/global-filter-50-703a7b0f4.png 50w, https://blog.jonlu.ca/images/generated/global-filter-100-703a7b0f4.png 100w, https://blog.jonlu.ca/images/generated/global-filter-200-703a7b0f4.png 200w, https://blog.jonlu.ca/images/generated/global-filter-400-703a7b0f4.png 400w, https://blog.jonlu.ca/images/generated/global-filter-800-703a7b0f4.png 800w, https://blog.jonlu.ca/images/generated/global-filter-1200-703a7b0f4.png 1200w, https://blog.jonlu.ca/images/generated/global-filter-1366-703a7b0f4.png 1366w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/global-filter-800-703a7b0f4.png" alt="global search filter" width="1366" height="368" /></picture>

<p class="footnote">Global search with filters</p>

<p>I also created a virtual FTS5 table in the sqlite messages copy, which does <code class="language-plaintext highlighter-rouge">MATCh</code> searches ordered by <code class="language-plaintext highlighter-rouge">rank</code>, and raw <code class="language-plaintext highlighter-rouge">LIKE %QUERY%</code> queries as well. This is really really fast, and a way better text searching experience than iMessage.</p>

<h3 id="semantic-search">Semantic Search</h3>

<p>I also wanted to add semantic search, as those results will often blow pure text searches out of the water. I used OpenAI and ChromaDB to create and store the embeddings for each text message, respectively.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/semantic-search-50-9f6b5418a.webp 50w, https://blog.jonlu.ca/images/generated/semantic-search-100-9f6b5418a.webp 100w, https://blog.jonlu.ca/images/generated/semantic-search-200-9f6b5418a.webp 200w, https://blog.jonlu.ca/images/generated/semantic-search-400-9f6b5418a.webp 400w, https://blog.jonlu.ca/images/generated/semantic-search-800-9f6b5418a.webp 800w, https://blog.jonlu.ca/images/generated/semantic-search-1200-9f6b5418a.webp 1200w, https://blog.jonlu.ca/images/generated/semantic-search-1600-9f6b5418a.webp 1600w, https://blog.jonlu.ca/images/generated/semantic-search-2280-9f6b5418a.webp 2280w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/semantic-search-50-a3f3a4e76.png 50w, https://blog.jonlu.ca/images/generated/semantic-search-100-a3f3a4e76.png 100w, https://blog.jonlu.ca/images/generated/semantic-search-200-a3f3a4e76.png 200w, https://blog.jonlu.ca/images/generated/semantic-search-400-a3f3a4e76.png 400w, https://blog.jonlu.ca/images/generated/semantic-search-800-a3f3a4e76.png 800w, https://blog.jonlu.ca/images/generated/semantic-search-1200-a3f3a4e76.png 1200w, https://blog.jonlu.ca/images/generated/semantic-search-1600-a3f3a4e76.png 1600w, https://blog.jonlu.ca/images/generated/semantic-search-2280-a3f3a4e76.png 2280w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/semantic-search-800-a3f3a4e76.png" alt="semantic search" width="2280" height="2608" /></picture>

<p class="footnote">Enabling semantic search on top of imessage</p>

<p>This takes quite a while and costs money, as well as requiring you to send your data off to OpenAI, so I’m planning on migrating to an entirely on device version soon.</p>

<h2 id="yak-shaving-the-actually-cool-part">Yak shaving the actually cool part</h2>

<p>The coolest part of this project was actually in using a tool I created for this project called <code class="language-plaintext highlighter-rouge">repo-refactor</code> - it’s a library that will convert a github repo from one language to another https://github.com/jonluca/repo-refactor. The rust library I mentioned above had already done some of the heavy lifting in creating data models and structures representing the schema, as well as writing some of the base queries. This was really useful for creating and understanding types early on, and for actually reading the rust code (admittedly not my forte).</p>

<p>The tool is very much in its infancy, and works successfully somewhere between 40 and 80 percent of the time. It doesn’t do great on long files, and some clear failure modes (going from weakly typed -&gt; strongly typed is pretty bad), but I think there are some cheap techniques that can be implemented to make it way more accurate. It’s still pretty incredible that it works this well with very little prompt engineering or manual cleanup.</p>

<h2 id="next-steps">Next Steps</h2>

<p>The whole project was built in open source <a href="https://github.com/jonluca/mimessage">here</a> and you can grab a copy of the latest working code from the <a href="https://github.com/jonluca/mimessage/releases/latest">releases page</a>. I want to get the inference running entirely locally, probably with alpaca or with vicuna weights, so that it’s free and privacy preserving.</p>

<p>I also want to add more stats to the wrapped page, and make it more of an experience like Spotify’s is - if there’s any designer reading this that wants to collaborate on it, <a href="mailto:hi@jonlu.ca">shoot me an email</a></p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[TLDR - You can download Mimessage on GitHub]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/mini-profile.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/mini-profile.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Getting a vanity phone number with 4 repeating digits</title><link href="https://blog.jonlu.ca/posts/verizon-rare-numbers" rel="alternate" type="text/html" title="Getting a vanity phone number with 4 repeating digits" /><published>2022-05-28T11:06:28-04:00</published><updated>2022-05-28T11:06:28-04:00</updated><id>https://blog.jonlu.ca/posts/verizon-rare-numbers</id><content type="html" xml:base="https://blog.jonlu.ca/posts/verizon-rare-numbers"><![CDATA[<p>I find that it’s pretty useful to have access to multiple phone numbers. Any time a site offers a discount if you give them their phone number, or when you want to be more anonymous online and a service needs a number, it’s nice to have a number that’s more than a burner but not your main number. I had heard of the Verizon My Numbers service and thought it would be cool to have a few extra numbers for cases like these. I also wanted a number with multiple consecutive digits, both because it was easier to remember and for the cool factor.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-search-50-cf57c94e5.webp 50w, https://blog.jonlu.ca/images/generated/verizon-search-100-cf57c94e5.webp 100w, https://blog.jonlu.ca/images/generated/verizon-search-200-cf57c94e5.webp 200w, https://blog.jonlu.ca/images/generated/verizon-search-400-cf57c94e5.webp 400w, https://blog.jonlu.ca/images/generated/verizon-search-800-cf57c94e5.webp 800w, https://blog.jonlu.ca/images/generated/verizon-search-1183-cf57c94e5.webp 1183w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-search-50-8d4034cfe.png 50w, https://blog.jonlu.ca/images/generated/verizon-search-100-8d4034cfe.png 100w, https://blog.jonlu.ca/images/generated/verizon-search-200-8d4034cfe.png 200w, https://blog.jonlu.ca/images/generated/verizon-search-400-8d4034cfe.png 400w, https://blog.jonlu.ca/images/generated/verizon-search-800-8d4034cfe.png 800w, https://blog.jonlu.ca/images/generated/verizon-search-1183-8d4034cfe.png 1183w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/verizon-search-800-8d4034cfe.png" alt="Verizons search UI" width="1183" height="1424" /></picture>

<p>Their UI was pretty bad, and it was quite slow. It also wouldn’t let you search without an area code, and the majority of area codes that I tried said that they didn’t have any numbers associated with them. I figured it would be faster to do this programatically.</p>

<h2 id="finding-the-endpoints">Finding the endpoints</h2>

<p>I used burpsuite to find out which API endpoint Verizon was hitting to fetch the available phone numbers.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-search-endpoint-50-8c0a25e81.webp 50w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-100-8c0a25e81.webp 100w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-200-8c0a25e81.webp 200w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-400-8c0a25e81.webp 400w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-800-8c0a25e81.webp 800w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-1200-8c0a25e81.webp 1200w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-1600-8c0a25e81.webp 1600w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-1854-8c0a25e81.webp 1854w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-search-endpoint-50-398fb8158.png 50w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-100-398fb8158.png 100w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-200-398fb8158.png 200w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-400-398fb8158.png 400w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-800-398fb8158.png 800w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-1200-398fb8158.png 1200w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-1600-398fb8158.png 1600w, https://blog.jonlu.ca/images/generated/verizon-search-endpoint-1854-398fb8158.png 1854w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/verizon-search-endpoint-800-398fb8158.png" alt="Verizons search endpoint" width="1854" height="1306" /></picture>

<p>From here I could just export it to code - my preferred method is to copy the request as curl from within BurpSuite and then use <a href="https://curlconverter.com/">a tool like CurlConverter</a> to turn it into the language of choice, although BurpSuite does have plugins that do this natively as well.</p>

<h2 id="calling-the-api-programatically">Calling the API programatically</h2>

<p>Curl converter gave me a nice little code snippet that I could use. I put it into a jupyter notebook and was happy to see it all worked. Their endpoint didn’t seem to use any signing or on-device auth that would’ve made this difficult, besides a static basic auth token as an HTTP header.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">requests</span>

<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">'</span><span class="s">Host</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">api-v.vzmessages.com</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Cache-Control</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">no-cache</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Connection</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">close</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Accept</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">*/*</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">User-Agent</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">Verizon%20My%20Numbers/1 CFNetwork/1333.0.4 Darwin/21.5.0</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Accept-Language</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">en-US,en;q=0.9</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Authorization</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">Basic &lt;Auth-Token&gt;</span><span class="sh">'</span><span class="p">,</span>
<span class="p">}</span>

<span class="n">params</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">'</span><span class="s">state</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">CA</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">acode</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">213</span><span class="sh">'</span><span class="p">,</span>
<span class="p">}</span>

<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">https://api-v.vzmessages.com/VirtualNumber/listOfMVNs/US</span><span class="sh">'</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="n">params</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span> <span class="n">verify</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-resp-50-f4bc7ed90.webp 50w, https://blog.jonlu.ca/images/generated/verizon-resp-100-f4bc7ed90.webp 100w, https://blog.jonlu.ca/images/generated/verizon-resp-200-f4bc7ed90.webp 200w, https://blog.jonlu.ca/images/generated/verizon-resp-400-f4bc7ed90.webp 400w, https://blog.jonlu.ca/images/generated/verizon-resp-576-f4bc7ed90.webp 576w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-resp-50-aa2d202b4.png 50w, https://blog.jonlu.ca/images/generated/verizon-resp-100-aa2d202b4.png 100w, https://blog.jonlu.ca/images/generated/verizon-resp-200-aa2d202b4.png 200w, https://blog.jonlu.ca/images/generated/verizon-resp-400-aa2d202b4.png 400w, https://blog.jonlu.ca/images/generated/verizon-resp-576-aa2d202b4.png 576w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/verizon-resp-576-aa2d202b4.png" alt="Verizons search endpoint response" width="576" height="158" /></picture>

<p>It seemed like multiple calls to this endpoint with the same params gave different replies, which meant it was non deterministic.</p>

<p>It also seemed like if you removed the params, it would remove the filter and return back many more numbers - I didn’t <em>really</em> care what the area code was, so removing the params should make it easier to find a number I liked.</p>

<p>I wrote a quick script to repeatedly call the API and try and find a number with at least 3 consecutive numbers.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">contains_cons</span><span class="p">(</span><span class="n">num</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">num</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">num</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">num</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="ow">and</span> <span class="n">num</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="n">num</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">2</span><span class="p">]:</span>
            <span class="k">return</span> <span class="bp">True</span>
    <span class="k">return</span> <span class="bp">False</span>
</code></pre></div></div>

<p>and then just kept calling that infinitely.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">seen</span> <span class="o">=</span> <span class="nf">set</span><span class="p">()</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
    <span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">https://api-v.vzmessages.com/VirtualNumber/listOfMVNs/US</span><span class="sh">'</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
    <span class="n">nums</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="nf">json</span><span class="p">()</span>
    <span class="k">for</span> <span class="n">num</span> <span class="ow">in</span> <span class="n">nums</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">num</span> <span class="ow">in</span> <span class="n">seen</span><span class="p">:</span>
            <span class="k">continue</span>
        <span class="n">seen</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
        <span class="n">is_valid</span> <span class="o">=</span> <span class="nf">contains_cons</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">is_valid</span><span class="p">:</span>
            <span class="nf">print</span><span class="p">(</span><span class="n">num</span><span class="p">)</span>
            <span class="k">break</span>
</code></pre></div></div>

<h2 id="finding-valid-numbers">Finding valid numbers</h2>

<p>I let this run for a few minutes and checked back and found that there were quite a few good candidates.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-success-50-4866e1f28.webp 50w, https://blog.jonlu.ca/images/generated/verizon-success-100-4866e1f28.webp 100w, https://blog.jonlu.ca/images/generated/verizon-success-200-4866e1f28.webp 200w, https://blog.jonlu.ca/images/generated/verizon-success-400-4866e1f28.webp 400w, https://blog.jonlu.ca/images/generated/verizon-success-800-4866e1f28.webp 800w, https://blog.jonlu.ca/images/generated/verizon-success-1200-4866e1f28.webp 1200w, https://blog.jonlu.ca/images/generated/verizon-success-1600-4866e1f28.webp 1600w, https://blog.jonlu.ca/images/generated/verizon-success-2262-4866e1f28.webp 2262w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/verizon-success-50-6995dc65a.png 50w, https://blog.jonlu.ca/images/generated/verizon-success-100-6995dc65a.png 100w, https://blog.jonlu.ca/images/generated/verizon-success-200-6995dc65a.png 200w, https://blog.jonlu.ca/images/generated/verizon-success-400-6995dc65a.png 400w, https://blog.jonlu.ca/images/generated/verizon-success-800-6995dc65a.png 800w, https://blog.jonlu.ca/images/generated/verizon-success-1200-6995dc65a.png 1200w, https://blog.jonlu.ca/images/generated/verizon-success-1600-6995dc65a.png 1600w, https://blog.jonlu.ca/images/generated/verizon-success-2262-6995dc65a.png 2262w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/verizon-success-800-6995dc65a.png" alt="Verizons numbers with consecutive digits" width="2262" height="1168" /></picture>

<p>These are the numbers that had at least three consecutive digits (along with the number I actually ended up using, blacked out).</p>

<h2 id="purchasing-the-number">Purchasing the number</h2>

<p>I went through and bought a dummy number to figure out how to purchase the number programatically, and then did the same method as above to purchase it. Luckily there weren’t any unique IDs or anything, I could just swap out the number and it would work.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">requests</span>

<span class="n">headers</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">'</span><span class="s">Host</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">api-v.vzmessages.com</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">User-Agent</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">Verizon%20My%20Numbers/1 CFNetwork/1333.0.4 Darwin/21.5.0</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Connection</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">close</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Accept</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">*/*</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Accept-Language</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">en-US,en;q=0.9</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">Cache-Control</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">no-cache</span><span class="sh">'</span><span class="p">,</span>
<span class="p">}</span>

<span class="n">json_data</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">'</span><span class="s">countryCode</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">US</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">zipcode</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">&lt;your billing zipcode</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">deviceId</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">&lt;your device id&gt;</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">contentId</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">Content2</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">price</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">15</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">mvn</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">&lt;number you want&gt;</span><span class="sh">'</span><span class="p">,</span>
    <span class="sh">'</span><span class="s">mdn</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">&lt;your hard line number&gt;</span><span class="sh">'</span><span class="p">,</span>
<span class="p">}</span>

<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="sh">'</span><span class="s">https://api-v.vzmessages.com/VirtualNumber/purchase</span><span class="sh">'</span><span class="p">,</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">,</span> <span class="n">json</span><span class="o">=</span><span class="n">json_data</span><span class="p">)</span>
</code></pre></div></div>

<h2 id="success">Success</h2>

<p>Using this I was able to purchase the number, and now it shows up in the app. One drawback is that it’s not really integrated fully in your phone - for instance, iMessages to that number won’t work, and the calls need to be made through the app. It’s still nice to have, and a nice number to give out when you don’t want to give out your real one.</p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[I find that it’s pretty useful to have access to multiple phone numbers. Any time a site offers a discount if you give them their phone number, or when you want to be more anonymous online and a service needs a number, it’s nice to have a number that’s more than a burner but not your main number. I had heard of the Verizon My Numbers service and thought it would be cool to have a few extra numbers for cases like these. I also wanted a number with multiple consecutive digits, both because it was easier to remember and for the cool factor.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/verizon-search.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/verizon-search.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Web3, Free Candy, and exploits galore</title><link href="https://blog.jonlu.ca/posts/candy-machine" rel="alternate" type="text/html" title="Web3, Free Candy, and exploits galore" /><published>2022-02-26T18:59:43-05:00</published><updated>2022-02-26T18:59:43-05:00</updated><id>https://blog.jonlu.ca/posts/candy-machine</id><content type="html" xml:base="https://blog.jonlu.ca/posts/candy-machine"><![CDATA[<p>On 1/4/22, nearly 4000 Solana NFT projects were drained of their funds due to a reinitialization bug present in the Candy Machine v1 smart contract on Solana. The account, <a href="https://solscan.io/account/cHfYkrVAwfEoe3Mr2GbvzpNQJboDL6AiBoFZDsf8dxj">cHfYkrVAwfEoe3Mr2GbvzpNQJboDL6AiBoFZDsf8dxj</a>, converted 1,027 SOL into 155k USDC using Raydium, and then transferred the USDC into their FTX account. The vulnerability was patched while the attack was actively going on, at 6:20am on 1/4/22.</p>

<p>This investigation uncovered similar vulnerabilities in NFT exchanges, yet to be publicized.</p>

<h2 id="background">Background</h2>

<p>Metaplex’s <a href="https://docs.metaplex.com/candy-machine-v2/introduction">Candy Machine</a>, a Solana program which handles the logistics of NFT issuance, just launched last September. You instantiate it with their CLI, feed it your images, and it handles the rest. It will deal with all the technically complex parts of putting the images on chain and creating the smart contracts to mint them to the buyers.</p>

<p>It’s extremely simple to launch an NFT sale with Metaplex; you choose the price you want to set, the timing of the collection drop and any other configs - it handles the rest and mints right to recipients wallets.</p>

<p>This simplicity greatly lowered the barrier to entry - you didn’t need to have any Rust knowledge or Solana API experience to use it. When it first came out it led to a huge increase in NFT collections.</p>

<p>Since its inception, over 14,800 candy machines have been created, each corresponding to an NFT collection.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/cm-program-solscan-50-e693d3e86.webp 50w, https://blog.jonlu.ca/images/generated/cm-program-solscan-100-e693d3e86.webp 100w, https://blog.jonlu.ca/images/generated/cm-program-solscan-200-e693d3e86.webp 200w, https://blog.jonlu.ca/images/generated/cm-program-solscan-400-e693d3e86.webp 400w, https://blog.jonlu.ca/images/generated/cm-program-solscan-800-e693d3e86.webp 800w, https://blog.jonlu.ca/images/generated/cm-program-solscan-1200-e693d3e86.webp 1200w, https://blog.jonlu.ca/images/generated/cm-program-solscan-1600-e693d3e86.webp 1600w, https://blog.jonlu.ca/images/generated/cm-program-solscan-2400-e693d3e86.webp 2400w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/cm-program-solscan-50-e344b0d0e.png 50w, https://blog.jonlu.ca/images/generated/cm-program-solscan-100-e344b0d0e.png 100w, https://blog.jonlu.ca/images/generated/cm-program-solscan-200-e344b0d0e.png 200w, https://blog.jonlu.ca/images/generated/cm-program-solscan-400-e344b0d0e.png 400w, https://blog.jonlu.ca/images/generated/cm-program-solscan-800-e344b0d0e.png 800w, https://blog.jonlu.ca/images/generated/cm-program-solscan-1200-e344b0d0e.png 1200w, https://blog.jonlu.ca/images/generated/cm-program-solscan-1600-e344b0d0e.png 1600w, https://blog.jonlu.ca/images/generated/cm-program-solscan-2400-e344b0d0e.png 2400w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/cm-program-solscan-800-e344b0d0e.png" alt="Candy Machine program" width="2592" height="2026" /></picture>

<h2 id="impact">Impact</h2>

<p>The goal of this research was to identify how the attacker exploited the vulnerability, trace the funds and their total dollar denominated value, and then to determine which projects were impacted.</p>

<p>The attacker targeted 4,410 of the 14,800 candy machines that were created at the time. I’m guessing they didn’t target every vulnerable program because they had trouble pulling the historical candy machine creation records.</p>

<p>They fired off withdrawal transactions that took advantage of the reinitialization bug over the period of an hour.</p>

<p>The withdrawal transactions lasted between <a href="https://solscan.io/tx/coSeMNsGKebMGP1vqPZcEbu6rYiF4BbCRrtBRNLFi4TbMo3Psd7KZyvDTPv6KyeqZNDyMVU3o6D3rgQPG1aV94J">5:57am</a> and <a href="https://solscan.io/tx/3zhZDtCV2vr5fSG2TxEjXXTdMmfk8rfnM4mNAavKdZM1Cy6627hN8vDnu7gaUk6oPmzLLcacJpTopK1bsscX9MbB">6:49am</a> EST on January 4th 2022. At <a href="https://solscan.io/tx/3zhZDtCV2vr5fSG2TxEjXXTdMmfk8rfnM4mNAavKdZM1Cy6627hN8vDnu7gaUk6oPmzLLcacJpTopK1bsscX9MbB">6:20am</a>, the patched contract was deployed, causing every subsequent transaction by the attacker to fail.</p>

<p>Of the 4,410 candy machines targeted, 3,470 were completely drained. The vulnerability didn’t give the attacker permanent control of the candy machines - only for the duration of that transaction, which means that the candy machines that were impacted are not currently vulnerable.</p>

<p>Some of the notable projects impacted by this vulnerability are SolSteads, Contrastive, and Degen Ape Society, with a full list below.</p>

<h2 id="vulnerability">Vulnerability</h2>

<p>The bug was subtle - the attacker was injecting pre-initialized accounts and the program was not checking if the account had already been initialized, meaning an attacker could populate their own address as the authority of the contract.</p>

<p><a href="https://github.com/metaplex-foundation/metaplex/commit/4ddc13ea29070172f358e054baa9d4c47687a26b">The fix itself was fairly straightforward.</a></p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/candy-machine-fix-50-dcefebdfb.webp 50w, https://blog.jonlu.ca/images/generated/candy-machine-fix-100-dcefebdfb.webp 100w, https://blog.jonlu.ca/images/generated/candy-machine-fix-200-dcefebdfb.webp 200w, https://blog.jonlu.ca/images/generated/candy-machine-fix-400-dcefebdfb.webp 400w, https://blog.jonlu.ca/images/generated/candy-machine-fix-800-dcefebdfb.webp 800w, https://blog.jonlu.ca/images/generated/candy-machine-fix-1200-dcefebdfb.webp 1200w, https://blog.jonlu.ca/images/generated/candy-machine-fix-1498-dcefebdfb.webp 1498w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/candy-machine-fix-50-0de749254.png 50w, https://blog.jonlu.ca/images/generated/candy-machine-fix-100-0de749254.png 100w, https://blog.jonlu.ca/images/generated/candy-machine-fix-200-0de749254.png 200w, https://blog.jonlu.ca/images/generated/candy-machine-fix-400-0de749254.png 400w, https://blog.jonlu.ca/images/generated/candy-machine-fix-800-0de749254.png 800w, https://blog.jonlu.ca/images/generated/candy-machine-fix-1200-0de749254.png 1200w, https://blog.jonlu.ca/images/generated/candy-machine-fix-1498-0de749254.png 1498w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/candy-machine-fix-800-0de749254.png" alt="Fix for the vulnerability" width="1498" height="590" /></picture>

<p>The hack seems fairly unsophisticated - the damage this vulnerability could do was pretty high, as the bug effectively allowed any account to control the Candy Machine. The attacker submitted the transactions slowly, and would probably have been able to capture the entirety of the vulnerable set of candy machines had they submitted the transactions through their own RPC pool without rate limits.</p>

<p>What’s also interesting about the fix is that it was <a href="https://github.com/metaplex-foundation/metaplex/commit/e9ef376443c3c8fd2f5b151dd0b09f757b1bf35c">actually fixed in code on December 31st for Candy Machine v2</a>, but the CMv1 contract wasn’t redeployed until it was actively being exploited.</p>

<h2 id="fund-extraction">Fund extraction</h2>

<p>The attacker used Serum DEX and RaydiumSwapV2 to convert the SOL to USDC, then sent the USDC to a FTX address. It should be fairly easy to reverse their idea from FTXs end if they’ve KYC’d properly.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-50-9e188d381.webp 50w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-100-9e188d381.webp 100w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-200-9e188d381.webp 200w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-400-9e188d381.webp 400w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-800-9e188d381.webp 800w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-1200-9e188d381.webp 1200w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-1600-9e188d381.webp 1600w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-2400-9e188d381.webp 2400w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-50-d592a356c.png 50w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-100-d592a356c.png 100w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-200-d592a356c.png 200w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-400-d592a356c.png 400w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-800-d592a356c.png 800w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-1200-d592a356c.png 1200w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-1600-d592a356c.png 1600w, https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-2400-d592a356c.png 2400w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/candy-machine-withdrawal-800-d592a356c.png" alt="Withdrawal transaction" width="2592" height="1406" /></picture>

<h2 id="candy-machine">Candy Machine</h2>

<p>Candy Machine v1 is now deprecated, and any new candy machines created should be v2s. <a href="https://docs.metaplex.com/candy-machine-v2/introduction">From their docs</a>:</p>

<blockquote>
  <p>The second iteration of the well-known Candy Machine, a fully on-chain generative NFT distribution program, provides many improvements over its predecessor. The new version also allows you to create a whole new set of distribution scenarios and offers protection from bot attacks, while providing the same easy-to-use experience.</p>
</blockquote>

<h2 id="research">Research</h2>

<p>Querying for historical data on chain in Solana is a time consuming process. I tried doing research in jupyter notebook at first, but the volume of data made it hard to parse and query.</p>

<p>I ended up cloning the historical transactions into a local database, and indexing that for faster queries.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">export</span> <span class="kd">class</span> <span class="nc">MongoClient</span> <span class="p">{</span>
  <span class="nx">init</span> <span class="o">=</span> <span class="k">async </span><span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">log</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="dl">"</span><span class="s2">Connecting...</span><span class="dl">"</span><span class="p">);</span>
    <span class="k">await</span> <span class="nf">connect</span><span class="p">(</span><span class="dl">"</span><span class="s2">mongodb://127.0.0.1:27017</span><span class="dl">"</span><span class="p">,</span> <span class="p">{</span>
      <span class="na">keepAlive</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span>
      <span class="na">keepAliveInitialDelay</span><span class="p">:</span> <span class="mi">300000</span><span class="p">,</span>
      <span class="na">dbName</span><span class="p">:</span> <span class="dl">"</span><span class="s2">candymachine</span><span class="dl">"</span><span class="p">,</span>
      <span class="na">minPoolSize</span><span class="p">:</span> <span class="mi">50</span><span class="p">,</span>
      <span class="na">maxPoolSize</span><span class="p">:</span> <span class="mi">500</span><span class="p">,</span>
    <span class="p">});</span>
    <span class="nx">log</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="dl">"</span><span class="s2">Connected to mongo db</span><span class="dl">"</span><span class="p">);</span>
  <span class="p">};</span>

  <span class="nx">saveHashes</span> <span class="o">=</span> <span class="k">async </span><span class="p">(</span><span class="nx">hashes</span><span class="p">:</span> <span class="nx">object</span><span class="p">[])</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="nx">log</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="dl">"</span><span class="s2">Saving batch...</span><span class="dl">"</span><span class="p">);</span>
    <span class="k">try</span> <span class="p">{</span>
      <span class="k">await</span> <span class="nx">Txhashes</span><span class="p">.</span><span class="nf">insertMany</span><span class="p">(</span><span class="nx">hashes</span><span class="p">,</span> <span class="p">{</span> <span class="na">ordered</span><span class="p">:</span> <span class="kc">false</span> <span class="p">});</span>
    <span class="p">}</span> <span class="k">catch </span><span class="p">(</span><span class="na">e</span><span class="p">:</span> <span class="nx">any</span><span class="p">)</span> <span class="p">{</span>
      <span class="c1">// ignore dup key errors</span>
      <span class="k">if </span><span class="p">(</span><span class="o">!</span><span class="nx">e</span><span class="p">.</span><span class="nx">message</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="dl">"</span><span class="s2">E11000</span><span class="dl">"</span><span class="p">))</span> <span class="p">{</span>
        <span class="nx">log</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="nx">e</span><span class="p">);</span>
      <span class="p">}</span>
    <span class="p">}</span>
    <span class="kd">const</span> <span class="nx">count</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">Txhashes</span><span class="p">.</span><span class="nf">count</span><span class="p">();</span>
    <span class="nx">log</span><span class="p">.</span><span class="nf">debug</span><span class="p">(</span><span class="s2">`Saved batch - </span><span class="p">${</span><span class="nx">count</span><span class="p">}</span><span class="s2"> total documents`</span><span class="p">);</span>
  <span class="p">};</span>

  <span class="nx">getHashes</span> <span class="o">=</span> <span class="k">async </span><span class="p">(</span><span class="nx">filter</span><span class="p">:</span> <span class="nx">FilterQuery</span><span class="o">&lt;</span><span class="k">typeof</span> <span class="nx">Txhashes</span><span class="o">&gt;</span> <span class="o">=</span> <span class="p">{})</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="kd">const</span> <span class="nx">docs</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">Txhashes</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="nx">filter</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">25000</span><span class="p">);</span>
    <span class="k">return</span> <span class="nx">docs</span><span class="p">;</span>
  <span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I first cloned all the transaction hashes into Mongo - I set up a connection pool of various RPCs to accomplish this, as there’s no way of getting it from the Solana mainnet-beta RPC in a reasonable amount of time.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">history</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">con</span><span class="p">.</span><span class="nf">getSignaturesForAddress</span><span class="p">(</span>
  <span class="k">new</span> <span class="nc">PublicKey</span><span class="p">(</span><span class="nx">publicKey</span><span class="p">),</span>
  <span class="nx">options</span>
<span class="p">);</span>

<span class="k">for </span><span class="p">(</span><span class="kd">const</span> <span class="nx">c</span> <span class="k">of</span> <span class="nf">chunk</span><span class="p">(</span><span class="nx">history</span><span class="p">,</span> <span class="mi">100000</span><span class="p">))</span> <span class="p">{</span>
  <span class="k">await</span> <span class="nx">mc</span><span class="p">.</span><span class="nf">saveHashes</span><span class="p">(</span><span class="nx">c</span><span class="p">);</span>
  <span class="nx">log</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="dl">"</span><span class="s2">Completed chunk</span><span class="dl">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Then, after fetching all the hashes, I would clone the parsed transaction details into Mongo</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">run</span> <span class="o">=</span> <span class="k">async </span><span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span>
  <span class="k">await</span> <span class="nx">mc</span><span class="p">.</span><span class="nf">init</span><span class="p">();</span>
  <span class="nx">log</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="dl">"</span><span class="s2">Fetching hashes</span><span class="dl">"</span><span class="p">);</span>
  <span class="kd">let</span> <span class="nx">hashesToFetch</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">mc</span><span class="p">.</span><span class="nf">getHashes</span><span class="p">({</span> <span class="na">tx</span><span class="p">:</span> <span class="kc">null</span> <span class="p">});</span>
  <span class="nx">log</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="dl">"</span><span class="s2">Fetched hashes</span><span class="dl">"</span><span class="p">);</span>
  <span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="k">while </span><span class="p">(</span><span class="nx">hashesToFetch</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">let</span> <span class="nx">isNearEnd</span> <span class="o">=</span> <span class="nx">hashesToFetch</span><span class="p">.</span><span class="nx">length</span> <span class="o">&lt;</span> <span class="mi">10000</span><span class="p">;</span>
    <span class="kd">const</span> <span class="nx">chunkSize</span> <span class="o">=</span> <span class="nx">isNearEnd</span> <span class="p">?</span> <span class="mi">10</span> <span class="p">:</span> <span class="mi">200</span><span class="p">;</span>
    <span class="k">if </span><span class="p">(</span><span class="nx">isNearEnd</span><span class="p">)</span> <span class="p">{</span>
      <span class="nf">shuffle</span><span class="p">(</span><span class="nx">hashesToFetch</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="kd">const</span> <span class="nx">chunkedHistory</span> <span class="o">=</span> <span class="nf">chunk</span><span class="p">(</span><span class="nx">hashesToFetch</span><span class="p">,</span> <span class="nx">chunkSize</span><span class="p">)</span> <span class="k">as</span> <span class="nx">string</span><span class="p">[][];</span>
    <span class="kd">const</span> <span class="nx">processChunk</span> <span class="o">=</span> <span class="k">async </span><span class="p">(</span><span class="na">hashes</span><span class="p">:</span> <span class="nx">any</span><span class="p">[])</span> <span class="o">=&gt;</span> <span class="p">{</span>
      <span class="kd">let</span> <span class="nx">message</span> <span class="o">=</span> <span class="s2">`Fetched txs </span><span class="p">${</span><span class="nx">i</span><span class="p">}</span><span class="s2">`</span><span class="p">;</span>
      <span class="kd">let</span> <span class="nx">savedTxMessage</span> <span class="o">=</span> <span class="s2">`Saved txs </span><span class="p">${</span><span class="nx">i</span><span class="p">}</span><span class="s2">`</span><span class="p">;</span>
      <span class="nx">i</span><span class="o">++</span><span class="p">;</span>

      <span class="nx">console</span><span class="p">.</span><span class="nf">time</span><span class="p">(</span><span class="nx">message</span><span class="p">);</span>
      <span class="kd">const</span> <span class="nx">hashMap</span> <span class="o">=</span> <span class="p">{};</span>
      <span class="nx">hashes</span><span class="p">.</span><span class="nf">forEach</span><span class="p">((</span><span class="nx">hash</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
        <span class="k">if </span><span class="p">(</span><span class="nx">hash</span><span class="p">.</span><span class="nx">signature</span><span class="p">)</span> <span class="p">{</span>
          <span class="nx">hashMap</span><span class="p">[</span><span class="nx">hash</span><span class="p">.</span><span class="nx">signature</span><span class="p">]</span> <span class="o">=</span> <span class="nx">hash</span><span class="p">;</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
          <span class="nx">console</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="dl">"</span><span class="s2">what</span><span class="dl">"</span><span class="p">);</span>
        <span class="p">}</span>
      <span class="p">});</span>

      <span class="k">try</span> <span class="p">{</span>
        <span class="kd">const</span> <span class="p">{</span> <span class="nx">c</span><span class="p">,</span> <span class="na">tx</span><span class="p">:</span> <span class="nx">txs</span> <span class="p">}</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">fetchTxsWithFallbackWithConnection</span><span class="p">(</span>
          <span class="nx">hashes</span><span class="p">.</span><span class="nf">map</span><span class="p">((</span><span class="nx">p</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nx">p</span><span class="p">.</span><span class="nx">signature</span><span class="p">)</span>
        <span class="p">);</span>

        <span class="nx">console</span><span class="p">.</span><span class="nf">timeLog</span><span class="p">(</span><span class="nx">message</span><span class="p">,</span> <span class="nx">c</span><span class="p">?.</span><span class="nx">_rpcEndpoint</span><span class="p">);</span>
        <span class="nx">console</span><span class="p">.</span><span class="nf">timeEnd</span><span class="p">(</span><span class="nx">message</span><span class="p">);</span>
        <span class="kd">let</span> <span class="nx">failureCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="nx">txs</span><span class="p">.</span><span class="nf">forEach</span><span class="p">((</span><span class="nx">tx</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
          <span class="k">if </span><span class="p">(</span><span class="o">!</span><span class="nx">tx</span><span class="p">)</span> <span class="p">{</span>
            <span class="nx">failureCount</span><span class="o">++</span><span class="p">;</span>

            <span class="k">return</span><span class="p">;</span>
          <span class="p">}</span>
          <span class="nx">tx</span><span class="p">.</span><span class="nx">transaction</span><span class="p">.</span><span class="nx">signatures</span><span class="p">.</span><span class="nf">forEach</span><span class="p">((</span><span class="nx">s</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span>
            <span class="k">if </span><span class="p">(</span><span class="nx">hashMap</span><span class="p">[</span><span class="nx">s</span><span class="p">])</span> <span class="p">{</span>
              <span class="nx">hashMap</span><span class="p">[</span><span class="nx">s</span><span class="p">].</span><span class="nx">tx</span> <span class="o">=</span> <span class="nx">tx</span><span class="p">;</span>
            <span class="p">}</span>
          <span class="p">});</span>
        <span class="p">});</span>
        <span class="k">if </span><span class="p">(</span><span class="nx">failureCount</span><span class="p">)</span> <span class="p">{</span>
          <span class="nx">console</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="s2">`Invalid txs: </span><span class="p">${</span><span class="nx">failureCount</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="nx">console</span><span class="p">.</span><span class="nf">time</span><span class="p">(</span><span class="nx">savedTxMessage</span><span class="p">);</span>
        <span class="k">await</span> <span class="nx">Txhashes</span><span class="p">.</span><span class="nf">bulkSave</span><span class="p">(</span><span class="nx">hashes</span><span class="p">);</span>
        <span class="nx">console</span><span class="p">.</span><span class="nf">timeEnd</span><span class="p">(</span><span class="nx">savedTxMessage</span><span class="p">);</span>
      <span class="p">}</span> <span class="k">catch </span><span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span><span class="p">;</span>
      <span class="p">}</span>
    <span class="p">};</span>
    <span class="k">try</span> <span class="p">{</span>
      <span class="kd">const</span> <span class="nx">promises</span> <span class="o">=</span> <span class="nx">chunkedHistory</span><span class="p">.</span><span class="nf">map</span><span class="p">((</span><span class="nx">h</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nf">limit</span><span class="p">(()</span> <span class="o">=&gt;</span> <span class="nf">processChunk</span><span class="p">(</span><span class="nx">h</span><span class="p">)));</span>
      <span class="k">await</span> <span class="nb">Promise</span><span class="p">.</span><span class="nf">all</span><span class="p">(</span><span class="nx">promises</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">catch </span><span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">console</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="nx">e</span><span class="p">);</span>
      <span class="nx">console</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span>
        <span class="s2">`Chunk failed with total history length of </span><span class="p">${</span><span class="nx">hashesToFetch</span><span class="p">.</span><span class="nx">length</span><span class="p">}</span><span class="s2">`</span>
      <span class="p">);</span>
    <span class="p">}</span>
    <span class="nx">console</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">Finished chunk</span><span class="dl">"</span><span class="p">);</span>
    <span class="nx">log</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="dl">"</span><span class="s2">Fetching hashes</span><span class="dl">"</span><span class="p">);</span>
    <span class="nx">hashesToFetch</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">mc</span><span class="p">.</span><span class="nf">getHashes</span><span class="p">({</span> <span class="na">tx</span><span class="p">:</span> <span class="kc">null</span> <span class="p">});</span>
    <span class="nx">log</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="dl">"</span><span class="s2">Fetched hashes</span><span class="dl">"</span><span class="p">);</span>
  <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>I also pulled every transaction (legitimate and the attackers) that called the withdraw function on the candy machines.</p>

<p>Of the 14,800 candy machines, 11,848 have had the withdraw function executed on them. The top accounts associated with these functions are below.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/cm-top-withdrawers-50-82c537280.webp 50w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-100-82c537280.webp 100w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-200-82c537280.webp 200w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-400-82c537280.webp 400w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-800-82c537280.webp 800w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-1068-82c537280.webp 1068w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/cm-top-withdrawers-50-c833c8b40.png 50w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-100-c833c8b40.png 100w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-200-c833c8b40.png 200w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-400-c833c8b40.png 400w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-800-c833c8b40.png 800w, https://blog.jonlu.ca/images/generated/cm-top-withdrawers-1068-c833c8b40.png 1068w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/cm-top-withdrawers-800-c833c8b40.png" alt="Top withdrawer callers" width="1068" height="598" /></picture>

<p>Only <code class="language-plaintext highlighter-rouge">cHfYkrVAwfEoe3Mr2GbvzpNQJboDL6AiBoFZDsf8dxj</code> seems to be doing this maliciously - the other accounts are all calling legitimate withdraw functions.</p>

<p><code class="language-plaintext highlighter-rouge">F9fER1Cb8hmjapWGZDukzcEYshAUDbSFpbXkj9QuBaQj</code> actually seems to have created over 2,000 candy machines, and then attempted to call withdraw on them, single handedly creating ~14% of all candy machines on Solana.</p>

<h2 id="redacted-pending-vulnerability-disclosure">[Redacted Pending Vulnerability Disclosure]</h2>

<p>[Redacted pending vulnerability disclosure of Solana exchange]</p>

<h3 id="redacted-pending-vulnerability-disclosure-1">[Redacted Pending Vulnerability Disclosure]</h3>

<p>[Redacted pending vulnerability disclosure of Solana exchange]</p>

<h2 id="identified-projects">Identified Projects</h2>

<p>I went through and fetched the public keys to all known affected projects and tried to map them back to the hacked machines. I identified 334 unique projects that were actively listed on Magic Eden, Solanart, DigitalEyez, and Solsea as being affected.</p>

<p><span style="max-height: 400px; white-space: pre-line; overflow: auto; display: flex; border: 1px solid black; padding: 4px;">012Funksaicy
0xDRIP
100Radials
169 Pixel Gang
3D Flowers
A Pixel Art
Abstergo
Aeterna Civitas
Afrobubble
AI: Baby Bots
Aircrafts
AIRDOGS
Alphabet Originals
Angomon
Angry BaboonS
Angry Bunny Club
Angry Citizens
AngryWorms
Anti Artist Club
ApexDucks Halloween
Arcade ‘88
Art by NRG – Pop Sushi
Artificial Irrelevants
Autistic Reindeer Herd
Baby Ape Social Club
Baby Bait
Baby Frogs
Baby Goblin
Bad Bromatoes
BADBOYS
BalisariNFT
Bannurs
Bare Bones Society: The Kingdom Of Secrets
Beat Drops V1
Boat Boys
BoldFrames
Boopie Gen 1
Bored Ape Social Club
Boss Babes
Broken Robot Burger Bar
bugbearz
Bunny Warriors
CASSETS Audio
CatPunk
CATPUNK OG PASS
Cats Club NFT
Chickenz
ChihuahuaSol
Classic Art Mashups
Coherence
Combat Women NFT
Contrastive
Crazy Pickles
Creepy Girls
Crypto Greeks
Crypto Idolz - Faces
CryptoCream
CryptoCubs
CryptoCubs Mutants
Cryptone™
Cryptonic Creations
CryptoRock
CryptoTeds
CryptZero Season 2 - Ghosts
Cult of Meerkats
CyberKeys
Danger Valley
Danuki Dojo
Deadass
Deeps
Degen Ape Society
DegenEggs
Degeneggs - Gen 2
Desolates Metaverse
Dessert Girls
DigiLife
Dinos Zone
Dippy Dragons NFT
Doll Society
Dragon Eggs NFT
Dragon Slayerz
Dreamland Monkeys
Element Art NFT 2D
Enigma Expanses
enviro
Enviro
Epoch Labs
Ethereans
Eyeballz
Eyes WTF
Fallen Traveler
Fast Food Thugbirdz
Finefolk
Floppy Disk Nft
Flutter
Fractures
G.O.A.T. Collection
Galactic Goose
GAMEKIDZ
GANder g0
Gem Heroes
Ghostface
Ghostly Sols
Ghoulie Gang
Gremlins NFT
Hallow Birds
Happy Pups
Hellish Party Boollons
Hello World! NFT
Hemp
Heroes and Villains: PASS
High Roller Hippo Clique
HOAG play
Honorary Space Bums
Hot Bunnies NFT
Houses Of pixel
Iceland - 0xDRIP
Icy Bearz NFT
Idle Planets - Autumn 2021 Moon
Idle Planets - Holiday 2021 Moon
Infamous Apes
InnerMind
Intersolar
iTrading_Bot
Jelly Beasts
Jingle Monkeys
Joeian’s Collection
JOSEPHTAYLOR.ART: CRYSTAL BEAMS
Jungle Cats
Just Blocks
Kitten Coup
Knightdom
KoreanPunkz
Krunk Roach NFT
Kyoudai Academy: Solana Arcade Games
Labyrinth
Lanabots
Lazy Heroes
Little Noots
Los Cactus Hermanos
LuchaLucha NFT
Lucky Kittens
LuxAI
Mad Vikings Arms Collection
Magic Solana Shits
Make your own NFT
Mark McKenna’s Heroes &amp; Villains: Origins
Megumi
Meta Homes
Metadroids
Metakatz
metaSpheres
MetaSpheres
Mickey Degods
Millionaire Apes Club
Mindfolk
Mini Royale: Nations - Season 1 (Premium)
Mob Monkettez
MogulWars
Monkey Ball
MonkeyBall Gen Zero
My name Is Sol
My Name is Sol
Myopa
Mystic Potion
Neopets Metaverse
NFT Poetry
NFTrees Solana
NGMIPandas
Nifty Nanas
NON-FUNGIBLE “BEES”
Nyan Heroes
OGbottles
Oink Club NFT
OinkClub NFT
PEEPS
Pengu Love
Personify
Pesky Ice Cube
Phantom
Pilgrim Society
PIMP MY THUG
Pirates of Sol Bay - Bottles
Pirates of Sol Bay - Treasures
pitcrew
Pix World NFT
Pixel Island NFT
PixelWorms
PixWorldNFT
Platypusol Family
Playground Waves
Playground: Waves
PopsicleNFT
Posh Dolphs
Powder Heroes
Prickly Pete’s Platoon - OG Cactoon Series
PSY Network | PlanetZ
Pudgy Pigeons
Rabbit Punks
Realm Kings
Red paperclip
RowdyRex
Rug Toadz
Ruled by Randomness: The Genesis
SantaClaus
Savage Dray by Squeak Brigade
SavagesTotsys
SawBunny
Secret Duck Society
SGF United
Shadowy Super Coder
Shadowy Super Coder DAO
SharkBros
shatteredmarble
ShroomZ
Slimeballz
sLoot
Smileys
Smolpenguins
Snek Gang
soAlien
SocksOnSolana
Sol Diamond Hands
Sol Lions
SOL NFL PLAYER’S
SOL NFL Players
SOL Parasites
Sol Slugs
Sol Tamagotchi
Sol Tapes
SOLadies
Solagon
SolAlbums
Solamids
Solana Baby Monkey Business
Solana Bananas
Solana Birbs
Solana Bros
Solana Cat Gang
Solana Fan The Game of Squid
Solana Feline Business
Solana Havana Cigar Club
Solana Locks
Solana Mystery Box Items
Solana Mystery Items
Solana Pickles
Solana Reversed Monkey Business
Solana Robot Business
Solana Samurai Journey
Solana Slugs
SOLANA SUPERCAR CLUB
Solana Surfers
Solana Tactical RPG STACC
solanabets | The Clique NFT
SolArc NFT
Solarnauts: Mission Bravo
SolBlocks
Solbusters
Solccoons
SolCrocos
Soldalas
SolDice
SOLDIER RABBITS
SolEmoji
Solez
SolFoxes
SolGalaxy
SolGangsta
Solloons
Sollyfish
Solmon
Solmoverse: Collection 0
Solmushies
SOLNANA
SolNauts
SolOrbs
Solryx
SolSlimes
SolSneakers
Solsteads Surreal Estate
SolStoners
SolTowers
Solutions
Solvaders
SolWatchers
Soul Dogs
Soulofox
Space Bums
Space Bums: Galaxy Mint Pass
Spiderverse
Spirits of Solana
Squareheadz
Squid Society
Squirrelz
Stash
StratosNFT
Structs
Superballz
Surging Bulls
Synesthesia, by Labyrinth
Terrarium Tanks
Test Guys
Test Guys Item Outpost
The Assembly
The Baby Boogles
The Beverly Hills Car Club
The Collectoooooor
The Elementies
The Exiled Apes
The Nasty Boys
The Rock
theBULL by metaCOLLECTIVE
TheDragonClub
Thirsty Cactus Garden Party
Thoughtful Folk NFT
ThugDragonz
Tiny Tigers
Titanz
ToneBox
Undead Sols
Vale Unleashed Rel
Vampires of SOL
Vampires Of SOL
WallStreetPunkS
Wicked Pigeon Posse
Wieners
Wieners Club
Wildfire Native
Winter Tiny Tigers
Wolves On Wallstreet
WOOFers
World of Deities NFT
WUKONGSOL
Xperiment
</span></p>

<h2 id="bug-bounty">Bug Bounty</h2>

<p>In conjunction with this vulnerability research, Metaplex has launched a <a href="https://www.metaplex.com/posts/bug-bounty-blog">bug bounty program.</a></p>

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Earlier this week the <a href="https://twitter.com/metaplex?ref_src=twsrc%5Etfw">@Metaplex</a> Foundation announced a Bug Bounty program—our commitment to white-hat developers we’ve been spinning up for months.<br /><br />Our first contributor, <a href="https://twitter.com/jonluca?ref_src=twsrc%5Etfw">@jonluca</a>, uncovered a vulnerability in CMv1 back in January. More below. 👇 <a href="https://t.co/sq0cjtLtTj">https://t.co/sq0cjtLtTj</a></p>&mdash; Metaplex (@metaplex) <a href="https://twitter.com/metaplex/status/1504846982954762290?ref_src=twsrc%5Etfw">March 18, 2022</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

<h2 id="timeline">Timeline</h2>

<ul>
  <li>Dec 31st - <a href="https://github.com/metaplex-foundation/metaplex/commit/e9ef376443c3c8fd2f5b151dd0b09f757b1bf35c">Fix for CMv2 is landed</a></li>
  <li>Tue Jan 04 2022 05:57:11 GMT-0500 - <a href="https://solscan.io/tx/coSeMNsGKebMGP1vqPZcEbu6rYiF4BbCRrtBRNLFi4TbMo3Psd7KZyvDTPv6KyeqZNDyMVU3o6D3rgQPG1aV94J">First attacker transaction is executed</a></li>
  <li>Tue Jan 04 2022 06:20:29 GMT-0500 - <a href="https://solscan.io/tx/3zhZDtCV2vr5fSG2TxEjXXTdMmfk8rfnM4mNAavKdZM1Cy6627hN8vDnu7gaUk6oPmzLLcacJpTopK1bsscX9MbB">First transaction that tries to interact with the newly updated contracted is executed</a></li>
  <li>Tue Jan 04 2022 06:49:27 GMT-0500 - <a href="https://solscan.io/tx/3zhZDtCV2vr5fSG2TxEjXXTdMmfk8rfnM4mNAavKdZM1Cy6627hN8vDnu7gaUk6oPmzLLcacJpTopK1bsscX9MbB">Last transaction that tries to interact with the newly updated contracted is executed</a></li>
  <li>Tue Jan 06, 2022, 18:25 GMT-0500 - <a href="https://github.com/metaplex-foundation/metaplex/commit/4ddc13ea29070172f358e054baa9d4c47687a26b">Fix for CMv1 is landed</a></li>
  <li>Tue Jan 15, 2022, 21:15 GMT-0500 - Metaplex is alerted to this specific vulnerability.</li>
  <li>Fri Mar 11, 2022 - <a href="https://www.metaplex.com/posts/bug-bounty-blog">Metaplex bug bounty program is launched in conjunction with this post</a></li>
  <li>Fri Mar 18, 2022 - <a href="https://twitter.com/metaplex/status/1504846982954762290">Metaplex bug bounty for CMv1 is announced</a></li>
</ul>

<h2 id="appendix">Appendix</h2>

<p>This vulnerability was discussed in Discord’s and on Twitter but was not widely analyzed.</p>

<p>All the code for this research will be made public pending final vulnerability disclosures on various exchanges.</p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[On 1/4/22, nearly 4000 Solana NFT projects were drained of their funds due to a reinitialization bug present in the Candy Machine v1 smart contract on Solana. The account, cHfYkrVAwfEoe3Mr2GbvzpNQJboDL6AiBoFZDsf8dxj, converted 1,027 SOL into 155k USDC using Raydium, and then transferred the USDC into their FTX account. The vulnerability was patched while the attack was actively going on, at 6:20am on 1/4/22.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/candy-machine-withdrawal.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/candy-machine-withdrawal.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Snapshotting memory to scrape encrypted network requests</title><link href="https://blog.jonlu.ca/posts/heap-snapshot-scraping" rel="alternate" type="text/html" title="Snapshotting memory to scrape encrypted network requests" /><published>2021-08-08T10:06:09-04:00</published><updated>2021-08-08T10:06:09-04:00</updated><id>https://blog.jonlu.ca/posts/heap-snapshot-scraping</id><content type="html" xml:base="https://blog.jonlu.ca/posts/heap-snapshot-scraping"><![CDATA[<p>Most web reverse engineering focuses on two attack surfaces - either DOM scraping through something like puppeteer or beatifulsoup, or on MITM attacks to reverse network calls.</p>

<p>The former is what most traditional scraping looks like - you manually inspect a web page, you determine the right xpath/css selectors to follow, and then instruct a scraper to statically request the page and scrape it.</p>

<p>The rise of SPAs has made this approach a bit less impractical - you now have to actually render the content, which is is significantly slower, to have the content you want to scrape show up in the DOM.</p>

<p>At the same time, it made web scraping as a whole much easier - sites that used to be server rendered now have nice, machine readable routes that serve JSON. This has lead to tools like Burp Suite and Charles Proxy being coopted from their original use of finding security vulnerabilities to being primarily used for web scraping.</p>

<p>In this article I want to introduce a third, more niche attack surface for scraping web pages - scraping through memory snapshots and their respective dominator graph. It’s particularly applicable when the client is encrypting its network requests and responses.</p>

<h2 id="chrome-devtools">Chrome Devtools</h2>

<p>Devtools has a nice feature to detect memory leaks - the Memory tab.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/memory-50-b9ef5f4ee.webp 50w, https://blog.jonlu.ca/images/generated/memory-100-b9ef5f4ee.webp 100w, https://blog.jonlu.ca/images/generated/memory-200-b9ef5f4ee.webp 200w, https://blog.jonlu.ca/images/generated/memory-400-b9ef5f4ee.webp 400w, https://blog.jonlu.ca/images/generated/memory-794-b9ef5f4ee.webp 794w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/memory-50-c4f2c4dbc.png 50w, https://blog.jonlu.ca/images/generated/memory-100-c4f2c4dbc.png 100w, https://blog.jonlu.ca/images/generated/memory-200-c4f2c4dbc.png 200w, https://blog.jonlu.ca/images/generated/memory-400-c4f2c4dbc.png 400w, https://blog.jonlu.ca/images/generated/memory-794-c4f2c4dbc.png 794w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/memory-794-c4f2c4dbc.png" alt="Memory tab" width="794" height="234" /></picture>

<p>It will take a snapshot of the current tabs memory, and built a graph out of it to inspect.</p>

<p>The memory tab is mostly used to compare different memory snapshots - you take a snapshot, perform some actions on the page, take another snapshot, and look at everything that got allocated to see if there are any memory leaks.</p>

<p>Since it’s pretty much a full memory snapshot, it stores all strings and objects - which, for something like React or Vue, will include all props and (typically) network request responses.</p>

<h2 id="memory-snapshots">Memory snapshots</h2>

<p>Inspecting memory isn’t new in the reverse engineering space - for native apps, it’s one of the most common ways of finding out what an application is doing and how its internal logic is structured.</p>

<p>Within the web world, though, it’s hardly ever used. Most of the literature online is regarding finding memory leaks. All the tools and software around it are therefore built around that - they don’t care about the <em>contents</em> of the memory, they care about the origination source/allocator and the references/dominators to it.</p>

<h2 id="when-does-this-make-sense">When does this make sense?</h2>

<p>Using memory snapshots makes sense in two cases - 1), when the client encrypts the network response, and you don’t have the time or energy to reverse the bundle to understand how to decrypt it programatically (like in <a href="'/posts/decrypting-blind?ref=wsenr'">Blind’s case, which I did here</a>) and 2) when the client requests are machine readable, but there are multiple requests that get stitched together in memory to create the desired objects.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/encrypted-memory-50-046c30a31.webp 50w, https://blog.jonlu.ca/images/generated/encrypted-memory-100-046c30a31.webp 100w, https://blog.jonlu.ca/images/generated/encrypted-memory-200-046c30a31.webp 200w, https://blog.jonlu.ca/images/generated/encrypted-memory-400-046c30a31.webp 400w, https://blog.jonlu.ca/images/generated/encrypted-memory-800-046c30a31.webp 800w, https://blog.jonlu.ca/images/generated/encrypted-memory-1200-046c30a31.webp 1200w, https://blog.jonlu.ca/images/generated/encrypted-memory-1600-046c30a31.webp 1600w, https://blog.jonlu.ca/images/generated/encrypted-memory-2048-046c30a31.webp 2048w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/encrypted-memory-50-4279b5d2d.png 50w, https://blog.jonlu.ca/images/generated/encrypted-memory-100-4279b5d2d.png 100w, https://blog.jonlu.ca/images/generated/encrypted-memory-200-4279b5d2d.png 200w, https://blog.jonlu.ca/images/generated/encrypted-memory-400-4279b5d2d.png 400w, https://blog.jonlu.ca/images/generated/encrypted-memory-800-4279b5d2d.png 800w, https://blog.jonlu.ca/images/generated/encrypted-memory-1200-4279b5d2d.png 1200w, https://blog.jonlu.ca/images/generated/encrypted-memory-1600-4279b5d2d.png 1600w, https://blog.jonlu.ca/images/generated/encrypted-memory-2048-4279b5d2d.png 2048w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/encrypted-memory-800-4279b5d2d.png" alt="An encrypted memory response" width="2048" height="1206" /></picture>

<h2 id="reversing-the-spec">Reversing the spec</h2>

<p>Saving a memory snapshot generates a <code class="language-plaintext highlighter-rouge">heapsnapshot</code> file - this is a JSON file with a given format, that is not document anywhere as far as I can tell. These snapshots are the same in Node and in Chrome, and all documentation for both just tell you to upload it to the memory tab in a new devtools instance to view it.</p>

<p>Luckily the code for devtools is public, <a href="https://github.com/jonluca/javascript-heap-inspector">so I managed to reverse it and turn it into a CLI tool</a> - this accepts heapsnapshots as inputs, and can parse them into the graph.</p>

<h2 id="extracting-json">Extracting JSON</h2>

<p>At this point, you can just go through and try and parse every string in memory as JSON, and see what sticks around. This will be an easy way to find in memory JSON strings - however, these aren’t as common as you might think. The v8 compiler is pretty efficient in knowing what to keep in memory and what to discard, so it’ll typically just keep the parsed objects.</p>

<h2 id="reconstructing-objects">Reconstructing objects</h2>

<p>You can reconstruct objects in memory by traversing through the graph and stitching the nodes together - everything ends up being a primitive, and graph nodes can just contain properties, so they’re proxies for objects.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/blind-memory-obj-50-d259a1dac.webp 50w, https://blog.jonlu.ca/images/generated/blind-memory-obj-100-d259a1dac.webp 100w, https://blog.jonlu.ca/images/generated/blind-memory-obj-200-d259a1dac.webp 200w, https://blog.jonlu.ca/images/generated/blind-memory-obj-400-d259a1dac.webp 400w, https://blog.jonlu.ca/images/generated/blind-memory-obj-800-d259a1dac.webp 800w, https://blog.jonlu.ca/images/generated/blind-memory-obj-1200-d259a1dac.webp 1200w, https://blog.jonlu.ca/images/generated/blind-memory-obj-1600-d259a1dac.webp 1600w, https://blog.jonlu.ca/images/generated/blind-memory-obj-2400-d259a1dac.webp 2400w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/blind-memory-obj-50-58eb9dedd.png 50w, https://blog.jonlu.ca/images/generated/blind-memory-obj-100-58eb9dedd.png 100w, https://blog.jonlu.ca/images/generated/blind-memory-obj-200-58eb9dedd.png 200w, https://blog.jonlu.ca/images/generated/blind-memory-obj-400-58eb9dedd.png 400w, https://blog.jonlu.ca/images/generated/blind-memory-obj-800-58eb9dedd.png 800w, https://blog.jonlu.ca/images/generated/blind-memory-obj-1200-58eb9dedd.png 1200w, https://blog.jonlu.ca/images/generated/blind-memory-obj-1600-58eb9dedd.png 1600w, https://blog.jonlu.ca/images/generated/blind-memory-obj-2400-58eb9dedd.png 2400w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/blind-memory-obj-800-58eb9dedd.png" alt="Blind's object containing interesting information" width="3008" height="3206" /></picture>

<p>It’s useful to sort by retained size (which the code in the repo has a function for), and look at the heaviest objects - these are typically going to be your most interesting allocations, especailly if you perform an allocation recording.</p>

<h2 id="future-plans">Future plans</h2>

<p><a href="https://github.com/jonluca/javascript-heap-inspector">The code for the project can be found here, which will parse any chrome or node heapsnapshot.</a>. In part 2 I’m going to add the ability to fully reconstruct objects from this graph.</p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[Most web reverse engineering focuses on two attack surfaces - either DOM scraping through something like puppeteer or beatifulsoup, or on MITM attacks to reverse network calls.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/blind-memory-obj.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/blind-memory-obj.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">JavaScript gotchas</title><link href="https://blog.jonlu.ca/posts/javascript-facts" rel="alternate" type="text/html" title="JavaScript gotchas" /><published>2021-07-18T09:18:39-04:00</published><updated>2021-07-18T09:18:39-04:00</updated><id>https://blog.jonlu.ca/posts/javascript-facts</id><content type="html" xml:base="https://blog.jonlu.ca/posts/javascript-facts"><![CDATA[<p>Here are some interesting JavaScript facts that I’ve encountered over the last few years.</p>

<h1 id="functionlength">Function.length</h1>

<p>Call <code class="language-plaintext highlighter-rouge">Function.length</code> will return the number of arguments a function is expecting.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-funclength-50-4965e4fea.webp 50w, https://blog.jonlu.ca/images/generated/javascript-funclength-100-4965e4fea.webp 100w, https://blog.jonlu.ca/images/generated/javascript-funclength-200-4965e4fea.webp 200w, https://blog.jonlu.ca/images/generated/javascript-funclength-400-4965e4fea.webp 400w, https://blog.jonlu.ca/images/generated/javascript-funclength-800-4965e4fea.webp 800w, https://blog.jonlu.ca/images/generated/javascript-funclength-1200-4965e4fea.webp 1200w, https://blog.jonlu.ca/images/generated/javascript-funclength-1228-4965e4fea.webp 1228w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-funclength-50-a162bbdd6.png 50w, https://blog.jonlu.ca/images/generated/javascript-funclength-100-a162bbdd6.png 100w, https://blog.jonlu.ca/images/generated/javascript-funclength-200-a162bbdd6.png 200w, https://blog.jonlu.ca/images/generated/javascript-funclength-400-a162bbdd6.png 400w, https://blog.jonlu.ca/images/generated/javascript-funclength-800-a162bbdd6.png 800w, https://blog.jonlu.ca/images/generated/javascript-funclength-1200-a162bbdd6.png 1200w, https://blog.jonlu.ca/images/generated/javascript-funclength-1228-a162bbdd6.png 1228w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/javascript-funclength-800-a162bbdd6.png" alt="Javascript function length" width="1228" height="480" /></picture>

<p>The spread operator will not be included in the count.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-spreadlength-50-b70e3b3fd.webp 50w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-100-b70e3b3fd.webp 100w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-200-b70e3b3fd.webp 200w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-400-b70e3b3fd.webp 400w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-800-b70e3b3fd.webp 800w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-1200-b70e3b3fd.webp 1200w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-1294-b70e3b3fd.webp 1294w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-spreadlength-50-584aa1b95.png 50w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-100-584aa1b95.png 100w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-200-584aa1b95.png 200w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-400-584aa1b95.png 400w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-800-584aa1b95.png 800w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-1200-584aa1b95.png 1200w, https://blog.jonlu.ca/images/generated/javascript-spreadlength-1294-584aa1b95.png 1294w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/javascript-spreadlength-800-584aa1b95.png" alt="Javascript function length" width="1294" height="486" /></picture>

<h1 id="arraymapfunc">Array.map(func)</h1>

<p>when you call <code class="language-plaintext highlighter-rouge">Array.map(func)</code>, the mapped function gets called with 3 arguments, not just the value.</p>

<p>So for:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="dl">"</span><span class="s2">10</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">11</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">12</span><span class="dl">"</span><span class="p">].</span><span class="nf">map</span><span class="p">(</span><span class="nb">parseInt</span><span class="p">);</span>
</code></pre></div></div>

<p>You’d <em>expect</em> to get</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">];</span>
</code></pre></div></div>

<p>but in reality get</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="kc">NaN</span><span class="p">,</span> <span class="mi">1</span><span class="p">];</span>
</code></pre></div></div>

<p>This is because <code class="language-plaintext highlighter-rouge">.map(parseInt)</code> calls the function with three arguments - <code class="language-plaintext highlighter-rouge">(currentValue, index, array)</code>. This is normally not an issue, but becomes an issue when the mapped function takes additional arguments that do not correspond to the ones being passed in.</p>

<p><code class="language-plaintext highlighter-rouge">parseInt</code> takes in two arguments - <code class="language-plaintext highlighter-rouge">value, [, radix]</code>, and thus tries to parse <code class="language-plaintext highlighter-rouge">11</code> with radix <code class="language-plaintext highlighter-rouge">1</code>, which is <code class="language-plaintext highlighter-rouge">NaN</code></p>

<h1 id="values-are-truth-y-by-default">Values are truth-y by default</h1>

<p>The only falsey values are:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="nx">n</span><span class="p">,</span> <span class="dl">""</span><span class="p">,</span> <span class="dl">""</span><span class="p">,</span> <span class="kc">null</span><span class="p">,</span> <span class="kc">undefined</span><span class="p">,</span> <span class="kc">NaN</span><span class="p">,</span> <span class="kc">false</span><span class="p">];</span>
</code></pre></div></div>

<p><em>Everything</em> else is truthy - including <code class="language-plaintext highlighter-rouge">[]</code>, an empty Set(), and an empty object.</p>

<h1 id="null-comparisons-to-0">Null comparisons to 0</h1>

<p>I ran into a nasty bug once where a value I thought was guaranteed to be a number was actually explicitly set to null. I was doing a comparison with <code class="language-plaintext highlighter-rouge">0</code> and ran into this weird behavior:</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-null-50-c3e2f91bf.webp 50w, https://blog.jonlu.ca/images/generated/javascript-null-100-c3e2f91bf.webp 100w, https://blog.jonlu.ca/images/generated/javascript-null-200-c3e2f91bf.webp 200w, https://blog.jonlu.ca/images/generated/javascript-null-400-c3e2f91bf.webp 400w, https://blog.jonlu.ca/images/generated/javascript-null-486-c3e2f91bf.webp 486w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-null-50-0260a95f4.png 50w, https://blog.jonlu.ca/images/generated/javascript-null-100-0260a95f4.png 100w, https://blog.jonlu.ca/images/generated/javascript-null-200-0260a95f4.png 200w, https://blog.jonlu.ca/images/generated/javascript-null-400-0260a95f4.png 400w, https://blog.jonlu.ca/images/generated/javascript-null-486-0260a95f4.png 486w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/javascript-null-486-0260a95f4.png" alt="Javascript null comparison" width="486" height="392" /></picture>

<h1 id="arraysort-sorts-by-string-sequence-code">Array.sort sorts by string sequence code</h1>

<p>Call <code class="language-plaintext highlighter-rouge">.sort()</code> on an array of numbers will not sort them numerically. Which is perplexing</p>

<p>Null is not equal to zero, and is not greater than zero, but is greater than or equal to zero.</p>

<h1 id="stringreplace-only-replaces-the-first-instance-of-a-match">String.replace() only replaces the first instance of a match</h1>

<p>It’s almost a rite of passage for javascript developers to be bewildered that <code class="language-plaintext highlighter-rouge">String.replace</code> only replaces the first instance of a match in a string.</p>

<p>Thankfully in ES2021 we now have <code class="language-plaintext highlighter-rouge">String.replaceAll</code>, which behaves as you’d expect.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-replace-50-d66690e8e.webp 50w, https://blog.jonlu.ca/images/generated/javascript-replace-100-d66690e8e.webp 100w, https://blog.jonlu.ca/images/generated/javascript-replace-200-d66690e8e.webp 200w, https://blog.jonlu.ca/images/generated/javascript-replace-400-d66690e8e.webp 400w, https://blog.jonlu.ca/images/generated/javascript-replace-718-d66690e8e.webp 718w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/javascript-replace-50-212c5700b.png 50w, https://blog.jonlu.ca/images/generated/javascript-replace-100-212c5700b.png 100w, https://blog.jonlu.ca/images/generated/javascript-replace-200-212c5700b.png 200w, https://blog.jonlu.ca/images/generated/javascript-replace-400-212c5700b.png 400w, https://blog.jonlu.ca/images/generated/javascript-replace-718-212c5700b.png 718w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/javascript-replace-718-212c5700b.png" alt="Javascript replace" width="718" height="400" /></picture>

<h1 id="wtfjs">WTFJS</h1>

<p>Someone also pointed out <a href="https://wtfjs.com/">WTFJS.com</a> which is a site that has a lot more javascript oddities and examples.</p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[Here are some interesting JavaScript facts that I’ve encountered over the last few years.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/javascript-funclength.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/javascript-funclength.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Writing fast async HTTP requests in Python</title><link href="https://blog.jonlu.ca/posts/async-python-http" rel="alternate" type="text/html" title="Writing fast async HTTP requests in Python" /><published>2021-06-14T23:02:29-04:00</published><updated>2021-06-14T23:02:29-04:00</updated><id>https://blog.jonlu.ca/posts/async-python-http</id><content type="html" xml:base="https://blog.jonlu.ca/posts/async-python-http"><![CDATA[<p>I do a lot of web scraping in my spare time, and have been chasing down different formats and code snippets to make a large amount of network requests locally, with controls for rate limiting and error handling.</p>

<p>I’ve gone through a few generations - I’ll use this post to catalogue where I started and what I’m doing now. If you want to skip the post and just see the final code, <a href="https://gist.github.com/jonluca/14fe99be6204f34cbd61c950b0faf3b1">it can be found here.</a></p>

<h2 id="gen-1">Gen 1</h2>

<p>Generation one was trusty old <code class="language-plaintext highlighter-rouge">requests</code>. Need to make 10 requests? Wrap it in a for loop and make them iteratively.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">requests</span>

<span class="n">results</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
    <span class="n">resp</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">https://jsonplaceholder.typicode.com/todos/1</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">results</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">resp</span><span class="p">.</span><span class="nf">json</span><span class="p">()</span>
</code></pre></div></div>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen1-50-a2035196e.webp 50w, https://blog.jonlu.ca/images/generated/python-gen1-100-a2035196e.webp 100w, https://blog.jonlu.ca/images/generated/python-gen1-200-a2035196e.webp 200w, https://blog.jonlu.ca/images/generated/python-gen1-400-a2035196e.webp 400w, https://blog.jonlu.ca/images/generated/python-gen1-800-a2035196e.webp 800w, https://blog.jonlu.ca/images/generated/python-gen1-1200-a2035196e.webp 1200w, https://blog.jonlu.ca/images/generated/python-gen1-1600-a2035196e.webp 1600w, https://blog.jonlu.ca/images/generated/python-gen1-2252-a2035196e.webp 2252w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen1-50-1c7d6a956.png 50w, https://blog.jonlu.ca/images/generated/python-gen1-100-1c7d6a956.png 100w, https://blog.jonlu.ca/images/generated/python-gen1-200-1c7d6a956.png 200w, https://blog.jonlu.ca/images/generated/python-gen1-400-1c7d6a956.png 400w, https://blog.jonlu.ca/images/generated/python-gen1-800-1c7d6a956.png 800w, https://blog.jonlu.ca/images/generated/python-gen1-1200-1c7d6a956.png 1200w, https://blog.jonlu.ca/images/generated/python-gen1-1600-1c7d6a956.png 1600w, https://blog.jonlu.ca/images/generated/python-gen1-2252-1c7d6a956.png 2252w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-gen1-800-1c7d6a956.png" alt="Python Gen 1" width="2252" height="1530" /></picture>

<p>This isn’t bad - 40 requests in 2.8s, or 1 req/70ms. Not an issue at all. This is fine when you need to bruteforce a 3 digit passcode - you can get 1000 requests done in 70s. Not great, but fast enough, and no need for external libraries or any research.</p>

<p>As soon as you get to something in the 4 character range, though, this becomes unwieldy.</p>

<h2 id="gen-2">Gen 2</h2>

<p>The next step here was to find ways to make these requests using Threads. Spin off a native thread for each request, and let them run behind the scenes.</p>

<p>Set up a queue and pool to pull URLs from, and we’re good. The queue and Worker threads are defined pretty simply below:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">queue</span> <span class="kn">import</span> <span class="n">Queue</span>

<span class="kn">from</span> <span class="n">threading</span> <span class="kn">import</span> <span class="n">Thread</span>


<span class="k">class</span> <span class="nc">Worker</span><span class="p">(</span><span class="n">Thread</span><span class="p">):</span>
  <span class="sh">"""</span><span class="s"> Thread executing tasks from a given tasks queue </span><span class="sh">"""</span>

  <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">tasks</span><span class="p">):</span>
    <span class="n">Thread</span><span class="p">.</span><span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">)</span>
    <span class="n">self</span><span class="p">.</span><span class="n">tasks</span> <span class="o">=</span> <span class="n">tasks</span>
    <span class="n">self</span><span class="p">.</span><span class="n">daemon</span> <span class="o">=</span> <span class="bp">True</span>
    <span class="n">self</span><span class="p">.</span><span class="nf">start</span><span class="p">()</span>

  <span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
      <span class="n">func</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">kargs</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="nf">get</span><span class="p">()</span>
      <span class="k">try</span><span class="p">:</span>
        <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kargs</span><span class="p">)</span>
      <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="c1"># An exception happened in this thread
</span>        <span class="nf">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
      <span class="k">finally</span><span class="p">:</span>
        <span class="c1"># Mark this task as done, whether an exception happened or not
</span>        <span class="n">self</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="nf">task_done</span><span class="p">()</span>


<span class="k">class</span> <span class="nc">ThreadPool</span><span class="p">:</span>
  <span class="sh">"""</span><span class="s"> Pool of threads consuming tasks from a queue </span><span class="sh">"""</span>

  <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">num_threads</span><span class="p">):</span>
    <span class="n">self</span><span class="p">.</span><span class="n">tasks</span> <span class="o">=</span> <span class="nc">Queue</span><span class="p">(</span><span class="n">num_threads</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">num_threads</span><span class="p">):</span>
      <span class="nc">Worker</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">tasks</span><span class="p">)</span>

  <span class="k">def</span> <span class="nf">add_task</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">func</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kargs</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s"> Add a task to the queue </span><span class="sh">"""</span>
    <span class="n">self</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="nf">put</span><span class="p">((</span><span class="n">func</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">kargs</span><span class="p">))</span>

  <span class="k">def</span> <span class="nf">map</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">func</span><span class="p">,</span> <span class="n">args_list</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s"> Add a list of tasks to the queue </span><span class="sh">"""</span>
    <span class="k">for</span> <span class="n">args</span> <span class="ow">in</span> <span class="n">args_list</span><span class="p">:</span>
      <span class="n">self</span><span class="p">.</span><span class="nf">add_task</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>

  <span class="k">def</span> <span class="nf">wait_completion</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s"> Wait for completion of all the tasks in the queue </span><span class="sh">"""</span>
    <span class="n">self</span><span class="p">.</span><span class="n">tasks</span><span class="p">.</span><span class="nf">join</span><span class="p">()</span>
</code></pre></div></div>

<p>And the actual query code is fairly straightforward - just define a function that’ll populate a global variable using some unique ID, and have it make the request off in its own thread.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">urls</span> <span class="o">=</span> <span class="p">[</span><span class="sa">f</span><span class="sh">"</span><span class="s">https://jsonplaceholder.typicode.com/todos/</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="sh">"</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">40</span><span class="p">)]</span>
<span class="n">pool</span> <span class="o">=</span> <span class="nc">ThreadPool</span><span class="p">(</span><span class="mi">40</span><span class="p">)</span>

<span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">session</span><span class="p">()</span>


<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
    <span class="n">resp</span> <span class="o">=</span> <span class="n">r</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
    <span class="n">results</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">resp</span><span class="p">.</span><span class="nf">json</span><span class="p">()</span>


<span class="n">pool</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="n">get</span><span class="p">,</span> <span class="n">urls</span><span class="p">)</span>
<span class="n">pool</span><span class="p">.</span><span class="nf">wait_completion</span><span class="p">()</span>
</code></pre></div></div>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen2-50-5f6670d38.webp 50w, https://blog.jonlu.ca/images/generated/python-gen2-100-5f6670d38.webp 100w, https://blog.jonlu.ca/images/generated/python-gen2-200-5f6670d38.webp 200w, https://blog.jonlu.ca/images/generated/python-gen2-400-5f6670d38.webp 400w, https://blog.jonlu.ca/images/generated/python-gen2-800-5f6670d38.webp 800w, https://blog.jonlu.ca/images/generated/python-gen2-1200-5f6670d38.webp 1200w, https://blog.jonlu.ca/images/generated/python-gen2-1600-5f6670d38.webp 1600w, https://blog.jonlu.ca/images/generated/python-gen2-2222-5f6670d38.webp 2222w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen2-50-387b51a76.png 50w, https://blog.jonlu.ca/images/generated/python-gen2-100-387b51a76.png 100w, https://blog.jonlu.ca/images/generated/python-gen2-200-387b51a76.png 200w, https://blog.jonlu.ca/images/generated/python-gen2-400-387b51a76.png 400w, https://blog.jonlu.ca/images/generated/python-gen2-800-387b51a76.png 800w, https://blog.jonlu.ca/images/generated/python-gen2-1200-387b51a76.png 1200w, https://blog.jonlu.ca/images/generated/python-gen2-1600-387b51a76.png 1600w, https://blog.jonlu.ca/images/generated/python-gen2-2222-387b51a76.png 2222w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-gen2-800-387b51a76.png" alt="Python Gen 2" width="2222" height="2314" /></picture>

<p>Now we’re getting somewhere - 40 requests in 365ms, or 9.125ms per request. The same 1000 requests that would’ve taken 1m10s earlier now finishes in a little over nine seconds, or just about a 7x speed up. Not bad for a pretty naive implementation of threading. Can we get it even faster, though?</p>

<h2 id="gen-3">Gen 3</h2>

<p>A few years back I was introduced to the library <code class="language-plaintext highlighter-rouge">aiohttp</code> - which is <a href="https://docs.aiohttp.org/en/stable/">Asynchronous HTTP Client/Server for asyncio and Python.</a> This leverages the new (at the time) async capabilities of python and lose the actual overhead of <code class="language-plaintext highlighter-rouge">Thread</code>s.</p>

<p>There is a bit of monkey patching I’ve had to do to make it work with all the various request types - specifically around its conformance to the cookie spec and to get it to work properly in jupyter notebook, where I like to play around with a lot of network requests.</p>

<p>Once we start looking at a pool of thousands of requests, we also want to be able to throttle ourselves - our laptops can only open so many TCP connections at once, and fire off so many bits in a given second. I defined a helper called <code class="language-plaintext highlighter-rouge">gather_with_concurrency</code> - it’s a way of using <code class="language-plaintext highlighter-rouge">asyncio</code>s gather with a semaphore, to limit the amount of tasks we work on in any given second.</p>

<p>This slows us down a bit due to the semaphore overhead, but if you’re doing anything in the alphanumeric 4 digit space you’re looking at 36^4, or 1.7m requests, which should probably be pooled and throttled a bit, and the overhead from the semaphore is worth it.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">gather_with_concurrency</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="o">*</span><span class="n">tasks</span><span class="p">):</span>
    <span class="n">semaphore</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="nc">Semaphore</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>

    <span class="k">async</span> <span class="k">def</span> <span class="nf">sem_task</span><span class="p">(</span><span class="n">task</span><span class="p">):</span>
        <span class="k">async</span> <span class="k">with</span> <span class="n">semaphore</span><span class="p">:</span>
            <span class="k">return</span> <span class="k">await</span> <span class="n">task</span>

    <span class="k">return</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">gather</span><span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="nf">sem_task</span><span class="p">(</span><span class="n">task</span><span class="p">)</span> <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">tasks</span><span class="p">))</span>
</code></pre></div></div>

<p>Next we set up the connector and the custom session</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">conn</span> <span class="o">=</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">TCPConnector</span><span class="p">(</span><span class="n">limit</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">ttl_dns_cache</span><span class="o">=</span><span class="mi">300</span><span class="p">)</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">ClientSession</span><span class="p">(</span><span class="n">connector</span><span class="o">=</span><span class="n">conn</span><span class="p">)</span>
</code></pre></div></div>

<p>Here we limit each host to none - it won’t throttle itself internally, since we want to control that externally. We also bump up the dns cache TTL. For the purposes of this blog post this won’t matter, but by default it’s 10s, which saves us from the occasional DNS query.</p>

<p>Finally we define our actual <code class="language-plaintext highlighter-rouge">async</code> function, which should look pretty familiar if you’re already used to <code class="language-plaintext highlighter-rouge">requests</code>. We also disable SSL verification for that slight speed boost as well.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">ssl</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
        <span class="n">obj</span> <span class="o">=</span> <span class="k">await</span> <span class="n">response</span><span class="p">.</span><span class="nf">read</span><span class="p">()</span>
        <span class="n">all_offers</span><span class="p">[</span><span class="n">url</span><span class="p">]</span> <span class="o">=</span> <span class="n">obj</span>
</code></pre></div></div>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen3-50-12157efc9.webp 50w, https://blog.jonlu.ca/images/generated/python-gen3-100-12157efc9.webp 100w, https://blog.jonlu.ca/images/generated/python-gen3-200-12157efc9.webp 200w, https://blog.jonlu.ca/images/generated/python-gen3-400-12157efc9.webp 400w, https://blog.jonlu.ca/images/generated/python-gen3-800-12157efc9.webp 800w, https://blog.jonlu.ca/images/generated/python-gen3-1200-12157efc9.webp 1200w, https://blog.jonlu.ca/images/generated/python-gen3-1600-12157efc9.webp 1600w, https://blog.jonlu.ca/images/generated/python-gen3-2256-12157efc9.webp 2256w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen3-50-f896bc132.png 50w, https://blog.jonlu.ca/images/generated/python-gen3-100-f896bc132.png 100w, https://blog.jonlu.ca/images/generated/python-gen3-200-f896bc132.png 200w, https://blog.jonlu.ca/images/generated/python-gen3-400-f896bc132.png 400w, https://blog.jonlu.ca/images/generated/python-gen3-800-f896bc132.png 800w, https://blog.jonlu.ca/images/generated/python-gen3-1200-f896bc132.png 1200w, https://blog.jonlu.ca/images/generated/python-gen3-1600-f896bc132.png 1600w, https://blog.jonlu.ca/images/generated/python-gen3-2256-f896bc132.png 2256w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-gen3-800-f896bc132.png" alt="Python Gen 3" width="2256" height="1232" /></picture>

<p>Now we’re really going! 40 requests in 100ms, or 4ms per requests. We can do about 250 requests per second - however, at this speed, the overhead of the initial function set up and jupyter notebook is actually a significant portion of the overall cost.</p>

<p>If we bump it up to 4000 requests, we see that we actually get closer to a 1.574s execution time, or about 56% of the time it took us to make 10 requests iteratively.</p>

<p>We can make one request every 0.393ms, or 393 microseconds. We can blast through the entire alphanumeric space for a 4 character permutation in about 660 million microseconds, or 11 minutes, all from my MacBook pro.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-4k-optimal-50-69a67d2a9.webp 50w, https://blog.jonlu.ca/images/generated/python-4k-optimal-100-69a67d2a9.webp 100w, https://blog.jonlu.ca/images/generated/python-4k-optimal-200-69a67d2a9.webp 200w, https://blog.jonlu.ca/images/generated/python-4k-optimal-400-69a67d2a9.webp 400w, https://blog.jonlu.ca/images/generated/python-4k-optimal-800-69a67d2a9.webp 800w, https://blog.jonlu.ca/images/generated/python-4k-optimal-1200-69a67d2a9.webp 1200w, https://blog.jonlu.ca/images/generated/python-4k-optimal-1600-69a67d2a9.webp 1600w, https://blog.jonlu.ca/images/generated/python-4k-optimal-1604-69a67d2a9.webp 1604w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-4k-optimal-50-f1d25b382.png 50w, https://blog.jonlu.ca/images/generated/python-4k-optimal-100-f1d25b382.png 100w, https://blog.jonlu.ca/images/generated/python-4k-optimal-200-f1d25b382.png 200w, https://blog.jonlu.ca/images/generated/python-4k-optimal-400-f1d25b382.png 400w, https://blog.jonlu.ca/images/generated/python-4k-optimal-800-f1d25b382.png 800w, https://blog.jonlu.ca/images/generated/python-4k-optimal-1200-f1d25b382.png 1200w, https://blog.jonlu.ca/images/generated/python-4k-optimal-1600-f1d25b382.png 1600w, https://blog.jonlu.ca/images/generated/python-4k-optimal-1604-f1d25b382.png 1604w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-4k-optimal-800-f1d25b382.png" alt="Python Gen 3 w/ 4000 requests" width="1604" height="1322" /></picture>

<p>Our Threading implementation also benefits from the increase in pool - giving it a 100 threads (the same as the semaphore in the asyncio version) gives us a time to completion of 8s for 4000 urls, or a little over 2ms per URL. Nowhere near our aiohttp implementation but not terrible either.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-50-aff3ae3f0.webp 50w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-100-aff3ae3f0.webp 100w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-200-aff3ae3f0.webp 200w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-400-aff3ae3f0.webp 400w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-800-aff3ae3f0.webp 800w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-1200-aff3ae3f0.webp 1200w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-1600-aff3ae3f0.webp 1600w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-1687-aff3ae3f0.webp 1687w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-50-d4a0286a4.png 50w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-100-d4a0286a4.png 100w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-200-d4a0286a4.png 200w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-400-d4a0286a4.png 400w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-800-d4a0286a4.png 800w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-1200-d4a0286a4.png 1200w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-1600-d4a0286a4.png 1600w, https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-1687-d4a0286a4.png 1687w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-gen3-4k-threads-800-d4a0286a4.png" alt="ThreadPool at 4000 requests w/ 100 threads" width="1687" height="682" /></picture>

<p>The same 36^4 requests using the ThreadPool would take 48 minutes, though.</p>

<p>We can clean up the code and optimize it slightly as well:</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">sys</span>
<span class="kn">import</span> <span class="n">os</span>
<span class="kn">import</span> <span class="n">json</span>
<span class="kn">import</span> <span class="n">asyncio</span>
<span class="kn">import</span> <span class="n">aiohttp</span>


<span class="c1"># Initialize connection pool
</span><span class="n">conn</span> <span class="o">=</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">TCPConnector</span><span class="p">(</span><span class="n">limit_per_host</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">ttl_dns_cache</span><span class="o">=</span><span class="mi">300</span><span class="p">)</span>
<span class="n">PARALLEL_REQUESTS</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">urls</span> <span class="o">=</span> <span class="p">[</span><span class="sh">'</span><span class="s">https://jsonplaceholder.typicode.com/todos/1</span><span class="sh">'</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">4000</span><span class="p">)]</span> <span class="c1">#array of urls
</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">gather_with_concurrency</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
    <span class="n">semaphore</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="nc">Semaphore</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
    <span class="n">session</span> <span class="o">=</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">ClientSession</span><span class="p">(</span><span class="n">connector</span><span class="o">=</span><span class="n">conn</span><span class="p">)</span>

    <span class="c1"># heres the logic for the generator
</span>    <span class="k">async</span> <span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
        <span class="k">async</span> <span class="k">with</span> <span class="n">semaphore</span><span class="p">:</span>
            <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">ssl</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
                <span class="n">obj</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="nf">loads</span><span class="p">(</span><span class="k">await</span> <span class="n">response</span><span class="p">.</span><span class="nf">read</span><span class="p">())</span>
                <span class="n">results</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">gather</span><span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">))</span>
    <span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">get_event_loop</span><span class="p">()</span>
<span class="n">loop</span><span class="p">.</span><span class="nf">run_until_complete</span><span class="p">(</span><span class="nf">gather_with_concurrency</span><span class="p">(</span><span class="n">PARALLEL_REQUESTS</span><span class="p">))</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Completed </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">urls</span><span class="p">)</span><span class="si">}</span><span class="s"> requests with </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="si">}</span><span class="s"> results</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h2 id="optimal-semaphore-size">Optimal semaphore size?</h2>

<p>If we bump our concurrent requests to 4k we see a drastic loss in performance.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-4k-semaphore-50-ccd3f58fa.webp 50w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-100-ccd3f58fa.webp 100w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-200-ccd3f58fa.webp 200w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-400-ccd3f58fa.webp 400w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-800-ccd3f58fa.webp 800w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-1200-ccd3f58fa.webp 1200w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-1600-ccd3f58fa.webp 1600w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-2256-ccd3f58fa.webp 2256w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-4k-semaphore-50-2539f6349.png 50w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-100-2539f6349.png 100w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-200-2539f6349.png 200w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-400-2539f6349.png 400w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-800-2539f6349.png 800w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-1200-2539f6349.png 1200w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-1600-2539f6349.png 1600w, https://blog.jonlu.ca/images/generated/python-4k-semaphore-2256-2539f6349.png 2256w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-4k-semaphore-800-2539f6349.png" alt="Python Gen 3 semaphore w/ 4000 concurrent requests" width="2256" height="334" /></picture>

<p>This is nearly a 3x slow down due to resource contention issues locally.</p>

<p>The optimal number will depend on your host - beefier set ups will have higher concurrency limits, and if you’re running this on a remote host on something like digital ocean you can crank this number up quite a bit.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-semaphore-optimal-50-54b2e7ae8.webp 50w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-100-54b2e7ae8.webp 100w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-200-54b2e7ae8.webp 200w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-400-54b2e7ae8.webp 400w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-800-54b2e7ae8.webp 800w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-1200-54b2e7ae8.webp 1200w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-1600-54b2e7ae8.webp 1600w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-1660-54b2e7ae8.webp 1660w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-semaphore-optimal-50-6125f332b.png 50w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-100-6125f332b.png 100w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-200-6125f332b.png 200w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-400-6125f332b.png 400w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-800-6125f332b.png 800w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-1200-6125f332b.png 1200w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-1600-6125f332b.png 1600w, https://blog.jonlu.ca/images/generated/python-semaphore-optimal-1660-6125f332b.png 1660w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-semaphore-optimal-800-6125f332b.png" alt="Finding the optimal semaphore value" width="1660" height="648" /></picture>

<p>Interestingly enough the optimal semaphore value was right around 60.</p>

<p>I mostly do this locally at home, though, for my side projects - introducing parallelization and multiple hosts can get you numbers that are an order of magnituded better than this, but the purpose of this excercise is seeing what we can hit on a local machine with jupyter notebook.</p>

<h2 id="gen-4">Gen 4</h2>

<p>Know of a faster implementation of an HTTP library that is stateful and works locally? <a href="https://twitter.com/jonluca">Let me know!</a></p>

<h2 id="httpx">HTTPX</h2>

<p><strong>Update - 6/17/21</strong></p>

<p>A <a href="https://lobste.rs/s/fxpne4/writing_fast_async_http_requests_python#c_pr0lfr">poster on lobste.rs</a> said that I should try out httpx. HTTPX is a modern implementation of a python web client.</p>

<p>Unfortunately, in my testing, it was strictly slower than aiohttp. I used their async library with the same sempahore restricting the number of processes ran, but it was still slower.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-httpx-50-cf0ad06b2.webp 50w, https://blog.jonlu.ca/images/generated/python-httpx-100-cf0ad06b2.webp 100w, https://blog.jonlu.ca/images/generated/python-httpx-200-cf0ad06b2.webp 200w, https://blog.jonlu.ca/images/generated/python-httpx-400-cf0ad06b2.webp 400w, https://blog.jonlu.ca/images/generated/python-httpx-800-cf0ad06b2.webp 800w, https://blog.jonlu.ca/images/generated/python-httpx-1200-cf0ad06b2.webp 1200w, https://blog.jonlu.ca/images/generated/python-httpx-1600-cf0ad06b2.webp 1600w, https://blog.jonlu.ca/images/generated/python-httpx-2236-cf0ad06b2.webp 2236w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-httpx-50-79e95b27e.png 50w, https://blog.jonlu.ca/images/generated/python-httpx-100-79e95b27e.png 100w, https://blog.jonlu.ca/images/generated/python-httpx-200-79e95b27e.png 200w, https://blog.jonlu.ca/images/generated/python-httpx-400-79e95b27e.png 400w, https://blog.jonlu.ca/images/generated/python-httpx-800-79e95b27e.png 800w, https://blog.jonlu.ca/images/generated/python-httpx-1200-79e95b27e.png 1200w, https://blog.jonlu.ca/images/generated/python-httpx-1600-79e95b27e.png 1600w, https://blog.jonlu.ca/images/generated/python-httpx-2236-79e95b27e.png 2236w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-httpx-800-79e95b27e.png" alt="HTTPX speeds" width="2236" height="1100" /></picture>

<p>I also tried a native gather, with punting the concurrency down to the library - this did not help either.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-httpx-gather-50-75eb366ce.webp 50w, https://blog.jonlu.ca/images/generated/python-httpx-gather-100-75eb366ce.webp 100w, https://blog.jonlu.ca/images/generated/python-httpx-gather-200-75eb366ce.webp 200w, https://blog.jonlu.ca/images/generated/python-httpx-gather-400-75eb366ce.webp 400w, https://blog.jonlu.ca/images/generated/python-httpx-gather-800-75eb366ce.webp 800w, https://blog.jonlu.ca/images/generated/python-httpx-gather-1200-75eb366ce.webp 1200w, https://blog.jonlu.ca/images/generated/python-httpx-gather-1600-75eb366ce.webp 1600w, https://blog.jonlu.ca/images/generated/python-httpx-gather-2220-75eb366ce.webp 2220w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/python-httpx-gather-50-02efd2f32.png 50w, https://blog.jonlu.ca/images/generated/python-httpx-gather-100-02efd2f32.png 100w, https://blog.jonlu.ca/images/generated/python-httpx-gather-200-02efd2f32.png 200w, https://blog.jonlu.ca/images/generated/python-httpx-gather-400-02efd2f32.png 400w, https://blog.jonlu.ca/images/generated/python-httpx-gather-800-02efd2f32.png 800w, https://blog.jonlu.ca/images/generated/python-httpx-gather-1200-02efd2f32.png 1200w, https://blog.jonlu.ca/images/generated/python-httpx-gather-1600-02efd2f32.png 1600w, https://blog.jonlu.ca/images/generated/python-httpx-gather-2220-02efd2f32.png 2220w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/python-httpx-gather-800-02efd2f32.png" alt="HTTPX without a semaphore" width="2220" height="422" /></picture>

<h2 id="pycurl">PyCurl</h2>

<p>Someone <a href="https://lobste.rs/s/fxpne4/writing_fast_async_http_requests_python">on that same lobste.er’s thread</a> suggested <a href="http://pycurl.io/">pycurl</a>.</p>

<p>PyCurl is different in that it feels like a pretty raw wrapper to <code class="language-plaintext highlighter-rouge">curl</code>. Writing the code felt more like dispatching actions and opening sockets than dealing with a nice http library.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/pycurl-50-7a46152cc.webp 50w, https://blog.jonlu.ca/images/generated/pycurl-100-7a46152cc.webp 100w, https://blog.jonlu.ca/images/generated/pycurl-200-7a46152cc.webp 200w, https://blog.jonlu.ca/images/generated/pycurl-400-7a46152cc.webp 400w, https://blog.jonlu.ca/images/generated/pycurl-800-7a46152cc.webp 800w, https://blog.jonlu.ca/images/generated/pycurl-1200-7a46152cc.webp 1200w, https://blog.jonlu.ca/images/generated/pycurl-1600-7a46152cc.webp 1600w, https://blog.jonlu.ca/images/generated/pycurl-2040-7a46152cc.webp 2040w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/pycurl-50-2a512e395.png 50w, https://blog.jonlu.ca/images/generated/pycurl-100-2a512e395.png 100w, https://blog.jonlu.ca/images/generated/pycurl-200-2a512e395.png 200w, https://blog.jonlu.ca/images/generated/pycurl-400-2a512e395.png 400w, https://blog.jonlu.ca/images/generated/pycurl-800-2a512e395.png 800w, https://blog.jonlu.ca/images/generated/pycurl-1200-2a512e395.png 1200w, https://blog.jonlu.ca/images/generated/pycurl-1600-2a512e395.png 1600w, https://blog.jonlu.ca/images/generated/pycurl-2040-2a512e395.png 2040w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/pycurl-800-2a512e395.png" alt="PyCurl implementation" width="2040" height="2446" /></picture>

<p>The results were impressive, but the aiohttp library was <em>still faster</em>. This was my first time writing a pycurl implementation, though, based on <a href="https://www.programmersought.com/article/26191793705/">this template</a> - introducing native threading might be able to speed it up, but I still haven’t seen anything faster than the 393 microseconds approach.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/pycurl-results-50-316373ddf.webp 50w, https://blog.jonlu.ca/images/generated/pycurl-results-100-316373ddf.webp 100w, https://blog.jonlu.ca/images/generated/pycurl-results-200-316373ddf.webp 200w, https://blog.jonlu.ca/images/generated/pycurl-results-400-316373ddf.webp 400w, https://blog.jonlu.ca/images/generated/pycurl-results-800-316373ddf.webp 800w, https://blog.jonlu.ca/images/generated/pycurl-results-1200-316373ddf.webp 1200w, https://blog.jonlu.ca/images/generated/pycurl-results-1600-316373ddf.webp 1600w, https://blog.jonlu.ca/images/generated/pycurl-results-2234-316373ddf.webp 2234w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/pycurl-results-50-f769397b6.png 50w, https://blog.jonlu.ca/images/generated/pycurl-results-100-f769397b6.png 100w, https://blog.jonlu.ca/images/generated/pycurl-results-200-f769397b6.png 200w, https://blog.jonlu.ca/images/generated/pycurl-results-400-f769397b6.png 400w, https://blog.jonlu.ca/images/generated/pycurl-results-800-f769397b6.png 800w, https://blog.jonlu.ca/images/generated/pycurl-results-1200-f769397b6.png 1200w, https://blog.jonlu.ca/images/generated/pycurl-results-1600-f769397b6.png 1600w, https://blog.jonlu.ca/images/generated/pycurl-results-2234-f769397b6.png 2234w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/pycurl-results-800-f769397b6.png" alt="PyCurl results" width="2234" height="614" /></picture>

<p>If you know how to set up HTTPX or PyCurl in a way that’s faster let me know!</p>

<h2 id="uvloop">UVLoop</h2>

<p>Addendum: 8/27/21 - I received an email from Steve telling me about uvloop, a faster, drop in replacement for asyncio’s event loop.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/uvloop-email-50-2f2029978.webp 50w, https://blog.jonlu.ca/images/generated/uvloop-email-100-2f2029978.webp 100w, https://blog.jonlu.ca/images/generated/uvloop-email-200-2f2029978.webp 200w, https://blog.jonlu.ca/images/generated/uvloop-email-400-2f2029978.webp 400w, https://blog.jonlu.ca/images/generated/uvloop-email-800-2f2029978.webp 800w, https://blog.jonlu.ca/images/generated/uvloop-email-1200-2f2029978.webp 1200w, https://blog.jonlu.ca/images/generated/uvloop-email-1404-2f2029978.webp 1404w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/uvloop-email-50-a6b2381be.png 50w, https://blog.jonlu.ca/images/generated/uvloop-email-100-a6b2381be.png 100w, https://blog.jonlu.ca/images/generated/uvloop-email-200-a6b2381be.png 200w, https://blog.jonlu.ca/images/generated/uvloop-email-400-a6b2381be.png 400w, https://blog.jonlu.ca/images/generated/uvloop-email-800-a6b2381be.png 800w, https://blog.jonlu.ca/images/generated/uvloop-email-1200-a6b2381be.png 1200w, https://blog.jonlu.ca/images/generated/uvloop-email-1404-a6b2381be.png 1404w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/uvloop-email-800-a6b2381be.png" alt="uvloop email" width="1404" height="592" /></picture>

<p>It doesn’t seem to have impacted the performance all that much - it did have lower variance, though. Across multiple runs of a regular asyncio event loop, I would get as high as 3s for the same 4000 requests; with uvloop, it never broke 2.1s.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/uvloop-50-421b9ea0a.webp 50w, https://blog.jonlu.ca/images/generated/uvloop-100-421b9ea0a.webp 100w, https://blog.jonlu.ca/images/generated/uvloop-200-421b9ea0a.webp 200w, https://blog.jonlu.ca/images/generated/uvloop-400-421b9ea0a.webp 400w, https://blog.jonlu.ca/images/generated/uvloop-800-421b9ea0a.webp 800w, https://blog.jonlu.ca/images/generated/uvloop-1200-421b9ea0a.webp 1200w, https://blog.jonlu.ca/images/generated/uvloop-1600-421b9ea0a.webp 1600w, https://blog.jonlu.ca/images/generated/uvloop-1662-421b9ea0a.webp 1662w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/uvloop-50-ddc9040a2.png 50w, https://blog.jonlu.ca/images/generated/uvloop-100-ddc9040a2.png 100w, https://blog.jonlu.ca/images/generated/uvloop-200-ddc9040a2.png 200w, https://blog.jonlu.ca/images/generated/uvloop-400-ddc9040a2.png 400w, https://blog.jonlu.ca/images/generated/uvloop-800-ddc9040a2.png 800w, https://blog.jonlu.ca/images/generated/uvloop-1200-ddc9040a2.png 1200w, https://blog.jonlu.ca/images/generated/uvloop-1600-ddc9040a2.png 1600w, https://blog.jonlu.ca/images/generated/uvloop-1662-ddc9040a2.png 1662w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/uvloop-800-ddc9040a2.png" alt="speed results for uvloop" width="1662" height="798" /></picture>

<p>For a drop in replacement it seems pretty great - I don’t think it’ll help much at this stage, though, because most of the timing is due to the actual network call at this point. The threading and event loop implementation isn’t adding that much overhead, I’m guessing.</p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[I do a lot of web scraping in my spare time, and have been chasing down different formats and code snippets to make a large amount of network requests locally, with controls for rate limiting and error handling.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/python-semaphore-optimal.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/python-semaphore-optimal.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">A boilerplate for SSR’d Vite, React 17, and TypeScript 4.3</title><link href="https://blog.jonlu.ca/posts/vite" rel="alternate" type="text/html" title="A boilerplate for SSR’d Vite, React 17, and TypeScript 4.3" /><published>2021-05-16T12:49:29-04:00</published><updated>2021-05-16T12:49:29-04:00</updated><id>https://blog.jonlu.ca/posts/vite</id><content type="html" xml:base="https://blog.jonlu.ca/posts/vite"><![CDATA[<p>Introducing a barebones, slightly-opinionated boilerplate for working with a modern web stack written for 2021. This takes the additional jump of allowing you to run your own server, for applications that are more complex or need more flexibility than Netlify or Nextjs can provide.</p>

<p><strong><a href="https://github.com/jonluca/vite-typescript-ssr-react">Repo</a></strong></p>

<ul>
  <li><a href="https://reactjs.org/blog/2020/10/20/react-v17.html">React 17</a></li>
  <li><a href="https://devblogs.microsoft.com/typescript/announcing-typescript-4-3-rc/">Typescript 4.3</a></li>
  <li><a href="https://vitejs.dev/guide/ssr.html">Vite with Vite SSR</a></li>
  <li><a href="https://github.com/features/actions">GitHub Actions</a></li>
  <li><a href="https://tailwindui.com/">Tailwind CSS</a></li>
  <li><a href="https://prettier.io/">Prettier</a> &amp; <a href="https://eslint.org/">ESLint</a></li>
</ul>

<p>The SSR is accomplished through an Express server.</p>

<video class="centered-image" controls="" autoplay="" loop="">
    <source src="/images/vite.mp4" type="video/mp4" />
    Your browser does not support the video tag.
</video>
<p class="footnote">Vite HMR</p>

<h2 id="vite">Vite</h2>

<p>Vite HMR is baked into the server, so we get the blazingly fast code changes reflected on the client seen in the video above. The server rendering logic currently does not have any type of prop injection or initial state.</p>

<h2 id="opinions">Opinions</h2>

<p>This boilerplate is slightly opinionated, specifically around linting and code format. The TypeScript config is set up such as to allow for the greatest flexibility</p>

<p>When running in development mode, the application uses <code class="language-plaintext highlighter-rouge">ts-node</code> to natively run the Express server and load in the Vite HMR. In development, the rendering is still done client side. In production, all requests are server rendered. Read more on SSR on <a href="https://vitejs.dev/guide/ssr.html">the Vite docs</a>.</p>

<p>There is no selected front end state management library or back end database. There is a simple React Context for basic state.</p>

<h2 id="hosting">Hosting</h2>

<p>This assumes a traditional hosting infrastructure. The goal of the project is to provide as little lock-in as possible. You’re required to set up the deployment infrastructure, as well as containerizing the application.</p>

<p><a href="https://github.com/jonluca/vite-typescript-ssr-react">You can get this boilerplate here.</a></p>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[Introducing a barebones, slightly-opinionated boilerplate for working with a modern web stack written for 2021. This takes the additional jump of allowing you to run your own server, for applications that are more complex or need more flexibility than Netlify or Nextjs can provide.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/vite.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/vite.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How to redeem $2000 of HNS for being a FOSS developer</title><link href="https://blog.jonlu.ca/posts/hns-airdrop" rel="alternate" type="text/html" title="How to redeem $2000 of HNS for being a FOSS developer" /><published>2021-05-09T17:22:33-04:00</published><updated>2021-05-09T17:22:33-04:00</updated><id>https://blog.jonlu.ca/posts/hns-airdrop</id><content type="html" xml:base="https://blog.jonlu.ca/posts/hns-airdrop"><![CDATA[<p>If you were an active FOSS developer in 2019, you were gifted ~4,246 HNS, which as of May 2021 is worth 0.0359087 BTC, or $2k USD. <a href="https://handshake.org/claim/">Handshake wanted to reward FOSS developers by gifting handshake tokens (HNS).</a> You can redeem these pretty easily, and either use them to support open source projects, or redeem them as bitcoin and cash them out.</p>

<h2 id="prerequisites">Prerequisites</h2>

<p>Roughly 250,000 GitHub users qualified. You must have had at least <strong>15 followers</strong> during the week of <strong>2019-02-04</strong>. If you didn’t have a GitHub account at the time, or didn’t have 15 followers, or didn’t have a valid public SSH key uploaded, you were not eligible for the airdrop.</p>

<h2 id="first-steps">First steps</h2>

<p>Download hs-airdrop and register for an account on https://namebase.io. Namebase will create a free HNS wallet for you, which you can easily transfer to BTC.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/handshake-org/hs-airdrop.git &amp;&amp; cd hs-airdrop &amp;&amp; npm install
</code></pre></div></div>

<p>Next go to https://www.namebase.io/ and find your wallets address.</p>

<p>To redeem your HNS you’ll need:</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">hs-airdrop</code> binary</li>
  <li>The path to your private key</li>
  <li>Your wallet address</li>
</ul>

<p>Namebase also has similar instructions on redeeming these coins at https://www.namebase.io/airdrop.</p>

<h2 id="ssh">SSH</h2>

<p>If you want to use SSH, first confirm that the key you want to use is registered on GitHub, and was associated with your account on February 4th, 2019.</p>

<p>List all your keys with</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ls ~/.ssh/
</code></pre></div></div>

<p>Most likely it will be named <code class="language-plaintext highlighter-rouge">id_rsa</code>. Then run the hs-airdrop binary with the path to your private key.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./bin/hs-airdrop --bare ~/.ssh/id_rsa hs1...&lt;your address from namebase&gt; 0.1
</code></pre></div></div>

<p>It will ask you to decrypt your private key. Note that the private key is <em>never sent anywhere</em> - you can verify this yourself from the source code of hs-airdrop, here: https://github.com/handshake-org/hs-airdrop. This will then proceed to submit your confirmation and check if it is in the airdrop tree. If it is, you’ll get a base64 confirmation, which you’ll submit below.</p>

<h2 id="pgp">PGP</h2>

<p>PGP is a little more tricky - the documentation on the repo is a bit lacking.</p>

<p>On MacOS, using PGP Suite, you can do the following.</p>

<p>First, get all your available keys:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --list-keys
/Users/jonlucadecaro/.gnupg/pubring.kbx
---------------------------------------
pub   rsa4096 2017-08-24 [SC] [expires: 2033-08-20]
      849E61D17094A964866AE510E6DC4811DD593AC7
uid           [ultimate] JonLuca De Caro &lt;jonlucadecaro96@gmail.com&gt;
sub   rsa4096 2017-08-24 [E] [expires: 2033-08-20]
</code></pre></div></div>

<p>The key ID is the value below the <code class="language-plaintext highlighter-rouge">pub</code> key (in the case of the above, <code class="language-plaintext highlighter-rouge">849E61D17094A964866AE510E6DC4811DD593AC7</code>). You now want to export the private key, replacing my public key ID with yours.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gpg --armor --export-secret-keys 849E61D17094A964866AE510E6DC4811DD593AC7 &gt; ~/Desktop/sec.asc
</code></pre></div></div>

<p>It will ask you for the keys password.</p>

<p>to check if your key is in the airdrop tree, you next run, again replacing my key with yours:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./bin/hs-airdrop --bare ~/Desktop/sec.asc 849E61D17094A964866AE510E6DC4811DD593AC7 hs1...&lt;your address&gt; 0.1
</code></pre></div></div>

<p>This will tell you if your key is in the airdrop tree, and if it is, it will print out a base64 string with your airdrop redemption.</p>

<h2 id="submitting-your-confirmation">Submitting your confirmation</h2>

<p>If your SSH or public key was in the airdrop tree, it will print a base64 string. Go to https://www.namebase.io/airdrop and paste it into the 5th box.</p>

<p>If it was valid, you’ll receive your HNS in roughly 16 hours.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/namebase-balance-50-a2df97989.webp 50w, https://blog.jonlu.ca/images/generated/namebase-balance-100-a2df97989.webp 100w, https://blog.jonlu.ca/images/generated/namebase-balance-200-a2df97989.webp 200w, https://blog.jonlu.ca/images/generated/namebase-balance-400-a2df97989.webp 400w, https://blog.jonlu.ca/images/generated/namebase-balance-800-a2df97989.webp 800w, https://blog.jonlu.ca/images/generated/namebase-balance-1200-a2df97989.webp 1200w, https://blog.jonlu.ca/images/generated/namebase-balance-1600-a2df97989.webp 1600w, https://blog.jonlu.ca/images/generated/namebase-balance-2062-a2df97989.webp 2062w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/namebase-balance-50-a69ae5085.png 50w, https://blog.jonlu.ca/images/generated/namebase-balance-100-a69ae5085.png 100w, https://blog.jonlu.ca/images/generated/namebase-balance-200-a69ae5085.png 200w, https://blog.jonlu.ca/images/generated/namebase-balance-400-a69ae5085.png 400w, https://blog.jonlu.ca/images/generated/namebase-balance-800-a69ae5085.png 800w, https://blog.jonlu.ca/images/generated/namebase-balance-1200-a69ae5085.png 1200w, https://blog.jonlu.ca/images/generated/namebase-balance-1600-a69ae5085.png 1600w, https://blog.jonlu.ca/images/generated/namebase-balance-2062-a69ae5085.png 2062w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/namebase-balance-800-a69ae5085.png" alt="Namebase Balance" width="2062" height="1168" /></picture>

<p>You can then proceed to sell it on namebase.</p>

<picture data-downloadable="true"><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/namebase-50-65f153c50.webp 50w, https://blog.jonlu.ca/images/generated/namebase-100-65f153c50.webp 100w, https://blog.jonlu.ca/images/generated/namebase-200-65f153c50.webp 200w, https://blog.jonlu.ca/images/generated/namebase-400-65f153c50.webp 400w, https://blog.jonlu.ca/images/generated/namebase-800-65f153c50.webp 800w, https://blog.jonlu.ca/images/generated/namebase-1200-65f153c50.webp 1200w, https://blog.jonlu.ca/images/generated/namebase-1450-65f153c50.webp 1450w" type="image/webp" /><source sizes="(max-width: 480px) calc(100vw - 16px), (max-width: 768) 80vw, 800px" srcset="https://blog.jonlu.ca/images/generated/namebase-50-622a90ab3.png 50w, https://blog.jonlu.ca/images/generated/namebase-100-622a90ab3.png 100w, https://blog.jonlu.ca/images/generated/namebase-200-622a90ab3.png 200w, https://blog.jonlu.ca/images/generated/namebase-400-622a90ab3.png 400w, https://blog.jonlu.ca/images/generated/namebase-800-622a90ab3.png 800w, https://blog.jonlu.ca/images/generated/namebase-1200-622a90ab3.png 1200w, https://blog.jonlu.ca/images/generated/namebase-1450-622a90ab3.png 1450w" type="image/png" /><img class="centered-image" src="https://blog.jonlu.ca/images/generated/namebase-800-622a90ab3.png" alt="Namebase Sell" width="1450" height="542" /></picture>

<p>Just paste in whatever BTC balance you want to send it to, and it’ll show up in a few minutes to hours. You can also use the coins to bid on names in namespace, or to support additional FOSS projects!</p>

<h3 id="multiple-keys">Multiple keys</h3>

<p>Note that if you have multiple SSH keys, it will only allow you to redeem for one. It will create valid nonces for each key, but once you redeem one in the chain, it invalidates the rest. If you submit the base64 for it it will look like it was accepted, but it will be rejected by the miners</p>

<h2 id="other-airdrops">Other Airdrops</h2>

<ul>
  <li>Keybase did an XLM drop worth about $1000 USD as of May 2021 - check your https://keybase.io account</li>
  <li>Uniswap did a 400 ETH drop worth about $16,000 USD as of May 2021 - https://airdrops.io/uniswap/</li>
</ul>]]></content><author><name>{&quot;twitter&quot;=&gt;&quot;JonLuca&quot;}</name></author><summary type="html"><![CDATA[If you were an active FOSS developer in 2019, you were gifted ~4,246 HNS, which as of May 2021 is worth 0.0359087 BTC, or $2k USD. Handshake wanted to reward FOSS developers by gifting handshake tokens (HNS). You can redeem these pretty easily, and either use them to support open source projects, or redeem them as bitcoin and cash them out.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://blog.jonlu.ca/images/namebase-balance.png" /><media:content medium="image" url="https://blog.jonlu.ca/images/namebase-balance.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>