<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://bogo.wtf/feed.xml" rel="self" type="application/atom+xml" /><link href="https://bogo.wtf/" rel="alternate" type="text/html" /><updated>2025-01-26T07:21:30+00:00</updated><id>https://bogo.wtf/feed.xml</id><title type="html">Hello, I’m Bogo</title><subtitle>Bogo is a research engineer working at OpenAI in San Francisco, working on realtime audio intelligence.</subtitle><author><name>Bogo Giertler</name></author><entry><title type="html">Hacking native ARM64 binaries to run on the iOS Simulator - the dynamic framework edition</title><link href="https://bogo.wtf/arm64-to-sim-dylibs.html" rel="alternate" type="text/html" title="Hacking native ARM64 binaries to run on the iOS Simulator - the dynamic framework edition" /><published>2021-02-17T10:00:00+00:00</published><updated>2021-02-17T10:00:00+00:00</updated><id>https://bogo.wtf/arm64-to-sim-dylibs</id><content type="html" xml:base="https://bogo.wtf/arm64-to-sim-dylibs.html"><![CDATA[<blockquote>
  <p><strong>NOTE:(bogo)</strong> This article focuses on <em>dynamic</em> libraries. This is a follow up to a <a href="/arm64-to-sim.html">separate article</a> explaining how to run ARM64 <em>static</em> libraries on the iOS Simulator.</p>
</blockquote>

<p>Since I published <a href="/arm64-to-sim.html">the original ARM64 hacking article</a>, a couple of folks reached out asking whether a similar technique can be applied to dynamic frameworks, such as <a href="https://pspdfkit.com">PSPDFKit</a> or <a href="https://developers.google.com/interactive-media-ads/docs/sdks/ios/client-side">Google’s Interactive Media Ads SDK</a>.</p>

<p>My original project did not account for the existence of dynamic frameworks, so hacking them seemed like a great learning opportunity. In the process, I found out that making a typical ARM64 dylib run in the iOS Simulator is actually pretty straightforward - and requires significantly less Mach-O acrobatics than a static library!</p>

<h2 id="-static-vs-dynamic">🧑‍🔬 Static vs Dynamic</h2>
<p>Just like static libraries we were dealing with last week, dynamic libraries are used as a form of code sharing. The core difference lies in the linking process. Static libraries are linked in (by <code class="language-plaintext highlighter-rouge">ld</code>) at build time and become an integral part of the application’s binary. Dynamic libraries are linked in (by <code class="language-plaintext highlighter-rouge">dyld3</code>) at runtime and (theoretically) can be swapped out at any time - even after the application launches.</p>

<p>On Apple platforms, this is particularly useful for applications that provide multiple extensions. The developer, instead of shipping the same code and assets in each target, can bundle the shared content into a single framework and then link any binary in the application bundle against it.</p>

<p>In case of a large application, the space savings can easily go into hundreds of megabytes. An online tool called Emerge provides <a href="https://www.emergetools.com/apps/dropbox">an interesting visualization for the Dropbox app</a> - with eight distinct <code class="language-plaintext highlighter-rouge">appex</code>es, the Dropbox app bundle remains reasonably sized thanks to the (rather chonky) <code class="language-plaintext highlighter-rouge">DropboxExtensions</code> dynamic framework:</p>

<p><img src="/assets/images/2021-02-16-arm64-to-sim-dylibs/DropboxExtensions-savings.png" alt="space savings at Dropbox when using a dylib" /></p>

<p>While dynamic libraries can seem like a silver bullet for modularizing a complex application, their loading incurs a significant cost at launch time and <a href="https://developer.apple.com/documentation/xcode/improving_your_app_s_performance/reducing_your_app_s_launch_time">Apple advises</a> developers to keep the total number of dylibs to a bare minimum of “a few”.</p>

<p>Under the hood, dynamic libraries are bona fide Mach-O fat <strong>binaries</strong>. Unlike the fat <strong>archives</strong> we worked with last week, our dynamic library begins with <code class="language-plaintext highlighter-rouge">CAFE BABE</code> - the magic number of Mach-O fat binary - and not the <code class="language-plaintext highlighter-rouge">!&lt;arch&gt;</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ od -N 4 -t x1 Example.framework/Example
0000000    ca  fe  ba  be
</code></pre></div></div>

<p>All of this means we might be equipped to handle the dynamic libraries with a few simple modifications to our <code class="language-plaintext highlighter-rouge">arm64-to-sim</code> transmogrifier!</p>

<h2 id="-my-god-its-full-of-0s">✨ My God, it’s full of 0s!</h2>
<p>We start by <code class="language-plaintext highlighter-rouge">lipo</code>-ing the framework down to ARM64:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>lipo <span class="nt">-thin</span> arm64 Example.framework/Example <span class="nt">-output</span> Example.framework/Example.arm64
<span class="nv">$ </span><span class="nb">od</span> <span class="nt">-N</span> 4 <span class="nt">-t</span> x1 Example.framework/Example.arm64
0000000    cf  fa  ed  fe
</code></pre></div></div>

<p>Let’s read this ARM64 slice with <code class="language-plaintext highlighter-rouge">otool -fahl</code>. We can immediately notice that, unlike in a typical static binary, <code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code>s seem to partition the entire file. We can confirm it by taking look at <code class="language-plaintext highlighter-rouge">filesize</code> and <code class="language-plaintext highlighter-rouge">fileoff</code> parameters, since they are present only in the <code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code>s:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">stat</span> <span class="nt">-f</span>%z Example/Example.arm64
6645432
<span class="nv">$ </span>otool <span class="nt">-l</span> Example/Example.arm64 | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"filesize | fileoff "</span>
  fileoff 0
 filesize 851968
  fileoff 851968
 filesize 196608
  fileoff 1048576
 filesize 4816896
  fileoff 5865472
 filesize 779960
</code></pre></div></div>

<p>If we add all the <code class="language-plaintext highlighter-rouge">filesize</code> fields, our suspicion is confirmed:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">stat</span> <span class="nt">-f</span>%z Example/Example.arm64
6645432
<span class="nv">$ </span>otool <span class="nt">-l</span> Example/Example.arm64 | <span class="nb">grep </span>filesize | <span class="nb">grep</span> <span class="nt">-Eo</span> <span class="s2">"(</span><span class="se">\d</span><span class="s2">+)"</span> | <span class="nb">paste</span> <span class="nt">-sd</span>+ - | bc
6645432
</code></pre></div></div>

<p>This means that our approach of offsetting the <code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code> might not work at all - the load commands are likely accounted for in the first segment already. Let’s zoom out a bit and check what sections does the <code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code> point at:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>otool <span class="nt">-l</span> Example/Example.arm64
Load <span class="nb">command </span>0
      cmd LC_SEGMENT_64
  cmdsize 952
  segname __TEXT
   vmaddr 0x0000000000000000
   vmsize 0x00000000000d0000
  fileoff 0
 filesize 851968
  maxprot 0x00000005
 initprot 0x00000005
   nsects 11
    flags 0x0
Section
  sectname __text
   segname __TEXT
      addr 0x0000000000007210
      size 0x00000000000aa300
    offset 29200
     align 2^2 <span class="o">(</span>4<span class="o">)</span>
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
...
</code></pre></div></div>

<p>Interesting! The first section of the first segment (<code class="language-plaintext highlighter-rouge">__text</code>) seems to be <em>very</em> far into the file, at 0x7210 (== 29,200) bytes. That seems odd - we know from last week that load commands are usually much shorter. Let’s check if our gut is right by summing up the length of all <code class="language-plaintext highlighter-rouge">load_command</code>s:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>otool <span class="nt">-l</span> Example/Example.arm64 | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"cmdsize "</span> | <span class="nb">grep</span> <span class="nt">-Eo</span> <span class="s2">"(</span><span class="se">\d</span><span class="s2">+)"</span> | <span class="nb">paste</span> <span class="nt">-sd</span>+ -
 | bc
4736
</code></pre></div></div>

<p>Wowza! This is indeed way shorter than the offset of <code class="language-plaintext highlighter-rouge">__text</code>. Let’s investigate this further and use <code class="language-plaintext highlighter-rouge">xxd</code> to look at what is our dynamic library hiding at 4,736 bytes…</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>xxd <span class="nt">-s</span> 4736 <span class="nt">-l</span> 256 Example/Example.arm64
00001288: 2600 0000 1000 0000 98f3 5900 800d 0000  &amp;.........Y.....
00001298: 2900 0000 1000 0000 1801 5a00 0000 0000  <span class="o">)</span>.........Z.....
000012a8: 1d00 0000 1000 0000 c066 6500 5012 0100  .........fe.P...
000012b8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012c8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012d8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012e8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000012f8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001308: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001318: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001328: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001338: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001348: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001358: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001368: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001378: 0000 0000 0000 0000 0000 0000 0000 0000  ................
</code></pre></div></div>

<p>A massive amount of completely empty load command padding! This is excellent news. Since this pre-existing padding <strong>must</strong> be already accounted for in all the load commands, we can use it to store the “excess” bytes caused by replacing <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONEOS</code> with <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code>.</p>

<p>To that end, let’s edit our previous week’s sources to “chomp” off the 8 bytes after the load commands. Since the zeros are not a part of load commands, our existing code reads them into <code class="language-plaintext highlighter-rouge">programData</code> variable using the <code class="language-plaintext highlighter-rouge">readToEnd()</code> function on the <code class="language-plaintext highlighter-rouge">FileHandle</code> object. To effectively overwrite the 8 bytes we need to fit the new command in, we simply seek ahead before that final read:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// discard the empty 8 bytes that we will use for our longer load command</span>
<span class="k">let</span> <span class="nv">bytesToDiscard</span> <span class="o">=</span> <span class="nf">abs</span><span class="p">(</span><span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">build_version_command</span><span class="o">&gt;.</span><span class="n">stride</span> <span class="o">-</span> <span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">version_min_command</span><span class="o">&gt;.</span><span class="n">stride</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">handle</span><span class="o">.</span><span class="nf">readData</span><span class="p">(</span><span class="nv">ofLength</span><span class="p">:</span> <span class="n">bytesToDiscard</span><span class="p">)</span>
</code></pre></div></div>

<p>Since we do not have to reconstruct any offsets in the binary, we also don’t need to handle any other load command updates. We can safely remove them from our <code class="language-plaintext highlighter-rouge">map()</code>, obviously except for the <code class="language-plaintext highlighter-rouge">build_version_command</code> substitution.</p>

<h2 id="-the-vtool-way">🪄 The <code class="language-plaintext highlighter-rouge">vtool</code> Way</h2>

<p>Turns out there is a second solution, that is both significantly easier and does away with most of the shell-fu above - the <code class="language-plaintext highlighter-rouge">vtool</code>!</p>

<p><code class="language-plaintext highlighter-rouge">vtool</code> was shipped by Apple as a part of the Xcode 11’s command line tools. While it appears to be <a href="https://developer.apple.com/forums/thread/659964">meant to help in notarizing pre-macOS 10.9 frameworks</a>, we can use its ability to edit load commands for our purposes as well.</p>

<p><code class="language-plaintext highlighter-rouge">vtool</code> does not require us to perform any <code class="language-plaintext highlighter-rouge">lipo</code>, as it operates directly on the specified platform slice. Let’s first use it to check what are the relevant load commands in the file:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>xcrun vtool <span class="nt">-arch</span> arm64 <span class="se">\</span>
              <span class="nt">-show</span> <span class="se">\</span>
              Example.framework/Example
Example.framework/Example <span class="o">(</span>architecture arm64<span class="o">)</span>:
Load <span class="nb">command </span>9
      cmd LC_VERSION_MIN_IPHONEOS
  cmdsize 16
  version 8.0
      sdk 14.0
Load <span class="nb">command </span>10
      cmd LC_SOURCE_VERSION
  cmdsize 16
  version 0.0
</code></pre></div></div>

<p>Great, we can see our old friend, <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONEOS</code>. Since we know that <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code> is 8 bytes longer, let’s see if the binary has enough padding space to accommodate it:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>xcrun vtool <span class="nt">-arch</span> arm64 <span class="se">\</span>
              <span class="nt">-show-space</span> <span class="se">\</span>
              Example.framework/Example
Example.framework/Example <span class="o">(</span>architecture arm64<span class="o">)</span>:
  Mach header size:     32
  Load <span class="nb">command </span>size:  4736
  Available space:   24432
  Total:             29200
</code></pre></div></div>

<p>Great! We seem to have 24,432 bytes of available space - all the 0s we saw using <code class="language-plaintext highlighter-rouge">xxd</code>. That’s plenty for our substitution.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>xcrun vtool <span class="nt">-arch</span> arm64 <span class="se">\</span>
              <span class="nt">-set-build-version</span> 7 13.0 13.0 <span class="se">\</span>
              <span class="nt">-replace</span> <span class="se">\</span>
              <span class="nt">-output</span> Example.framework/Example.reworked <span class="se">\</span>
              Example.framework/Example
</code></pre></div></div>

<p>Let’s break this command up a little.</p>

<p>With <code class="language-plaintext highlighter-rouge">-set-build-version</code> and <code class="language-plaintext highlighter-rouge">-replace</code> parameters, we are asking <code class="language-plaintext highlighter-rouge">vtool</code> to set us up with a new <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code> <strong>and</strong> replace the previous <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONE_OS</code> entry. Should we not specify <code class="language-plaintext highlighter-rouge">-replace</code>, we will actually end up with both load commands present in the Mach-O header and compile- and run-time hilarity will ensue.</p>

<p>The <code class="language-plaintext highlighter-rouge">build-version</code> is specified as a <code class="language-plaintext highlighter-rouge">&lt;platform&gt; &lt;minos&gt; &lt;sdk&gt;</code> tuple. Note that <code class="language-plaintext highlighter-rouge">vtool</code> does not take a string for the <code class="language-plaintext highlighter-rouge">platform</code> value - the seemingly random number <code class="language-plaintext highlighter-rouge">7</code> in our invocation actually represents the <a href="https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/EXTERNAL_HEADERS/mach-o/loader.h#L1282"><code class="language-plaintext highlighter-rouge">IOSSIMULATOR</code> entry in XNU’s Mach-O loader</a>.</p>

<p>Let’s wrap this up by confirming that <code class="language-plaintext highlighter-rouge">vtool</code> modified the binary correctly:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>xcrun vtool <span class="nt">-arch</span> arm64 <span class="se">\</span>
              <span class="nt">-show</span> <span class="se">\</span>
              Example.framework/Example.reworked
Example.framework/Example.reworked <span class="o">(</span>architecture arm64<span class="o">)</span>:
Load <span class="nb">command </span>9
      cmd LC_SOURCE_VERSION
  cmdsize 16
  version 0.0
Load <span class="nb">command </span>34
      cmd LC_BUILD_VERSION
  cmdsize 24
 platform IOSSIMULATOR
    minos 13.0
      sdk 13.0
   ntools 0
</code></pre></div></div>

<h2 id="-keys-to-the-kingdom">🔑 Keys to the Kingdom</h2>

<p>After we run our modified transmogrifier, we will need to follow the familiar steps of <code class="language-plaintext highlighter-rouge">lipo</code>-ing the resulting ARM64 dylib with an x86 slice and assembling the library into an XCFramework. Should we use the <code class="language-plaintext highlighter-rouge">vtool</code> approach, we won’t even have to <code class="language-plaintext highlighter-rouge">lipo</code> the library back. (although thinning it to just the relevant platforms is a reasonable move)</p>

<p>In either case, following a straightforward framework substitution in the original Xcode project, the build should succeed!</p>

<p>Unfortunately, it’s not time to celebrate just yet - with linking happening at runtime, our app will almost certainly crash immediately after the Simulator boots. Xcode’s debugger console should contain a cryptic message similar to the one below:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dyld: Library not loaded: @rpath/Example.framework/Example
  Referenced from: &lt;snip&gt;/Example.app/Example
  Reason: no suitable image found.  Did find:
	&lt;snip&gt;/Example.framework/Example: code signature in (&lt;snip&gt;/Example.framework/Example) not valid for use in process using Library Validation: Trying to load an unsigned library
</code></pre></div></div>

<p>It appears that M1 Macs have a stricter policy on dynamic library validation compared to the Intel ones - and they won’t load an unsigned ARM64 library into memory, even if we mark it in Xcode as <em>Embed &amp; Sign</em>. Luckily, we can fix that pretty easily:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ xcrun codesign --sign - Example.xcframework/ios-arm64-x86_64-simulator/Example.framework/Example
</code></pre></div></div>

<p>Let’s try to run the app again…</p>

<p><img src="/assets/images/2021-02-16-arm64-to-sim-dylibs/simulator-with-dylib.png" alt="iOS Simulator running with a hacked dylib" /></p>

<p>Voilà, a native dynamic framework running on the iOS Simulator! 🎉</p>

<h2 id="-learnings-and-dead-ends">🤓 Learnings and Dead Ends</h2>
<p>It took me a total of an hour to come up with this solution. Along the way I understood:</p>

<ul>
  <li>that header padding is a typical practice in dylibs - every third party dynamic framework I looked had at least a few hundred bytes of ✨ 0s ✨ immediately following the load commands;</li>
  <li>that <code class="language-plaintext highlighter-rouge">vtool</code> is exactly the tool I needed for my ARM64 hacking - but it also requires binaries built with load command padding <code class="language-plaintext highlighter-rouge">¯\_(ツ)_/¯</code>;</li>
  <li>that <a href="https://lief.quarkslab.com/">LIEF</a> and <a href="https://github.com/Homebrew/ruby-macho">macho-ruby</a> were meant for handling dylibs - they both required padding after load commands to be able to edit the binary;</li>
  <li>that some dynamic libraries contain <code class="language-plaintext highlighter-rouge">LC_ENCRYPTION_INFO</code> load command; fortunately, the encryption is not sensitive to changes to the padding, and our hacks continue to work.</li>
</ul>

<h2 id="-future-improvements">🤓 Future Improvements</h2>
<p>A couple of potential improvements are left as an exercise to the reader. These could be:</p>
<ul>
  <li>exploring the impact of the <code class="language-plaintext highlighter-rouge">LC_CODE_SIGNATURE</code> property, used by a few third party frameworks (e.g. PSPDFKit) - I have not tested yet whether it impacts binary edits like the one described in this article;</li>
  <li>heuristic for detecting the header padding - it would be a neat feature to include in <code class="language-plaintext highlighter-rouge">arm64-to-sim</code>, making it a one-stop shop for ARM64 transmogrification.</li>
</ul>

<h2 id="-references">🙌 References</h2>
<p>A couple of people and projects deserve special thanks:</p>

<ul>
  <li><a href="https://twitter.com/myeyesareblnd/status/1362005189385875461">myeyesareblind on Twitter</a> suggested using <code class="language-plaintext highlighter-rouge">vtool</code> to edit load commands - and it turned out to be just the right tool for the job, as long as padding is available!</li>
  <li><a href="https://pewpewthespells.com/blog/static_and_dynamic_libraries.html">Samantha Demi</a> wrote a great explainer on the differences between static and dynamic frameworks and I’d recommend it to anyone looking to make sense of the topic;</li>
  <li>the <a href="https://github.com/JuliaLang/julia/issues/36617">Julia project</a> seems to have run into the code signing issue on M1s as well, which confirmed the use of <code class="language-plaintext highlighter-rouge">codesign</code> to solve the issue;</li>
  <li><a href="https://blog.allegro.tech/2018/05/Static-linking-vs-dyld3.html">Allegro’s Kamil Borzym</a> ran a series of benchmarks in 2018 to confirm that excessive loading of dylibs indeed carries a performance penalty at launch time;</li>
  <li><a href="https://twitter.com/lgerbarg">Louis Gerbarg</a> gave a <a href="https://developer.apple.com/videos/play/wwdc2017/413/">fantastic WWDC17 presentation on <code class="language-plaintext highlighter-rouge">dyld3</code></a> and it’s still a worthy watch, 4 years on.</li>
</ul>]]></content><author><name>Bogo Giertler</name></author><category term="swift" /><category term="arm64" /><summary type="html"><![CDATA[Turns out that ARM64 dynamic libraries can also run on M1 Macs - and it's even easier than the static lib hack!]]></summary></entry><entry><title type="html">Hacking native ARM64 binaries to run on the iOS Simulator</title><link href="https://bogo.wtf/arm64-to-sim.html" rel="alternate" type="text/html" title="Hacking native ARM64 binaries to run on the iOS Simulator" /><published>2021-02-10T23:37:00+00:00</published><updated>2021-02-10T23:37:00+00:00</updated><id>https://bogo.wtf/arm64-to-sim</id><content type="html" xml:base="https://bogo.wtf/arm64-to-sim.html"><![CDATA[<blockquote>
  <p><strong>NOTE:(bogo)</strong> This article focuses on <em>static</em> libraries. I wrote a <a href="/arm64-to-sim-dylibs.html">separate article</a> explaining how to use this technique to get ARM64 <em>dynamic</em> libraries running on the iOS Simulator.</p>
</blockquote>

<p><img src="/assets/images/2021-02-10-arm64-to-sim/Full Screenshot.png" alt="M1 Simulator + ARM64" /></p>

<p>The screenshot above looks perfectly normal - until you realize that the sample app running on this M1 MacBook is actually a legacy Spotify SDK demo from 2017. Its proprietary binary framework has never been rebuilt to support M1 Macs and cannot run on Apple’s newest computers, unless Xcode is launched through <a href="https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment">Rosetta 2</a>.</p>

<p>If you have an M1 Mac, you probably already encountered this issue. A couple of seconds after hitting Run on your favorite project (and going <em>wow, those M1 Macs sure are fast!</em>), you were likely greeted with this:</p>

<p><img src="/assets/images/2021-02-10-arm64-to-sim/Xcode Linker Error.png" alt="Xcode Linker Error" /></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ld: in ../../SpotifyiOS.framework/SpotifyiOS(MPMessagePackReader.o), building for iOS Simulator, but linking in object file built for iOS, file '../../SpotifyiOS.framework/SpotifyiOS' for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
</code></pre></div></div>

<p>In plain English, the proprietary binary framework you’ve been using in your project has not been updated to support iOS Simulator running on M1 Macs. Apple’s advice in this situation is to reach out to the vendor and ask them to release an updated version of the framework - both by migrating it to an XCFramework format, and by rebuilding it to support M1 simulators.</p>

<p>There is a lot of reasons why you might not get your hands on that updated framework anytime soon - or even at all. Commonly, the third-party vendor is slow to react, or you are pinning to a previous major version of the framework for compatibility reasons. Since you likely do not have sources for the original library, you cannot rebuild it yourself either. This means no Simulator builds and no local unit and UI tests. You seemingly hit a dead end and development on an M1 Mac will be very difficult for time being. Or did you?</p>

<p>Last week, I ran into this issue with <a href="https://github.com/spotify/ios-sdk">Spotify’s iOS SDK</a>. With the binary release not updated for over a year, I had to find a way to hack the native ARM64 binary to run in the Simulator. On the way, I learnt a lot about frameworks, binaries, and loaders. You can find the complete sources for <a href="https://github.com/bogo/arm64-to-sim">arm64-to-sim</a> on GitHub. What follows is a detailed explanation of the ARM64 transmogrification.</p>

<h2 id="-an-idea-takes-root">💡 An Idea Takes Root</h2>
<p>Let’s take a look at the error message again. The error we receive isn’t actually a compiler error - it’s a linker error. <code class="language-plaintext highlighter-rouge">ld</code> complains that we are attempting to link in a binary that was compiled for <em>native</em> ARM64 to a binary that is being built for <em>iOS Simulator</em> ARM64.</p>

<p>Historically, the ARM/x86 bifurcation in the Apple product line meant that one could safely assume that code built for <code class="language-plaintext highlighter-rouge">i386</code> and <code class="language-plaintext highlighter-rouge">x86_64</code> was meant for the Simulator, and code built for <code class="language-plaintext highlighter-rouge">armv7</code> and <code class="language-plaintext highlighter-rouge">arm64</code> was meant for native devices. This found reflection in <a href="https://en.wikipedia.org/wiki/Fat_binary">fat (universal) binaries</a> being a widely used hack for distributing frameworks for Apple platforms that could be used both for devices and simulators.</p>

<p>With the release of M1 Macs, this assumption no longer holds true - an ARM64 slice can now be meant for either. Under the guise of supporting macOS, iOS, watchOS, and tvOS in a single framework, in 2019 Apple released a new bundle framework format, <a href="https://developer.apple.com/videos/play/wwdc2019/416/">XCFramework</a>.</p>

<p>This should give us an idea: since, as indicated by the <code class="language-plaintext highlighter-rouge">ld</code> error, we already have a native ARM64 slice in our library, maybe we can repackage it as an iOS Simulator-supporting XCFramework. There is no technical reason why it <em>shouldn’t</em> work - a compiled binary links against symbols of other frameworks and binaries. Since iOS devices and M1 Macs use the same ARM64 instruction set, if the symbols of native and Simulator libraries are sufficiently similar, the library should simply work. We will just need to apply a lot of elbow grease.</p>

<h2 id="-the-anatomy-of-a-xcframework">🫀 The Anatomy of a (XC)Framework</h2>
<p>XCFramework is a pretty straightforward format that is meant to be a drop-in replacement for the original Cocoa Frameworks. Essentially, each XCFramework is a directory containing a property list telling the linker where to find architecture- and plaform-specific copies of each framework.</p>

<p>An example XCFramework looks as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Example.xcframework/
|-- Info.plist
|-- ios-arm64/
|   +-- Example.framework/
+-- ios-arm64_x86_64-simulator/
    +-- Example.framework/
</code></pre></div></div>

<p>The actual mapping of individual frameworks to platforms is done in the <code class="language-plaintext highlighter-rouge">Info.plist</code> file. Notice the <code class="language-plaintext highlighter-rouge">SupportedArchitectures</code>, <code class="language-plaintext highlighter-rouge">SupportedPlatform</code>, and <code class="language-plaintext highlighter-rouge">SupportedPlatformVariant</code> properties.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</span>
<span class="cp">&lt;!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"&gt;</span>
<span class="nt">&lt;plist</span> <span class="na">version=</span><span class="s">"1.0"</span><span class="nt">&gt;</span>
<span class="nt">&lt;dict&gt;</span>
	<span class="nt">&lt;key&gt;</span>AvailableLibraries<span class="nt">&lt;/key&gt;</span>
	<span class="nt">&lt;array&gt;</span>
		<span class="nt">&lt;dict&gt;</span>
			<span class="nt">&lt;key&gt;</span>LibraryIdentifier<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;string&gt;</span>ios-arm64<span class="nt">&lt;/string&gt;</span>
			<span class="nt">&lt;key&gt;</span>LibraryPath<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;string&gt;</span>Example.framework<span class="nt">&lt;/string&gt;</span>
			<span class="nt">&lt;key&gt;</span>SupportedArchitectures<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;array&gt;</span>
				<span class="nt">&lt;string&gt;</span>arm64<span class="nt">&lt;/string&gt;</span>
			<span class="nt">&lt;/array&gt;</span>
			<span class="nt">&lt;key&gt;</span>SupportedPlatform<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;string&gt;</span>ios<span class="nt">&lt;/string&gt;</span>
		<span class="nt">&lt;/dict&gt;</span>
		<span class="nt">&lt;dict&gt;</span>
			<span class="nt">&lt;key&gt;</span>LibraryIdentifier<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;string&gt;</span>ios-arm64_x86_64-simulator<span class="nt">&lt;/string&gt;</span>
			<span class="nt">&lt;key&gt;</span>LibraryPath<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;string&gt;</span>Example.framework<span class="nt">&lt;/string&gt;</span>
			<span class="nt">&lt;key&gt;</span>SupportedArchitectures<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;array&gt;</span>
				<span class="nt">&lt;string&gt;</span>arm64<span class="nt">&lt;/string&gt;</span>
				<span class="nt">&lt;string&gt;</span>x86_64<span class="nt">&lt;/string&gt;</span>
			<span class="nt">&lt;/array&gt;</span>
			<span class="nt">&lt;key&gt;</span>SupportedPlatform<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;string&gt;</span>ios<span class="nt">&lt;/string&gt;</span>
			<span class="nt">&lt;key&gt;</span>SupportedPlatformVariant<span class="nt">&lt;/key&gt;</span>
			<span class="nt">&lt;string&gt;</span>simulator<span class="nt">&lt;/string&gt;</span>
		<span class="nt">&lt;/dict&gt;</span>
	<span class="nt">&lt;/array&gt;</span>
	<span class="nt">&lt;key&gt;</span>CFBundlePackageType<span class="nt">&lt;/key&gt;</span>
	<span class="nt">&lt;string&gt;</span>XFWK<span class="nt">&lt;/string&gt;</span>
	<span class="nt">&lt;key&gt;</span>XCFrameworkFormatVersion<span class="nt">&lt;/key&gt;</span>
	<span class="nt">&lt;string&gt;</span>1.0<span class="nt">&lt;/string&gt;</span>
<span class="nt">&lt;/dict&gt;</span>
<span class="nt">&lt;/plist&gt;</span>
</code></pre></div></div>

<p>After creating a relevant folder structure and dropping in an <code class="language-plaintext highlighter-rouge">Info.plist</code> alongside our legacy <code class="language-plaintext highlighter-rouge">.framework</code>, we should now have a real <code class="language-plaintext highlighter-rouge">.xcframework</code> on our hands. Let’s emplace the original <code class="language-plaintext highlighter-rouge">.framework</code> in Xcode with it and try to build. Of course, it would be too easy if it worked - instead, we get the following:</p>

<p><img src="/assets/images/2021-02-10-arm64-to-sim/Xcode Linker Error with XCFramework.png" alt="Xcode Linker Error - with XCFramework" /></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ld: in /Users/bogo/Library/Developer/Xcode/DerivedData/NowPlayingView-aeukgqexpeqlsrdzslkpeehveixs/Build/Products/Debug-iphonesimulator/SpotifyiOS.framework/SpotifyiOS(MPMessagePackReader.o), building for iOS Simulator, but linking in object file built for iOS, file '/Users/bogo/Library/Developer/Xcode/DerivedData/NowPlayingView-aeukgqexpeqlsrdzslkpeehveixs/Build/Products/Debug-iphonesimulator/SpotifyiOS.framework/SpotifyiOS' for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
</code></pre></div></div>

<p>Since the Cocoa Framework is coming from <code class="language-plaintext highlighter-rouge">DerivedData</code>, we can be sure we assembled our XCFramework correctly. Still, we are back to square one - despite our naïve wrapping, the linker can still tell that we are bringing in a native library. Here’s our new objective: to find a way to convince <code class="language-plaintext highlighter-rouge">ld</code> that the library is actually a Simulator library.</p>

<h2 id="️-into-the-binary">🕵️ Into the Binary</h2>

<p>Let’s take a look inside our framework and see what files could be informing it about the platform.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Example.framework/
|-- Info.plist
|-- Example
|-- Headers/
|   |-- A.h
|   |-- B.h
|   +-- C.h
+-- Modules/
    +-- module.modulemap
</code></pre></div></div>

<p>Cursory browsing of the human-readable contents of the framework does not yield any hints, so the linker must be using the contents of the binary file itself to infer the Simulator information. Since, we don’t really know what to look for, let’s dig into the binaries of other XCFrameworks out there first.</p>

<p><a href="https://github.com/firebase/firebase-ios-sdk/blob/master/Package.swift#L244">FirebaseAnalytics.xcframework</a> is a particularly good XCFramework to investigate - it contains both native and Simulator binaries. The obvious first idea is to search for Simulator references in the human-readable strings of the binary:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># in the FirebaseAnalytics.xcframework directory</span>
<span class="nv">$ </span>strings ios-arm64_i386_x86_64-simulator/FirebaseAnalytics.framework/FirebaseAnalytics | <span class="nb">grep</span> <span class="nt">-i</span> sim
</code></pre></div></div>

<p>The result is a bunch of rather uninteresting strings, none of them mentioning the Simulator. We can make an educated guess that the Simulator information is thus encoded in the machine-readable segment of the binary. To extract it, we can use <code class="language-plaintext highlighter-rouge">otool</code> - a tool meant to explore the executable files produced by LLVM. The <code class="language-plaintext highlighter-rouge">-fahl</code> parameter prints the relevant  fat, archive, and Mach-O headers, as well as the load commands.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># in the FirebaseAnalytics.xcframework directory</span>
<span class="nv">$ </span>otool <span class="nt">-fahl</span> ios-arm64_i386_x86_64-simulator/FirebaseAnalytics.framework/FirebaseAnalytics
<span class="o">(</span>...<span class="o">)</span>
Load <span class="nb">command </span>2
      cmd LC_LINKER_OPTIMIZATION_HINT
  cmdsize 16
  dataoff 12464
 datasize 760
Load <span class="nb">command </span>3
     cmd LC_SYMTAB
 cmdsize 24
  symoff 13224
   nsyms 201
  stroff 16440
 strsize 5064
<span class="o">(</span>...<span class="o">)</span>
</code></pre></div></div>

<p>Whoops, that’s a lot of data! The offsets and addresses and sizes are doing us no good and are likely to be different between platforms. Let’s constrain our search to load commands, save the results, and compare them:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># in the FirebaseAnalytics.xcframework directory</span>
<span class="nv">$ </span>otool <span class="nt">-fahl</span> ios-arm64_i386_x86_64-simulator/FirebaseAnalytics.framework/FirebaseAnalytics | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s1">'cmd |\.o'</span> <span class="o">&gt;</span> simulator_cmds

<span class="nv">$ </span>otool <span class="nt">-fahl</span> ios-arm64_armv7/FirebaseAnalytics.framework/FirebaseAnalytics | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s1">'cmd |\.o'</span> <span class="o">&gt;</span> native_cmds

<span class="nv">$ </span>diff <span class="nt">-u</span> native_cmds simulator_cmds
<span class="nt">-ios-arm64_armv7</span>/FirebaseAnalytics.framework/FirebaseAnalytics<span class="o">(</span>FirebaseAnalytics_vers.o<span class="o">)</span>:
+ios-arm64_i386_x86_64-simulator/FirebaseAnalytics.framework/FirebaseAnalytics<span class="o">(</span>FirebaseAnalytics_vers.o<span class="o">)</span>:
       cmd LC_SEGMENT_64
-      cmd LC_VERSION_MIN_IPHONEOS
+      cmd LC_BUILD_VERSION
      cmd LC_SYMTAB
<span class="o">(</span>...<span class="o">)</span>
</code></pre></div></div>

<p>Alright, we got a match! Seems that the Simulator binary contains  an <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code> load command, while the native binary contains an <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONEOS</code> load command in the same place. A pass with <code class="language-plaintext highlighter-rouge">otool</code> on our unsupported, native-only <code class="language-plaintext highlighter-rouge">.framework</code> confirms this theory. A bit of Googling reveals that <a href="https://reviews.llvm.org/D85358">this specific difference</a> is used by LLDB to distinguish Simulator and native binaries. We are on the right track then - looks like substituting <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONEOS</code> with <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code> might be just enough to fool <code class="language-plaintext highlighter-rouge">ld</code>.</p>

<h2 id="-meet-the-librarian">📚 Meet the Librarian</h2>
<p>So far, we’ve been playing with a fat binary, containing multiple platform-specific slices. We can see architectures available in a binary using the <code class="language-plaintext highlighter-rouge">file</code> command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>file Example.framework/Example
Example.framework/Example Mach-O universal binary with 4 architectures: <span class="o">[</span>i386:current ar archive] <span class="o">[</span>arm_v7] <span class="o">[</span>x86_64] <span class="o">[</span>arm64]
Example.framework/Example <span class="o">(</span><span class="k">for </span>architecture i386<span class="o">)</span>:        current ar archive
Example.framework/Example <span class="o">(</span><span class="k">for </span>architecture armv7<span class="o">)</span>:       current ar archive
Example.framework/Example <span class="o">(</span><span class="k">for </span>architecture x86_64<span class="o">)</span>:      current ar archive
Example.framework/Example <span class="o">(</span><span class="k">for </span>architecture arm64<span class="o">)</span>:       current ar archive
</code></pre></div></div>
<p>Obviously, for our purposes we don’t particularly care about x86 or ARMv7 slices. So let’s grab just the <code class="language-plaintext highlighter-rouge">arm64</code> one:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>lipo <span class="nt">-thin</span> arm64 Example.framework/Example <span class="nt">-output</span> Example.arm64
</code></pre></div></div>

<p>If we open the resulting <code class="language-plaintext highlighter-rouge">Example.arm64</code> file in a hex editor (such as <a href="https://github.com/HexFiend/HexFiend">Hex Fiend</a>), we should notice that the <a href="https://en.wikipedia.org/wiki/File_format#Magic_number">magic number (file format identification pattern)</a> of the file spells <code class="language-plaintext highlighter-rouge">!&lt;arch&gt;</code> in ASCII - this means, we are not working with an individual <em>binary</em>, but a UNIX archive of binaries - a <em>library</em>. We can unpack it quite easily:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ar x Example.arm64
</code></pre></div></div>

<p>The result should be a number of familiar looking <code class="language-plaintext highlighter-rouge">.o</code> files. If we open any of them in a hex editor, we should see the <code class="language-plaintext highlighter-rouge">CFFA EDFE</code> as the initial 2 bytes - a little-endian encoded <code class="language-plaintext highlighter-rouge">FEED FACE + 1</code>, the magic number of ARM64 Mach-O binaries. (with the original <code class="language-plaintext highlighter-rouge">FEED FACE</code> being the magic number of ARM32)</p>

<p>At this point, we have gone from an XCFramework, to a Cocoa Framework, to a UNIX library, to individual Mach-O binary objects that we can finally edit - truly, a matryoshka of abstraction layers <a href="https://en.wikipedia.org/wiki/Mach_(kernel)">spanning nearly 40 years of computing history</a>.</p>

<h2 id="️-dissecting-a-mach-o-binary">✂️ Dissecting a Mach-O Binary</h2>
<p>While trying to read and edit the object file in a hex editor is possible, it will quickly prove to be a fool’s errand. A look at the publicly available XNU sources for <a href="https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/EXTERNAL_HEADERS/mach-o/loader.h#L1241-L1268">the Mach-O loader</a> shows that <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONE</code> and <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code>  are different in size. <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code> struct has 2 extra fields of <code class="language-plaintext highlighter-rouge">UInt32</code> size, meaning it is exactly 8 bytes longer than than <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONE</code>.</p>

<p>While the machine code segment generated by LLVM is <a href="https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/dynamic_code.html">position independent</a>, the headers and load commands are not - after all they are <em>navigation instructions</em> for the loader. To place a longer load command in the binary, we need to both re-create the binary and adjust all the references to the absolute distance from the beginning of the binary file. (typically referred to as <em>offsets</em>)</p>

<p>To do this, we need to first understand what makes up a Mach-O binary. According to the <a href="https://github.com/aidansteele/osx-abi-macho-file-format-reference">official Mach-O ABI docs</a> and the <code class="language-plaintext highlighter-rouge">loader.h</code> specs, a Mach-O binary can be separated into 4 basic components:</p>

<ul>
  <li>a <strong>Mach-O header</strong>, which contains the magic number (such as <code class="language-plaintext highlighter-rouge">CAFE BABE</code> or <code class="language-plaintext highlighter-rouge">FEED FACE</code>), supported CPU type, and a number and size of load commands;</li>
  <li>a <strong>load command table</strong>, which is arbitrarily long and informs the linker about the binary and where to find segments of interest in the raw content;</li>
  <li>an <strong>optional padding</strong>, which can be added by a developer to simplify subsequent edits to the binary;</li>
  <li>a <strong>raw content</strong>, which contains the executable code, strings, and everything else required for the actual execution - all at offsets described in the load command table.</li>
</ul>

<p>All these components are described as byte-aligned C structs in the <code class="language-plaintext highlighter-rouge">MachO</code> framework. Armed with this knowledge, we can start implementing a simple command line tool to read and transmogrify any Mach-O binary from a native one to a Simulator one.</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">import</span> <span class="kt">Foundation</span>
<span class="kd">import</span> <span class="kt">MachO</span>

<span class="kd">extension</span> <span class="kt">Data</span> <span class="p">{</span>
    <span class="kd">func</span> <span class="n">asStruct</span><span class="o">&lt;</span><span class="kt">T</span><span class="o">&gt;</span><span class="p">(</span><span class="n">fromByteOffset</span> <span class="nv">offset</span><span class="p">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">T</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">withUnsafeBytes</span> <span class="p">{</span> <span class="nv">$0</span><span class="o">.</span><span class="nf">load</span><span class="p">(</span><span class="nv">fromByteOffset</span><span class="p">:</span> <span class="n">offset</span><span class="p">,</span> <span class="nv">as</span><span class="p">:</span> <span class="kt">T</span><span class="o">.</span><span class="k">self</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">let</span> <span class="nv">path</span> <span class="o">=</span> <span class="kt">CommandLine</span><span class="o">.</span><span class="n">arguments</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">guard</span> <span class="k">let</span> <span class="nv">handle</span> <span class="o">=</span> <span class="kt">FileHandle</span><span class="p">(</span><span class="nv">forReadingAtPath</span><span class="p">:</span> <span class="n">path</span><span class="p">)</span> <span class="k">else</span> <span class="p">{</span>
    <span class="nf">fatalError</span><span class="p">()</span>
<span class="p">}</span>

<span class="k">let</span> <span class="nv">headerData</span> <span class="o">=</span> <span class="k">try!</span> <span class="n">handle</span><span class="o">.</span><span class="nf">read</span><span class="p">(</span><span class="nv">upToCount</span><span class="p">:</span> <span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">mach_header_64</span><span class="o">&gt;.</span><span class="n">stride</span><span class="p">)</span><span class="o">!</span>
<span class="k">let</span> <span class="nv">header</span><span class="p">:</span> <span class="n">mach_header_64</span> <span class="o">=</span> <span class="n">headerData</span><span class="o">.</span><span class="nf">asStruct</span><span class="p">()</span>
<span class="k">if</span> <span class="n">header</span><span class="o">.</span><span class="n">magic</span> <span class="o">!=</span> <span class="kt">MH_MAGIC_64</span> <span class="o">||</span> <span class="n">header</span><span class="o">.</span><span class="n">cputype</span> <span class="o">!=</span> <span class="kt">CPU_TYPE_ARM64</span> <span class="p">{</span>
    <span class="nf">fatalError</span><span class="p">()</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If we fed the transmogrifier a valid Mach-O binary, the <code class="language-plaintext highlighter-rouge">header</code> struct will now inform us of the number of load commands and their total size. To read individual load commands following the Mach-O header, we need to understand two things: that C structs representing Mach-O commands are not explicitly polymorphic - they implicitly follow the <code class="language-plaintext highlighter-rouge">load_command</code> - and that individual load commands are not fixed in size.</p>

<p>In other words, every load command <em>begins</em> with a <code class="language-plaintext highlighter-rouge">load_command</code> struct. This struct contains exactly two 32-bit integers - describing the command <em>type</em> and the command <em>size</em>. To read the commands correctly, we need to support peeking into our <code class="language-plaintext highlighter-rouge">FileHandle</code> and checking the command size and type straight from raw <code class="language-plaintext highlighter-rouge">Data</code> objects:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">extension</span> <span class="kt">Data</span> <span class="p">{</span>
    <span class="k">var</span> <span class="nv">loadCommand</span><span class="p">:</span> <span class="kt">UInt32</span> <span class="p">{</span>
        <span class="k">let</span> <span class="nv">lc</span><span class="p">:</span> <span class="n">load_command</span> <span class="o">=</span> <span class="n">withUnsafeBytes</span> <span class="p">{</span> <span class="nv">$0</span><span class="o">.</span><span class="nf">load</span><span class="p">(</span><span class="nv">as</span><span class="p">:</span> <span class="n">load_command</span><span class="o">.</span><span class="k">self</span><span class="p">)</span> <span class="p">}</span>
        <span class="k">return</span> <span class="n">lc</span><span class="o">.</span><span class="n">cmd</span>
    <span class="p">}</span>

    <span class="k">var</span> <span class="nv">commandSize</span><span class="p">:</span> <span class="kt">UInt32</span> <span class="p">{</span>
        <span class="k">let</span> <span class="nv">lc</span><span class="p">:</span> <span class="n">load_command</span> <span class="o">=</span> <span class="n">withUnsafeBytes</span> <span class="p">{</span> <span class="nv">$0</span><span class="o">.</span><span class="nf">load</span><span class="p">(</span><span class="nv">as</span><span class="p">:</span> <span class="n">load_command</span><span class="o">.</span><span class="k">self</span><span class="p">)</span> <span class="p">}</span>
        <span class="k">return</span> <span class="n">lc</span><span class="o">.</span><span class="n">cmdsize</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kd">extension</span> <span class="kt">FileHandle</span> <span class="p">{</span>
    <span class="kd">func</span> <span class="nf">peek</span><span class="p">(</span><span class="n">upToCount</span> <span class="nv">count</span><span class="p">:</span> <span class="kt">Int</span><span class="p">)</span> <span class="k">throws</span> <span class="o">-&gt;</span> <span class="kt">Data</span><span class="p">?</span> <span class="p">{</span>
        <span class="k">let</span> <span class="nv">originalOffset</span> <span class="o">=</span> <span class="n">offsetInFile</span>
        <span class="k">let</span> <span class="nv">data</span> <span class="o">=</span> <span class="k">try</span> <span class="nf">read</span><span class="p">(</span><span class="nv">upToCount</span><span class="p">:</span> <span class="n">count</span><span class="p">)</span>
        <span class="k">try</span> <span class="nf">seek</span><span class="p">(</span><span class="nv">toOffset</span><span class="p">:</span> <span class="n">originalOffset</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">data</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Armed with these functions, we can now start extracting individual commands into Data blobs:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="nv">loadCommandsData</span><span class="p">:</span> <span class="p">[</span><span class="kt">Data</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">0</span><span class="o">..&lt;</span><span class="n">header</span><span class="o">.</span><span class="n">ncmds</span><span class="p">)</span><span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">_</span> <span class="k">in</span>
    <span class="k">let</span> <span class="nv">loadCommandPeekData</span> <span class="o">=</span> <span class="k">try!</span> <span class="n">handle</span><span class="o">.</span><span class="nf">peek</span><span class="p">(</span><span class="nv">upToCount</span><span class="p">:</span> <span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">load_command</span><span class="o">&gt;.</span><span class="n">stride</span><span class="p">)</span>
    <span class="k">return</span> <span class="k">try!</span> <span class="n">handle</span><span class="o">.</span><span class="nf">read</span><span class="p">(</span><span class="nv">upToCount</span><span class="p">:</span> <span class="kt">Int</span><span class="p">(</span><span class="n">loadCommandPeekData</span><span class="o">!.</span><span class="n">commandSize</span><span class="p">))</span><span class="o">!</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The most difficult individual load command to work with is <code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code>. Similarly to the Mach-O header, <code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code> is actually composed out of a <code class="language-plaintext highlighter-rouge">segment_command_64</code> struct, followed by an arbitrary number of <code class="language-plaintext highlighter-rouge">section_64</code>s:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="nv">segment</span><span class="p">:</span> <span class="n">segment_command_64</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="nf">asStruct</span><span class="p">()</span>

<span class="k">let</span> <span class="nv">sections</span><span class="p">:</span> <span class="p">[</span><span class="n">section_64</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">0</span><span class="o">..&lt;</span><span class="kt">Int</span><span class="p">(</span><span class="n">segment</span><span class="o">.</span><span class="n">nsects</span><span class="p">))</span><span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="n">index</span> <span class="k">in</span>
    <span class="k">let</span> <span class="nv">sectionOffset</span> <span class="o">=</span> <span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">segment_command_64</span><span class="o">&gt;.</span><span class="n">stride</span> <span class="o">+</span> <span class="n">index</span> <span class="o">*</span> <span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">section_64</span><span class="o">&gt;.</span><span class="n">stride</span>
    <span class="k">return</span> <span class="n">data</span><span class="o">.</span><span class="nf">asStruct</span><span class="p">(</span><span class="nv">fromByteOffset</span><span class="p">:</span> <span class="n">sectionOffset</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>With all the commands in memory, we can now finish reading the binary by saving the remainder of it for later handling:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="nv">programData</span> <span class="o">=</span> <span class="k">try!</span> <span class="n">handle</span><span class="o">.</span><span class="nf">readToEnd</span><span class="p">()</span><span class="o">!</span>
<span class="k">try!</span> <span class="n">handle</span><span class="o">.</span><span class="nf">close</span><span class="p">()</span>
</code></pre></div></div>

<p>Lastly, we need to persist the entire thing back to disk. And since we are dealing with a lot of <code class="language-plaintext highlighter-rouge">Data</code> arrays, let’s simplify their handling too:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">extension</span> <span class="kt">Array</span> <span class="k">where</span> <span class="kt">Element</span> <span class="o">==</span> <span class="kt">Data</span> <span class="p">{</span>
    <span class="k">var</span> <span class="nv">flattened</span><span class="p">:</span> <span class="kt">Data</span> <span class="p">{</span> <span class="nf">reduce</span><span class="p">(</span><span class="nv">into</span><span class="p">:</span> <span class="kt">Data</span><span class="p">())</span> <span class="p">{</span> <span class="nv">$0</span><span class="o">.</span><span class="nf">append</span><span class="p">(</span><span class="nv">$1</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span>
<span class="p">}</span>

<span class="k">try!</span> <span class="p">[</span>
    <span class="kt">Data</span><span class="p">(</span><span class="nv">bytes</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">header</span><span class="p">,</span> <span class="nv">count</span><span class="p">:</span> <span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">mach_header_64</span><span class="o">&gt;.</span><span class="n">stride</span><span class="p">),</span>
    <span class="n">loadCommandsData</span><span class="o">.</span><span class="n">flattened</span>
    <span class="n">programData</span><span class="p">,</span>
<span class="p">]</span><span class="o">.</span><span class="n">flattened</span><span class="o">.</span><span class="nf">write</span><span class="p">(</span><span class="nv">to</span><span class="p">:</span> <span class="kt">URL</span><span class="p">(</span><span class="nv">fileURLWithPath</span><span class="p">:</span> <span class="s">"</span><span class="se">\(</span><span class="n">path</span><span class="se">)</span><span class="s">.reworked.o"</span><span class="p">))</span>
</code></pre></div></div>

<p>The resulting file should be exactly the same as the input file. We can confirm this using <code class="language-plaintext highlighter-rouge">cmp</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>cmp <span class="nt">-s</span> input.o input.o.reworked.o <span class="o">||</span> <span class="nb">echo</span> <span class="s2">"Files are different!
</span></code></pre></div></div>

<p>If we see no errors, the reassembly worked as expected. We can now safely edit binary’s load commands.</p>

<h2 id="-raison-dêtre">🚀 Raison d’Être</h2>
<p>At this point, we broke up the binary with surgical precision and are ready to edit its individual components. The largest change is, of course, getting rid of the <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONEOS</code> command and replacing it with an instance of <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code>. Once this is done, we need to reconstruct the offsets in the following load commands:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code> - where we need to update <code class="language-plaintext highlighter-rouge">offset</code> and <code class="language-plaintext highlighter-rouge">reloff</code> properties of individual sections, as well as the <code class="language-plaintext highlighter-rouge">fileoff</code>, <code class="language-plaintext highlighter-rouge">filesize</code>, and <code class="language-plaintext highlighter-rouge">vmsize</code> properties for the entire binary;</li>
  <li><code class="language-plaintext highlighter-rouge">LC_DATA_IN_CODE</code> and <code class="language-plaintext highlighter-rouge">LC_LINKER_OPTIMIZATION_HINT</code> - which are represented using the same C struct, both requiring an update to the <code class="language-plaintext highlighter-rouge">dataoff</code> property;</li>
  <li><code class="language-plaintext highlighter-rouge">LC_SYMTAB</code> - where we need to change the <code class="language-plaintext highlighter-rouge">stroff</code> and <code class="language-plaintext highlighter-rouge">symoff</code> properties, for, respectively, strings and symbol tables offets;</li>
</ul>

<p>The most straightforward way to perform all the updates to the load commands, is to simply use Swift’s <code class="language-plaintext highlighter-rouge">map</code> and handle the updates in helper functions:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="nv">offset</span> <span class="o">=</span> <span class="kt">UInt32</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">build_version_command</span><span class="o">&gt;.</span><span class="n">stride</span> <span class="o">-</span> <span class="kt">MemoryLayout</span><span class="o">&lt;</span><span class="n">version_min_command</span><span class="o">&gt;.</span><span class="n">stride</span><span class="p">))</span>
<span class="k">let</span> <span class="nv">editedCommandsData</span> <span class="o">=</span> <span class="n">loadCommandsData</span>
    <span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="p">(</span><span class="n">lc</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kt">Data</span> <span class="k">in</span>
        <span class="k">switch</span> <span class="n">lc</span><span class="o">.</span><span class="n">loadCommand</span> <span class="p">{</span>
        <span class="k">case</span> <span class="kt">LC_SEGMENT_64</span><span class="p">:</span>
            <span class="k">return</span> <span class="nf">updateSegment64</span><span class="p">(</span><span class="n">lc</span><span class="p">,</span> <span class="n">offset</span><span class="p">)</span>
        <span class="k">case</span> <span class="kt">LC_VERSION_MIN_IPHONEOS</span><span class="p">:</span>
            <span class="k">return</span> <span class="nf">updateVersionMin</span><span class="p">(</span><span class="n">lc</span><span class="p">,</span> <span class="n">offset</span><span class="p">)</span>
        <span class="k">case</span> <span class="kt">LC_DATA_IN_CODE</span><span class="p">,</span> <span class="kt">LC_LINKER_OPTIMIZATION_HINT</span><span class="p">:</span>
            <span class="k">return</span> <span class="nf">updateDataInCode</span><span class="p">(</span><span class="n">lc</span><span class="p">,</span> <span class="n">offset</span><span class="p">)</span>
        <span class="k">case</span> <span class="kt">LC_SYMTAB</span><span class="p">:</span>
            <span class="k">return</span> <span class="nf">updateSymTab</span><span class="p">(</span><span class="n">lc</span><span class="p">,</span> <span class="n">offset</span><span class="p">)</span>
        <span class="k">case</span> <span class="kt">LC_BUILD_VERSION</span><span class="p">:</span>
            <span class="nf">fatalError</span><span class="p">()</span>
        <span class="k">default</span><span class="p">:</span>
            <span class="k">return</span> <span class="n">lc</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="o">.</span><span class="nf">merge</span><span class="p">()</span>
</code></pre></div></div>

<p>For handling <code class="language-plaintext highlighter-rouge">LC_VERSION_MIN_IPHONE_OS</code>, we need to return a <code class="language-plaintext highlighter-rouge">Data</code> blob containing a new instance of the <code class="language-plaintext highlighter-rouge">build_version_command</code> struct in our helper function. For other load commands, we simply update the C structs and return them as <code class="language-plaintext highlighter-rouge">Data</code> objects. The individual implementations of all the <code class="language-plaintext highlighter-rouge">load_command</code> changes are available in the <a href="https://github.com/bogo/arm64-to-sim">GitHub repository for the project</a>.</p>

<p>Last, but not least, we need to update the <code class="language-plaintext highlighter-rouge">sizeofcmds</code> property in the Mach-O header before the binary is written back to disk:</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">header</span><span class="o">.</span><span class="n">sizeofcmds</span> <span class="o">=</span> <span class="kt">UInt32</span><span class="p">(</span><span class="n">editedCommandsData</span><span class="o">.</span><span class="n">count</span><span class="p">)</span>
</code></pre></div></div>

<p>At this point, running our transmogrifier should yield a valid ARM64 Simulator file. Of course, updating a single binary only gets us so far - we still need to perform a couple more tasks:</p>

<ul>
  <li>use the transmogrifier on every object file;</li>
  <li>archive the objects back into a library;</li>
  <li>merge the library with the original x86_64 slice to form a Simulator-friendly fat binary;</li>
  <li>substituting the original library file within the Cocoa Framework within the XCFramework.</li>
</ul>

<p>We can knock off the first two pretty easily:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="k">for </span>file <span class="k">in</span> <span class="k">*</span>.o<span class="p">;</span> <span class="k">do </span>arm64-to-sim <span class="nv">$file</span><span class="p">;</span> <span class="k">done</span><span class="p">;</span>
<span class="nv">$ </span>ar crv ../Example.arm64-reworked <span class="k">*</span>.reworked.o
</code></pre></div></div>

<p>As a part of assembling the library, <code class="language-plaintext highlighter-rouge">ar</code> attempts to construct an index from provided binaries. The process requires performing extensive checks to confirm each object is a valid executable, and, thankfully, yields detailed errors. If we made any mistakes or omissions in our offset reconstructions, <code class="language-plaintext highlighter-rouge">ar</code> will tell us which section is faulty and what is it overlapping with. From here, we just need to keep hammering on the edits in our code. Once <code class="language-plaintext highlighter-rouge">ar</code> is happy, we can merge our transmogrified ARM64 slice with the Intel one.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>lipo <span class="nt">-create</span> <span class="nt">-output</span> Example Example.x86_64 Example.arm64-reworked
</code></pre></div></div>

<p>Finally! We now have a fat binary containing ARM64 and x86-64 Simulator slices. After substituting the original library file in the framework with our hacked one, it’s time for a ⌘+R and a moment of truth…</p>

<p><img src="/assets/images/2021-02-10-arm64-to-sim/Simulator with M1 Hacks.png" alt="Simulator finally running an M1 build" /></p>

<p>Boom! A native ARM64 framework hacked to run as a Simulator framework on M1 Macs! 🎉</p>

<h2 id="-learnings-and-dead-ends">🤓 Learnings and Dead Ends</h2>
<p>It took me a total of 15 hours to come up with this solution. Along the way I understood:</p>

<ul>
  <li>how Cocoa frameworks, UNIX libraries, and Mach-O binaries are constructed;</li>
  <li>that ARM64 code is more universal than I thought and Apple is remarkably consistent about it across platforms;</li>
  <li>that binary files are just more advanced <a href="https://en.wikipedia.org/wiki/Finite-state_machine">finite automatons</a>.</li>
</ul>

<p>This wouldn’t be an exciting project without a few dead ends. My futile attempts included:</p>

<ul>
  <li>using the open source security research tools, but neither <a href="https://lief.quarkslab.com/doc/latest/tutorials/11_macho_modification.html">LIEF</a> nor <a href="https://github.com/Homebrew/ruby-macho">ruby-macho</a> can reliably edit offsets required to insert an additional command into the Mach-O header - instead they require binaries to be built with extra padding;</li>
  <li>freeing up the “missing” 8 bytes by removing the 80 byte long <code class="language-plaintext highlighter-rouge">__cmdline</code> section from <code class="language-plaintext highlighter-rouge">LC_SEGMENT_64</code> – I was initially afraid to edit the binary offsets of load commands, but this still caused <code class="language-plaintext highlighter-rouge">ar</code> to complain about offsets;</li>
  <li>getting more “space” in the binary by stripping Bitcode (<code class="language-plaintext highlighter-rouge">xcrun bitcode_strip $input -m -o $output</code>) - the resulting binaries did not contain any additional padding.</li>
</ul>

<h2 id="-future-improvements">👍 Future Improvements</h2>
<p>A couple of potential improvements to the tool are left as an exercise to the reader. These could be:</p>

<ul>
  <li>making the offset handling more dynamic by doing two passes on the load commands and passing a data/offset pair (<code class="language-plaintext highlighter-rouge">DOP</code>) around instead;</li>
  <li>persist the minimum iOS and SDK values from the original <code class="language-plaintext highlighter-rouge">version_min_command</code> in the new <code class="language-plaintext highlighter-rouge">build_version_command</code>;</li>
  <li>extending binary hacking to support other Apple platforms (tvOS, watchOS, etc) natively and in Simulator;</li>
  <li>turning the legacy frameworks into bona fide Clang modules by exposing their umbrella headers through <code class="language-plaintext highlighter-rouge">module.modulemap</code>; (to avoid Swift bridging header headaches)</li>
</ul>

<h2 id="-references-and-contributions">🙌 References and Contributions</h2>
<p>A couple of people and projects deserve special thanks for making this effort easier:</p>

<ul>
  <li><a href="https://zacwe.st">Zac West</a> - thank you for pointing me in the direction of loader commands and <code class="language-plaintext highlighter-rouge">LC_BUILD_VERSION</code> in particular;</li>
  <li><a href="https://github.com/HexFiend/HexFiend">Hex Fiend</a> and <a href="https://github.com/dcsch/macho-browser">Mach-O Browser</a> - two solid tools for, respectively, exploring and comparing raw binaries, and for investigating the load commands of Mach-O binaries;</li>
  <li><a href="https://lief.quarkslab.com/">LIEF</a> and <a href="https://github.com/Homebrew/ruby-macho">macho-ruby</a> - two great open source libraries for platform-independent reading of Mach-O binaries (writing was a bit <code class="language-plaintext highlighter-rouge">¯\_(ツ)_/¯</code>);</li>
  <li><a href="https://github.com/steventroughtonsmith/marzipanify">marzipanify</a> - Steven Troughton-Smith attempted to solve a similar problem in 2018, albeit in the other direction;</li>
  <li><a href="https://yossarian.net/res/pub/macho-internals/macho-internals.pdf">Macho-O Internals presentation</a> - William Woodruff prepared an excellent explanation of how the Mach-O binaries are organized.</li>
</ul>]]></content><author><name>Bogo Giertler</name></author><category term="swift" /><category term="arm64" /><summary type="html"><![CDATA[A primer on how to launch native ARM64 binaries directly in the iOS Simulator, using otool, Mach-O, and a lot of elbow grease.]]></summary></entry></feed>