MacOS Universal Binaries

Apple made a daring move a few years ago: they changed their laptops and computers from Intel x86_64 processors to ARM Processors. Processor architecture changes are difficult to make; Every program that has to run on the computer must be recompiled for the new architecture. This process is enough work for just the first-party programs, but third-party developers are notoriously slow. It can take a long time before third-party programs work on the new architecture. And users hate it, when their programs no longer work after a simple computer upgrade.

But Apple has made this move before. Apple used to equip their computers with PowerPC processors, but from 2005 to 2006 they switched them to Intel processors. Because of this, Apple has a few tricks up their sleeves this time. One of them is "Universal Binaries", binaries, that can run on more than one architecture. Most binaries can only run on one architecture, so these are impressive. Let's see how they work.

The most interesting fact about Apple's Universal Binaries is that they are so simple. They are nothing more as multiple Mach-O files in the same file with a small header. Mach-O file are the normal executable file format, that Apple's operating systems have used for years. A Universal Binary just packs the executable files for the individual architectures together and the operating system chooses the right executable file, when the Universal Binary is executed.

All Universal Binaries, also known as "Fat Binaries", begin with a simple header. The first four bytes are a magic number, that identifies this file format. They are either the hexadecimal values CA FE BA BE, which identifies a 32-bit Universal Binary, or CA FE BA BF, which identifies a 64-bit Universal Binary. The magic number is followed by another four big-endian unsigned bytes, that represents the number of architectures.

// from fat.h In the MacOSX SDK 
struct fat_header {
    uint32_t    magic;      /* FAT_MAGIC or FAT_MAGIC_64 */
    uint32_t    nfat_arch;  /* number of structs that follow */
};

Directly after the header are the nfat_arch fat_arch or fat_arch64 structures, that define the individual architectures in the binary. The size of the members of these structures depends on whether the binary is 32-bit or 64-bit, but the order of the members is the same. All integers are big endian.

First comes the cputype, a 32-bit signed number, that identifies the architecture. The possible values can be found in the file machine.h. Next is cpusubtype, another 32-bit signed number, that identifies the exact version of the architecture. For example, a cputype of 0x00000007, or CPU_TYPE_X86, and a cpusubtype of 0x00000005, or CPU_SUBTYPE_PENT, means, that this part of the binary should run on an Intel Pentium processor.

The third member of the fat_arch or fat_arch64 structure is the unsigned offset into the binary where the original Mach-O object file for this architecture can be found. The next member is the unsigned size of that object. These two members together allow the original object file for this architecture to be extracted from the binary. These members are either 32-bit or 64-bit, depending on the type of "Universal Binary" this file is.

The last member of the fat_arch or fat_arch64 structure is align, the unsigned power of two that the object file must be aligned to, when it's loaded into memory. The fat_arch64 structure also has another 32-bit unsigned number named reserved, that is reserved for future use.

// also from fat.h
struct fat_arch {
    int32_t    cputype; /* cpu specifier (int) */
    int32_t    cpusubtype; /* machine specifier (int) */
    uint32_t   offset;  /* file offset to this object file */
    uint32_t   size;  /* size of this object file */
    uint32_t   align;  /* alignment as a power of 2 */
};

struct fat_arch_64 {
    int32_t    cputype; /* cpu specifier (int) */
    int32_t    cpusubtype; /* machine specifier (int) */
    uint64_t   offset;  /* file offset to this object file */
    uint64_t   size;  /* size of this object file */
    uint32_t   align;  /* alignment as a power of 2 */
    uint32_t   reserved; /* reserved */
};

With the information contained in these structures, you can finally find the correct Mach-O file inside the Universal Binary. The Mach-O file itself remains completely unchanged. There is no sharing of data between the different architectures and no data compression. As I researched these files, I was actually disappointed to find that they are so simple. But the Mach-O files inside are much more interesting. As someone coming from the Linux world and familiar with its ELF Files (Executable Linux Format), I was very surprised with the differences between the Mach-O and ELF file format. My next post will describe the details of the Mach-O format.

MacOS Universal Binaries

Select a language: