Skip to content

Macho

Understanding Mach-O Files and Patching#

Mach-O (Mach Object) is the native binary file format used by Apple operating systems, including macOS, iOS, iPadOS, tvOS, and watchOS. It defines the structure for executables, dynamic libraries (.dylib), frameworks, bundles, and object files (.o). Understanding its structure is key to manipulating dependencies and runtime behavior, often necessary for application packaging and relocation.

Mach-O File Structure#

A Mach-O file is organized into three main regions:

  • Mach Header: Located at the very beginning (offset 0). It identifies the file as Mach-O, specifies the target architecture (e.g., x86_64, arm64), the file type (executable, dylib, etc.), and most importantly, the number and total size of the Load Commands that follow.
  • Load Commands: A list of variable-length commands immediately following the header. These commands act as instructions for the operating system's dynamic linker (dyld) and the kernel, dictating how to map the file into memory, what libraries are needed, where the main execution thread starts, symbol table locations, code signature details, and more.
  • Data: The bulk of the file, containing the actual code and data, organized into segments and sections as specified by the Load Commands.

Visual Layout

+---------------------+
| Mach Header         | (File Offset 0)
| (mach_header_64)    |
+---------------------+
| Load Command 1      | (Immediately follows Header)
+---------------------+
| Load Command 2      |
+---------------------+
| ...                 |
+---------------------+
| Load Command N      | (Total size = sizeofcmds from Header)
+---------------------+
|                     |
| Data Region 1       | (e.g., __TEXT Segment: code sections)
| (Segments/Sections) |
+---------------------+
|                     |
| Data Region 2       | (e.g., __DATA Segment: data sections)
|                     |
+---------------------+
| ...                 | (e.g., __LINKEDIT: symbol/string tables)
+---------------------+

Load Commands: The Core Instructions#

The Load Commands region is central to Mach-O's functionality. dyld parses this list to understand how to prepare the binary for execution. Each command has a type (cmd) and size (cmdsize). Key types include:

  • LC_SEGMENT_64 / LC_SEGMENT: Defines a segment (e.g., __TEXT, __DATA, __LINKEDIT) and its properties: file offset/size, virtual memory address/size, and permissions (read/write/execute). It also contains descriptions of the sections (like __TEXT.__text, __DATA.__data) within that segment.
  • LC_ID_DYLIB: Specifies the "install name" for a dynamic library. This is the canonical path identifying the library, used by other binaries when linking against it.
  • LC_LOAD_DYLIB: Defines a dependency on an external dynamic library, specifying the library's install name (the path to find it).
  • LC_LOAD_WEAK_DYLIB: Defines an optional library dependency.
  • LC_REEXPORT_DYLIB: Links against another library and re-exports its symbols.
  • LC_RPATH: Adds a path to the runtime search path list, used for resolving @rpath dependencies.
  • LC_MAIN: Specifies the entry point (start address) for executable files.
  • LC_CODE_SIGNATURE: Points to the code signature data.
  • LC_SYMTAB: Points to the symbol table and string table (used by linker/debugger).
  • LC_DYSYMTAB: Points to dynamic linking symbol information.
  • LC_DYLD_INFO_ONLY: Points to optimized dynamic linking info used by dyld (rebasing, binding).

Path Commands and Resolution#

How dyld finds dependent libraries (LC_LOAD_DYLIB) is crucial and often involves special path prefixes:

  • @executable_path: Resolves to the absolute path of the directory containing the main executable of the running process. Useful for finding libraries bundled relative to the main application binary.
  • @loader_path: Resolves to the absolute path of the directory containing the specific Mach-O file (executable or library) that contains the LC_LOAD_DYLIB command currently being processed. Useful for libraries finding other libraries located relative to themselves.
  • @rpath: A placeholder indicating that dyld should search for the library using a list of runtime search paths. This search list is constructed in order: - Paths specified by LC_RPATH load commands within the Mach-O file containing the @rpath dependency itself. - Paths specified by LC_RPATH load commands within the main executable (if the dependency is not in the main executable). - Paths specified by LC_RPATH load commands within the main executable (if the dependency is not in the main executable). - Paths specified in the DYLD_LIBRARY_PATH environment variable (though its use is often restricted for security reasons, especially with System Integrity Protection). - Paths specified in the DYLD_FALLBACK_LIBRARY_PATH environment variable (if DYLD_LIBRARY_PATH is not set or doesn't find the library). - Standard system fallback locations (e.g., /usr/local/lib, /usr/lib).

The LC_RPATH load command simply contains a path string to be added to this search list.

How Patching Works with arwen#

Patching Mach-O files with a tool like arwen typically involves modifying the Load Commands or the data they reference (often strings within the commands themselves or in the __LINKEDIT segment).

Some common patching operations include:

  • Modifying runtime dependencies. This involves adding LC_RPATH or removing them. For example, adding LC_RPATH command with the path @loader_path/../Frameworks to make a binary look inside a sibling Frameworks directory for its @rpath dependencies.
  • Changing Dependencies: To make a binary look for a library in a different location, you modify the path string stored within an LC_LOAD_DYLIB or LC_LOAD_WEAK_DYLIB command. For example, changing /usr/local/lib/libfoo.dylib to @rpath/libfoo.dylib often requires ensuring an appropriate LC_RPATH exists.
  • Changing a Library's Install Name: To change the canonical path by which other binaries refer to a library (essential when relocating or bundling libraries/frameworks), you modify the path string stored within the library's own LC_ID_DYLIB command. For instance, changing /Users/dev/project/build/lib/libbar.dylib to @rpath/libbar.dylib.
  • Adding or Modifying Runtime Search Paths (RPATH): To tell dyld where to look when resolving @rpath dependencies, you add a new LC_RPATH command or modify the path string within an existing one. You might add an LC_RPATH command with the path @loader_path/../Frameworks to make a binary look inside a sibling Frameworks directory for its @rpath dependencies.

Challenges and Considerations:

  • Space Constraints: The mach_header specifies the total size (sizeofcmds) allocated for all load commands. If you need to add a new command or make a path string significantly longer, there might not be enough space. Simple tools might fail. More sophisticated tools like arwen might attempt to use existing padding or might need to rewrite parts of the file, which is complex. Changing a path to another path of the same or shorter length is generally safest and easiest.
  • Code Signing: Modifying any part of a signed Mach-O binary (executable or library) invalidates its code signature. On modern macOS and iOS, unsigned or improperly signed code may fail to run due to security policies (Gatekeeper, System Integrity Protection). After patching a signed binary, you must re-sign it using the codesign command-line tool with an appropriate certificate for it to be runnable in many contexts.