Universal Procedure Pointers
When Apple announced they’d be switching from PowerPC to Intel CPUs in 2005, many existing Mac developers were looking at the prospect more calmly than newer arrivals to the platform. After all, Apple had done something similar before, successfully, in the mid-nineties: The switch from the 68000 CPU to the PowerPC CPU.
One of the differences to 2005’s switch, however, was that Apple permitted mixing of PowerPC and 68000 code within the same application. To achieve that, a new “Mixed Mode Manager” was introduced, that took care of switching between executing raw PowerPC code and emulating 68000 CPU instructions. The linchpin of this manager were Universal Procedure Pointers, or UPPs, for short (sometimes also called Routine Descriptors).
Universal Procedure Pointers
A UPP was a simple data structure that described the calling conventions and location of a PowerPC function in RAM, and started with a 68000 instruction. This data structure could be handed to any system function where it expected a callback, and could be executed by 68000 code just like a function pointer.
The instruction at the start of the UPP simply contained a jump (think function call, or goto) to a function that recorded the address of the UPP and stopped/started the 68000 emulator.
This meant that Apple only had to port the very foundations of the operating system to PowerPC for the initial roll-out. Applications like the Finder, or window/control definition functions (WDEFs and CDEFs, little code modules that took care of drawing the frame around a window or custom views) could remain 68000 code, and could be ported selectively as needed.
This also meant that plug-ins written for a 68000 application could be loaded and launched by a PowerPC application. All it had to do was, instead of calling the plug-in’s main function directly (which would crash, because it contained 68000 instructions, not PowerPC instructions), it would call the CallUniversalProc function.
The CallUniversalProc function would look at the start of the function pointer given. PowerPC plug-ins effectively contained a UPP at the start. From the address of the function it jumped to, the Mixed Mode Manager could see that this was already PowerPC, and just jump over the UPP to where the PowerPC code lay, without having to load and run the 68000 emulator, which only got chosen if the function didn’t begin with a UPP.
Also, a 68000 application running in emulation on a PowerPC Mac was able to load a PowerPC plugin. It would simply try to execute the UPP, whose first bytes were the jump instruction telling the Mixed Mode Manager to switch back to PowerPC and run it directly.
Fat binaries – universal apps before universal apps
It was trivial to create an application where the same file ran both on old and new Macs: 68000 executables contained their code in ‘CODE’ resources in their resource fork. PowerPC applications had their code in the data fork of the application file, plus a ‘cfrg’ (“code fragment”) resource with some information about the code (e.g. an offset, so you could have other data in the data fork besides code, which especially games on 68000 liked to do). So a 68000 Mac would simply ignore the data fork and ‘cfrg’ resource, while an PowerMac would look for it, and only if it failed to find one start the emulator and run the 68000 code.
This meant that, in those days, compilers simply built a 68000 and a PowerPC version of the application, then copied the ‘CODE’ resources from the 68000 application into the PowerPC application. Presto! Fat binary for both architectures!
But that wasn’t all: It was also possible to create UPPs (and thus plug-ins) that were “fat”: They contained both PowerPC and 68000 code. Depending on what architecture you were running under, the Mixed Mode Manager would simply jump to the right offset in your plug-in resource, which contained both versions of your code.
Of course, all this mucking about with UPPs meant that you had to allocate/free memory for a UPP for each function you wanted to pass to a system API. And you had to keep that memory around as long as that system call needed it.
For plug-ins, this involved some additional management, as often plug-ins would be dynamically loaded and unloaded during the life of an application. For functions in your application, you usually just stashed the result of NewRoutineDescriptor in a global variable and never bothered calling DisposeRoutineDescriptor.
Why no UPPs for Intel?
So why didn’t Apple choose to do UPPs again for the Intel switch? Well, apart from political reasons (back then, many application vendors, not just Apple, dragged their feet porting their Mac applications to PowerPC, meaning Macs spent most of their cycles emulating old code instead of overtaking the competition), PowerPC and Intel differed in the way they stored numbers in memory.
The PowerPC CPU actually supported running both in big endian like the 68000 and little endian like Intel CPUs. This came in handy when switching from 68000, because the PowerPC CPU was simply told to run big endian, and both PowerPC and 68000 code now stored their data the same way.
But of course the Intel CPU didn’t have that switch. And since an emulator only knows about raw bytes, the PowerPC emulator (“Rosetta”) in Intel Macs could not transparently convert the stored bytes. So it was decided to not allow mixing of PowerPC or Intel code at all. There would only be a tiny bit of translation at the point when a PowerPC application called into the system.
If an application had plug-ins that might still be written in PowerPC code, it could not load them. You had to run a PowerPC version of the application to run your PowerPC plug-ins in, or the Intel version to run your Intel plug-ins (Of course, there were universal binaries that packaged those two versions up in the same file).
Though the QuickTime media playback library found a nice workaround for this during another switch, from 32-bit to 64-bit: It simply launches a separate, hidden background process that is 32 bit. That process can load any old legacy plug-ins, QuickTime running in a 64-bit application can pipe the data to be en-/decoded to that process, and that process sends it back when it’s done. This is not optimal, but can be surprisingly fast because it uses Unix shared memory and Mach messages.
Update: As Leon kindly pointed out, this description of QuickTime X's separate 32-bit encoding process is a bit simplistic: What it actually did was upload the graphics directly to the graphics card (i.e. an IOSurface), so it didn't even need to be piped back to the 64-bit process for simple playback.