In the interest of correcting the inaccurate information posted in this thread.
Physical Address Extensions (PAE)
Every IA32 processor since the Pentium Pro has had PAE36 allowing 64GB of memory to be physically addressed. Most AMD64/EMT64 processors have PAE40 allowing 1TB to be physically addressed however I believe the early 64bit P4s didn’t.
Cacheable Area.
On many processors the address space cacheable by the processor, and therefore practically usable is less than the size of the address space. For example the old Slot 1 Celerons could only cache 512MB and PII era Slot 1 CPUs could only cache 4GB. Having a 64GB cacheable area starting in the Xeon line and migrated down.
PAE OS Support
A poster further up was correct that a 32 bit OS cannot address more than 4GB. Address is the key word here it means that each process cannot have no more than 4GB of virtual address space. Nothing says that has to map onto physical RAM below 4GB. However the original poster was correct in that device memory must be below 4GB.
Microsoft artificially limited desktop versions of Windows to only support 4GB of physical address space probably due to the issues with device driver quality. However some server versions can address more, Microsoft can very tightly control what device drivers run on systems running these versions of Windows and therefore allow the PAE suport code to be switched on.
Windows 2000 Advance Server - 8GB Physical
Windows 2000 DataCentre - 32GB Physical
Windows Server 2003 Enterprise - 32GB Physical
Windows Server 2003 DataCentre - 64GB Physical
All other versions of Windows including Windows Vista do not have the PAE code enabled and cannot access memory which has been relocated above 4GB in order to make room for device memory.
Every version of Linux since the early 2.6 kernels has supported 64GB assuming the kernel was configured for it.
Chipset Support.
Even if the OS and the CPU support PAE it still might not be possible to map the physical RAM that has had to be moved to make space for the device memory above 4GB so it can be used. Chipset support is required to make this magic happen and, especially on laptop and “value” segment chipsets this magic is not present. This is the reason the MacBook Pro cannot support more than 3GB of physical RAM but the Mac Pro (based on Xeon processors and chipsets) can.
Address Window Extensions
This is a Windows API to allow a 32 bit process to address memory outside it’s 4GB virtual address space. It works by remapping memory below the 4GB barrier. It’s a horrible slow hack and never really caught on. MS did it because they needed it internally for things like the 32 bit version of SQL Server and Exchange.
AMD64/EMT64
As noted above processors implementing a 64 bit x86 instruction can address 40 bits or 1TB of RAM. A 32 bit program running under a 64 bit OS can still only use 4GB of virtual address space, where as a 64 bit program can use 64GB of virtual address space. The machine is therefore capable of physically addressing less memory than the programs running on it can virtually address.
Both MacOS and Linux support running both 32 and 64 bit binaries on a 64 bit kernel with no slow down.
Windows cannot do this and requires and emulated 32 bit subsystem called WOW (Windows on Windows) which incurs a slow down. Exactly the same was done with the transition from 16 to 32 bit Windows, in that case having a 16 bit emulated system on 32 bit Windows.
Real Big Iron processors such as POWER, Alpha generally have 48 or 50 odd bits of physical address space. In fact when the Alpha was being designed a customer complained that the processsor couldn’t do 64 bit physical addressing. DEC replied that if the customer was willing or order enough RAM from DEC to require the physical address space DEC would redesign the processor to support it 
To sum up
A good idea (PAE) was hamstrung in the consumer market by lacklustre OS support from Microsoft leading to supporting hardware not being designed to be compatible with it and driver support being broken.
PAE did extremely well in the server market due to rapid OS support via Linux and the ability for third parties to fix broken drivers in this OS. As it was in constant use supporting chipsets were designed and qualified to support it. Microsoft also eventually supported it in the server market but attempted to charge a premium for it.