Several people asked me to re-explain virtual linear page tables after
my SLUG talk. So here it goes!
First, let's review the standard translation mechanism. We have a large
address space, which we divide up into pages. We map pages to frames
of physical memory.
The processor has a translation look-aside buffer or TLB which stores
a cache of virtual page to physical frame mappings. It is small, but
fast. Like any cache, it runs out and needs to be filled. The operating
system does this from the page tables, which keep a virtual to
physical mapping.
On Linux, we use a multi-level hierarchical page table, illustrated
above. First you take some of the top bits, and use that as an index
into the page global directory (PGD). You then follow that to a page
middle directory entry (PMD), take more bits as an index into that
page, and finally follow that to page table entries. Finally, you take
another index to find the PTE entry, which points to the physical page
the given virtual address is mapped to.
So far, so good. But this walking process is very slow. It requires a
lot of memory references and hence cache traffic, and has to be
performed at regular intervals.
What would be good is if we dispensed with the upper levels, as per the
above diagram. So we just keep the PTE entries (which point directly to
allocated physical pages) in one long array, and then simply use the VPN
as an index into it. This would require much less effort.
The problem is that we can not find that many contiguous (next to each
other) physical pages. Even if we could, the amount of physical memory
we had to allocate to page tables would be huge! So this scheme won't
work.
But we have a large virtual address space. Why not dedicate the top
portion of that to be a linear array of PTE entries, and use the TLB to
map the virtual pages full of PTE's to actual pages full of PTE's.
So when we ask for a virtual page, and it is not mapped by the TLB, we
can use the virtual page number as an index into our virtually linear
page table (note above we mention that we hash the virtual page number,
it equates to the same thing as just using it as an index). We can now
go directly to the PTE entry, and as long as the virtual page that PTE
entry exists in has a mapping to the underlying physical page full of
PTE entries, there is no need to walk the tree - the "virtual" PTE is
now a pointer to the real thing! From there, it is a simple matter to
read the PTE entry and find the physical frame for the virtual page we
are looking for.
Of course, as with any engineering effort, there are trade-offs.
Firstly, we have to sacrifice some of our virtual address space to the
virtual linear table. We have lots of address space, so this is not too
severe. The other trade-off is that you now have an extra TLB entry; as
we mentioned the TLB is small, so taking up more entries can have a
penalising effect. However, if you are walking linearly through pages,
as a program usually does, you will not need to do a full page table
walk for a whole page full of PTE entries and it only cost you one TLB
entry; a significant win.
Itanium attempts to find the hashed PTE address in hardware, and calls
is the virtually hashed page table walker or VHPT walker. If it can
not resolve the situation, it returns control to the operating system,
which will then have to do a full walk of the page tables. There's a few
more details, but that's the broad strokes.
I hope that clears things up for a few people ...
... however if it does not, please feel free to modify the
figures to make them
clearer!