Understanding PLT and GOT
The Procedure Linkage Table and Global Offset Table are crucially important, allowing ELF binaries to dynamically Link. But what are they and how do they work?
What is a PLT?
PLT stands for Procedure Linkage Table and is partly used by ELF binaries to facilitate dynamic linking. To speed up program startup, ELF binaries use lazy binding of a procedure address, meaning the address of a procedure isn't known until it's called. Every dynamically bound program has a PLT containing entries to nonlocal routines.
When a program is compiled, the call to routines that will be dynamically linked are updated to point to the PLT entry.
Initial calls to a PLT entry force the entry to call the runtime linker to resolve the routine's actual address and then jump to the actual address. Subsequent calls to the now resolved routine are constant time O(1) lookups, only requiring an indirect jump.
The runtime (dynamic) linker knows the memory location of each segment and can therefore compute the absolute address. The PLT translates function calls to absolute locations.
With dynamic linking, absolute addresses can be a little confusing. Conventionally, absolute addresses are determined at compile time and point to a fixed physical or logical location in memory. In the context of dynamic linking, they are determined during runtime; the dynamic linker knows the location of each segment and fixes a shared library to some absolute relative location. I use the phrase absolute relative because it's relative to some base, but it doesn't change after calculation. That is (as outlined below): once P = S + A is calculated, GOT entry is fixed to this location. If position-independence is disabled, it will not be absolute relative, but absolute -- a constant distance from the program's base. In contrast, a position-independent or relative address can be relative to a register or something else, frequently changing during program execution.
What is a GOT?
GOT stands for Global Offset Table and is a section in program memory that facilitates PIE/PIC by mapping symbols to absolute addresses, known as relocations. In a previous post, "Linker Scripts," I explained the sections of an executable (defined in a linker script) and how they allow developers to structure a program's memory and the data contained in each section.
The absolute address is the runtime memory address that isn't known until a program starts. Dynamic Loader or Dynamic Linker will update the GOT with the relocations or symbol-to-absolute memory address mappings.
The GOT
Why is this necessary?
As mentioned in my note on Position Independent Code/Executables, the addresses of objects in object files were absolute, starting at a known location; in most cases, the starting point was zero. With multiple programs sharing an absolute address, memory conflicts would arise, preventing concurrent execution and/or static linking with required shared libraries. PIC and PIE allow the linker to relocate objects statically or during runtime. When a routine is called, the linker or loader will not only (dynamically or statically) resolve the actual address (symbol resolution) but also update (relocate) the object to an absolute (runtime) memory address that does not conflict with other objects. With static linking, the linker's single object file (or executable) will include the reorganized objects linked together in a way that makes sense. With dynamic linking, the PLT and GOT are used for ELF binaries. When a dynamically linked routine is called for the first time during runtime, the PLT will invoke the dynamic runtime linker to resolve the address by using the GOT and then jump the absolute memory address (runtime address) for execution.
How are they used to facilitate dynamic linking?
The first entry in the PLT (PLT0) is a special code to call the dynamic linker. At load time, the dynamic linker places two values in the GOT, at GOT+4 and GOT+8. These are the second and third words of the GOT, not the first and second. In the first entry (GOT+4), a library identifier code is stored, and in the second entry (GOT+8), the address of the dynamic linkers symbol resolution routine is stored. The remaining entries are the indirect jumps through the GOT. I believe this is for each call to a PLT entry.
When a call is made to a dynamically linked routine, the first PLT entry, PLT0, which contains code to call the dynamic linker, will place a code in the GOT at GOT+4 that identifies the library being linked and the address to the resolution routine at GOT+8.
For 64-bit architectures, addresses are 64-bits (8 bytes), so the second word in GOT is at GOT+8 (in hex GOT+0x8), and the third is at GOT+16 (in hex GOT+0x10). The first entry, _DYNAMIC, points the linker to its structure.
Several relocation types are outlined in System V ABI. My example below includes JUMP_SLOT and GLOB_DAT. For more on relocation and types, see my relocation note! I believe the dynamic linker will use different resolution routines depending on the relocation type. From the System V ABI documentation, the below information is useful for understanding JUMP_SLOT and GLOB_DAT on an x86_64 architecture:
A represents the addend used to compute the vlaue of the relocation table
B represents the base address at which a shared object has been loaded into memory during execution.
S represents the value of the symbol whose inded resides in the relocation entry.
It's worth noting that the PLT is also position-independent and must find itself. The first word in the GOT (GOT+0) contains the address of the dynamic structure with the symbol _DYNAMIC, which allows the dynamic linker to find its structure.
Subsequent entries in PLT will have the label PLTn, where n is an integer representing an entry. Each PLT entry initially has a corresponding GOT entry pointing to the PLT push instruction following the first jump. After the jump, a push instruction pushes a relocation offset. This offset is the files relocation table of a special entry of type JUMP_SLOT.
Real Example
Let's examine the process step-by-step. Below is a very simple main function that calls printf(). You'll notice that my compiler used puts() instead because they exhibit similar behavior with small inputs, and given that puts completes with fewer cycles, the optimization was made for me.
Here is the relevant disassembled code
In the .text section, there is a call in the main function at memory address 0x1147 to address 0x1030.
call 1030 <puts@plt>
Notice that the static disassembler I used, objdump, resolved the PLT label puts@plt. There are two PLT labels to pay attention to: puts@plt at memory address 0x1030 and puts@plt-0x10 at memory address 0x1020. Let's examine the call in main to 0x1030, puts@plt.
The first instruction, jmp QWORD PTR [rip+0x2fca], jumps to the Global Offset Table at offset 4000 (known from the relocation table). It is an indirect jump relative to the address of the next instruction pointed to by the %rip register, plus hex 0x2fca. Although not shown here, this first jump will be to the corresponding GOT entry (offset 4000 from the relocation table; see below), which hasn't been resolved yet and will point to the push instruction at memory address 0x1036. This will always happen for initial calls to unresolved routines. The push instruction pushes the relocation index onto the stack that the linker will use later.
This relocation index was determined during the static link process. If you revisit the linker script provided in "Linker Scripts," you'll notice a section named .rela.dyn. This section will include the dynamic relocations. Although unresolved at compile time, the awareness that dynamic references must be solved is present.
During compile time, the linker will detect routines such as puts() that will be dynamically linked and create corresponding entries in PLT and GOT. It generates relocation entries and stores information in the .rela.plt and .rela.dyn sections of the ELF file. This information is all that's needed to solve during runtime. Below is an example of a relocation entry after running readelf -r <elf binary>
.
The important information includes the Type (JUMP_SLOT), the symbolic name (puts@GLIBC_2.2.5), and the Offset in the GOT (4000), where the resolved address will be written. Remember, the initial jump to the GOT @ offset 4000 will point to the next push instruction because it hasn't been resolved yet.
Next, jmp 1020 will jump to the second PLT label puts@plt-0x10.
Recall that the second word in the GOT is stored at GOT+8 (GOT+4 on 32-bit architectures) and is a code that identifies the shared library. The instruction push QWORD PTR [rip+0x2fca], pushes this code to the stack. Next is a jmp QWORD PTR [rip+0x2fcc] to the third word in the GOT at GOT+16 (GOT+0x10), the linkers resolution routine. At this point, control is handed over to the linker, and the linker will unwind the stack to solve the reference. This is what the stack might look like:
So, how does the linker resolve? According to the ABI, for the relocation type JUMP_SLOT, P, the offset or place in the GOT where the routine's resolved address will be saved equals S + A. From the readelf output above, S is the Sym. Value is 0. Remember that this isn't known before runtime; once the program is executed, the Addend ( A ) offset will be added to S to form the final absolute memory address. In this example, the GOT @ offset 4000 would result in S + A.
Once the dynamic linker returns, the next instruction will again be in the puts@plt table, another jump to the relocation table @ offset 4000 ( jmp QWORD PTR [rip+0x2fca] ) to get the absolute address for puts(), and return to main.
Crucially, subsequent calls to the PLT will transfer directly to the resolved address in the GOT at the proper offset (4000 in this example) without being forwarded to the push instruction. Why? Hopefully, it's clear, but I will explain. The call to the first PLT will jump to the offset specified in the relocation table. For the initial jump, no translation exists because the linker hasn't resolved. However, once resolved, subsequent jumps to GOT at the appropriate offset will have a valid address reference and immediately transfer to the targeted program.
A caveat with GOT/PLT regarding larger programs. According to the x86-64 AMD System V ABI, the 32-bit displacement size for jumps means that the PLT and GOT can be at most 2GB apart. For more efficient PIC code, consider using the -fPic
and -fpic
compiler flags. The latter may not be available on all architectures, so explore your compiler settings.
How can I circumvent stack protections such as ASLR?
First, ASLR stands for Address Space Layout Randomization. A stack protection mechanism randomly assigns the location of key areas in a process's address space, such as the stack and heap, to prevent memory exploitation.
Consider a typical buffer overflow, where vulnerable code neglects to enforce bounds checking on user input. If that input exceeds the size of a buffer (for which memory was allocated), it can (logically) overwrite key registers or leak data. For example, an attacker can use a specially crafted input to determine the exact offset where the saved instruction pointer register lives and overwrite it by inserting an address to redirect execution to the top of the stack where a vulnerable shellcode can exist.
ASLR is one mechanism that attempts to complicate a seemingly straightforward buffer overflow. Additionally, stack protection could be completely disabled. How can these defense techniques be circumvented to exploit a binary?
Return to Libc or Return 2 Libc (ret2libc) is an attack that circumvents stack protections by locating meaningful functions in the shared library, Libc. Libc is the standard library for the C programming language and includes several useful functions, including system(), which can be used to execute code. Calling system() on a shell ( system(/bin/bash) ) would give an attacker shell access to the vulnerable computer.
How does it work?
This note has enough. To see this exploit in action, check out my walkthrough.
References
Levine, J. R. (2010). Linkers and Loaders. Morgan Kaufmann.
Last updated