Computer System Chapter 7

1Static linking. The linker combines relocatable object files to form an executable object file p.

. Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.
. Executable object file. Contains binary code and data in a form that can be copied directly into memory and executed.
. Shared object file.A special type of relocatable object file that can be loaded  into memory and linked dynamically, at either load time or run time.

. Global symbols that are defined by module m and that can be referenced by other modules. Global linker symbols correspond to non static C functions and global variables that are defined without the C static attribute.
. Global symbols that are referenced by module m but defined by some other module. Such symbols are called externals and correspond to C functions and variables that are defined in other modules.
. Local symbols that are defined and referenced exclusively by module m. Some local linker symbols correspond to C functions and global variables that are defined with the static attribute. These symbols are visible anywhere within module m, but cannot be referenced by other modules. The sections in an object file and the name of the source file that corresponds to module m also get local symbols.

Any global variable or function declared with the static attribute is private to that module.
Similarly, any global variable or function declared without the static attribute is public and can be accessed by any other module.

Given this notion of strong and weak symbols, Unix linkers use the following rules for dealing with multiply defined symbols:
. Rule 1: Multiple strong symbols are not allowed.
. Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol.
. Rule 3: Given multiple weak symbols, choose any of the weak symbols.

When in doubt, invoke the linker with a flag such as the gcc -fno-common flag, which triggers an error if it encounters multiply defined global   symbols.1

During the symbol resolution phase, the linker scans the relocatable object files and archives left to right in the same sequential order that they appear on the compiler driver’s command line.  The general rule for libraries is to place them at the end of the command line. if the libraries are not independent, then they must be ordered so that for each symbol s that is referenced externally by a member of an archive, at least one definition of s follows a reference to s on the command line.

it merges the input modules and assigns run-time addresses to each symbol. Relocation consists of two steps:
. Relocating sections and symbol definitions. In this step, the linker merges all sections of the same type into a new aggregate section of the same type. For example, the .data sections from the input modules are all merged into one section that will become the .data section for the output executable object file. The linker then assigns run-time memory addresses to the new
aggregate sections, to each section defined by the input modules, and to
each symbol defined by the input modules.When this step is complete, every
instruction and global variable in the program has a unique run-time memory
address.
. Relocating symbol references within sections. In this step, the linker modifies
every symbol reference in the bodies of the code and data sections so that
they point to the correct run-time addresses. To perform this step, the linker
relies on data structures in the relocatable object modules known as relocation
entries, which we describe next.

it generates a relocation entry that tells the linker how to modify the reference when it merges the object file into an executable. Relocation entries for code are placed in .rel.text. Relocation entries for initialized data are placed in .rel.data.

The ELF header describes the overall format of the file. It also includes the program’s entry point, which is the address of the first instruction to execute when the program runs.  The .init section defines a small function, called _init, that will be called by the program’s initialization code. Since the executable is fully linked (relocated), it needs no .rel sections.
ELF executables are designed to be easy to load into memory, with contiguous chunks of the executable file mapped to contiguous memory segments. This mapping is described by the segment header table.

112

On 32-bit Linux systems, the code segment starts at address 0x08048000. The data segment follows at the next 4 KB aligned address. The run-time heap follows on the first 4 KB aligned address past the read/write segment and grows up via calls to the malloc library. (We will describe malloc and the heap in detail in Section 9.9.) There is also a segment that is reserved for shared libraries. The user stack always starts at the largest legal user address and grows down (toward lower memory addresses). The segment starting above the stack is reserved for the code and data in the memory-resident part of the operating system known as the kernel.

Shared libraries are modern innovations that address the disadvantages of static libraries. A shared library is an object module that, at run time, can be loaded at an arbitrary memory address and linked with a program in memory. This process is known as dynamic linking and is performed by a program called a dynamic linker.

Shared libraries are “shared” in two different ways. First, in any given file system, there is exactly one .so file for a particular library. Second, a single copy of the .text section of a shared library in memory can be shared by different running processes.

unix> gcc -shared -fPIC -o libvector.so addvec.c multvec.c

The -fPIC flag directs the compiler to generate position-independent code (more on this in the next section). The -shared flag directs the linker to create a shared object file.

unix> gcc -o p2 main2.c ./libvector.so

This creates an executable object file p2 in a form that can be linked with libvector.so at run time. The basic idea is to do some of the linking statically when the executable file is created, and then complete the linking process dynamically when the program is loaded.

The dynamic linker then finishes the linking task by performing the following relocations:
. Relocating the text and data of libc.so into some memory segment.
. Relocating the text and data of libvector.so into another memory segment.
. Relocating any references in p2 to symbols defined by libc.so and libvector.so.

A key purpose of shared libraries is to allow multiple running processes to share the same library code in memory and thus save precious memory resources.  A better approach is to compile library code so that it can be loaded and executed at any address without being modified by the linker. Such code is known as position-independent code (PIC).

Linking can be performed at compile time by static linkers, and at load time and run time by dynamic linkers. Linkers manipulate binary files called object files, which come in three different forms: relocatable, executable, and shared.
Relocatable object files are combined by static linkers into an executable object file that can be loaded into memory and executed. Shared object files (shared libraries) are linked and loaded by dynamic linkers at run time, either implicitly when the calling program is loaded and begins executing, or on demand, when the program calls functions from the dlopen library. 

The two main tasks of linkers are symbol resolution, where each global symbol in an object file is bound to a unique definition, and relocation, where the ultimate memory address for each symbol is determined and where references to those objects are modified.
Static linkers are invoked by compiler drivers such as gcc. They combine multiple relocatable object files into a single executable object file. Multiple object files can define the same symbol, and the rules that linkers use for silently resolving these multiple definitions can introduce subtle bugs in user programs.
Multiple object files can be concatenated in a single static library. Linkers use libraries to resolve symbol references in other object modules. The left-to right sequential scan that many linkers use to resolve symbol references is another source of confusing link-time errors. Loaders map the contents of executable files into memory and run the program. Linkers can also produce partially linked executable object files with unresolved references to the routines and data defined in a shared library. At load time, the loader maps the partially linked executable into memory and then calls a dynamic linker, which completes the linking task by loading the shared library and relocating the references in the program.
Shared libraries that are compiled as position-independent code can be loaded anywhere and shared at run time by multiple processes. Applications can also use the dynamic linker at run time in order to load, link, and access the functions and data in shared libraries.

Advertisements
This entry was posted in Computer System. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s