original in es Emiliano Ariel Lesende
es to en Gonzalo Garcia Agullo
Welcome to the first of a series of articles about the Linux kernel secrets. Probably you already took a look at the kernel sources some time in the past. In that case you noticed that the initial couple of 100-kb compressed files has turned into more than 300 files containing more than 2 million source code lines, and taking as many as 9 Megabytes of compressed storage.
This series is intended not for newbies but advanced programmers. Obviously you're free to read it anyway, and the author will do his best to answer any question or doubt you send through e-mail.
New bugs are discovered and new patches are published mostly every day. Nowadays it's mostly impossible to understand the source code in a whole. It's co-written by lots of different programmers who try to keep an homogeneous coding style, but in fact it differs from each other.
Linux is a freely distributable operating system for PC architecture and others. It's compatible with the POSIX 1003.1 standard and includes a large number of features from Unix System V and BSD 4.3. Many substantial parts of the Linux kernel this series is writing about, were written by Linus Torvalds, a Finish computer science student. The first kernel was released on November, 1991.
Linux supports true multitasking. All processes are independent. None of them must release the processor to execute other process.
Linux is not only a multiuser operating system, but also has multiuser accessibility. Linux is able to share the same system resources among users connected through different terminals attached to the host.
Only needed parts of a program are loaded into memory to be executed.
If the system memory is fully exhausted, Linux will then search fo r 4K-sieLinux entoncesd memory pages to be released from memory and stored on the hard disk. If any of these pages is required again, Linux will restored it from disk into its original memory location. Old unix systems and some current platforms, including Microsoft Windows, memory is swapped into disk. That means that all memory pages belonging to a task are saved on disk when there is a memory shortage, but this is less efficient.
MSDOS users are used to work with SmartDrive, a program which reserves some fixed area of the system memory for disk caching. Linux instead has a lot more dynamic disk caching system: reserved memory for cache is enlarged when memory is unused, or shrinked as needed when system or users processes demand more memory.
Libraries are sets of routines used by programs to process data. There is a number of standard libraries used from more than one process at the same time. These libraries are included onto every executable file in old systems, and loaded redundantly into memory everytime a new process using is the same library is executed, so spending more memory space. compartida. In modern systems like Linux, shared code is loaded just once, and shared among all processes that use it.
POSIX 1003.1 defines an standard interface for Unix operating systems.This interface is described as a set of C routines, and is currently supported by all modern operating systems. Microsoft Windows NT has support for POSIX 1003.1. Linux 1.2 is 100% compliant with POSIX. Additionally, some System V and BSD interfaces are supported or being implemented for further compatibility.
Who would not like to run any DOS, Windows95, FreeBSD or OS/2 application under Linux? So DOS, Windows and Windows95 emulators are under development. Linux is also able to run binaries from other intel-based Unix platforms compliant with the iBCS2 (intel Binary Compatibility) standard.
Linux support a large number of file system formats. The most commonly used format used nowadays is the Second Extended File System (Ext2). Another supported file system format is the File Allocation Table (FAT) used by DOS-based systems, but FAT is not ready for security or multiuser access due to its design restrictions.
Linux is able to be integrated into any local area network. Any unix service is supported, including Networked File System (NFS), remote login (telnet, rlogin), dial-up SLIP and PPP, and so on. Integration as server or client for other networks is also supported, including filesharing and printing in Macintosh, Netware and Windows.
Linux uses this technology to provide inter-process message queing, semaphores and shared memory.
Let's take a look at the kernel source code before studying the kernel itself.
Source tree structure: Linux kernel sources are commonly located under the /usr/src/linux directory, so we'll mention directories as relative to this location. As a result of the porting to non-Intel architectures, the kernel tree was changed after version 1.0. Architecture-dependent code is located under the arch/ hierarchy. Code for Intel 386, 486, Pentium and Pentium Pro processors are under arch/i386. The arch/mips directory is for MIPS-based systems, arch/sparc for Sun Sparc-based platforms, arch/ppc for PowerPC/Powermacintosh systems, and so on. We'll concentrate on the Intel architecture as this is the most widely used with Linux.
The Linux kernel is just an standard C program. There are only two important differences. The starting point for programs written in the C language is the main(int argc,char **argv) routine. Linux kernel uses start_kernel(void). The program environment does not exist yet when the system is starting up and the kernel is to be loaded. This means that a couple of things are to be done before the first C routine is called. The asembler code that perform this task is located under the arch/i386/asm/ directory.
The appropiate assembler routine loads the kernel into the absolute 0x100000 (1 Mbyte) memory address, then installs the interrupt servicing routines, global file descriptor tables and interrupt descriptor tables, that are exclusively used during the initialization process. At this point, the processor is turned into protected mode. The init/ directory contains everything you need to initialize the kernel. Here is the start_kernel() routine, dedicated to initialize the kernel properly, taking in consideration all passed boot parameters. The first process is created without using system calls (system itself is not loaded yet). This is the famous idle process, the one which uses processor time when not used by any other process.
The kernel/ and arch/i386/kernel/ directories contain, as suggested by their path names, the main parts of the kernel. Here is where main system calls are located. Here are implemented other tasks including the time handler, the scheduler, the DMA manager, the interrupt handler and the signal controller.
Code handling system memory is located in mm/ and arch/i386/mm/. This area is devoted to the memory assignation and release for processes. Memory paging is also implemented here.
The Virtual File System (vfs) is under the fs/ directory. Different supported file system formats are located in different subdirectories respectively. The most important file systems are Ext2 y Proc. We'll take a detailed look at later them later.
All operating systems require a set of drivers for hardware components. In the Linux kernel, these are located under drivers/.
Under ipc/ you will find the Linux implementation of the System V IPC.
Source code to implement several network protocols, sockets and internet domains is stored under net/.
Some standard C routines are implemented in lib/, enabling the kernel itself to use C programming habits.
Loadable modules generated during the kernel compilation are saved in modules/, but it's empty until the first kernel compilation is done.
Probably the most important directory used by programmers is include/. Here you find all C header files specifically used by the kernel. Specific kernel header files for intel platforms are under include/asm-386/
Compiling: A new kernel is basically generated in
just three steps:
We will get on details about the backgrounds for these scripts and how to modify them to introduce new configuration options in next articles.
I hope you enjoyed this article. You're free to email your comments, sugestions and criticisms to elesende@nextwork.net.