jmglov's Linux Kernel Hacking FAQ |
|
Please note that this FAQ is based on my own kernel hacking, which has been exclusively on the Intel i386 (also know as 80x86) architecture. Thus, the FAQ may be less helpful on other architectures. In any case, as the Linux kernel evolves quickly, following my instructions verbatim is probably a bad idea. Use this FAQ as a guide to understanding how to hack on the kernel. Do *not* copy my code, paste it into the kernel, and expect good results. Just in case, this FAQ is provided with no warranty whatsoever, and no guarentees that the code and instructions contained herein will not cause your hardware to spontaneously combust. Proceed at your own risk. 0. Table of Contents
Doesn't kernel hacking take a huge brain? How do I get started?The short answer is no, anyone with some decent knowledge of C should be able to write code that actually *works* in the kernel. You do not need a degree in Computer Science / Computer Engineering / Rocket Science, or a membership in MENSA. To get started, grab yourself a kernel and read the section on necessary tools. How do I create a new system call?Creating a new system call requires three steps. First, you must add a #define to include/asm-i386/unistd.h (if you are hacking on a kernel that will need to run on a different architecture, or want your system call to work on non-Intel architectures, you will need to check out the other include/asm-* directories). Open this file in your favourite text editor, then locate the long list of #define __NR_foo 25statements, and add after the last of these statements: #define __NR_<syscall> <num>(where <syscall> is the name of your system call, and <num> is the number of your system call, which must be the next number in sequence after the last system call). The second step is to add a symbol definition to the syscall table in arch/i386/kernel/entry.S (again, if you care about architectures other than the i386, you will need to deal with them as well). Open this file in a text editor and locate the lines: .data ENTRY(sys_call_table)The lines of assembly code that follow set up the system call table. You will need to add a line to the end of the table. Look for the lines:
.rept NR_syscalls-190
.long SYMBOL_NAME(sys_ni_syscall)
.endr
Right before these lines, insert:
.long SYMBOL_NAME(sys_<syscall>)(again, <syscall> is the name of your syscall; note that the Linux kernel convention is to implement the foobar() syscall in a function called sys_foobar()). The third and final step to implementing a new system call is to actually write the function. The easiest thing to do is to add your function to a source file that is already included in a standard Linux build. It makes sense to put the function in a source file that contains similar functions. For example, if you are implementing a syscall that reads the entire contents of a file at once, stick it in fs/read_write.c, where similar syscalls reside. Now that you have chosen a file for your syscall implementation, you must resist the urge to slap an ordinary C function into it. Why? It will not work. Syscalls are magical, as they must run in kernel mode. On most architectures, the only way to get from user mode to kernel || supervisor || privileged mode is to toss an interrupt. On the i386 architecture, Linux handles syscalls like this: Warning! The following expects you to know a bit about the Intel i386 architecture and assembler. If you do not, you might want to remain in the dark as to how syscalls really work and skip this explanation.
Right, so now that you know how a syscall works, write the function and put it in the appropriate source file. If you skipped the above description of syscall voodoo (presumably because you wanted to avoid the specifics of Intel i386 architectural organisation and the unfortunate side-effect: a splitting headache), here is what you need to know:
retval = syscall( num, arg0, arg1, ... argN );There are other ways to invoke your new syscall, but I have not the energy left to describe them. (Watch this space... :) Example: We want to add a system call named foobar() to the kernel, which is supported on the Intel i386 architecture. We open include/asm-i386/unistd.h in our favourite editor (XEmacs) and find the last syscall definition: #define __NR_vfork 190We add our syscall definition on the next line: #define __NR_foobar 191 Now, we open the arch/i386/kernel/entry.S file and, right before
.rept NR_syscalls-190
.long SYMBOL_NAME(sys_ni_syscall)
.endr
add the line:
.long SYMBOL_NAME(sys_foobar) Since our foobar() syscall is going to fill out a trivial struct that we claim has some relevance to read()ing and write()ing, we will chose fs/read_write.c as the target for our function. We open the file and slap this code into it, right at the bottom:
/** trivial_t{}
*
* A pretty trivial data structure.
*/
struct trivial_t
{
int number; /* some trivial number */
long bignum; /* some big number */
} trivial; /* trivial_t{} */
/** sys_foobar()
*
* Implements the foobar() syscall: copies the kernel's trival struct into
* userland.
*
* Returns:
* retval 0 on success, -1 on failure
*/
asmlinkage long sys_foobar( struct trivial_t *buf /** pointer to a trivial_t
* struct that lives in
* userland */
)
{
/* Copy our struct to the userland one */
if( (copy_to_user( buf, &trivial, sizeof( trivial ) )) != 0 )
return -1;
/* Victory! */
return 0;
} /* sys_foobar()
Here ends our example. Given a filename, how do I grab the inode to which it refers?Many syscalls in Linux take filenames that could be relative, and could contain ".." or ".", and could be hard or soft (symbolic) links. Given this mess, it would be nice to have a way to take a filename and resolve it to the underlying inode. Luckily, there is just such a procedure. The function that you are after is path_walk(), which is defined (in Linux 2.4.5) at line 422 of fs/namei.c. What it does is takes a valid pathname, and returns the actual dentry (wrapped in a nameidata struct, but who cares) to which the pathname refers. Use LXR to do an identifier search on path_walk and dentry. What you probably want to do is something like this:
asmlinkage long sys_foo(const char * pathname)
{
int error 0;
char * name;
char fqfn[1024]; /* this will hold your fully qualified filename */
struct dentry *dentry;
struct nameidata nd;
name getname(pathname);
...
if (path_init(name, LOOKUP_PARENT, &nd))
error = path_walk(name, &nd);
...
strcpy(fqfn, nd.dentry->dname->name);
fqfn will now hold your full filename. Your inode will be nd.dentry->d_inode. Can you explain the ext2 filesystem to me?Maybe. Take a look at the slides (PDF) of a lecture I gave on the subject as part of my "Linux Kernel Internals" course at The College of William and Mary. If you feel like updating the slides for the 2.6 kernel (please?) or otherwise want the LaTeX sources, help yourself to a tarball. Who is responsible for this FAQ, and how do I shower them with thanks?The principle parties are:
I am actively maintaining this FAQ, so please do send questions, comments, and corrections to me. My email address is: "jmglov{SYMBOL_0}wmalumni.com{SYMBOL_1}{TLD}", where {SYMBOL_0} == '@', {SYMBOL_1} == '.', and {TLD} == "com". There, I hope *that* stops the bloody harvester bots. |