Terminal, Shell, and Portability
Preamble
A shell process is a program under execution that allows the user to interact with the system. When you open a terminal, you are greeted with a prompt and expects a user input. Before we even begin exploring shell in brief and talking about some portability factors, we need to understand the device that runs the shell process: the terminal. We'll slowly move into topics such as terminal line discipline, character device, shell process, and things to consider when writing portable programs.
Terminal
We frequently use the terminal, yet most of us don't bother learning what sort of device it really is. For a program executed interactively through a shell, the terminal associated with the shell process (and the standard streams connected with it) is provided to the program under execution. On a UNIX-like system, when you browse the /dev directory, you'll notice a bunch of tty devices. Listing 1 shows the subset of "files" that are available in my /dev directory. The first column describes two attributes of the file: the file type and the permission bits.
Script started on Sat Sep 13 16:24:21 2025
bash-5.3$ ls -la /dev | grep tty
crw-rw-rw- 1 root tty 0xf000005 Sep 13 16:24 ptmx
crw-rw-rw- 1 root wheel 0x2000000 Sep 10 20:13 tty
crw-rw-rw- 1 root wheel 0x4000000 Aug 21 20:43 ttyp0
crw-rw-rw- 1 root wheel 0x4000001 Aug 21 20:43 ttyp1
crw-rw-rw- 1 root wheel 0x4000002 Aug 21 20:43 ttyp2
crw-rw-rw- 1 root wheel 0x4000003 Aug 21 20:43 ttyp3
crw-rw-rw- 1 root wheel 0x4000004 Aug 21 20:43 ttyp4
...
bash-5.3$ ^D
exit
Script done on Sat Sep 13 16:24:41 2025
Controlling terminal device programatically is a complex task. Historical systems such as System V and 4.3BSD had their own ioctl(2) commands to control the I/O of the terminal device. Various aspects of the device can be configured along with terminal attributes, controlling terminal, line discipline, and much more. ioctl_tty(2) is for GNU/Linux-specific control commands. The term tty stands for teletype. Refer to your device's manual for implementation specific operations. For a gory detail regarding tty device, refer to: The TTY demystified.
The term driver needs to be defined first. A driver is a software component that allows the Operating System to interact with a device. A device could be anything including a hard drive, a keyboard, a mouse, a network interface, and so on.
The fifth column in Listing 1 seems cryptic. GNU/Linux provides a more readable output, but macOS does not provide such output. This information is not available for a regular file. What this column conveys is the device. A device is composed of two components: device major number and device minor number. The major number describes the device's type and the device driver used to interact with the said device. The minor number describes the instance of the device of that type. Each driver in a system is assigned a unique major number (the terminal device driver on my machine being assigned a number 0x4) while the minor number is used by the respective driver to distinguish different instances of the device.
Pseudo-Terminal Device
I'll use the terms master and slave side below to discuss about ptmx file and its characteristics. Some text and manual refer to master as primary and slave as replica.
The file ptmx is specially useful when making a ssh-like program. I'll try to briefly explain how terminal acquisition works this way. The file ptmx refers to a pseudo-terminal device (hence the different driver used). There are two components in a pseudo-terminal device: master side, and the slave side. As the name suggests, this file appears to be as a terminal device, but the device is not associated to an actual hardware. System V and 4.3BSD had their own techniques to obtain such devices, but Single Unix Specification (SUS) (and POSIX) describes the interface similar to that of System V. For example, grantpt(3) is used to establish ownership and permissions of the slave device, while unlockpt(3) is used to unlock the slave device.
After obtaining the master and slave device, we need to understand how I/O using these two devices work. I'll assume that the slave device opens the shell process and discuss the interaction. When you want to send input to this slave device, you write it on the master device and the characters are sent to the slave device. If something is written on the slave device (say, write(2)), the master device can read it and later write it to its own standard output (or error) stream for the end user to view the output. However, we aren't limited to I/O specific operations. Programs such as vi(1) draw on the screen. Such programs also take into account the current window size.
Consider that the slave device (which is running the shell process) executes the vi(1) program. It draws the screen that will be presented on the master side. The terminal emulator manages the window changes. We now change the window size. The terminal emulator will be notified of this change. The terminal emulator issues an TIOCSWINSZ ioctl(2) command for the corresponding terminal device. The kernel--keeping track of the terminal sizes--receives this command, and updates the window size for the pseudo-terminal. Consequently, any process that is associated with the slave device receives the SIGWINCH signal. Since the slave device is running vi(1), this process will receive the signal and redraw the screen appropriately.
File Type and File Permission Bits
A file in UNIX-like system is an opaque term and everything is a file. The file type is described below:
| File Type | Symbolic Representation |
|---|---|
| Regular file | - |
| Directory | d |
| Symbolic link | l |
| Character special | c |
| Block special | b |
| Named pipe | p |
| Socket | s |
The permission bits shows 9 characters. They are divided into 3 sets: user, group, and others. Each set contain 3 characters, and provides information regarding the accessibility of file to the user. Consider the example file below:
-rwxr-xr-- 1 pranavramjoshi staff 33K Sep 17 03:16 bar
From the first column, we can get the following information:
- The file is a regular file, as indicated by the
-character at the beginning. - The owner has the following permissions: read (
r), write (w), and execute (x) the file. - The group has the following permission: read (
r) and execute (x), but not write. - Others has the following permission: read (
r), but not write or execute.
The owner of the file is given in the third column, pranavramjoshi. The file belongs to the group, staff. If a user Trent is also present in the system and Trent is also in the staff group, then the user can read and execute the file but cannot make any modification. If Mallory is another user in the system and the user is not in the staff group, then the user can only read the file but cannot execute or write the file.
A standard program is provided in almost all UNIX-like system: stat(1). This provides more verbose output of the file. There are multiple flags that can be provided to this program.
To know about yourself; your User Idenification (UID), Group Identification (GID), and the groups you belong to, use the id(1) program. To list out only the groups your are in, use the groups(1) program.
Character Special Device
From Listing 1, we are now aware that a terminal device is a character special device. The Wiki definition of a character device is as:
Character special files or character devices provide unbuffered, direct access to the hardware device. They do not necessarily allow programs to read or write single characters at a time; that is up to the device in question.
Being unbuffered means that the devices handles the data directly, without any explicit buffer being used by, say, the kernel. This also means that the system call lseek(2) can't be used on a file descriptor representing a terminal device, as like pipe(2)s. When you interact with your own or other command line program, the flow would look as seen in Listing 2. One might argue that the line is being buffered until the Enter or Return key is pressed, and they would be right. This is where the Terminal Line Discipline comes into play.
+-------------------+
| shell |
+-------------------+
| ^
stdout, stderr | | stdin
| |
+-----------+-----------+---------------+
| | | |
| V | |
| +-------------------+ |
| | terminal line | |
| | discipline | |
| +-------------------+ |
| | ^ |
| | | | kernel
| | | |
| V | |
| +-------------------+ |
| | terminal | |
| | device driver | |
| +-------------------+ |
| | ^ |
| | | |
| | | |
+-----------+-----------+---------------+
| |
| |
V |
+-------------------+
| user at a |
| terminal |
+-------------------+
The terminal line discipline sits between the terminal device driver and the actual shell process. Consider the scenario we witness all the time. You type something into the terminal and before the Enter or Return key is pressed, two things are being done: the characters are echoed in your screen, and characters are being "prepared" to pass to the process connected to the terminal. Recall that a terminal is a duplex device; there are two channels used for input and output. Since the terminal is by default in "cooked mode", the input is not provided to the shell process unless the Enter or Return key is pressed. After it is pressed, the entire line is provided to the shell.
The terminal device driver is ultimately responsible for handling I/O for the user. There is a notion of Canonical mode for terminal devices. In essence, it describes how input stream of the terminal device should be processed. When we say the terminal is in cooked mode, it means that the terminal will process input in unit of lines, where the delimiting character is usally the newline character along with EOF character or EOL character. Refer to Canonical Mode Input Processing section of termios(4) manual for more information.
This allows us to configure the driver to accept sequence of single characters rather than an entire line (Non-Canonical mode). We often refer to this as raw mode, which is useful when typing on a remote system. It does have its drawbacks; we can't "erase" a character that has already been transmitted. For differences between raw and cooked mode, check this StackExchange thread.
Characteristics of a Terminal Line Discipline
There are several functionailty provided by a line discipline module. Some of them are mentioned below:
- Echo the characters that are entered.
- Assemble the characters entered into lines, so that a process reading from the terminal receives complete lines.
- Edit the line that is input. For instance, Delete or Back Space characters are the usual Erase Character. You could also kill the entire line and start over with a new line.
- Generate signals when certain terminal keys are entered. For example,
Control-Cgenerates the SIGINT signal. - Process flow control character. For example, entering the
Control-Skey stops the output in the terminal whileControl-Qrestarts the output. - Allow you to enter an end-of-file character.
- Do character conversions. For example, every time a process writes a newline character, the line discipline can convert it to a carriage return and a line feed. Also, tab characters that are output can be converted to spaces if the terminal doesn't handle tab characters.
Many UNIX-like system provide the utility stty(1) that allows us to view the terminal characteristics and control characters:
$ stty -a
speed 38400 baud; 35 rows; 143 columns;
lflags: icanon isig -iexten echo echoe -echok echoke -echonl echoctl
-echoprt -altwerase -noflsh -tostop -flusho pendin -nokerninfo
-extproc
iflags: -istrip icrnl inlcr -igncr ixon -ixoff ixany imaxbel iutf8
-ignbrk brkint -inpck -ignpar -parmrk
oflags: opost onlcr oxtabs -onocr -onlret
cflags: cread cs8 -parenb -parodd hupcl -clocal -cstopb -crtscts -dsrflow
-dtrflow -mdmbuf
cchars: discard = ^O; dsusp = <undef>; eof = <undef>; eol = <undef>;
eol2 = <undef>; erase = ^?; intr = ^C; kill = ^U;
lnext = <undef>; min = 1; quit = ^\; reprint = <undef>;
start = ^Q; status = <undef>; stop = ^S; susp = ^Z; time = 0;
werase = <undef>;
Refer to the manual of stty(1) for more information about the various terminal characteristics.
Always ensure that the terminal configuration is set to cooked or sane mode. Speaking from experience, if you manage to set terminal to non-canonical mode and then some error occurs between the task, your best choice is to simply close the application as it would be next to impossible to change the mode later on.
Terminal device is a complex topic. It deserves a dedicated blog of its own. For now, the reader might have some idea as to what a terminal device does and how we can use it alongside the shell process.
Shell Process
A shell process is a command-line interpreter. There are various flavors of shell programs out there. Some of the ones are: Bourne Shell (sh), Bourne Again Shell (bash), C Shell (csh), Korn Shell (ksh), and many more. Normally, there are two ways to use the shell: enter command interactively, or write a shell script that will be processed by the shell process.
A shell, similar to a terminal device, is a complicated topic. The evolution of various shell programs brought many features we take for granted today: command history, tab completion, job control, and many other. Nowdays, we also have shell program that supports "plugins"; add external programs that allows the shell to perform as they wish. For instance, we can override the default prompt of the shell and add extra features such as colors, rich texts and the like.
How a Shell Executes Program
When you enter a line into the shell, it is assumed to be be a command. If the last character of the line (before the newline character) was a backslash (\), the shell assumes the command is not complete and allows the user to continue the command in the next line. As assumed, the line is not considered complete unless there is no backslash at the end of the line.
There are two types of command: builtin shell commands, and the programs that are located in the directory specified by the PATH environment variable. An example of a builtin command would be the cd command. This command is used to change the directory to the provided argument to cd command.
I will use the word command, program, and utility interchangeably. Although they might have their own distinct properties, I use it with a common understanding: they are a sequence of instructions crafted to achieve a task. I will try my best to use appropriate terminology wherever necessary.
Additionally, I won't dive deeper into environment variables. For now, we can think of it as a key-value pair that is used by many programs to work in an, well... environment.
When you're working on a command-line interface, you will eventually encounter many commands that is available in your system. Understanding what every single one of them is almost impossible. This is where manuals turn out to be extremely helpful. To query a command, you can simply try out:
$ man <command>
and it will display a manual page for the <command>. Along with the man(1) utility, other utilities are also provided such as whatis(1), apropos(1), and which(1). The which(1) is unusually helpful when you have the same program (but with possibly different versions) in your system and you aren't sure which program is actually being executed. For example, if you have a program foo that is located in two directories; /bin and /usr/bin, the which(1) program could give the following output:
$ which foo
/bin/foo
If you want to use the other program, you would need to order your PATH environment variable (not recommended) or explicitly specify absolute path of the other program as:
$ /usr/bin/foo
...
When you run a program within your shell program, the shell does the following:
- The shell process forks itself. This means, a new process is created that has some characteristics of the process that instantiated it.[0] The newly created process is known as the child process whereas the other one is the parent process.
- The child process is the one that will execute the program. While the child is executing the command, the parent process waits for the child to terminate.[1] The child also inherits the terminal.
- When the program is executed, the child process (which is that of the shell process) image is replaced by the image of the new process to be run.
- While the parent is waiting for the child, the child (which is runnning the program) will use the terminal to interact with the user. The output of the program will be written to the terminal device (via the program's standard output stream and standard error stream, unless they are redirected) and the user can enter input in the terminal device that will be shown to the user as well as passed to the program under execution.
- Once the program is finished (either successfully or not), the parent process is notified of this state change and the parent will take over the terminal, writing the prompt and waiting for the user input.
This is where the distinction between a program and command should be considered. I use "run a program" above to be as clear about it. The above behavior is seen when executing a program, and is not necessarily the same behavior when the shell executes a command. For instance, observe the following shell process interaction below:
$ y=foo
$ echo $y
foo
You will see that assignment was persistent afterwards. In this case, we're doing a shell variable assignment. This is what builtin commands are for: to modify the state of the parent shell. This is also why cd must be a shell builtin command instead of an external program.
Setting Environment Variables for a Program
Imagine that you made a program that uses the environment variable. Web developers would be familiar with this concept as they have a dedicated .env file that stores this key-value pair (some even make it publicly available!) that will be used by the program eventually. A program can be instrumented to explicitly set the environment variable through call to library function setenv(3), but the shell also provides a feature to achieve this. We'll see a simple program as seen in Listing 3, where we'll fetch the value of environment variable foo, and display the value contained in that environment variable. For the sake of brevity, I won't use any build system to build the source file shown in Listing 3. The steps to compile the program and the output is shown in Listing 4.
#include <stdio.h>
#include <stdlib.h>
#define FOO_ENV "foo"
int
main (void)
{
char *foo_val;
if ((foo_val = getenv(FOO_ENV)) == NULL) {
fprintf(stderr, "%s not an environment variable.\n", FOO_ENV);
} else {
fprintf(stderr, "%s environment variable has the value: %s\n", FOO_ENV, foo_val);
}
return (0);
}
Environment variable allows the programmer to instrument the program as per their needs. For example, we can create a function that will connect to a server specified by the environment variable SERVER_URL. This allows the same source file to work as needed with dynamic values such as URLs. Another use case is to allow debugging mode. If the user wanted to hide the debug logs to be shown, it could be wrapped inside an if statement and the program will initally check for the environment variable DEBUG to be present to enable debugging logs to be displayed.
Script started on Sat Sep 13 21:24:38 2025
bash-5.3$ gcc -Wall -o foo foo_env.c
bash-5.3$ ./foo
foo not an environment variable.
bash-5.3$ foo=bar ./foo
foo environment variable has the value: bar
bash-5.3$ # the environment variable **bar** won't be used below
bash-5.3$ foo=bar bar=baz ./foo
foo environment variable has the value: bar
bash-5.3$ ^D
exit
Script done on Sat Sep 13 21:25:03 2025
The environment variable list is provided as space-separated arguments before the name of the program, allowing the user to provide multiple key-value pair for the program to work with. This list is appended to the current environment variables, overriding previous value if the variable name exists in the list.
Shell Interpreter
I've mentioned before that a shell is a command-line interpreter. This means that the shell process isn't limited to running programs, and is capable of being used as a programming language. Since the concept of shell was conceived over half a century ago and a lot has changed by now, the syntax understood by the shell process is bewildering. For example, Listing 5 shows a simple shell script that is not explicitly written as a shell script but written in the shell process itself. Of course, shell scripting isn't intended to be done like that, but we'll explore (one kind of) loop and conditional statement that is compatible for a shell process. It is a trivial program that loops for 6 times and checks if a number is even or odd. We can craft this script in another way to achieve the same result. For instance, we can use the test(1) utility to check whether the number is even or odd.
Script started on Sun Sep 14 12:39:34 2025
bash-5.3$ for i in {0..5}
> do
> if (( i % 2 == 0 ))
> then
> echo "$i is even"
> else
> echo "$i is odd"
> fi
> done
0 is even
1 is odd
2 is even
3 is odd
4 is even
5 is odd
bash-5.3$ ^D
exit
Script done on Sun Sep 14 12:42:09 2025
When we're interactively working with a shell process, it is sometime necessary to fetch previous command. The history builtin command is available to most shell programs today. It outputs the commands that the shell has processed. Of course, a shorthand for this is provided. The ! command is used as history expansion. Listing 6 shows some of the usage of the ! command. The !! command is used to run the immediate previous command that was entered. The !:<num> extracts the <num>'th argument from the previous command that was entered. And lastly, !<num> provides running the specific command. The <num> can be fetched from history builtin command. If <num> is negative, the command that is executed is relative to the current position in history list. For example, !-1 is equivalent to !!.
Script started on Sun Sep 14 13:13:57 2025
bash-5.3$ echo "first prompt"
first prompt
bash-5.3$ echo "second prompt\n"
second prompt\n
bash-5.3$ !!
echo "second prompt\n"
second prompt\n
bash-5.3$ printf !:1
printf "second prompt\n"
second prompt
bash-5.3$ history
...
208 echo "first prompt"
209 echo "second prompt\n"
210 echo "second prompt\n"
211 printf "second prompt\n"
212 history
bash-5.3$ !-2
printf "second prompt\n"
second prompt
bash-5.3$ !209
echo "second prompt\n"
second prompt\n
bash-5.3$ ^D
exit
Script done on Sun Sep 14 13:15:12 2025
Each shell program provides it own set of builtin(1) capabilities. Refer to the manual for more information. To dig further into this topic requires learning shell scripting, which is an entirely different topic.
Portablility
Writing a portable program turns out to be more of a nightmare than imagined. C is notorious for being platform dependent, yet most of the portable programs are written in this language. We need to first observe how the transformation of source file(s) to an executable is done. Listing 7 shows the pipeline of this process. A C source file is described by the .c extension. Historically, the executable is typically generated with the filename a.out unless explicitly specified in compiler toolchain's argument. Along with source files, we also frequently use header files, typically having the extension .h. Header files can be used for various purposes: declaring functions, defining structures, #define symbols (or helpful macros), and much more. Some people take this into next level and provide header-only library that provides sufficient functionality to achieve a certain task, say, processing an image of specific format.
+-----------------+ modified C code +-----------+
Source file------>| C preprocessor |------------------>| Compiler |-----+
+-----------------+ (.i file) +-----------+ |
| compiled assembly code
+-----------------+ object code +-------------+ | (.s file)
Executable<-------| Linker |<------------------| Assembler |<--+
+-----------------+ (.o file) +-------------+
The compilation topic is not easy, to say the least. If you take a look at the manual for gcc(1) or clang(1), you'll witness myriad of compilation flags you can provide to instruct the compiler driver to create an output (which can be any of the file as seen in Listing 7, or others, such as library). For example, if you want to add an address sanitizer (commonly known as ASan) in your executable, enabling you to debug the program of any memory address related issues, you can use the -fsanitize=address flag during the compilation process. Likewise, -fsanitize=undefined probes the undefined-behavior sanitizer (commonly known as UBSan) to your executable, allowing you to get report on any undefined behavior that might be present in your program.
Although compilers such as gcc(1) and clang(1) are available on many systems, they abstract one of the non-portable component; the linker. Similar to compiler's manual, the ld(1) manual too is verbatim. The linker is responsible for arranging all the object files, resolving any missing symbols in the object file, and conforming to the system's Application Binary Interface (ABI) to make a binary or a library.
Let's take an example to see how some old linkers behaved. Say that we have three source files: foo.c, bar.c, and baz.c. For this hypothetical scenario, we'll assume that bar.c and baz.c will be made into library files. foo.c uses one of the symbol defined in bar.c and that same symbol is contingent on another symbol defined in baz.c. The steps would be as:
$ gcc -c bar.c -o bar.o
$ ar cr libbar.a bar.o
$ gcc -c baz.c -o baz.o
$ ar cr libbaz.a baz.o
$ # incorrect order
$ gcc foo.c -L. -lbaz -lbar
...
$ # correct order
$ gcc foo.c -L. -lbar -lbaz
Here, traditional linkers would do a single pass and search from left to right, notes any unresolved symbol and searches the library. Since libbaz.a (lib is trimmed when supplying in command-line argument) does not contain the symbol used in foo.c (rather, it uses a symbol that itself is dependent on symbol defined in libbaz.a), the linker moves on. When it looks into libbar.a, the symbol is found. Unfortunately, given that the linker performed a single pass, it won't attempt to resolve the symbol that is present in libbaz.a and just throws out an error.
The above discussed problem is explained in more detail in this Stack Overflow thread
Preprocessor Directive
The word directive is synonymous to the word instruction. There are many directives avaiable to the user, define and include being the most common. The C preprocessor is aware of the respective directive if it has a leading # symbol. One rule is that the directive is only processed if the line containing it has the first non-whitespace character as the # character, followed by the name of directive. This allows us to write programs in a weird way that seems illegal, but is legal after the C preprocessor modifies the source file. Listing 8 shows one of the weird way to use the include directive. This is mostly possible due to the fact that C does not have strict rules for whitespaces within the source file.
$ cat foostr.txt
"bar"
$ cat include_dir.c
#include <stdio.h>
int
main (void)
{
char *str =
#include "foostr.txt"
;
printf("str: %s\n", str);
return (0);
}
$ gcc -Wall -o include_dir include_dir.c
$ ./include_dir
str: bar
The include directive, as the name suggests, includes a file within the file using the directive. Indeed, when using #include <stdio.h>, we're including a file stdio.h that must be available on a hosted environment. The preprocessor replaces this directive with the content of the file.
Beware of the use of #include <...> and #include "..." form. The former one is used to locate a file that is located in the standard search path, such as /usr/local/include or /usr/include (in GNU/Linux). The latter one is specifies the file that is relative to the location where the source file is located.
Most compilers provide the -I flag to compile the file with additional inlcude search path. For instance, if a header is located in /usr/foo/include and your program depends on that header, you'll need to compile with the option: -I/usr/foo/include, to ensure the compiler can find the header file used by your program.
Of course, this kind of abuse of the C preprocessor isn't found in most places, and for good reason. Doing such makes the code harder to read. Apart from the include directive, we mostly use conditional directive when writing portable program. It should be clear by now that the file generated by the C preprocessor is the one that will be used for compilation. The available conditional directives are:
- #if / #elif / #else / #endif
- #ifdef
- #ifndef
- #elifdef (since C23)
- #elifndef (since C23)
- #if __has_include (since C23)[2]
- #if __has_c_attribute (since C23)[2]
Another kind of abuse we can use for conditional directive is through the use of #if 0 directive. Listing 9 shows a source file that is able to compile itself. Apart from the odd usage of the conditional directive, we're also using it for one of its purpose, to only compile the piece of code that is required for the target system. Let's first discuss the statements inside the #if 0 and #endif block. The condition #if 0 will always evaluate to false, so the statements inside it won't be used in the actual source file. Listing 10 shows the compilation steps. Since we modified the file to be an executable through chmod(1), we can execute the file. Any line beginning with # in shell scripting is treated as a comment. There's a catch tho. If the # character is followed by the ! character (collectively known as Shebang) and the file is marked as executable, then the rest of the line determines the interpreter used for the executable.
#if 0
#!/bin/sh
gcc -Wall -o self_compile self_compile.c
exit 0
#endif /* 0 */
#include <stdio.h>
int
main (void)
{
#ifdef _WIN32
printf("This program is running on a Windows system.\n");
#elif __APPLE__
printf("This program is running on a macOS system.\n");
#elif __linux__
printf("This program is running on a Linux system.\n");
#else
printf("This program is running on an unknown system.\n");
#endif
return 0;
}
The symbols _WIN32, __APPLE__, and __linux__ are exposed by the host operating system. If this program is being built on GNU/Linux, then the symbol __linux__ would be visible to the program, allowing us to write linux-specific code within the block. This is one of the way to write portable code. Since the C preprocessor takes over our source file, we can construct our source file to behave one way on GNU/Linux OS, and some other way on, say, Apple-based system. Take the GNU/Linux specific system call for instance, the epoll(2) system call. This is a better version over the poll(2) system call, which itself is an advancement over select(2) system call.
Unfortunately, except for build automation systems such as GNU Autotools, Cmake, and such, we don't really have a concrete way to determine if the header declares a specific function. For example, some systems might provide the strlcpy(3) functions while others do not. We don't really have a specific method to check for the existence of a function (and fallback to alternative if not available) and can only be aware when the compilation fails.
I'll try to describe more when discussing about build automation systems. The general sequence (considering GNU Autotools) is that: We mention to Autotool that we want xyz function. GNU Autotool will create a simple shell script that builds a simple C program containing the xyz function. The program is then compiled and linked to check the symbol can be resolved. If there was no compilation issues, it means that the function is available. The Autotool will create a header file (usually config.h) that has the line: #define HAVE_XYZ 1. Now, we can have our source file as:
...
#if HAVE_CONFIG_H
#include <config.h>
#endif /* HAVE_CONFIG_H */
#if HAVE_XYZ
xyz(...);
#else /* fallback to custom implementation */
/* maybe define a function called xyz */
#endif /* HAVE_XYZ */
...
$ ls
self_compile.c
$ chmod +x self_compile.c
$ ./self_compile.c
$ ls
self_compile sel_compile.c
$ ./self_compile
This program is running on a macOS system.
Data Type
We are aware of the various data type that are available to us. The definition, according to Wikipedia is:
A data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types.
A data value, or value is representation of some entity that can be manipulated by a program.
Consider a common data type we frequently use: integer. This data type is polymorphic, meaning it has multiple forms (as seen in table below) and the width of the integer can be a compile-time information. The table below shows various integer types that is permitted by standard C:
| Data Type | Format | Range |
|---|---|---|
| signed short int | %hi or %hd | [SHRT_MIN, SHRT_MAX] |
| unsigned short int | %hu | [0, USHRT_MAX] |
| signed int | %i or %d | [INT_MIN, INT_MAX] |
| unsigned int | %u | [0, UINT_MAX] |
| signed long int | %li or %ld | [LONG_MIN, LONG_MAX] |
| unsigned long int | %lu | [0, ULONG_MAX] |
| signed long long int | %lli or %lld | [LLONG_MIN, LLONG_MAX] |
| unsigned long long int | %llu | [0, ULLONG_MAX] |
I haven't mentioned other primitive data types that are available to the programmer. character, declared as char for instance, is a data type whose width is atleast 1 byte. It should be noted that some implementation has three distinct types of char; unsigned char with range [0, UCHAR_MAX], signed char with range [SCHAR_MIN, SCHAR_MAX], and char with range [CHAR_MIN, CHAR_MAX]. Floating point data type available are: float, double, and long double. Boolean data type is declared as _Bool (bool from C23).
The fields under Format column, where i, d, and u are used, are used to specify format in decimal (base 10). Those can be replaced with o to specify format in octal. x or X specifies format in hexa-decimal. Note that octal and hexa-decimal prints in unsigned format.
When writing code, one should also consider Arithmetic Conversions. For example, we should have some idea what would happen if the code fragment is as:
...
int x = 100;
long y = x;
short z = y;
...
Covering all the rules for such conversions are not really required. Instead, interested readers should look up how system implicitly or explicitly convert values for different variables.
Declaraing a variable should be handled with great care. The "syntax" to declare a variable is:
declaration-specifiers declarators ;
Where the various declaration-specifiers are as:
- Storage classes -
auto,static,extern, andregister - Type qualifiers -
const,volatile, andrestrict(since C99) - Type specifiers -
void,char,short,int,long,float,double,signed,unsigned, and such.
A declarator may have atmost one storage class. Regarding type qualifiers, a declarator may have zero or more type qualifiers. We can combine the type specifier for a declarator and the ordering does not matter, i.e., int unsigned long is equivalent to unsigned long int.
For declarators that are functions, C99 has fourth kind of declaration-specifier: function-specifier. This category has only one member; the inline keyword.
Since C99, a header file is provided exclusively to deal to with integer of various widths: <stdint.h>. The Fixed-Width integer types section of Wikipedia provides insight of various integer data types that are provided to us.
One thing that might be confusing is the distinction of fast and least integer data types. In essence, least integer data types has atleast the specified width and assures no data type with lesser size has at least the specified width. fast integer is a bit complex, but it is used to represent a data type of atleast the specified width, and the operation on that variable is quick. For instance, a system could provide int_fast8_t that is type defined to usual int if the operation on int would result in less memory access or the likes. This StackOverflow thread discusses the issue in brief.
We can see that width of data type varies. The width of the data type indicates the number of bits required to represent the value. Usually, a short variable occupies 2-bytes, int occupying 4-bytes, and so on. The C standard only dictates the minimum and maximum value that can be represented by a data type. The width of the data type is left out for the implementation. The lowest addressable unit is 1 byte, meaning that we cannot have an address that represents a storage location of less than 8 bits. Some text use the term octet to represent a byte. Listing 11 shows a simple program that views the memory location of a multi-byte variable (an integer) and prints out the value stored at each address (in hex). The output of the program is shown in Listing 12. This shows another implementation defined property: Endianness.
There are various properties of data types that have their niche usage. For example, if we have a program fragment as:
...
#define SOME_BIG_NUM 0xFFFFFF
int k = 0;
for (int i = 0; i < SOME_BIG_NUM; i++) {
k += i;
}
...
and we assume that the loop will iterate SOME_BIG_NUM amount of times, we're bound to be surprised by how compiler optimization works. The compiler is smart enough to learn that this particular fragment will evaluate to some constant value and optimize it out, using least number of CPU cycles as possibles (Conditional Constant Propagation). Maybe it does this by keeping the value of k in register until the final evaluation, and ahead-of-time compiles the result.
To ensure that the memory address of k is accessed at each iteration, we need to notify the compiler that the given variable is volatile. C has the keyword for that, and not surprisingly, it's called volatile. If we initialize k as: volatile int k, then only will the loop run as per our expectation.
#include <stdio.h>
int
main (void)
{
int var;
char *ptr;
var = 0x10f2c;
ptr = (char *) &var;
/*
* We could also use 'size_t' to declare 'i' instead of 'int'.
* This is just for illustration purpose only.
*/
for (int i = 0; i < sizeof(int); i++) {
// printf("Address %p has the value: 0x%x\n", ptr + i, *(ptr + i));
printf("%02x ", *(ptr + i));
}
printf("\n");
return (0);
}
Endianness
Endianness Wikipedia has a rather verbose definition as:
Endianness is the order in which bytes within a word data type are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness.
If you have a variable that stores value in 16-bits (2 bytes/octets), and you want to store a value into that variable, there are two ways the variable stores the data: use the lower addresss to store the least-significant byte (Little Endian), or use the higher address to store the least-significant byte (Big Endian). In our example, we have the value: 0x10f2c, which can also be represented with leading zeros as 0x00010f2c. Here, 0x2c is the least-significant byte. Also note in Listing 11, the address increases after each iteration, so the first output represents the value stored at the lower address. Listing 12 proves that my machine is Little Endian based.
Most general purpose systems nowdays use the Little Endian convention to store value in a variable. Some systems such as IBM System/360 uses the Big Endian byte ordering. Apart from this, another use case of Endianness is during transmission of data over a network. By convention, the byte ordering of data over a network follows the Big Endian. Hence, you'll notice some network programming macros as htons (host-to-network short) available that helps convert a value in host's byte order (usually Little Endian) to network order (Big Endian). It is the job of implementation to assure that macros like htons does no operation if the host byte order and network byte ordering are identical.
Apart from some configuration here and there, we don't really deal with endianness too much during network programming. The kernel helps a ton by correcting the byte order before we receive the data in user space.
$ ls
endianness.c
$ gcc -Wall -o endianness endianness.c
$ ./endianness
2c 0f 01 00
Bit-field, and union for the rescue?
A bit-field is not a data type, instead it is a data structure that maps one or more adjacent bits which have been allocated for specific purposes, so that any single bit or group of bit within that structure can be set or inspected. Notice that we don't use the term address to use it as a mean to set bits within the bit-field. This is because we cannot address members in a bit-field. Such operation is illegal and the compiler will throw out and error if the & operator is used before the member of bit-field. Listing 13 shows a program that provides us a way to view memory location in a rather interesting way through the use of union. Of course, the program is hypothetical one, but its still worth studying to understand how memory is laid out and how operation is performed. Listing 14 shows the output of the program, and we'll inspect it further to understand different aspect of memory.
Apart from addressing, the sizeof operator too is invalid for a member in a bifield. For instance:
...
struct Flags {
unsigned int overflow : 1; /* 1 bit for this member */
unsigned int jump : 3; /* 3 bits for this member */
...
};
struct Flags foo_status;
/*
* Illegal operation
*/
char *ptr = &foo_status.overflow;
/*
* Illegal operation
*/
size_t jmp_size = sizeof(foo_status.jump);
...
Before we move on to the program, we need to understand why the union data type even exists. Unlike struct, where each members of the structure has their own dedicated memory block, a union is only large enough to hold the largest member of the union. Without considering any padding of members in structure, the size of the structure: struct { char a; int b; }; would be 5 bytes considering that the size of an integer in the machine is 4 bytes.
Structures are padded to ensure the alignment of members for the given system. If the memory is not aligned, or the structure is packed, then member access may be penalized due to misaligned memory. The padding is done because some computers require the address of data items to be of certain power of 2. We know that char only takes one byte, and considering that address of member a is 0x1000, if there was no padding then the address of member b would be 0x1001.
Assuming the system uses 4 bytes for int, we need to know that the address of b must be aligned to a 4-byte boundary. Through padding, we can satisfy this and the computer will assume a hole in the addresses: 0x1001, 0x1002, and 0x1003. The address of b would be 0x1004.
#include <stdio.h>
#include <stdint.h>
/*
* A bit-field. Notice the second member that is unnamed and is of width 0.
* A 0-length bit-field is a signal to the compiler to align the following
* bit-field at the beginning of a storage unit.
* An unnamed member can be declared, and it can have any width including 0.
* A named member must have non-zero width.
* Storage units are implementation defined and the size of a storage unit
* is typically 8 bits, 16 bits, and 32 bits.
*/
typedef struct bitfield {
unsigned long x : 5;
unsigned long : 0;
unsigned int foo : 7;
unsigned int bar : 3;
unsigned int baz : 22;
} unpacked_bitfield;
/*
* A structure that will be used as an addressable unit to check the value
* in the respective address.
*/
struct foo {
uint64_t first;
uint64_t second;
};
/*
* A union whose members are same size. This is not strictly needed for our
* illustration, but it's easy to view how memory is being handled.
*/
typedef union new_view {
unpacked_bitfield unpacked;
struct foo addressable;
} BitField;
/*
* Helper functions that print out the content of the bit-field structure
* as well as the union.
*/
void print_bitfield (unpacked_bitfield *addr);
void print_64bits (BitField *addr);
int
main (void)
{
BitField stackvar = { 0 };
stackvar.unpacked.x = 0xF;
stackvar.unpacked.foo = 0x1; /* 0b-0000-001 */
stackvar.unpacked.bar = 0x2; /* 0b-010 */
stackvar.unpacked.baz = 0x3456; /* 0b-00-0000-0011-0100-0101-0110 */
/*
* Addressing of bitfield is not allowed. We're only assuming.
* address of foo < address of bar < address of baz
* In Little Endian, least significant byte is stored in lower address.
* In the representation below, from left to right, higher address
* to lower address.
*
* 0b-0000-0000-1101-0001-0101-1001-0000-0001
* ^ ^
* +-------------------------+
* baz ^ ^
* +--+
* bar
* ^ ^
* +------+
* foo
*/
fprintf(stdout, "sizeof(unsigned long): %zu and sizeof(unsigned int): %zu\n", sizeof(unsigned long), sizeof(unsigned int));
fprintf(stdout, "The size of the union is: %zu\n", sizeof(BitField));
print_bitfield(&stackvar.unpacked);
print_64bits(&stackvar);
return (0);
}
void
print_bitfield (unpacked_bitfield *addr)
{
fprintf(stdout, "structure address: %p\n"
"sizeof(structure): %zu\n"
"x: %d\n"
"foo: %d\n"
"bar: %d\n"
"baz: %d\n", addr, sizeof(unpacked_bitfield), addr->x, addr->foo, addr->bar, addr->baz);
}
void
print_64bits (BitField *addr)
{
unsigned char *ptr;
fprintf(stdout, "The address we're viewing is: %p\n", (ptr = (unsigned char *) &addr->addressable.first));
for (int i = 0; i < sizeof(uint64_t); i++) {
fprintf(stdout, "%02x ", *(ptr + i));
}
fprintf(stdout, "\n");
fprintf(stdout, "The address we're viewing is: %p\n", (ptr = (unsigned char *) &addr->addressable.second));
for (int j = 0; j < sizeof(uint64_t); j++) {
fprintf(stdout, "%02x ", *(ptr + j));
}
fprintf(stdout, "\n");
}
The program shown in Listing 13 does not have any real world purpose. This does not imply that unions are useless in any way. In fact, it's especially useful when doing low-level work. For example, we could have a few type definitions as: typedef unsigned char BYTE; and typedef unsigned short WORD;, and create a union as: union { struct { WORD ax, bx, cx, dx } word; struct {BYTE al, ah, bl, bh, cl, ch, dl, dh} byte; } regs; that will represent the registers for an architecture (x86 in this case).
$ ls
bitfield.c
$ gcc -Wall -o bitfield bitfield.c
$ ./bitfield
sizeof(unsigned long): 8 and sizeof(unsigned int): 4
The size of the union is: 16
structure address: 0x16f1a2ed0
sizeof(structure): 16
x: 15
foo: 1
bar: 2
baz: 13398
The address we're viewing is: 0x16f1a2ed0
0f 00 00 00 00 00 00 00
The address we're viewing is: 0x16f1a2ed8
01 59 d1 00 00 00 00 00
What's happening? What does this output even say? The unpacked_bitfield structure essentially consume 16 bytes. We have two distinct bit-fields; one of type unsigned long int and the other of type unsigned int. A long data type has a width of 8 bytes on my machine while an int has a width of 4 bytes. Only 5 bits out of 64 bits (8 bytes of long) is being used. All of the int bits are being used; 7 + 3 + 22 = 32. The compiler added extra padding to make sure the memory addresses of the members are aligned as needed.
Recall the byte-ordering done by the system. If we assume the members of unpacked_bitfield, it'll be two members; an unsigned long and an unsigned int. The first view of address seems normal. The second one seems its scattered all over place. To make things a bit easier, I've commented out the bit representation of unsigned int fields in their assignment operation. Given my machine follows the Little Endian byte-ordering, it's safe to assume that foo has the lower address, and baz has the higher address. I've added comment in the source file in Listing 13 to make it easier to visualize how the bit-field is being filled. The least-significant byte is the one with the lower address and the address increase as we move to the left. This is how Little Endian ordering is done. If you compare it with the output in Listing 14, you'll see that it matches.
Memory alignment is done not only as a way to increase performance, but is also default because some architectures (notably SPARC) prohibit unaligned memory access. Even if a system allows unaligned memory access, it sometimes triggers runtime memroy access violation if the program was probed with ASan during compilation. This mostly occurs when we're explicitly casting a buffer (usually an array of chars) into a specific data type (like struct timeval) while the system is under assumption that the memory is aligned. To fix this, one must declare a variable (say struct timeval for our example) and then use function such as memcpy(3) to copy the content of the buffer instead of trying to explicitly cast it and retrieve the value.
When doing low-level programming, you'll come across the word; word. It is the processor design's natural unit of data. Usually, the computer's registers in a processor are word-sized and the largest datum that can be transferred to and from the memory in a single operation is a word in most (not all) architectures. Historically, Intel used the term word to describe a data unit of width 16-bits. Likewise, double word described data unit of width 32-bits, while quad word described data unit of width 64-bits. Unfortunately, the naming convention only gets worse if we move on to ARM-based architectures. In the ARM world, a word is used to describe data unit of width 32-bits. To describe a data unit of width 16-bits, half word is used. And, to describe a data unit of width 64-bits, xword is used.
Generally speaking, the size of the pointer is equivalent to the word size of the processor.
Before we wrap this up, I want to state one of the property of memory alignment:
If an address is aligned to a boundary of size 2n, it is also aligned to any boundary of size 2m, where m < n.
To check if a memory address (say, a) is aligned to a power-of-two boundary, say n, then we can do so as:
- (
amodn) == 0 - (
a& (n- 1)) == 0
In the first case, the modulo operation is performed in C using the % operator. For the second one, we're performing a bitwise-AND operation to check if the memory a is aligned to a certain boundary, n.
Writing Header Files
A Header file usually contain the following components:
- Header guard. If a header file contains type definitions, recursive header inclusion results in compilation error. The convention is to have a symbol defined as:
#define FILENAME_Hif the symbol previously wasn't#defined. If it was, nothing from the header file will be included.[3] - Function signatures that may allow us to use the function. The program calling the function needs to be compiled along with the object file that contains the implementation of said function.
- If you intend to make a library, make sure to add versioning functionality. For example, have a dedicated header file that contains the library's required version information, exposing symbols such as
#define FOO_MAJOR_VER 1. This will be useful when you intend to add functionality to the library without having conflicts with external source programs. - Symbols. Say that you have a function (in the public header file) whose one job is to return a character string which is a header template for an arbitrary protocol. One of the argument that the function takes is an integer that represents the possible protocol the function can work with. Instead of letting the user guess the integer value,
#defineall the possible symbols. The implementation can fail if the argument was invalid. - Type definitions are useful if you want to have objects that has some properites and functions that work with this objects. One example of this is data structures. You can have a header file that contains an identifier
stack_objand methods that work with the object as:stack_pushandstack_pop. Many header files provide shorthand macros for verbose function names, something like:#define fn_name large_function_name, wherelarge_function_nameis the identifier of a function declared in the header file. Make sure to implement shorthand macros within a conditional directive to assure the user can opt out if name conflict is a possibilty when building the translation unit.
and much more! One should know that defining a variable in a header file is technically defining it, often called as tentatve definition. This is also why there's an explicit extern storage class as a declaration-specifier. Read more about this on: External and tentative definitions. We discuss the potential outcome of having tentative definition of variables below.
A global variable that is not initialized is kept in the block starting symbol segment, or the .bss section of the object file. In essence, items in the .data section is explicitly written into the executable file, but for .bss section in the object file, the only information it contains is the total memory region (size) required to contain all the global uninitailized variable. It is the program loader that allocates enough memory during runtime, making the executable thinner. This StackOverflow thread discusses the need for .bss segment.
This is very much platform specific problem. For example, when compiling a project that includes a header file containing variable declaration, the compilation can either succeed or fail.
In macOS, I noticed that there was no compilation error. I tried to understand why it didn't throw any tentative definition error. I hypothesize that this is partly due to how the tentative variables are kept in a separate region; the __common section (nm(1) uses the C symbol to indicate this section). Unfortunately, the __common section is not considered a real section; this section won't have any virtual memory but acts as a placeholder that tells the linker which variables will be put to common section. On the final executable, if you use nm(1) to lookup the symbol table of the binary, you'll see the uninitialized variable are in small data section (nm(1) will use the S symbol). This is often referred to as .sdata or .sbss section, and is a specialized area for storing small initialized or uninitialized global variables.
I actually found this error when compiling a project in GNU/Linux. In this platform, when you create an object file, all the global uninitialized variables will be put into the .bss section. In gcc(1), the default behavior is that of -fno-common flag. The GNU's -fcommon documentation mentions this:
The default is
-fno-common, which specifies that the compiler places uninitialized global variables in the BSS section of the object file. This inhibits the merging of tentative definitions by the linker so you get a multiple-definition error if the same variable is accidentally defined in more than one compilation unit.
Using the -fcommon instructs the compiler to place the uninitialized variables in a common block. It is the linker that will resolve all the tentative definitions of the same variable in different compilation units to the same object, or to a non-tentative definition.
Most newer code bases probably does not suffer this issue, and even the documentation page states:
It [
-fcommon] is mainly useful to enable legacy code to link without errors.
Build System
High-level programming languages like C abstract many low-level details. There is no notion of data type in machine-level instructions, and what we call function is usually referred to as a routine. There is however a convention per-architecture. We usually call this Application Binary Interface (ABI). For example, when a function has a return type specified and we call the said function, on x86_64, the rax register holds the return value. On AArch64, x0 register holds the return value for a function.
Another thing to consider is the difference between a stack variable and a global variable. All variable inside a function whose storage class is not static or extern is stored in stack. Indeed, the variables inside the main functions is stored in the stack frame for the main routine. This helps us understand the scope of the variable. It should also be known that the runtime stack has its own limitation. Usually, the stack size cannot exceed 8 MB.
Why am I talking about low-level details in a build system section? Well, build system helps us get information about the target system where the code will be compiled. make(1) is one of the earlier tools to build your project. If you're going to use make(1) as your only build tool for your project, you need to realize that make(1) does some magic and you need to be aware of them to capitalize the power of make(1). Concepts such as wildcard function, string substituion, automatic variables, and such should not be a nuisance to you. For example, you might see a Makefile (or makefile, or GNUMakefile) with the content as:
foo: bar.o baz.o
$(CC) $(CFLAGS) -o $@ $^
bar.o: bar.c bar.h
$(CC) $(CFLAGS) -c $< -o $@
In the given Makefile, $@, $^, and @< are automatic variables. Refer to Automatic Variables section of the GNU manual for more information.
Furthermore, Makefile may contain various assignments such as: recursive expansion, simple expansion, conditional assignment, and appending assignment.
If you still want to use make(1) as your build system, you might want to take a look at the Linux source code. You'll attain more information about the capabilities of make(1) when you browse the Makefile. GNU make covers all the aspect of the make(1) program and writing good Makefile.
One of the use of build system is to instrument the compiler option as per the system's capability. Listing 15 shows a source file that checks if the system is Apple-based or GNU/Linux-based. Based on this information, we perform distinct action. If we plan to use make(1), we need to write the Makefile as:
UNAME_SYS := $(shell uname -s)
ifeq ($(UNAME_SYS), Linux)
CFLAGS += -DIS_LINUX
endif # Linux
ifeq ($(UNAME_SYS), Darwin)
CFLAGS += -DIS_DARWIN
endif # Darwin
...
and depending on the platform where the project will compile, the appropriate symbol will be exposed. In our above example, if the system is Apple-based, IS_DARWIN will be present to the source file. If the system is GNU/Linux-based, IS_LINUX will be present.
#include <stdio.h>
int
main (void)
{
#if defined(IS_DARWIN)
printf("Project built in Apple-based system\n");
#endif /* IS_DARWIN */
#if defined(IS_LINUX)
printf("Project built in GNU/Linux-based system\n");
#endif /* IS_LINUX */
return (0);
}
The compiler option -D has the following syntax: -D<macroname>=<value>. This adds an implicit #define into the predefines buffer which is read before the source file is preprocessed.
Another option is available as well: -U. This has the syntax: -U<macroname>. This adds an implicit #undef into the predefines buffer which is read before the source file is preprocessed.
The ability to compile a set of commands inside conditional directives based on the machine the program runs on is quite handy. Unfortunately, it also means that the source file will be cluttered with tons of conditional directives to achieve the same task in different platforms.
Indeed, this is why portability is possible in language like C.
GNU Autotools
By now, you must have some rough idea of what sort of things should one consider when building a portable software. Some general recommendations are:
- If you're starting out, stick to one C standard and one POSIX specification. C99 is a great baseline, and POSIX 2001 specification exposes sufficient system functionality to work with. POSIX 2008 is also a good specification to start with.
- When doing mathematics based operations, know the width of the data type which performs the operation. This avoids problems related to overflow and underflow.[4]
- If you use a system call to achieve a certain task (which you definitely will), make a wrapper function that will check which platform the program is running on, and act accordingly. For example, if we have multiple file descriptors to poll, we can use
epoll(2)on GNU/Linux-based system, or fallback topoll(2)orselect(2)on others. The wrapper function will abstract out the implementation-specific details and help the user achieve the task. - If you use a library function in your program (which you definitely will), make sure the function is available in the system. Most library functions can be implemented (or can be found on the web), so you can check for the availability of those functions and have those function calls within the conditional directives. If the function is not available, you need to fallback to custom implementation or copy implementation of others (accredit the author tho!)
- If you want to make a library instead, divide the interface and the implementation. The interface should expose public API that other programs can call, but the implementation need to be crafted to handle most implementation you intend to support. It is also a good practice to handle the API versioning to avoid any conflict in later releases. Instead of relying on the primitive data types, provide structures; either transparent or opaque. The latter one is preferred. The structure should contain all the necessary members to do specific operation. Most projects use this technique and calls it as context.
- Know the compiler that will be used to create the program from your source file(s). In most cases, knowledge of
gcc(1)andclang(1)will be sufficient. But they will have specific compiler options. If you have a program that is not behaving as expected, try to use debugger such asgdb(1)andlldb(1)and check if the compiler has performed any optimization. It's not really practical to ask the reader to understand all the compiler flags, but knowledge of the capability of compiler can be helpful at times.
GNU Autotool is a set of programs that is primarily used to configure the project based on the capability of the system. Along with configuring the project, it also provide program that creates the Makefile used to build the project. The programs are:
-
aclocal- Generateaclocal.m4by scanningconfigure.ac. -
autoheader- Create a template header forconfigure. -
automake- GenerateMakefile.infiles for configure fromMakefile.am. -
libtool- Provide generalized library-building support services. -
autoconf- Generate configuration scripts. -
autoreconf- Update generated configuration files.
The libtool program in Apple systems is not identical to GNU's libtool. You will probably need to install the GNU one separately through your package manager like Homebrew. Even after that, if you try the which command, you'll see:
$ which libtool
/usr/bin/libtool
One way to check if it is GNU's libtool is to use the --version flag:
$ libtool --version
error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: unknown option character `-' in: --version
...
As you can see, this is not the GNU's libtool. I have it installed through Homebrew and aliased as: glibtool:
$ which glibtool
/opt/homebrew/bin/glibtool
$ glibtool --version
glibtool (GNU libtool) 2.5.4
Copyright (C) 2025 Free Software Foundation, Inc.
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Originally written by Gordon Matzigkeit, 1996
(See AUTHORS for complete contributor listing)
Most modern project use the file configure.ac to describe the project's requirements and all other details. It should be noted that older projects used the file configure.in for the same purpose. m4 stands for macro. Indeed, like how Kubernetes is written as K8s, m4 is known as macro.
In the early stages of your project development, you won't need to deal with all the files that GNU Autotool deal with. The m4 macros specified in your configure.ac file are provided without any need to explicitly write them by yourself. The only files that you'll be writing is: configure.ac that will create a configure shell script, and Makefile.am that provides simple syntax to describe how to build the source file in your project.
The file configure.ac is used by all of the programs mentioned above. The configure shell script is created by autoconf and autoreconf.
Another program is available to the user, autoscan. This program scans the project directory and suggests the user to add macros that might be useful to make the project more portable.
In essence, a configure.ac file contains information such as:
- Boilerplate, options, and programs - macro such as
AC_PREREQ,AC_INIT,AC_PROG_CC, and such describe some basic information of the project, the programs that are required to build the project. - Libraries and headers - macros such as
AC_HEADER_STDC,AC_CHECK_HEADERS,AC_CHECK_LIB, and such are used to ensure the system contains all the necessary libraries and header files. - Type definitions and Structures - macros such as
AC_C_CONST,AC_TYPE_PID_T, and such are used to check if the compiler provides some type qualifier keywords or if the system exposes some type definitions. - Custom macros - One can define a function in a
.m4file (or inside am4directory) and use it for a specific purpose. If anm4directory exists for your project, you need to specify it (preferably in Boilerplate section) through the macroAC_CONFIG_MACRO_DIR. - Functions - macros such as
AC_CHECK_FUNCS,AC_REPLACE_FUNCSare used to check for the existence of functions provided by the system. One could also request the script to link to separate library (that includes the missing functions) if the system does not contain the functions. - Output - Specify the files that the
configurescript should build.
GNU Autotool creates a set of specialized scripts that allow the user to obtain necessary information to configure and build the project. Most of the checks are cached so that subsequent runs of the GNU Autotool program can execute faster.
Most of the macros provided cache the result so that configure.ac can be crafted to check the result of the macros later on. If you use the macro AC_CHECK_FUNC as: AC_CHECK_FUNC([strlcpy]), the result will be cached in the shell variable ac_cv_func_strlcpy, and we can instrument the configure.ac as per our need.
...
AC_CHECK_FUNC([strlcpy])
if test "$ac_cv_func_strlcpy" = yes; then
AC_DEFINE([HAVE_STRLCPY], [1], [Define to 1 if you have the 'strlcpy' function.])
else
echo "[WARN] System does not have strlcpy function."
fi
...
Another file that you'll work with is Makefile.am. This is a high-level configuration file that provides more readable form than the typical Makefile. Unfortunately, we still don't have much leeway when writing Makefile.am (for reference, check out The Uniform Naming Scheme section). But it is more human-readable and direct. I would advise the reader to look into the article amhello's Makefile.am Setup Explained in GNU's website. The automake program takes this Makefile.am file as input and creates a file Makefile.in that will be used by configure script to produce the final Makefile.
The entire process might seem convoluting and terrifying, but the good thing is that we have tons of documentation available on the web (or through GNU's ftp). The GNU Project Build Tools is a fantastic reference to all the intricate details of the GNU Autotool ecosystem.
We aren't limited to the creation of executable files with the use of Makefile.am. For example, if you have a lib directory within your project directory, you can have ./lib/Makefile.am that contains information to build a library out of the source files within the lib directory. You can also specify the public headers that will be available to the user of library.
Similar to how we can instrument the compiler options in a Makefile, Makefile.am allows the user to add extra flags for specific files used for compilation. It can be hard to follow the ordering when multiple programs are used to build the project. For instance, the configure may be setup with some default compilation flags, and it can conflict with the explicit flags we added in Makefile.am. Flag Varibales Ordering is an article that explains how compilation flags are ordered in your project.
Lastly, autoreconf is one of the program that should be appreciated. It handles all the other details that needs to be explicitly mentioned if we intend to use autconf. In a simple case, the following sequence of commands would be run in the shell script, as seen in Listing 16. We'll use the following flags in autoreconf: -f to mention that we will force the run and consider all generated and standard files are obsolete, -v to get the verbose output, and -i to copy missing standard auxiliary files.
$ ls -F
Makefile.am configure.ac
bootstrap* lib/
build/ src/
config/
$ # 'config/' will contain the auxiliary files,
$ # so that we don't have the project directory cluttered.
$ cat bootstrap
#!/bin/sh
autoreconf -fvi
$ ls -F src
Makefile.am foo.h
bar.c
foo.c
$ Create the 'configure' script based on your 'configure.ac' and other files
$ ./bootstrap
...
...
...
$ # It is generally a good idea to build the project in a separate directory
$ cd build
$ # Run the configure script to build the 'Makefile's required for the project
$ ../configure
$ # Now you can run 'make' and build the project
$ make
...
$ # We are still inside 'build'. But 'configure'
$ # script--if properly configured--will create necessary files and directories
$ ./src/foo
Hello, World!
References
- [0] A process can "clone" itself and make a new process. The process that is cloning is said to be the parent process whereas the newly created process is known as the child process. We need to request the kernel to do so. This is often termed as a system call. Histoically, the
fork(2)system call was created to do this job. Nowdays, some kernel--notably GNU/Linux--provide other system call such asclone(2)to achieve this task (and theposix_spawn(2)call), but with more fine tuning of the child's characteristics. Refer to your system's manual for more information. - [1] The system provides
execve(2)system call to execute a file. There exists multiple standard library function provided by the C standard that are essentially wrappers over theexecve(2)call. For waiting, there are multiple system calls too;wait(2)being the simplest. In essence, thewait(2)system calls blocks the calling process till a child process terminates or (if configured properly) is stopped/started by a signal. - [2] Although the preprocessor operators
__has_includeand__has_c_attributeare not necessarily conditional directive, they are used to check if the system exposes the provided system header or if the compiler supports the provided attribute supported by the C standard or by the compiler. - [3] If for some reason, you want to do something even when the symbol is already
#defined, you can do so as: #ifndef FOO_H
#define FOO_H
...
/*
* can also use an `#elif`, but it's not usual,
* neither is the option below...
*/
#endif /* FOO_H */
/* All the lines below will be included if FOO_H is defined */
#define STILL_DEFINED- [4] It is not always possible to work with numbers whose width is limited to 64 bits or even 128 bits. Cryptography is one of those study that heavily relies on operating on large numbers. The numbers can scale up to 2048-bits. It's fascinating to think that we can do such kind of calculation with good precision. GNU Multiple Precision Arithmetic Library is one of the project that provides the user to use interfaces to work with numbers that cannot be represented by the word size of the system.