The Unix Model
Programs
The program fragments shown below were written by me long ago. Such programs are shown to introduce reader to some of interface available in most UNIX-like systems. Despite the limited use of these programs in modern software implementation, several admonitions have been recently added to discuss interfaces used in the programs along with implementation specifics on Darwin and Linux.
Hello World
- Program
- Output
$ ./hello_world
Hello, World!
Startup Function
- Program
- Output
$ ./startup
[LOG] The only argument provided is the function name, which is: ./startup
$ ./startup foo bar baz
The provided arguments are:
1. ./startup
2. foo
3. bar
4. baz
Environment List
- Program
- Output
$ ./environ
[LOG] No additional arguments provided to the main process.
[LOG] The environment variables are:
0. TERM_SESSION_ID=w2t0p0:645A4F1F-1A9F-4F7A-AA69-50A58C471244
...
7. SHELL=/bin/zsh
...
10. LC_CTYPE=UTF-8
...
28. HOMEBREW_PREFIX=/opt/homebrew
...
$ ./environ foo
[LOG] The provided arguments are:
1. ./environ
2. foo
[LOG] The environment variables are:
0. TERM_SESSION_ID=w2t0p0:645A4F1F-1A9F-4F7A-AA69-50A58C471244
...
$ FOO=BAR ./environ
[LOG] No additional arguments provided to the main process.
[LOG] The environment variables are:
0. TERM_SESSION_ID=w2t0p0:645A4F1F-1A9F-4F7A-AA69-50A58C471244
...
43. FOO=BAR
...
$ BAR=BAZ ./environ2 fox
[LOG] The provided arguments are:
1. ./environ2
2. fox
[LOG] The environment variables are:
0. TERM_SESSION_ID=w2t0p0:645A4F1F-1A9F-4F7A-AA69-50A58C471244
...
43. BAR=BAZ
...
Check Environment Variable
- Program
- Output
$ ./check_env
HOME = /Users/pranavramjoshi
[LOG] The current address stored in env_var_ptr is: 0x16fb8b9fc
TERM = xterm-256color
[LOG] The current address stored in env_var_ptr is: 0x16fb8b9e8
SHELL = /bin/zsh
[LOG] The current address stored in env_var_ptr is: 0x16fb8b686
Process IDs
The file process_id.c contains comment mentioning the special processes with process ID 0 and 1.
On XNU (kernel of Darwin Operating System), the process with process ID of 0 is known as kernel_task. A Mach routine (Mach is the microkernel found in the XNU hybrid kernel) called task_for_pid is provided which returns the task information for the respective process ID. Some exploits were found for process ID of 0. Now, XNU's implementation of task_for_pid strictly forbids obtaining information about process ID 0.
Process ID of 1 on Darwin is called launchd. The manual for launchd(8) mentions the following:
During boot launchd is invoked by the kernel to run as the first process on the system and to further bootstrap the rest of the system.
Finally, the system calls getpid(2) and getppid(2) will always return successfully, i.e., there is no error condition defined for these calls.
- Program
- Output
$ ./process_id
The current process id is: 67471
The parent process id is: 64794
User Group IDs
The system calls: getuid(2), geteuid(2), getgid(2), and getegid(2) will always return successfully. No error condition is defined.
- Program
- Output
$ ./user_group_id
The real user ID is: 502
The real group ID is: 20
The effective user ID is: 502
The effective effective group ID is: 20
# ./user_group_id
The real user ID is: 0
The real group ID is: 0
The effective user ID is: 0
The effective effective group ID is: 0
Passwd
The function getpwuid(3) is marked as thread-safe in Darwin, but on linux, RETURN VALUE section of getpwuid(3) manual states the following:
The return value may point to a static area, and may be overwritten by subsequent calls to getpwent(3), getpwnam(), or getpwuid(). (Do not pass the returned pointer to free(3).)
Furthermore, under ATTRIBUTES section of getpwuid(3) on linux (see attributes(7)), getpwnam(3) and getpwuid(3) are marked as:
getpwnam(): MT-Unsafe race:pwnam locale
getpwuid(): MT-Unsafe race:pwuid locale
Hence, this call is unsafe if the program is multi-threaded. Data race may be noticed for this function. Since there is no explicit mention of thread-local storage (TLS) being used, we need to assume that global storage may be in use. For instance, a naive implementation would be:
struct passwd *
getpwuid (uid_t uid)
{
/*
* A variable inside a function with a storage class of 'static'
* does not use the stack but a persistant storage, either DATA
* or BSS section of executable.
*/
static struct passwd res;
...
return (&res);
}
Here, concurrent call to this function from different threads of the same program could cause read or write race.
On Darwin, thread-specific storage is used and pointer to it is returned. However, the follwoing remark is made:
These [getpwnam(3), getpwuid(3), and getpwuuid(3)] routines are therefore unsuitable for use in libraries or frameworks, from where they may overwrite the per-thread data that the calling application expects to find as a result of its own calls to these routines. Library and framework code should use the alternative reentrant variants detailed below.
In general, if a library or a framework was to invoke these routines in conjunction with the applications calling these routines, the application might retrieve information which was not as expected. Consider a simple code fragment below:
...
struct passwd *my_passwd_entry;
my_passwd_entry = getpwuid(getuid());
...
/*
* Assume that 'libfoo_check_user_by_id()' is a library function
* which internally also calls 'getpwuid()'. For the sake of
* brevity, we'll assume that the hypothetical variable 'res'
* contains the information about the user that is filled by
* this function.
*/
libfoo_check_user_by_id(0, res);
/*
* If the return value of 'getuid()' is not zero, then the
* variable 'my_passwd_entry' will now point to a structure
* which contains passwd entry of user ID 0 and not the one
* whose user ID corresponds to 'getuid()'.
*/
...
To make matters more complicated, if an entry is not found, the value of errno is not well defined (see ERRORS and NOTES section of getpwuid(3) on Linux). The reentrant variants--which have the _r suffix--would have the result argument (see SYNOPSIS section of getpwuid(3)) as NULL along with a return value of 0.
- Program
- Output
$ ./passwd
[LOG] The size of the struct passwd is: 72
[LOG] User Name: pranavramjoshi
[LOG] Encrypted Password: ********
[LOG] User UID: 502
[LOG] User User GID: 20
[LOG] User Password Change time: 0
[LOG] User User Access Class:
[LOG] User Honeywell Login Info: Pranav Ram Joshi
[LOG] User Home Directory: /Users/pranavramjoshi
[LOG] User Default Shell: /bin/zsh
[LOG] User Account Expiration: 0
[LOG] User UID: 0
# ./passwd
[LOG] The size of the struct passwd is: 72
[LOG] User Name: root
[LOG] Encrypted Password: *
[LOG] User UID: 0
[LOG] User User GID: 0
[LOG] User Password Change time: 0
[LOG] User User Access Class:
[LOG] User Honeywell Login Info: System Administrator
[LOG] User Home Directory: /var/root
[LOG] User Default Shell: /bin/sh
[LOG] User Account Expiration: 0
[LOG] User UID: 0
Group
Manual for function getgrgid(3) (under BUGS) on macOS states:
The functions getgrent(), getgrnam(), getgrgid(), getgruuid(), setgroupent() and setgrent() leave their results in an internal thread-specific memory and return a pointer to that object. Subsequent calls to the same function will modify the same object.
There also exists reentrant variant of this call. Similar to most reentrant alternatives of other functions, the suffix _r is used; getgrgid_r(3)
- Program
- Output
$ ./group
[LOG] The information regarding the getgrgid for real group ID is:
[LOG] Group Name: staff
[LOG] Group Password: *
[LOG] Group ID: 20
[LOG] Group members:
[MEM LOG] 1. root
[MEM LOG] 2. pranavramjoshi
[LOG] The information regarding the getgrnam for group name staff is:
[LOG] Group Name: staff
[LOG] Group Password: *
[LOG] Group ID: 20
[LOG] Group members:
[MEM LOG] 1. root
[MEM LOG] 2. pranavramjoshi
# ./group
[LOG] The information regarding the getgrgid for real group ID is:
[LOG] Group Name: wheel
[LOG] Group Password: *
[LOG] Group ID: 0
[LOG] Group members:
[MEM LOG] 1. root
[LOG] The information regarding the getgrnam for group name staff is:
[LOG] Group Name: wheel
[LOG] Group Password: *
[LOG] Group ID: 0
[LOG] Group members:
[MEM LOG] 1. root
File
- Program
- Output
$ ./file
[SUCCESS] The stats of the file has been stored successfully.
[LOG] The size of the file (in bytes): 33585
[LOG] The number of blocks allocated for this program is: 72
[LOG] The block size is: 4096
[LOG] The device ID for the current file is: 0
[LOG] The last access of the file has been (in seconds): 1773759265
[LOG] The user defined flags for the file is: 0
$ ./typeof_file .
.: directory
$ ./typeof_file ./typeof_file
./typeof_file: regular
$ ./typeof_file /dev/null
/dev/null: character special
$ ./typeof_file /tmp/warp_service
/tmp/warp_service: socket
$ ./typeof_file ./typeof_file.c .. /dev/stdout
./typeof_file.c: regular
..: directory
/dev/stdout: character special
Process Group
- Program
- Output
$ ./process_group
getpgrp return value is: 70193
getpgid return value is: 70193
Time of Day
The manual for times(3) on macOS states the following:
This interface is obsoleted by getrusage(2) and gettimeofday(2).
- Program
- Output
$ ./time_of_day
The returned time in seconds is: 1773760215
[LOG] The value of times_ret is: 177376021512
User CPU time is: 0
System CPU time is: 0
Open File
A process may need to create file(s). A file on Unix has access permissions associated with. There are 9 field on a file regarding its access permissions, which commands such as stat(1) displays. The st_mode field of structure returned from stat(2) family of functions holds the value regarding the access permissions. The 9 fields are divided into three categories (user, group, and other):
st_mode mask | Meaning |
|---|---|
| S_IRUSR | user-read |
| S_IWUSR | user-write |
| S_IXUSR | user-execute |
| S_IRGRP | group-read |
| S_IWGRP | group-write |
| S_IXGRP | group-execute |
| S_IROTH | other-read |
| S_IWOTH | other-write |
| S_IXOTH | other-execute |
It should be noted that these symbols are defined as octal integer constants (integer constants that start with 0) and not the typical decimal numbers.
When a process creates a file, the kernel checks for the process's file mode creation mask and appropriately sets the access permission for newly created file. Notice the following line from umask(2) system call on macOS:
The default mask value is S_IWGRP | S_IWOTH (022, write access for the owner only). Child processes inherit the mask of the calling process.
It implies that if the process was to create a file that has group-write or other-write access, it must first clear off this mask using umask(2) and only later call functions such as open(2) or creat(2). Do note that creat(2) on macOS is marked as obsolete. If a process attempts to create a file with group-write or other-write access without changing its mask, the file will be created but the default file mode creation mask will be enforced; creating a file with no group-write and no other-write access.
- Program
- Output
$ ./open_file
[STAT LOG] The mode of the file is: 100644
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 128
[WRTIE LOG] Total bytes written is: 128
[READ LOG] Total bytes read is: 57
[WRTIE LOG] Total bytes written is: 57
[READ LOG] Total bytes read is: 0
[WRTIE LOG] Total bytes written is: 0
[LOG] The total bytes read and written is: 4409
$ ls
Makefile clone_file.c open_file open_file.c
Signal
Covering signal in its entirety here is not possible. Before POSIX, only one interface was available to install handlers for signal; through the signal(3) function. Even the manual for this function states the following:
This signal() facility is a simplified interface to the more general sigaction(2) facility.
In fact, the code below uses interfaces that has been superseded by newer interfaces.
| Old Interface | Purpose | New Interface |
|---|---|---|
| signal | Install a signal handler for a signal | sigaction |
| sigblock | Block a signal from being delivered | sigprocmask |
| sigpause | Mask a set of signal and wait for signal to arrive | sigsuspend |
| sigsetmask | Set the current signal mask | sigprocmask |
| pause | Pause (wait) until a signal is received | sigsuspend |
On historical systems such as System V and BSD, signal(3) interface was implementation specific and had behaviors which would be deemed as "unreliable". HISTORY section of signal(2) manual on Linux explains this in great detail. In short:
- Despite the call to
signal(3), the signal handler would reset once the signal is delivered. This required callingsignal(3)again inside the handler to re-install the handler. A condition could appear where the the same instance of signal is delivered before the call tosignal(3)is completed inside the signal hanlder, causing default handling of signal. - Inability to block the signal resulted in unreliable signal. Before
sigsetmask(and newersigprocmask) interface was introduced, the only way to wait on a signal was throughpause(3). Not much information could be conveyed through this call, and we could end up in a situation where the signal is delivered before we wait onpause(3), resulting in unexpected behavior.
NOTE: If the source code uses signal(3) interface instead of sigaction(2), it is wise to refer to system's manual so as to learn the behavior of the implementation-defined interface.
The interface sigblock(3) and sigsetmask(3) may appear confusing. Masking a signal implies that the signal is blocked from being delivered. In essence, both sigblock(3) and sigsetmask(3) perform the same operation: block a signal from being delivered until it is later unblocked. One of the difference is the fact that sigblock(3) adds the signal to the masked signal set for the process (although signal mask now is a per-thread attribute) whereas sigsetmask(3) sets the current signal mask for the process (again, signal mask is now a per-thread attribute).
If a signal is generated for a process that has it blocked (masked), then it is said to be in a pending state. The signal is only delivered once it is unblocked (unmasked). On Linux, standard signals (known as reliable signals) are not queued but real-time signals are. If multiple instances of a standard signal is generated for a process that has it blocked, only one instance of the signal is delivered once the signal is unblocked. Conversely, real-time signals are queued. An implementation--according to POSIX--should allow at least _POSIX_SIGQUEUE_MAX (32) real-time signals to be queued to a process.
The function sigpending(3) can be used to check which signals are pending for a process. This function does not have any error condition on macOS but has one on Linux (EFAULT). On Linux, if a signal has the SIG_IGN handler (ignore the signal) and is blocked, then it is not added to the mask of pending signals when generated.
Observe that fprintf(3) is called inside my_intr function in signal.c. Do note that fprintf(3) is not considered async-signal-safe. Since fprintf(3) (and other printf(3) family of functions) uses internal buffers to hold data before flushing it, it is inherently not reentrant. The function is called in this file for testing purpose only. On macOS, sigaction(2) lists out various async-signal-safe interfaces (see signal-safety(7) for Linux). Do note that not all signals are asynchronous in nature. For example, the SIGSEGV (segmentation violation) signal is generated if an invalid address is dereferenced, which is synchronous in nature.
Finally, the signature of a signal handling function differs in ANSI C and POSIX.
| Standard | Signature type definition |
|---|---|
| ANSI C | typedef (void) (*signal_handler_t) (int); |
| POSIX | typedef (void) (*posix_signal_handler_t) (int, siginfo_t *, void *); |
- Program
- Output
$ ./signal
[LOG] The previous signal bit mask is: 0
[LOG] The function sigprocmask has returned the value: 8196
The signal has not been handled.
The signal should have been handled by now...
[LOG] The mask inside the for loop is: 8198
Hello, World!
^C
The flag has been modified!
Encountered the SIGALRM signal interrupt.
$ ./signal2
[LOG] Encountered the SIGALRM signal, handled by handle_sigalrm. sig_id: 14
[LOG] The master flag may have been invoked...
Resetting the signals, now, the SIGINT flag can be handled.
Fork
The fork(2) system call is one of the fundamental function since the dawn of UNIX. The process hierarchy in a UNIX system contains the 'init' process (which has the process ID of 1) as the first process and every other process is a descendent of it; either direct or indirect.
fork(2) has several error conditions defined which is available in the ERRORS section of the manual. If the call to fork(2) is successful, a copy of the caller's process is made. The return value from fork(2) is distinct for the parent and child process. The parent process returns a value of PID of the newly created child process whereas the child process returns a value of 0. It is adviced to read the fork(2) manual to understand some of the implementation defined behavior of cloning a process. Linux provides a system call which allows fine-tuning of child process through the clone(2) system call.
Many implementations employ Copy-on-Write (CoW) for the fork(2) procedure. It means that the child process inherits the virtual memory space of the parent process and only upon modification, a separate copy is made.
In usual scenario, a fork(2) call is followed by a subsequent exec(3) call (on the child process). POSIX has a distinct function to achieve this common functionality: posix_spawn(2).
When the parent process terminates before the child process, the new parent process of the child process (fetched using getppid(2)) will be the init process (process ID 1). If the child process exits and the parent process does not wait(2) for it, the child process is said to be a zombie process.
NOTE: Program below does not really have a well-defined behvaior. It is uncertain as to which process will terminate first since the parent process fails to wait(2) for the child process. It is for illustration purpose only.
- Program
- Output
$ ./fork
[PARENT] This is the parent process.
[PARENT] The process id is: 92163 and the child process id is: 92164
[CHILD] This is the child process.
[CHILD] The process id is: 92164 and the parent process id is: 92163
Exec
exec(3) family of functionThe entry function of a user program main (usually) has the following signature:
int
main (int argc, char **argv)
{
...
}
The pedantic signature of the entry point is:
int
main (int argc, char **argv, char **envp)
{
...
}
Notice that the environment variables are also taken into account for a program. Environment variables are useful when creating software which considers the availability in the host system. For example, to support colored ouput of texts, the program might want to check if the underlying terminal emulator supports it.
The signature of main on Darwin systems allows an additional argument: apple. This has the same var=val form that is seen for environment variables. Chapter 7 of MacOS and iOS Internals, Volume 1 (specifically The apple[] argument vector section) explains the various variables and their usage.
Note that name of the parameters are trivial, i.e., we could have a program as:
int
main (int foo, char **bar, char**baz)
{
...
}
argc, argv, and envp are used for readability and as a convention.
Getting back, the exec(3) family of functions replaces the current process image with a new process image.
Functions other than execle(3) and (Linux-specific) execvpe(3) take the environment for the new process image from the external variable environ (see environ2.c on [Environment List]).
Functions execlp(3), execvp(3), (Darwin-specific) execvP(3), and (Linux-specific) execvpe(3) initially checks their first argument and if it does not contain a slash character /, the environment variable PATH (of the caller's environment variable) is used to locate the file that will be executed. If PATH is not defined:
- On Darwin, default path is set according to the
_PATH_DEFPATHdefinition in<paths.h>, which is set to "/usr/bin:/bin". The functionexecvP(3)requires the caller to specify the search path. - On Linux, the path list defaults to a list that includes the directories returned by
confstr(_CS_PATH), which typically returns the value "/bin:/usr/bin".
It should be obvious that upon a successful call to exec(3) family of functions, there will be no return to the original caller. If this call returns, it implies some error has occurred. There may be unusual cases where the error occurs but no return value is observed. The manual for execve(2) (the actual system call whose front-ends are the functions described above) on Linux mentions the following under NOTES section:
However, in (rare) cases (typically caused by resource exhaustion), failure may occur past the point of no return: the original executable image has been torn down, but the new image could not be completely built. In such cases, the kernel kills the process with a SIGSEGV (SIGKILL until Linux 3.17) signal.
- Program
- Output
$ ./exec
[PARENT LOG] This is the parent process.
[PARENT LOG] Executing the ls command...
[CHILD LOG] Launching the dummy executable...
Hello, World!
total 176
drwxr-xr-x 8 pranavramjoshi staff 256 Mar 19 15:07 .
drwxr-xr-x 19 pranavramjoshi staff 608 Oct 17 2024 ..
-rw-r--r-- 1 pranavramjoshi staff 191 Sep 25 2024 Makefile
-rwxr-xr-x 1 pranavramjoshi staff 33426 Mar 19 15:07 dummy
-rw-r--r-- 1 pranavramjoshi staff 83 Sep 25 2024 dummy.c
-rwxr-xr-x 1 pranavramjoshi staff 33553 Mar 19 15:07 exec
-rw-r--r-- 1 pranavramjoshi staff 1918 Sep 27 2024 exec.c
-rw-r--r-- 1 pranavramjoshi staff 494 Oct 17 2024 usage.txt
Daemon
A daemon process is such process which does not directly interact with a user. The program below shows an (older) implementation of the daemon(3) function; daemon_start. The implementaion does the following:
- Ignore terminal generated signals.
- Fork, the parent process exits immediately while the child process continues.
- Disassociate from the controlling terminal. (implementation differentiates between System V and BSD)
- Close all the file descriptors.
- Reset
errno. - Change the current working directory of process to root directory.
- Clear out any inherited file mode creation mask.
- Install
SIGCHLD(SIGCLDon System V) handler if specified by the caller.
A process is associated with a session. A new session is created using setsid(2). The daemon(3) implementation internally uses this function. The notion of session is used for job control. A session is a collection of process groups. In a session, one of the process group may be the foreground process group while others are background process groups. The term may is used since a session is not required to have foreground process group, but atmost one process group can be the foreground process.
For further reading, refer to [StackOverflow: Use and meaning of session and process group in Unix?] and [POSIX definitions]
Do note that System V's sematic of setpgrp(2) is described in Linux's setpgid(2) as:
setpgid() sets the PGID of the process specified by pid to pgid. If pid is zero, then the process ID of the calling process is used. If pgid is zero, then the PGID of the process specified by pid is made the same as its process ID.
The System V-style setpgrp(), which takes no arguments, is equivalent to setpgid(0, 0).
In essence, setpgrp(2) on System V was used to make the caller the process group leader of its process group. While BSD's variant does explicit open(2) to /dev/tty file and issues a TIOCNOTTY ioctl(2) command, System V's setpgrp(2)--while making the caller the process group leader--disassociates itself from its controlling terminal (if it has one). Section 2.6 of text (under Disassociate from Control Terminal; p.78) mentions the following:
In addition to making the calling process a process group leader, as described above, if the process is not already a process group leader when it calls
setpgrp, this call disassociates the process from its control terminal. Therefore we can use this system call to do two things--remove ourself from the inherited process group and disassociate from the inherited control terminal.
The program shown below is a trivial one. There is no work done once the daemon is created and it eventually exits out. Nevertheless, this program is used mostly for educational purpose as functions such as daemon(3) exists.
NOTE: On macOS, the canonical way to create daemon process is not through POSIX-defined interface but through launchd(8). Likewise, section New-Style Daemons of daemon(7) on Linux provides verbatim information on creating a daemon process.
- Program
Exercises
Question 2.1
If a process modifies an environment variable, by changing one of the strings pointed to by the environ pointer, what effect does this have on its parent process? What effect does this have on any child processes it invokes?
- Answer
- Program (a)
- Program (b)
When a child process modifies the environment variable, the effect is not propagated to the parent process. However, if a process modifies a value of an environment variable before it forks to clone the process, the child process inherits the modified value.
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
extern char **environ;
const char *home = "HOME=";
int main (int argc, char **argv)
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
extern char **environ;
const char *home = "HOME=";
int main (int argc, char **argv)
Question 2.2
What effect does the following have:
setuid(getuid());
(Hint: refer to BSD line printer spooler client in Section 13.3)
- Answer
From the man-page for getuid(2):
The
getuid()function returns the real user ID of the calling process. The real user ID is that of the user who has invoked the program. Thegetuid()function is always successful, and no return value is reserved to indicate an error.
Similarly, the man-page for setuid(2) states that:
The
setuid()function sets the real and effective user IDs and the saved set-user-ID of the current process to the specified value. Thesetuid()function is permitted if the effective user ID is that of the super user, or if the specified user ID is the same as the effective user ID. If not, but the specified user ID is the same as the real user ID,setuid()will set the effective user ID to the real user ID.
In Chapter 13, this expression is used after obtaining a reserved port through tcp_open.
To obtain a reserved port, the user who invoked the program must be root or the program be
set-user-ID root. For the latter scenario, after obtaining the socket using the reserved port,
we have no reason to keep the process under high privilege. We also do this to assure that
we can't read files as root that the user doesn't have normal access to.
Question 2.3
Both the functions getpwuid and getpwnam return a pointer to a structure that the function fills in. Where do you think this structure is stored? (Check the appropriate manual pages for your system.)
- Output
The DESCRIPTION section (of macOS) for these functions states that:
These functions obtain information from
opendirectoryd(8), including records in/etc/master.passwdwhich is described inmaster.passwd(5). Each entry in the database is defined by the structure passwd found in the include file<pwd.h>:struct passwd {
char *pw_name; /* user name */
char *pw_passwd; /* encrypted password */
uid_t pw_uid; /* user uid */
gid_t pw_gid; /* user gid */
time_t pw_change; /* password change time */
char *pw_class; /* user access class */
char *pw_gecos; /* Honeywell login info */
char *pw_dir; /* home directory */
char *pw_shell; /* default shell */
time_t pw_expire; /* account expiration */
int pw_fields; /* internal: fields filled in */
};The functions getpwnam(), getpwuid(), and getpwuuid() search the password database for the given login name, user uid, or user uuid respectively, always returning the first one encountered.
Apple-based devices use the opendirectoryd(8) daemon to look up for the passwd entry corresponding to the given name or user ID.
For Linux, there are two manual pages; library function and posix definition. Interested readers can look through the POSIX definition but I'll talk about the library function itself.
The structure passwd defined is as:
struct passwd {
char *pw_name; /* username */
char *pw_passwd; /* user password */
uid_t pw_uid; /* user ID */
gid_t pw_gid; /* group ID */
char *pw_gecos; /* user information */
char *pw_dir; /* home directory */
char *pw_shell; /* shell program */
};
As for the question, the manual states that the pointer to the structure contained in the password database (e.g., the local password file /etc/passwd, NIS, and LDAP). Refer to passwd(5) for details regarding the password file.
Question 2.4
Some network servers compare a user's encrypted password with the pw_passwd field in the passwd structure. What happens when the server is running on a system that has a shadow password file?
- Answer
A shadow file is a text file found inside the etc directory, /etc/shadow. This file contains one entry per line, where fields in the entry are separated by a colon, :. The fields are as follows, in order:
- login name
- encrypted password
- date of last password change
- minimum password age
- maximum password age
- password warning period
- password inactivity period
- account expiration date
- reserved field
The manual for shadow(5) further refers to crypt(3) as this seems to be the function used to encrypt the password with a given salt using DES. Although it should be noted that glibc provides a workaround to use other encryption standards. Also realize that this file provides additional information about the user such as the idea of "password age" and such. This allows greater control over access to system. So a network server using a shadow password file can verify if the user is still able to log on to the system.
In essence, the user's identification is first consulted in /etc/passwd file. In Linux, if the password field in this file is 'x', then it signifies that the actual hashed password is stored in /etc/shadow file.
Question 2.5
Investigate the access system call (which we have not described here) on your Unix system. We'll use this system call in the remote shell server in Section 14.3. Write a similar function that uses the effective user ID and the effective group ID.
- Answer
- Program
According to access(2) manual page for Linux, this function checks whether the calling process can access the file pathname--the first argument to the function. If pathname is a symbolic link, it is dereferenced. mode (the second argument) is either the value F_OK, or a mask consisting of the bitwise OR of one or more of R_OK, W_OK, and X_OK. The check is done using the calling process's real UID and GID.
Attempting to clone a system call is not a trivial task. But it does not mean we won't be able to mimic the functionality of calls such as access(2). In our case, we'll depend on the stat(2) system call that does some things for us, such as:
- As required by the question, the pathname is located using the effective user ID and effective group ID. Realize that
access(2)uses real UID and real GID. - We don't explicitly dereference a symbolic link file. In fact,
stat(2)resolves the symbolic link automatically, unlikelstat(2).
Apart from this, it's safe to say that this is a very basic program. But it gets the job done.
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdlib.h>
#define MY_R_OK 0x4 /* read */
#define MY_W_OK 0x2 /* write */
#define MY_X_OK 0x1 /* execute */
Question 2.6
If multiple processes are appending records to a file, is there any differences between having each process open the file with the O_APPEND flag, versus having each process issue an lseek to the end of the file before each write?
- Answer
On Linux, the manual for open(2) call contains the following remarks for O_APPEND flag:
O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as if
with lseek(2). The modification of the file offset and the
write operation are performed as a single atomic step.
O_APPEND may lead to corrupted files on NFS filesystems if
more than one process appends data to a file at once. This
is because NFS does not support appending to a file, so the
client kernel has to simulate it, which can't be done
without a race condition.
As NFS protocol can't atomically perform to: read the current file size, seek to the end of the file, and write the new data; this can cause race condition. We won't consider the lseek scenario for NFS as it's a protocol and it will introduce race conditions without extra handling.
So multiple processes are able to work on a file using O_APPEND flag. But in case of using lseek(2) before write(2), we may encounter early context switch which might not provide expected outcome. Since these both system calls are not performed atomically, the kernel's job scheduler might preemptively suspend the operation before write(2) call. While the question suggests we perform SEEK_END with lseek(2) and this doesn't necessarily introduce the race condition due to write(2) updating the offset, there is a race condition for "logical order of appended data". Consider a simple scenrio:
- The file's data we're expecting is "Pranav Joshi". We have two threads; A and B, who are responsible to write each word. Thread A writes "Pranav " and thread B writes "Joshi".
- Thread A
lseek(2)toSEEK_END. Before it could write anything, it is preempted by the kernel. - Thread B resumes. Consider that it was able to perform both
lseek(2)andwrite(2)in the time quantum it was given.lseek(2)won't really make much difference as we're aiming for theSEEK_END. It thenwrites to the file, thereafter updating the offset. - Thread A resumes. It will write the word on the updated offset. This results in the content being "JoshiPranav ".
This scenario is worsened for multiple processes having their own descriptors. If the context switch took place before the first process could write the word, and the other process was able to perform both lseek and write, it would not update the offset for the first process as they are independent descriptors. Later if the first process writes the data, it will overwrite the content the second process previously wrote since the offset was set before context switch took place.
Question 2.7
Implement the sleep function using the alarm system call. Be sure to handle the case of an alarm that is already set. Do you need reliable signals to do this correctly?
- Answer
- Program
In our context, a reliable signal is such signals that are guaranteed to be received by the process. On older Unix systems, signals were unreliable and this caused race conditions where signals could get lost--an event could occur to generate a signal but the process would never get notified. Consider the program fragment shown in text:
int flag = 0; /* global variable set when SIGINT occurs */
...
for (;;) {
while (flag == 0) {
pause(); /* wait for a signal to occur */
}
/* the signal has occurred, process it */
...
}
If a signal occurs after the test of the flag variable (in the while loop), but before the call to pause, the signal can be lost. A simple remedy is also shown:
int flag = 0; /* global variable set when SIGINT occurs */
for (;;) {
sigblock(sigmask(SIGINT));
while (flag == 0) {
sigpause(0); /* wait for a signal to occur */
}
/* the signal has occurred, process it */
...
}
Notice that using sigblock before we check the flag variable assures that any SIGINT is blocked by the process. Later, sigpause is called with 0, meaning to pause the process until any signal is received by the process. The text compares the signal functions for BSD and System V but the one shown above is the BSD version. System V has similar ways to enforce reliable signals, but is limited to function's ability to work with one signal at a time rather than using the idea of "mask".
My implementation utilizes the POSIX defined signal functions. As the sleep(3) function states, it returns if a signal is received by the calling process. The return value on such scenario is the unslept time. As such, we don't need the "reliable" signal as described above. What we want is to simplpy block the SIGALRM signal before we call alarm(3). Moreover, sigsuspend(3) is used instead of sigpause(3) whose behavior is described in the source file. For the sake of brevity, we only handle SIGINT and SIGQUIT as signals that might interrupt the process. If such signal was received by the process, the call to my_alarm will return the unslept time. Note that other process terminating signals will indeed terminate the process, and sigsuspend never returns.
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
void empty_handler (int signum); /* handler for SIGINT and SIGQUIT */
void sigalrm_handler (int signum);
unsigned int my_sleep (unsigned int seconds);
Question 2.8
Why doesn't fork return the process ID of the parent to the child and return zero to the parent?
- Answer
There are couple of reasons for this design choice. I'll try to provide sufficient reasons below:
- The
waitfamily of function is used by the parent process to determine the child that terminated--either gracefully or through signals. If the parent process received 0 instead of the child process ID, there would be no way for the parent to determine the child process. - The child process can determine its parent process's ID through
getppid(2)and it's own id throughgetpid(2). There is no such system call which determines the child process ID for a process.
Question 2.9
Implement the system function. (Hint: see Kernighan and Pike [1984].)
- Answer
- Program
The manual for system(3) on Linux states that:
The
system()library function behaves as if it usedfork(2)to create a child process that executed the shell command specified in command using execl(3) as follows:execl("/bin/sh", "sh", "-c", command, (char *) NULL);
system()returns after the command has been completed.
It further mentions that signals such as SIGINT and SIGQUIT will be ignored and blocks the SIGCHLD signal. Whenever the command argument is NULL, it checks for the availability of Bourne shell (/bin/sh).
The implementation of this function relies heavily on the manual but it is not ideal. Signals needs to be handled with care and there's probably some thing I've missed out. But it is indeed working as described in the manual.
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
int my_system (const char *command);
Question 2.10
If a process that is run by a shell sets its file access creation mask to zero, does this affect other processes that are run after it by the shell?
- Answer
To answer this, we first need to understand how the shell executes command. Whenever a command is entered in the shell, the shell process forks itself and the parent process is kept in the waiting state. It is the child process who is responsible for executing the command. The NOTES section of umask(2) manual on Linux states that:
A child process created via fork(2) inherits its parent's umask. The umask is left unchanged by execve(2).
Consider that we made a process who's core functionality is to modify the file access creation mask, like how the question mentions. Unlike the descriptors--which points to the same "file table" entries after fork(2), the file access creation mask is inherited, implying that any changes made to the mask later in the child process is not reflected back in the parent process. Any file created by this [child] process uses the modified mask but after the process terminates, the parent [shell] process resumes from waiting. The shell process persists it's file access creation mask. Any process afterward still uses the file access creation mask used for the shell process.