-- CSCI 4730 / 6730 Operating Systems: Project 1 : YOSH --

Project 1: YOSH: (Y)our (O)wn (S)imple S(H)ell

Assignment Day	August 17, 2010 (Tuesday)
Due Date	August 31, 2010 (Tuesday)

Collaboration Policy - Read Carefully

You must work on this project individually, but you may discuss this assignment with other students in the class and ask and provide help in useful ways, preferable over our email list so we can all benefit from your great ideas. You may consult (but not copy) any outside resources you including books, papers, web sites and people (but no penguins or sea urchins).

If you use resources other than the class materials, indicate what you used along with your answer.

Objective:

The main objective for this assignment, is of-course for you to familiarize yourself with the plain and powerful language C and the UNIX environment. I hope you will get a deeper knowledge of process creation and control, what a shell does and signals & pipes.

Key Concepts:

C programming
Process creation control
A simple shell

Tutorial / References

C programming/tutorials:

There are several C resources on the net. Here are some that may be useful:

C programming tutorial (Stephen Holmes):
http://www.strath.ac.uk/IT/Docs/Ccourse/

C programming tutorial by Brian W. Kernighan:
http://www.lysator.liu.se/c/bwk-tutor.html

GDB (GNU Debugger) tutorial:
http://www.cs.cmu.edu/~gilpin/tutorial/

C Frequently Asked Questions:
http://www.eskimo.com/~scs/C-faq/top.html

If you find other resources please post links on the mailing list for the class.

Background:

A shell is a command line interpreter, it is the program you interface with in a terminal or windows. For example, when you log onto {atlas}, the shell gives you a prompt, typically '% " or "$ " and then waits to interprets commands that you type on the command line. Unix of-course, gives you a wide selection of shells to choose between, csh, bash, tcsh or ksh (my favorite). In ksh you can search you command line history and re-use (and edit) old commands.

For this assignment you will write one program, a shell, yosh, that needs to compile and run on a computer system on{atlas}.

A typical flow of commands of a shell is listed below - at the high level a shell is an infinite loops, a big while loop, that infinitely waits for your command. Imagine that?

print a prompt
read a line of input
parse the line to determine
1. the name and parameters of programs to execute
2. any pipes or input/output redirections
set up any pipes or input/output redirections
use the fork system call to create a child process:
1. the child process calls the exec system call to execute the specified program
2. the parent process (your shell) waits for the child process to complete (with the waitpid system call) or continues, if the child process is to be run in the background
repeat

Since most of you don't have prior experience with parsing programs that converts text into data structures, that will be provided for you, Yay! Here is a link to a zip file for a simple parser [shellWithParser.zip] and a rough compilable skeleton (the big while loop).

Most shells support built-in commands, such as cd, time, and also programming language features such as loops (e.g., for and while loops). Your shell, yosh, on the other hand, is a much simpler shell, and will only need to support one build in feature - exit, which terminates your shell program.

Your shell will also support input/output redirection and supports pipes. Redirections allow us to specify that the input for a program should be read from a file, or that the output of a program should be written to a file. The following example shows using redirection to provide the sort command with input from a file called words.txt, and writing the output of sort to a file called sorted_words.txt.

sort < words.txt > sorted_words.txt

Pipes provide a mechanism to use the output of one program as the input to another program. Pipes encourage modularity; instead of writing one program that does everything, we can write many small programs that each do one thing well and enable them to cooperate by sharing data via pipes. For example, let's say I want to see how many.txt files (ends with the characters, .txt) are in my home directory:

{osisfun} find . -type f | grep '.txt$' | wc -l
3

The grep and wc commands will take input from the terminal, from a file, or from another program -- in fact, they don't make any assumptions about where their inputs come from. (Some programs check to see what kinds of outputs they have -- for example, the gzip program won't write compressed data to your terminal unless you tell it to.) This state of affairs is far preferable to the alternative: a world in which we had to write one version of a program to read input from a terminal, one to read from a pipe, and one to read from a file!

How you're going to do it

The bare essentials

First, implement an extremely simple shell in C. You will follow the basic algorithm given above, and should use the parser routine that is, as well as a simple skeleton shell program. You need to use the fork, exec, wait, and waitpid system calls.

Your shell should display a prompt string (more details later). You will use the readline to get standard input, the parse routine to split this line into tokens, and then fork and exec.

If the user enters exit, your shell should terminate with the exit call.

A command line ending with an ampersand (&) indicates that the command is to run in the background. Recall that this means your shell will not waitpid for the child process. When a background process exits, your shell will receive the SIGCHLD signal; you should install a signal handler in order to process this notification. Here is the code you'll need to add to install a signal handler:

#include <signal.h>

void handle_sigchld( int s )
{
    /* execute non-blocking waitpid, loop because we may only receive
     * a single signal if multiple processes exit around the same time.
     */
    while( waitpid(0, NULL, WNOHANG) > 0) ;
}

int main() {
   ...

   /* register the handler */
   signal( SIGCHLD, handle_sigchld );

   ...
}

Real shells print information about exited background processes, but yours does not have to (unless the user types jobs). Your shell should still execute a blocking waitpid to wait on a process that is not to be executed in the background. When it does, it should ignore the error return code (indicating the process was already waited on) because it is possible that the signal handler waited on the child first.

Easy, right? Let's move on to pipes and redirection, then! (Ensure that your solution to this part of the project works properly before moving on.)

Interacting with processes and files

Next, you'll implement input/output redirection and optionally pipes.

Commands and redirections

Each command you will execute is the path to a program (e.g., /bin/mv), optionally followed by arguments and input/output redirection (/bin/rm -i). Redirections may appear before, after, or in between arguments; the following three commands are all equivalent:

cmd arg1 arg2 < infile
cmd < infile arg1 arg2
cmd arg1 < infile arg2

The command can not redirect input or output without specifying a file. (In the case of input redirection, the file must exist.) You should print the exact error messages as follows, and your code must not crash on the following examples.

prompt> /bin/ls <
Missing name for redirect.
prompt> /bin/ls >
Missing name for redirect.
prompt>

prompt> /bin/cat < bogus
bogus: No such file or directory
prompt>

If the output file does not exist you should create it:

prompt> /bin/ls newfile
/bin/ls: newfile: No such file or directory
prompt> /bin/ls > newfile
prompt> /bin/cat newfile
shell.c
prompt>

If the output file already exists you should not overwrite it:

prompt> /bin/ls > shell.c
shell.c: File exists
prompt>

To open (and optionally create) files you should use the open system call; you read its man page.

Input or output redirection should not be specified more than once for the same command:

prompt> /bin/cat < shell.c < shell.c
Ambiguous input redirect.
prompt>

You will need to use the dup2 system call and the STDOUT_FILENO and STDIN_FILENO constants to implement input and output redirection. For more information on these calls, consult the man pages and the Stevens book.

Pipes

Pipes are the oldest form of UNIX interprocess communication. They are half-duplex, meaning that data only flows in one direction. When using a shell, you will use the vertical bar operator (|, a "pipe") to indicate that two processes are connected by a pipe. The output of a command on the left side of a pipe becomes the input to the command on the right side of a pipe.

When you implement your shell, you will use the pipe system call to create a pipe. pipe takes an array of two integers and puts the file descriptor for reading from the pipe in the first element and the file descriptor for writing to the pipe in the second.

If a process forks a child (or children) after creating a pipe then both processes have copies of the pipe file descriptors. The producing process closes the 1st file descriptor and writes to the 2nd file descriptor. The consuming process closes the 2nd file second file descriptor and reads from the 1st file descriptor.

A command line may consist of several commands linked together by pipes. Note that if a program is on the left side of a pipe, it may not redirect its output to a file (and vice versa); if a program is on the right side of a pipe, it may not redirect its input from a file.

This code illustrates how your shell could set up a pipe between the two processes in the command line ls | grep foo.

Summary of Required Elements

From experience (as a user) using a command shell, you should be able to write a simple shell (Note: The pseudo code below uses the UNIX style fork/exec not the Window style)

 
  int main (int argc, char **argv)
     {
	   while( 1 ) 
       {
			int childPid;
			char * cmdLine;

	        printPrompt();

	        cmdLine= readCommandLine(); //or GNU readline("");
		
			cmd = parseCommand( cmdLine );

			record command in history list (GNU readline history ?)
 
			if ( isBuiltInCommand( cmd ))
			{
		    	executeBuiltInCommand( cmd );
			} 
			else 
			{		
		     	childPid = fork();
		     	if( childPid == 0 )
				{
					executeCommand( cmd ); //calls execvp  
		     	} 
				else 
				{
					if( isBackgroundJob( cmd ) )
					{
			        	record in list of background jobs
					} 
					else 
					{
						waitpid ( childPid );
					}		
		    	}
	        }
     }

Between this simple pseudo code and full featured shells, there are many optional features. Here are the features you should support in your shell (some of these were also described in the general description):

The prompt you print should indicate the current working directory. For example:
The directory: {/home/profs/maria}

It may also indicate other things like machine name or username or any other information you would like.

Try getcwd( char * buf, size_t size ) .
You should allow the user to specify commands either by relative or absolute pathnames. To read in the command line, you may want to consider the readline function from the GNU readline library as it supports user editing of the command line.

Try execvp it will search the path automatically for you. First argument should be pointer to command string and the second arguments should be a pointer to an array which contains the command string as arg[0] and the other arguments as arg[1] through arg[n].

You do not need to support setting of environment variables. However, you may find it useful to know about these variables especially PATH which is the list of directories in which to look for a specified executable. You may use execvp to have the system search the PATH inherited by your own shell from its parent.

You should be able to redirect STDIN and STDOUT for the new processes by using < and >. For example, foo < infile > outfile would create a new process to run the program foo and assign STDIN for the new process to infile and STDOUT for the new process to outfile. In many real shells it gets much more complicated than this (e.g., >> to append, > to overwrite, >& redirect STDERR and STDOUT, etc.)!

Note: one redirect in each direction is fine, not ls > foo < foo2

First open the file (use open or creat, open read only for infiles and creat writeable for outfiles ) and then use dup2. 0 is the file descriptor for STDIN and 1 is the file descriptor for STDOUT.

Examples:

dup2( fdFileYouOpened, fileno(stdin) )
dup2( fdFileYouOpened, fileno(stdout) )

You should be able to place commands in the background with an & at the end of the command line. You do not need to support moving processes between the foreground and the background (e.g.,, bg and fg). You also do not need to support putting built-in commands in the background.

Try waitpid( pid, status, options ).

You should maintain a history of commands previously issued. The number of previous commands recorded can be a compile time constant of at least 10. This is a FIFO list, you should start numbering them with 1 and then when you exhaust your buffer you should discard the lowest number *BUT* keep incrementing the number of the next item. For example, if storing 10 commands, when the 11th is issued, you would be recording commands 2 through 11.
A user should be able to repeat a previously issued command by typing !number where number indicates which command to repeat. !-1 would mean to repeat the last command. !1 would mean repeat the command numbered 1 in the list of command returned by history.

Note: You can probably think of better syntax for this, but I thought it was good to stay as close as possible to syntax used by real shells
A built-in command is one for which no new process is created but instead the functionality is build directly into the shell itself. You should support (implement) the following built-in commands: jobs, history, cd, exit and kill.
- jobs - provide a numbered list of processes currently executing in the background.
- cd - should change the working directory.
- history - should print the list of previously executed commands. The list of commands should include be numbered such that the numbers can be used with ! to indicate a command to repeat.
- exit - should terminate your shell process, kill %num should terminate the process numbered, num in the list of background processes returned by jobs (by sending it a SIGKILL signal).
  Note: Usually kill num refers to the process with ProcessId, num; while kill %num refers to the process in the jobs list with number, num. Try kill (pid, SIGKILL) .
- help - lists the available built-in commands and their syntax. (If you don't follow the syntax expected, then a help function would let the graders proceed anyway.)
If the user chooses to exit while there are background processes, notify the user that these background processes exist, do not exit and return to the command prompt. The user must kill the background processes before exiting.
You may assume that each item in the command string is separated on either side by at least on space (e.g., prog > outfile rather than prog>outfile).
You could support | , a pipe, between two processes. For example, foo | bar would send the STDOUT of foo to the STDIN of bar using a pipe. You may want to start by supporting pipes only between two processes before considering longer chains. Longer chains will probably require something like handle process n and then recursively handle the other n-1.

Optional Features

If you are enjoying this project and would like to add more advanced features to your shell, here are some suggestions:

You could support optional parameters to some of the built-in commands. For example, history -s num could set the size of the history buffer and history num could return the last num commands. You could also support additional built-in commands like which, pushd/popd or alias. If you make modifications of this type, I would recommend help command to return more detailed information on a single command.
You could implement more advanced I/O redirection as described above (>&, >!, etc.).
You could implement the built-in shell functions, fg and bg, to move processes between the background and the foreground.
You could support the editing of shell variables with built-in shell functions like printenv and setenv.
You could write support programming feautres for shell programming (e.g., if/then, while, for constructs) - note this will probably be a challeng.
Tab completion and command prompt editing. The GNU readline library makes this easy.
Up and down errors to scroll through the history list. The GNU history library makes easy.
You could relax the parsing constraints on the command line itself (e.g., correctly recognize the space free command prog>outfile).
You could also try porting it to yet another OS. (PalmOS?)
Repeat the execution of old commands from the shell history (! in csh).

Any advanced shell feature is likely to earn you some extra credit, but you should do it only if you've finished the required functions, are having fun and would like to learn more. In particular, we will *not* say how much extra credit each feature or sub feature may be worth.

Graduate Student Additional Requirement

You should chose to support 2 - 3 optional features listed above.

Other Requirements

It must run on atlas. You should develop it in your environment but as a last step make sure it runs on atlas.

Submitting:

You need to name the directory of your source code "project1/". You must to include a README.txt file describing how to run and compile your program. You also need to include a Makefile, you can find an example Makefile in the directory of the shell helper files directory accessible from the schedule web page.

Create a directory project1
Put all the materials needed in the above directory (including your README.txt file)
Submit via the 'submit' command (while on atlas.cs.uga.edu)

{atlas:maria} submit project1 csx730

What you need to submit:

project1/
Makefile
shell.c
parser.c
. (x-tra files if needed, must be listed in README.txt)
.
README.txt how you run/compile the program

Optional:

Send a mascot of your new shell, send it in JPG to the class mailing list.