CSCI 4500/6500 Programming Languages

Project 4: Python Clarify

First Program in Python [or Ruby if you prefer]

 

Assigned: Friday March 24, 2006
Due: Friday April 07, 2006

Collaboration Policy - Read Carefully

You must work on this project individually, but you may discuss this assignment with other students in the class and ask and provide help in useful ways, preferable over our email list so we can all benefit from your great ideas. You may consult any outside resources you wish including books, papers, web sites and people (but no penguins or sea urchins).

If you use resources other than the class materials, indicate what you used along with your answer.

Objective:

The main objective for this assignment is to familiarize yourself with a scripting language, in particular Python. You may use the python that is already installed on atlas or you may install your own. I suggest that you install your own. I suggest that you also use IDLE, the interactive development environment, to develop your python programs that was shown in class.

 

Runs on Microsoft Windows, MacOS X (yay!), UNIX.

Tutorials:

Some tutorials on Python:

"Dive into Python" book (available in pdf and html)

"Python 101" (nice quick introduction)

Official Python tutorial:

The quick Python reference:

 

Description:

Your assignment is to create a program, written in python, called clarify. Clarify filters out successive identical lines from a file (or standard input) and writes the results to standard output. Here is the synopsis of the command that you need to implement:

clarify [ -c ] [ -d | -u ] [ -i ] [ -f fields ] [ -s char ] [ input_file ]

 

Order of switches (except the optional input_file) does not matter.

You need to implement the below options:

-c prefix lines by number of occurrences
-d only print duplicate (or more) lines
-u suppress writing successive lines that are repeated in the input
-f fields number of fields to skip over before checking for uniqueness
-s chars number of characters to skip over before checking for uniqueness. If you use both the field (-f) and character skipping (-s) options, fields are skipped over first.
-i ignore case

** A field is a series of nonwhite space characters.

-d and -u together should give you an error message.

Example session:

{saffron:ingrid:221} cat test.txt
hello this is a test.
hello this is a test.
hello this IS a test.
hello this is a test.
hello this is a test.
hello this is a test.
hello this not a test.
hi over there a test.
{saffron:ingrid:222} clarify test.txt
hello this is a test.
hello this IS a test.
hello this is a test.
hello this not a test.
hi over there a test.
{saffron:ingrid:223} clarify -c test.txt
2 hello this is a test.
1 hello this IS a test.
3 hello this is a test.
1 hello this not a test.
1 hi over there a test.
{saffron:ingrid:224} clarify -d test.txt
hello this is a test.
hello this is a test.
{saffron:ingrid:225} clarify -u test.txt
hello this IS a test.
hello this not a test.
hi over there a test.
{saffron:ingrid:226} clarify -f 3 test.txt
hello this is a test.
{saffron:ingrid:227} clarify -c -f 3 test.txt
8 hello this is a test


For option -f. Consider a file called fruits.txt that consists of two lines:

orange banana apple orange.
grape pear apple orange.

{saffron:ingrid:225} clarify -f 2 fruits.txt

first line: clarify skips the first 2 fields "orange" and "banana". So to determine uniqueness it considers only "apple orange".

second line: clarify skips the first 2 fields "grape" and "pear" so it also only considers only "apple orange" to determine uniqueness.

So here (above) clarify deems that two lines are duplicates. For the same file, if -f 1, then,

clarify -f 1 fruits.txt

deems that the lines are different.

For -s, consider the file maria.txt:

123maria
234maria

clarify -c -s 3 maria.txt
2 123maria

Above, the lines are considered duplicates, here it skips over the first 3 character to determine uniqueness, the prefix of 2, is a consequence of the -c option (and indicates the number of lines that it occurs).

 

Requirements:

clarify should be en executable "she banged!" script. It should run on atlas. You should develop it in your environment and as a last step, make sure it runs on atlas.

Since the version of atlas is somewhat old we will also use Window's XP version 2.4.2 to grade your project. Please indicate your preference: i.e you prefer your program is tested on atlas or Windows XP in your README.txt file.

You may use pre-defined bultin modules in Python, you may not invoke UNIX commands from python (such as cat).

 

Submitting:

Submitting

  1. Create a directory x500_program5
  2. Put all the materials needed (all py files) in the above directory (including your README.txt file)
  3. Submit via the 'submit' command (while on atlas.cs.uga.edu)
{atlas:maria} submit x500_program5