------------------------------------------------------------------------
BASIC REGULAR EXPRESSIONS
------------------------------------------------------------------------
Anchors:
  Anchors are location markers and do not correspond to a character
  ^ ... beginning of line 
  $ ... end of line 
However, note that ^ and $ lose their special
status when used at locations other than the beginning or
end of line, respectively.

  ^A .. line beginning with A
  A$ .. line ending with A

  ^$ .. line containing no characters

  A^ .. line containing "A^" at any location 
  $A .. line contaning "$A" at any location
  ^^ .. line starting with "^"
  $$ .. line ending with "$"
  $^ .. line contianing "$^" at any location


Character Set (search for one occurence of a specificed character):
   a .. character "a" and so on
   0 .. character "0" and so on

When characters are enclosed by square brackets then interesting
possibilities open  up!

   Class: 
   [0a] .. match either 0 or a

   Negation:
   	[^a] ... any character but "a"
   	[^0-9] .. any character but a digit
   	[^>] .. any character but ">"

   Range:
   [0-9] ...match one of ten digits
   [a-z] ...match a lower case English alphabet
   [A-Z] ...match an upper case English alphabet
   [a-zA-Z] ... match an English alphabet (regardless of case)
   [0-9a-zA-Z] ... match alphanumeric
   Note: The basis of the range ("-") is increasing ASCII value 
   (and not something deeper).

In order to be independent of the ASCII value (especially
important given that modern computer languages allow for
a rich set of characters) POSIX standards have been
defined. An incomplete listing can be found below:

	[:alnum:] printable characters (includes whitespace)
	[:alpha:] alphabetic characters [a-zA-Z]
	[:digit:] digits
	[:print:] printable characters
	[:punct:] punctuation
	[:blank:] space and tab
	[:cntrl:] control characters, [\x00-\x1F\x7F]

Confusingly to use these sets you need to include an additional set of square brackets
e.g.
$ grep '[[:digit:]]' infile  #identify lines with digits 

  
Special characters: 
   *    zero or more of previous character
   .    any one character 
   \  .. back-slash 
   \{ \} see below (special meaning)
   \( \) see below (special meaning)
   ^M .. EOL character     (specific use) 
   \n .. newline character (specific use)

   \{q\} .. q occurrences of previous character
   \{q,\} .. at least q previous characters
   \{p,q\} .. p through q of the previous characters

Backreferencing:
   & ... matched phrase
   \( \) captures a token
   \1    refers to the first captured token 
   Note: Capturing tokens usually requires great specifity and
   will only come with some experience (or trial and error).


Notes (specific to old-style or pre-GNU tools)
Control characters are specified as "cntrl V and then the character"
e.g.
EOL = 
Tab = controlV and then tab

------------------------------------------------------------------------
Regular Expressions Are Tricky: Homework1
------------------------------------------------------------------------

Rule 1: The longest match is returned. Regular expression engine
does not stop as soon as a match is found. The entire line is searched
for the longest match.

Rule 2: A null match is consider to be a valid match. [Remember this
rule when using "*"]. 

In order to appreciate these two rules study the following examples:

$echo "abc,def,ghi,jkl,mno" | gsed 's/,.*,/1/'
abc1mo

$echo "abc" | gsed 's/a*/1/'
1abc

$echo "abc" | gsed 's/a*/2'
a1c

$echo "abc" | gsed 's/b*/1/g'
1a1c1

$echo "abc" | gsed -E 's/b+/1/'       #this is egrep and uses ERE & not BRE
a1c

$echo "abc" | gsed -E 's/b+/1/g'      #caution: egrep
a1c
------------------------------------------------------------------------
Homework 2: square brackets 
------------------------------------------------------------------------
Rule 3: In square bracket all meta characters lose their meaning

$echo "abc" | gsed 's/[$]//
abc
$ echo "abc" | gsed 's/[*]//'
abc
$ echo "abc" | gsed 's[\]/-/'
abc
$ echo "abc" | gsed 's/[^]//        #Do you understand why this is not allowed?
gsed: -e expression #1, char 7: unterminated `s' command

$echo "abc]" | gsed -n '/[]xy]/p'   
abc]

$echo "abc]" | gsed -n '/[xy]]/p'   #No match. No output. Why?

$echo "abc-]" | gsed -n '/[-xy]/p'
abc-]

$echo "abc-]" | gsed -n '/[x-y]/p'  #No match. No output. Why?

$echo  "[abc-]" | gsed -n '/[[x-y]/p'
[abc-]

$echo "[abc-]" | gsed -n '/[x[-y]/p' 
[abc-]