slanted W3C logo

Day 15 — Strings III

In today's lecture we finally find out what String[] args means and we study regular expressions.

Comparators: equals

equals returns true if two string instances have the same state (represent the same sequence of text), and false otherwise.

String s = "ABC";
String t = s;
String u = "abc";

output.println("s same as t: " + s.equals(t));
output.println("s same as u: " + s.equals(u));

The code fragment prints:

s same as t: true
s same as u: false

Comparators: equalsIgnoreCase

If you want to check if two strings represent the same text and you do not care about the case of strings, use equalsIgnoreCase:

String s = "ABC";
String t = s;
String u = "abc";

output.println("s same as t ignoring case: " +
               s.equalsIgnoreCase(t));
output.println("s same as u ignoring case: " +
               s.equalsIgnoreCase(u));

The code fragment prints:

s same as t: true
s same as u: true

Comparators: compareTo

The compareTo method tends to be defined in classes that represent objects that can be put into some sort of order. Examples:

Boolean
Byte
Character
java.util.Date
Double
Float
Integer
Long
Short
String
type.lib.Fraction
type.lib.MixedNumber
type.lib.Money

Comparators: compareTo

compareTo will compare two strings by lexicographical (dictionary) order. Suppose you have two string variables named s and t:

s.compareTo(t) returns less than zero if the String s precedes the String t
s.compareTo(t) returns zero if s.equals(t) is true
s.compareTo(t) returns greater than zero if the String s follows the String t

The actual value returned by compareTo has a specific meaning; see the API if you are curious.

Example Using compareTo

Based on last year's midterm labtest: Repeatedly read in a person's name (first name followed by middle names followed by family name all separated by spaces); if the family name starts with a letter from A-M output "A-M", otherwise output "N-Z".

final String FAMILY_NAME_SEP = "N";

while (input.hasNext())
{
   String name = input.nextLine();

   int lastSpace = name.lastIndexOf(' ');
   String lastName = name.substring(lastSpace + 1);

   if (lastName.compareTo(FAMILY_NAME_SEP) < 0)
   {
      output.println("A-M");
   }
   else
   {
      output.println("N-Z");
   }
}

Note that this solution is case sensitive.

More Comparators

The String class has a large API with several other comparator methods. See the API for these other methods:

int     compareToIgnoreCase(String str)
boolean endsWith(String suffix)
int     lastIndexOf(int ch)
int     lastIndexOf(int ch, int fromIndex)
int     lastIndexOf(String str)
int     lastIndexOf(String str, int fromIndex)
boolean startsWith(String prefix)

Transformers

A transformer method returns a reference to a new string that is computed by transforming the characters of an existing string.

The original String remains unchanged (because of immutability).

Transformer: trim

String trim()

Returns a copy of the string, with leading and trailing whitespace omitted.

String s = "   hello   ";
String t = "hello   ";
String u = "hello";

String v = s.trim();
String w = t.trim();
String x = u.trim();

output.println(":" + s + ":" + " *" + v + "*");
output.println(":" + t + ":" + " *" + w + "*");
output.println(":" + u + ":" + " *" + x + "*");

The above code fragment prints:

:   hello   :
*hello*

:hello   :
*hello*

:hello:
*hello*

Transformers: toUpperCase and toLowerCase

The methods toUpperCase and toLowerCase each return a copy of an existing string with the characters converted to upper and lower case.

String s = "aBcDeFgHiJ";

String upper = s.toUpperCase();
String lower = s.toLowerCase();

output.println(s);
output.println(upper);
output.println(lower);

The above code fragment prints:

aBcDeFgHiJ
ABCDEFGHIJ
abcdefghij

Transformers: replace

String replace(char oldChar, char newChar)

Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.

String s = "sparring with a purple porpoise";

String t = s.replace('p', 't');

The above code fragment prints:

starring with a turtle tortoise

Transformers: replace

String
replace(CharSequence target, CharSequence replacement)


Returns a new string resulting from replacing all occurrences of the substring target in this string with replacement. The replacement proceeds from the beginning of the string to the end. Throws NullPointerException if target or replacement is null.

String s = "hiho, hiho, it's off to work we go";

String t = s.replace("hiho", "ohno");
output.println(s);
output.println(t);

The above code fragment prints:

hiho, hiho, it's off to work we go
ohno, ohno, it's off to work we go

Transformers: replace

It is important to remember that the replacement proceeds from the beginning of the string to the end.

String s = "aaaaaa";
      
String t = s.replace("aa", "b");
output.println(s);
output.println(t);

The above code fragment prints:

aaaaaa
bbb

main(String[] args)

An array is a container object that holds a fixed number of values of a single type.

Every Java main method has an array of String references named args. The String objects are created from the command-line arguments given when the program was run.

Command Line Arguments

Consider the following Java program

import java.io.PrintStream;

public class CommandLineArg
{
   public static void main(String[] args)
   {
      PrintStream output = System.out;

      output.print("Number of command line arguments: ");
      output.println(args.length);
   }
}

If you run the program by typing

java CommandLineArg one two three

it will print

Number of command line arguments: 3

Note that the attribute length is declared as public final int.

Indexing into an Array

You can retrieve a value from an array by specifying an index into the array:

String first  = args[0];               // 1st element
String second = args[1];               // 2nd element
String last   = args[args.length - 1]; // last element

An exception will occur if you use an index that is equal to or greater than the length of the array.

Looping over an Array

import java.io.PrintStream;

public class CommandLineArg
{
   public static void main(String[] args)
   {
      PrintStream output = System.out;

      output.print("Number of command line arguments: ");
      output.println(args.length);

      for (int i = 0; i < args.length; i++)
      {
         output.print(i + " : ");
         output.println(args[i]);
      }
   }
}

matches

boolean matches(String regex)

Tells whether or not this string matches the given regular expression.

The matches method lets you determine if a String matches a pattern of characters. The pattern is specified as a regular expression.

output.println(s.matches("[+-]?[0-9]+"));

prints true for any String referenced by s that is an integer value.

The String literal "[+-]?[0-9]+" is a regular expression for integer numbers (that might have a sign, and have no spaces or non-digit characters).

Regular Expressions

In Java, a regular expression (or regex) is a String that describes a pattern of characters in a concise unambiguous fashion. Regexes are typically used for pattern matching. Some examples are determining if a string:

The term regular expression means something else in formal language theory (where the term was invented) which you will learn about in CSE2001: Introduction to Theory of Computation.

A Regex Tester

import java.util.Scanner;
import java.io.PrintStream;

public class RegEx
{
   public static void main(String[] args)
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);

      if (args.length > 0)
      {
         String regex = args[0];
         for (int i = 1; i < args.length; i++)
         {
            output.printf("\"%s\" matches \"%s\" : %b%n",
                          args[i],
                          regex,
                          args[i].matches(regex));
         }
      }
   }
}

See http://download.oracle.com/javase/tutorial/essential/regex/test_harness.html for a more general tester.

The Simplest Regular Expression

The simplest non-empty regular expression is a String containing a single character where the character is not one of ([{\^-$|]})?*+.

String regex = "k";

The only String matching regex is "k".

The characters ([{\^-$|]})?*+. are called metacharacters, and are used to construct more sophisticated regular expressions.

Simple Regular Expressions

A String with no metacharacters matches only itself.

String regex = "Java";
boolean match = s.matches(regex);

is effectively equivalent to

boolean match = s.equals("Java");

Simple Regular Expressions

For regular expressions, it may help to think of the regular expression "Java" as being

  1. a 'J'
  2. followed by an 'a'
  3. followed by a 'v'
  4. followed by an 'a'

Simple Character Classes

Characters inside a pair of of square brackets define a character class (a set of characters).

String regex = "[abc]";

The only Strings matching regex are "a", "b", and "c".

You can think of this regular expression as being

  1. an "a" OR "b" OR "c"

Negation

You can define a character class based on non-matching characters.

String regex = "[^abc]";

The Strings matching regex are every single character String that is not "a", "b", or "c".

You can think of this regular expression as being

  1. not an "a" NOR "b" NOR "c"

Ranges

You can define a character class based on a contiguous range of characters.

String regex = "[a-m]";

The Strings matching regex are every single character String that is an "a" through "m", inclusive.

You can think of this regular expression as being

  1. an "a" OR "b" ... OR "m"

String regex = "[0-9]";

matches any String that is a single digit.

Unions

You can define a character class based on the union of two or more character classes.

String regex = "[a-z[A-Z]]";

The Strings matching regex are every single character String that is an English letter.


String regex = "[a-zA-Z0-9]";

matches any String that is an English letter or a single digit.

Example 1

Give a regular expression that matches a signed integer digit.

  1. a plus or minus sign
  2. followed by a digit
String regex = "[+-][0-9]";

You can think of this regular expression as being

  1. a "+" OR "-"
  2. followed by a digit

Example 2

Give a regular expression that matches every 2-letter English word starting with i.

  1. an "i"
  2. followed by a "d" OR "f" OR "n" OR "s" OR "t"
String regex = "i[dfnst]";

Does this scale to 3-letter words?

Predefined Character Classes

Predefined Character Classes
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

Quantifiers: Once or not at all

? after a character class means once or not at all.

String regex = "[+-]?\\d";

will match any signed or unsigned integer digit.

You can think of this regular expression as being

  1. a "+" OR "-" once or not at all
  2. followed by a digit

Quantifiers: Zero or more

* after a character class means zero or more.

String regex = "a*";

You can think of this regular expression as being

  1. an "a" zero or more times

will match the empty String and every String made up only of the letter a.


String regex = ".*";

will match any String (except possibly those containing line terminators; see java.util.regex.Pattern if you are interested in the details).


Quantifiers: One or more

+ after a character class means one or more.

String regex = "[+-]?[0-9]+";

will match any signed or unsigned integer number.

You can think of this regular expression as being

  1. a "+" OR "-" once or not at all
  2. followed by a digit one or more times

Example 3

Write a regular expression that matches a valid Java local variable name and follows our style convention (this is how the style checker checks variable names).

  1. a lowercase letter
  2. followed by zero or more letters/numbers
String regex = "[a-z][a-zA-Z0-9]*";

Challenge

Write a regular expression that matches a valid final local variable name and follows our style convention (this is how the style checker checks names of constants).

You will need to use a capture group (see http://download.oracle.com/javase/tutorial/essential/regex/groups.html).