The JavaTM Tutorial
Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Trail: Bonus
Lesson: Regular Expressions

String Literals

The most basic form of pattern matching supported by this API is the match of a string literal. For example, if the regular expression is foo, and the input string is foo, the match will succeed because these strings are identical. Try this out by creating a text file regex.txt and putting the text "foo" on lines 1 and 2. The first line is the regular expression, and the second line is the input string. Save this file in the same directory as RegexTestHarness.java (in a .java source file). When you run this code (type java RegexTestHarness on the command line), you should get the following output:
 
Current REGEX is: foo
Current INPUT is: foo
I found the text "foo" starting at index 0 and ending at index 3.
This match was a success. Note that while the input string is 3 characters long, the start index is 0 and the end index is 3. By convention, ranges are inclusive of the beginning index and exclusive of the end index, as shown in the following figure:

Each character in the string resides in its own cell, with the index positions pointing between each cell. The string "foo" starts at index 0 and ends at index 3, even though the characters themselves only occupy cells 0, 1, and 2.

With subsequent matches, you'll notice some overlap; the start index for the next match is the same as the end index of the previous match:

 
Current REGEX is: foo
Current INPUT is: foofoofoo
I found the text "foo" starting at index 0 and ending at index 3.
I found the text "foo" starting at index 3 and ending at index 6.
I found the text "foo" starting at index 6 and ending at index 9.

Metacharacters

This API also supports a number of special characters which can affect the way a pattern is matched. In your regex.txt file, change the regular expression to cat.. and the input string to cats. Here's the result:
Current REGEX is: cat.
Current INPUT is: cats
I found the text "cats" starting at index 0 and ending at index 4.
The match still succeeds, even though the period (.) is not present in the input string. It succeeds because the period is a metacharacter--a character with special meaning interpreted by the matcher. The metacharacter "." means "any character" which is why the match in our example succeeds.

The metacharacters supported by this API are: ([{\^$|)?*+.


Note: In certain situations the special characters listed above will not be treated as metacharacters. You'll encounter this as you learn more about how regular expressions are constructed. You can, however, use this list to check whether or not a specific character will ever be considered a metacharacter. For example, the characters ! @ and # never carry a special meaning.

There are two ways to force a metacharacter to be treated as an ordinary character:

When using this technique, the \Q and \E can be placed at any location within the expression, provided that the \Q comes first.


Previous Page Lesson Contents Next Page Start of Tutorial > Start of Trail > Start of Lesson Search
Feedback Form

Copyright 1995-2005 Sun Microsystems, Inc. All rights reserved.