All Packages Class Hierarchy This Package Previous Next Index
Class java.text.Collation
java.lang.Object
|
+----java.text.Collation
- public class Collation
- extends Object
- implements Cloneable, Serializable
The Collation class is an abstract class which provides Unicode text
comparison services. Text collation supports language-sensitive
comparison of strings, allowing for text searching and alphabetical
sorting. The collation classes provide a choice of ordering
strength (for example, to ignore or not ignore case differences) and
handle ignored, expanding, and contracting characters.
Developers don't need to know anything about the collation rules for various
languages. Any features requiring collation can use the collation object
associated with the current default locale, or with a specific locale
(like France or Japan) if appropriate.
- Basic Collation: Correctly sorting strings is
tricky, even in English. The results of a sort must be consistent
& emdash; any differences in strings must always be sorted the same
way. The sort assigns relative priorities to different features of
the text, based on the characters themselves and on the current
ordering strength of the collation object. Correct comparison and
sorting of natural languages requires the following:
- Ordering priorities: The first primary difference will
determine the resultant order. No matter what the other
characters are. For example, "cat" < "dog". Some languages
require primary, secondary, and tertiary ordering. For example,
in Czech, case differences are a tertiary difference (A vs. a),
accent differences are a secondary differece (e vs. ê) and
different base letters are a primary difference (d vs. e).
- Group characters: In collating some languages, a sequence of
characters is treated as though it was a single letter of the
alphabet. For example, "cx" < "chx" < "dx".
- Expanding characters: In some languages, a single character
is treated as though it was a sequence of letters of the
alphabet. For example, "aex" < "æx" < "aexx".
- Ignored characters: Certain characters are ignored when
collating. That is, they are not significant unless there are
other differences in the remainder of the string. For example,
"blackbird" < "black-bird" < "blackbirds"
- Localizable Collation: Different collation objects
associated with various locales handle the differences required
when sorting text strings for different languages.
- Customization: You can produce a new collation by
adding to or changing an existing one.
Because compare()'s algorithm is complex, it is faster to sort long lists
of words by retrieving sort keys or collation keys with getSortKey() or
getCollationKey() respectively. You can then cache the sort keys or
collation keys and compare them using either SortKey.compareTo() or
CollationKey.compareTo(). The following is a list of differences between
sort key and collation key:
- Sort Key : limited ignorable and accent characters
- bit-ordered (so you can do bit-wise comparison on sort keys)
- must use SortKey.compareTo to compare sort keys
- cannot be concatenated
- faster than collation key and direct compare algorithm
- Collation Key : unlimited ignorable and accent characters
- not bit-ordered (cannot do bit-wise comparison on collation
keys)
- must use CollationKey.compareTo to do comparision between
collation keys
- can be concatenated
- faster than compare algorithm but slower than sort key
comparison
Collation subclasses implement different collation rules for different
languages and different applications. (phone book, dictionary, etc.)
Use collation strength parameters, PRIMARY, SECONDARY, TERTIARY, and
IDENTICAL to specify the comparison level.
Each unicode character is assigned ordering priority: primary, secondary,
tertiary and no difference.
Decomposition mode determines how composed characters are handled for
Unicode.
- No Decomposition: With no decomposition, accented characters will
not be sorted correctly; this should only be used if the source
text is guaranteed to have no accented characters.
- Canonical Decomposition : Characters that are canonical variants
according to Unicode 2.0 are decomposed in collation if canonical
decomposition mode is set. This is the default, and is required
for proper collation of accented characters.
- Full Decomposition : With full decomposition, both canonical
variants and compatibility variants are decomposed. This causes
not only accented characters to be sorted, but also characters that
have special formats to be sorted with their norminal form. For
example, the half-width and full-width ASCII and Katakana characters
are then sorted properly.
LESS, EQUAL, GREATER identifies the result
of unicode text strings comparison.
Use the static method Collation.getDefault() to instantiate the class
by passing the desired locale as the argument.
Example of use:
// Compare two strings in the default locale
Collation myCollation = Collation.getDefault();
byte result = myCollation.compare("abc", "ABC");
Another example:
// compare two strings in French
Collation myCollation = Collation.getDefault(Locale.FRANCE);
byte result = myCollation.compare("abc", "ABC");
The following example demonstrates different ways of comparing two
strings:
String a = "abcdefgh", b = "ijklmnop";
// This comparision is not as fast as sort key
if (myCollation.compare(a, b) == Collation.LESS) { // ... }
// Limited accents, but faster than compare
SortKey aKey = myCollation.getSortKey(a);
SortKey bKey = myCollation.getSortKey(b);
if (aKey.compareTo(bKey) == Collation.LESS)
{ // ... }
// Unlimited accents, faster than compare
CollationKey aaKey = myCollation.getCollationKey(a);
CollationKey bbKey = myCollation.getCollationKey(b);
if (aaKey.compareTo(bbKey) == Collation.LESS)
{ // ... }
NOTE: Two sort keys from different collations cannot be
compared.
To combine collations from two locales,
// Create an en_US collation object
Collation en_USCollation =
Collation.getDefault(new Locale("EN", "US", ""));
// Create a da_DK collation object
Collation da_DKCollation =
Collation.getDefault(new Locale("DA", "DK", ""));
// Combine the two
// First, get the collation rules from en_USCollation
String en_USRules = en_USCollation.getRules();
// Second, get the collation rules from da_DKCollation
String da_DKRules = da_DKCollation.getRules();
TableCollation newCollation =
new TableCollation(en_USRules + da_DKRules);
// newCollation has the combined rules
Another more interesting example would be to make changes on an existing
table to create a new collation object. For example, add
"& C < ch, cH, Ch, CH" to the en_USCollation object to create your own
English collation object,
// Create a new collation object with additional rules
String addRules = "& C < ch, cH, Ch, CH";
TableCollation myCollation =
new TableCollation(en_USCollation + addRules);
// myCollation contains the new rules
- See Also:
- TableCollation, SortKey, CollationKey, Locale
-
CANONICAL_DECOMPOSITION
- Characters that are canonical variants according to Unicode 2.0 will be
decomposed for sorting.
-
EQUAL
- EQUAL is returned if source string is compared to be equal to target
string in the compare() method.
-
FULL_DECOMPOSITION
- Both canonical variants and compatibility variants be decomposed for
sorting.
-
GREATER
- GREATER is returned if source string is compared to be greater than
target string in the compare() method.
-
IDENTICAL
- Two characters are considered "identical" when they are equivalent
unicode spellings.
-
LESS
- LESS is returned if source string is compared to be less than target
string in the compare() method.
-
NO_DECOMPOSITION
- Accented characters will not be decomposed for sorting.
-
PRIMARY
- Base letter represents a primary difference.
-
SECONDARY
- Diacritical differences on the same base letter represent a secondary
difference.
-
TERTIARY
- Uppercase and lowercase versions of the same character represents a
tertiary difference.
-
Collation()
- Default constructor of the collation object.
-
clone()
- Overrides Cloneable
-
compare(String, int, int, String, int, int)
- This comparison function compares character data stored in the
specified regions of two different strings.
-
compare(String, String)
- The comparison function compares the character data stored in two
different strings.
-
equals(Object)
- Compares the equality of two collation objects.
-
equals(String, String)
- Convenience method for comparing the equality of two strings based on
the collation rules.
-
getAvailableLocales()
- Get the set of Locales for which Collations are installed
-
getCollationKey(String)
- Transforms the string into a series of characters that can be compared
with java.text.CollationKey.compareTo().
-
getCollationKey(String, int, int)
- Transforms the string into a series of characters that can be compared
with java.text.CollationKey.compareTo().
-
getDecomposition()
- Get the decomposition mode of the collation object.
-
getDefault()
- Gets the table-based collation object for the current default locale.
-
getDefault(Locale)
- Gets the table-based collation object for the desired locale.
-
getDisplayName(Locale)
- Get name of the object for the desired Locale, in the langauge of the
default locale.
-
getDisplayName(Locale, Locale)
- Get name of the object for the desired Locale, in the desired langauge
-
getSortKey(String)
- Transforms the string into a series of characters that can be compared
with SortKey.compareTo().
-
getSortKey(String, int, int)
- Transforms a specified region of the string into a series of chars
that can be compared with SortKey.compareTo.
-
getStrength()
- Determines the minimum strength that will be use in comparison or
transformation.
-
greater(String, String)
- Convenience method for comparing two strings based on the collation
rules.
-
greaterOrEqual(String, String)
- Convenience method for comparing two strings based on the collation
rules.
-
hashCode()
- Generates the hash code for the collation object
-
setDecomposition(byte)
- Set the decomposition mode of the collation object.
-
setStrength(byte)
- Sets the minimum strength to be used in comparison or transformation.
PRIMARY
public final static byte PRIMARY
- Base letter represents a primary difference. Set comparison
level to PRIMARY to ignore secondary and tertiary differences.
Use this to set the strength of a collation object.
Example of primary difference, "abc" < "abd"
- See Also:
- setStrength, getStrength
SECONDARY
public final static byte SECONDARY
- Diacritical differences on the same base letter represent a secondary
difference. Set comparison level to SECONDARY to ignore tertiary
differences. Use this to set the strength of a collation object.
Example of secondary difference, "ä" >> "a".
- See Also:
- setStrength, getStrength
TERTIARY
public final static byte TERTIARY
- Uppercase and lowercase versions of the same character represents a
tertiary difference. Set comparison level to TERTIARY to include
all comparison differences. Use this to set the strength of a collation
object.
Example of tertiary difference, "abc" <<< "ABC".
- See Also:
- setStrength, getStrength
IDENTICAL
public final static byte IDENTICAL
- Two characters are considered "identical" when they are equivalent
unicode spellings.
For example, "ä" == "a?".
LESS
public final static byte LESS
- LESS is returned if source string is compared to be less than target
string in the compare() method.
- See Also:
- compare
EQUAL
public final static byte EQUAL
- EQUAL is returned if source string is compared to be equal to target
string in the compare() method.
- See Also:
- compare
GREATER
public final static byte GREATER
- GREATER is returned if source string is compared to be greater than
target string in the compare() method.
- See Also:
- compare
NO_DECOMPOSITION
public final static byte NO_DECOMPOSITION
- Accented characters will not be decomposed for sorting. Please see
class description for more details.
CANONICAL_DECOMPOSITION
public final static byte CANONICAL_DECOMPOSITION
- Characters that are canonical variants according to Unicode 2.0 will be
decomposed for sorting. Use this to set the decomposition mode in a
collation object. Please see class description for more details.
FULL_DECOMPOSITION
public final static byte FULL_DECOMPOSITION
- Both canonical variants and compatibility variants be decomposed for
sorting. Use this to set the decomposition mode in a
collation object. Please see class description for more details.
Collation
protected Collation()
- Default constructor of the collation object. This constructor is made
protected so subclasses can get access to it.
getDefault
public static synchronized Collation getDefault()
- Gets the table-based collation object for the current default locale.
The default locale is determined by java.util.Locale.getDefault.
- Returns:
- the collation object of the default locale.(for example, EN_US)
- See Also:
- getDefault
getDefault
public static synchronized Collation getDefault(Locale desiredLocale)
- Gets the table-based collation object for the desired locale. The
resource of the desired locale will be loaded by
java.util.ResourceBundle. Locale.ENGLISH is the base collation table
and all other languages are built on top of it with additional
language-specific modifications.
- Parameters:
- desiredLocale - the desired locale to create the collation table
with.
- Returns:
- returns the created table-based collation object based on
the desired locale.
- See Also:
- Locale, ResourceBundle
compare
public abstract byte compare(String source,
String target)
- The comparison function compares the character data stored in two
different strings. Returns information about whether a string
is less than, greater than or equal to another string.
Example of use:
Collation myCollation = Collation.getDefault(Locale.US);
myCollation.setStrength(Collation.PRIMARY);
// result would be Collation.EQUAL ("abc" == "ABC")
// (no primary difference between "abc" and "ABC")
byte result = myCollation.compare("abc", "ABC");
myCollation.setStrength(Collation.TERTIARY);
// result would be Collation.LESS (abc" <<< "ABC")
// (with tertiary difference between "abc" and "ABC")
byte result = myCollation.compare("abc", "ABC");
- Parameters:
- source - the source string to be compared with.
- target - the string that is to be compared with the source string.
- Returns:
- Returns a byte value. GREATER if source is greater
than target; EQUAL if source is equal to target; LESS if source is less
than target.
compare
public abstract byte compare(String source,
int start,
int end,
String target,
int targetStart,
int targetEnd)
- This comparison function compares character data stored in the
specified regions of two different strings. Returns information
about whether a string is less than, greater than or equal to another
string.
If the given starting offset is greater than the ending offset, the
specified region of a string is the substring between the two offsets.
This is consistent with the way java.lang.String performs when
encounters the same situation.
(Starting and ending offsets must valid offsets of the string.)
- Parameters:
- source - the source string to be compared with.
- target - the string that is to be compared with the source string.
- start - the starting offset (inclusive) of the range of the
source string to be compared with.
- end - the ending offset (exclusive) of the range of the source
string to be compared with.
- targetStart - the starting offset (inclusive) of the range of the
target string to be compared with.
- targetEnd - the ending offset (exclusive) of the range of the
target string to be compared with.
- Returns:
- Returns a byte value. GREATER if source is greater
than target; EQUAL if source is equal to target; LESS if source is less
than target.
- Throws: StringIndexOutOfBoundsException
- If the starting offset or
the ending offset is out of the range of the string.
- See Also:
- String
getSortKey
public abstract SortKey getSortKey(String source)
- Transforms the string into a series of characters that can be compared
with SortKey.compareTo(). It is not possible to restore the original
string from the chars in the sort key, and they should not be used as
text. The generated sort key handles only a limited number of ignorable
characters.
Use SortKey.equals() or SortKey.compareTo to compare the
generated sort key string.
Example of use:
Collation myCollation = Collation.getDefault(Locale.US);
myCollation.setStrength(Collation.PRIMARY);
SortKey sortKey1 = myCollation.getSortKey("abc");
SortKey sortKey2 = myCollation.getSortKey("ABC");
// Use SortKey.compareTo() to compare the sort keys
// result would be Collation.EQUAL (sortKey1 == sortKey2)
byte result = sortKey1.compareTo(sortKey2);
myCollation.setStrength(Collation.TERTIARY);
sortKey1 = myCollation.getSortKey("abc");
sortKey2 = myCollation.getSortKey("ABC");
// Use SortKey.compareTo() to compare the sort keys
// result would be Collation.LESS (sortKey1 < sortKey2)
result = sortKey1.compareTo(sortKey2);
If the source string is null, a null sort key will be returned.
- Parameters:
- source - the source string to be transformed into a sort key.
- Returns:
- the sort key of the string based on the collation rules.
- See Also:
- compareTo
getSortKey
public abstract SortKey getSortKey(String source,
int start,
int end)
- Transforms a specified region of the string into a series of chars
that can be compared with SortKey.compareTo. It is not possible to
restore the original string from the characters in the sort key, and
they should not be used as text. If an indefinite number of ignorable
character can occur, use getCollationKey.
If the given starting offset is greater than the ending offset, the
specified region of a string is the substring between the two offsets.
This is consistent with the way java.lang.String performs when
encounters the same situation.
(Starting and ending offsets must valid offsets of the string.)
- Parameters:
- source - the source string to be transformed into a sort key.
- start - the starting offset (inclusive) of the range of the source
string to be transformed with.
- end - the ending offset (exclusive) of the range of the source
string to be transformed with.
- Returns:
- the transformed string.
- Throws: StringIndexOutOfBoundsException
- If the starting offset or
the ending offset is out of the range of the string.
- See Also:
- SortKey, compareTo
getCollationKey
public abstract CollationKey getCollationKey(String source)
- Transforms the string into a series of characters that can be compared
with java.text.CollationKey.compareTo(). Handles indefinite number
of ignorable characters. Collation keys must be compared with
CollationKey.compareTo().
Example of use:
Collation myCollation = Collation.getDefault(Locale.US);
CollationKey key1 = myCollation.getCollationKey("abc");
CollationKey key2 = myCollation.getCollationKey("ABC");
// Use java.text.CollationKey.compareTo() to compare the collation
// keys. Result will be Collation.LESS (key1 < key2)
byte result = key1.compareTo(key2);
If the source string is null, a null collation key will be
returned.
- Parameters:
- source - the source string to be used for creating the full sort
key.
- Returns:
- the collation key.
- See Also:
- CollationKey, getSortKey, compareTo, compareTo
getCollationKey
public abstract CollationKey getCollationKey(String source,
int start,
int end)
- Transforms the string into a series of characters that can be compared
with java.text.CollationKey.compareTo(). Handles indefinite number
of ignorable characters of the specified range of the source string.
If the given starting offset is greater than the ending offset, the
specified region of a string is the substring between the two offsets.
This is consistent with the way java.lang.String performs when
encounters the same situation.
(Starting and ending offsets must valid offsets of the string.)
- Parameters:
- source - the source string to be used for creating the collation
key
- start - the starting offset (inclusive) of the range of the source
string to be used to generate the collation key
- start - the ending offset (exclusive) of the range of the source
string to be used to generate the collation key
- Returns:
- the collation key
- Throws: StringIndexOutOfBoundsException
- If the starting offset or
the ending offset is out of the range of the string.
- See Also:
- CollationKey, getSortKey, compareTo, compareTo
equals
public boolean equals(String source,
String target)
- Convenience method for comparing the equality of two strings based on
the collation rules.
- Parameters:
- source - the source string to be compared with.
- target - the target string to be compared with.
- Returns:
- true if the strings are equal according to the collation
rules. false, otherwise.
- See Also:
- compare
greater
public boolean greater(String source,
String target)
- Convenience method for comparing two strings based on the collation
rules.
- Parameters:
- source - the source string to be compared with.
- target - the target string to be compared with.
- Returns:
- true if the first string is greater than the parameter, str,
according to the collation rules. false, otherwise.
- See Also:
- compare
greaterOrEqual
public boolean greaterOrEqual(String source,
String target)
- Convenience method for comparing two strings based on the collation
rules.
- Parameters:
- source - the source string to be compared with.
- target - the target string to be compared with.
- Returns:
- true if the first string is greater than or equal to the
parameter, str, according to the collation rules. false, otherwise.
- See Also:
- compare
getStrength
public synchronized byte getStrength()
- Determines the minimum strength that will be use in comparison or
transformation.
E.g. with strength == SECONDARY, the tertiary difference is ignored
E.g. with strength == PRIMARY, the secondary and tertiary difference
are ignored.
- Returns:
- the current comparison level.
- See Also:
- setStrength
setStrength
public synchronized void setStrength(byte newStrength) throws IllegalArgumentException
- Sets the minimum strength to be used in comparison or transformation.
Example of use:
Collation myCollation = Collation.getDefault(Locale.US);
myCollation.setStrength(Collation.PRIMARY)
// result will be "abc" == "ABC"
// tertiary differences will be ignored
byte result = myCollation.compare("abc", "ABC");
- Parameters:
- newStrength - the new comparison level.
- Throws: IllegalArgumentException
- If the new strength is
incorrect.
- See Also:
- getStrength
getDecomposition
public synchronized byte getDecomposition()
- Get the decomposition mode of the collation object.
- Returns:
- the decomposition mode
- See Also:
- setDecomposition
setDecomposition
public synchronized void setDecomposition(byte decompositionMode) throws IllegalArgumentException
- Set the decomposition mode of the collation object.
- Parameters:
- the - new decomposition mode
- Throws: IllegalArgumentException
- If the new decomposition
mode is incorrect.
- See Also:
- getDecomposition
getAvailableLocales
public static synchronized Locale[] getAvailableLocales()
- Get the set of Locales for which Collations are installed
- Returns:
- the list of available locales which collations are installed
getDisplayName
public static synchronized String getDisplayName(Locale objectLocale,
Locale displayLocale)
- Get name of the object for the desired Locale, in the desired langauge
- Parameters:
- objectLocale - must be from getAvailableLocales
- displayLocale - specifies the desired locale for output
- Returns:
- display-able name of the object for the object locale in the
desired language
getDisplayName
public static synchronized String getDisplayName(Locale objectLocale)
- Get name of the object for the desired Locale, in the langauge of the
default locale.
- Parameters:
- objectLocale - must be from getAvailableLocales
- Returns:
- name of the object for the desired locale in the default
language
clone
public Object clone()
- Overrides Cloneable
- Overrides:
- clone in class Object
equals
public abstract boolean equals(Object what)
- Compares the equality of two collation objects.
- Parameters:
- what - the collation object to be compared with this.
- Returns:
- true if the current collation object is the same
as the collation object what; false otherwise.
- Overrides:
- equals in class Object
hashCode
public synchronized abstract int hashCode()
- Generates the hash code for the collation object
- Overrides:
- hashCode in class Object
All Packages Class Hierarchy This Package Previous Next Index