Internationalized Resource Identifier (IRI) like IDN may contain Unicode characters, while Uniform Resource Identifier (URI) is limited to ASCII symbols only.
According to RFC 3987 IRIs are meant to replace URIs in identifying resources for protocols, formats, and software components that use a UCS-based character repertoire.
At first sight, you may consider that this task must been decided with the same means as for IDN. But there is not so exactly. Let's view a resource identifier structure:
You may notice that it has several components.
The authority component of a URI parses according to the following syntax
[user-info@]host[:port]
In a case, where a host is a domain name the IDN approach, i.e. the mapping, could be applied.
But generally the URI structure is more complicated. Applications can use URI-reference syntax to make reference to a URI, instead of always using above generic syntax rule. A URI-reference is either a URI or a relative reference. If a URI-reference doesn't specifies a scheme, it is said to be a relative reference. Usually, a relative reference expresses a URI reference relative to the name space of another URI.
Nevertheless, the instances the java.net.URI
class can represent IRIs
whenever they contain non ASCII characters.
This class was enhanced by the following methods to perform the operations and conversions according to RFC 3987:
toASCIIString()
- converts an IRI to a URI and returns its content
as a US-ASCII string.toString()
- returns the content of this URI as a string in its
original Unicode form.toIRIString()
Converts this URI to an IRI and returns its content
as a string.