Class PercentEscaper (1.42.2)

Stay organized with collections Save and categorize content based on your preferences.
public class PercentEscaper extends UnicodeEscaper

A UnicodeEscaper that escapes some set of Java characters using the URI percent encoding scheme. The set of safe characters (those which remain unescaped) is specified on construction.

For details on escaping URIs for use in web pages, see RFC 3986 - section 2.4 and RFC 3986 - appendix A

When encoding a String, the following rules apply:

  • The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
  • Any additionally specified safe characters remain the same.
  • If plusForSpace is true, the space character " " is converted into a plus sign "+".
  • All other characters are converted into one or more bytes using UTF-8 encoding. Each byte is then represented by the 3-character string "%XY", where "XY" is the two-digit, uppercase, hexadecimal representation of the byte value.

RFC 3986 defines the set of unreserved characters as "-", "_", "~", and "." It goes on to state:

URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI comparison implementations do not always perform normalization prior to comparison (see Section 6). For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.

Note: This escaper produces uppercase hexadecimal sequences. From RFC 3986:
"URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings."

Inheritance

java.lang.Object > Escaper > UnicodeEscaper > PercentEscaper

Static Fields

SAFECHARS_URLENCODER

public static final String SAFECHARS_URLENCODER

A string of safe characters that mimics the behavior of java.net.URLEncoder.

Field Value
TypeDescription
String

SAFEPATHCHARS_URLENCODER

public static final String SAFEPATHCHARS_URLENCODER

A string of characters that do not need to be encoded when used in URI path segments, as specified in RFC 3986. Note that some of these characters do need to be escaped when used in other parts of the URI.

Field Value
TypeDescription
String

SAFEQUERYSTRINGCHARS_URLENCODER

public static final String SAFEQUERYSTRINGCHARS_URLENCODER

A string of characters that do not need to be encoded when used in URI query strings, as specified in RFC 3986. Note that some of these characters do need to be escaped when used in other parts of the URI.

Field Value
TypeDescription
String

SAFEUSERINFOCHARS_URLENCODER

public static final String SAFEUSERINFOCHARS_URLENCODER

A string of characters that do not need to be encoded when used in URI user info part, as specified in RFC 3986. Note that some of these characters do need to be escaped when used in other parts of the URI.

Field Value
TypeDescription
String

SAFE_PLUS_RESERVED_CHARS_URLENCODER

public static final String SAFE_PLUS_RESERVED_CHARS_URLENCODER

Contains the safe characters plus all reserved characters. This happens to be the safe path characters plus those characters which are reserved for URI segments, namely '/' and '?'.

Field Value
TypeDescription
String

Constructors

PercentEscaper(String safeChars)

public PercentEscaper(String safeChars)

Constructs a URI escaper with the specified safe characters. The space character is escaped to %20 in accordance with the URI specification.

Parameter
NameDescription
safeCharsString

a non null string specifying additional safe characters for this escaper (the ranges 0..9, a..z and A..Z are always safe and should not be specified here)

PercentEscaper(String safeChars, boolean plusForSpace) (deprecated)

public PercentEscaper(String safeChars, boolean plusForSpace)

Deprecated. use PercentEscaper(String safeChars) instead which is the same as invoking this method with plusForSpace set to false. Escaping spaces as plus signs does not conform to the URI specification.

Constructs a URI escaper that converts all but the specified safe characters into hexadecimal percent escapes. Optionally space characters can be converted into a plus sign + instead of %20. and optional handling of the space

Parameters
NameDescription
safeCharsString

a non null string specifying additional safe characters for this escaper. The ranges 0..9, a..z and A..Z are always safe and should not be specified here.

plusForSpaceboolean

true if ASCII space should be escaped to + rather than %20

Methods

escape(int cp)

protected char[] escape(int cp)

Escapes the given Unicode code point in UTF-8.

Parameter
NameDescription
cpint
Returns
TypeDescription
char[]
Overrides

escape(String s)

public String escape(String s)

Returns the escaped form of a given literal string.

If you are escaping input in arbitrary successive chunks, then it is not generally safe to use this method. If an input string ends with an unmatched high surrogate character, then this method will throw IllegalArgumentException. You should ensure your input is valid UTF-16 before calling this method.

Parameter
NameDescription
sString
Returns
TypeDescription
String
Overrides

nextEscapeIndex(CharSequence csq, int index, int end)

protected int nextEscapeIndex(CharSequence csq, int index, int end)

Scans a sub-sequence of characters from a given CharSequence, returning the index of the next character that requires escaping.

Note: When implementing an escaper, it is a good idea to override this method for efficiency. The base class implementation determines successive Unicode code points and invokes #escape(int) for each of them. If the semantics of your escaper are such that code points in the supplementary range are either all escaped or all unescaped, this method can be implemented more efficiently using CharSequence#charAt(int).

Note however that if your escaper does not escape characters in the supplementary range, you should either continue to validate the correctness of any surrogate characters encountered or provide a clear warning to users that your escaper does not validate its input.

See PercentEscaper for an example.

Parameters
NameDescription
csqCharSequence
indexint
endint
Returns
TypeDescription
int
Overrides