This one had me bewildered for a while. Every single WordPress page is made up of many parts (core, themes, plugins, customisations, etc.). This powerful and flexible architecture also derives into complexity, especially when debugging, making bug hunting a tedious process.
After resolving the previous comments issue, I needed to finish off this matter where a guest commenter was wrongfully redirected to the login page after leaving a comment. The default behaviour is to redirect the commenter to its new comment on the post page. Despite WordPress asking for authentication the comment was actually published immediately.
Working backwards, the HTTP headers revealed that after posting the comment, the commenter was first redirected to the WordPress Dashboard, but being a guest, there was a second redirect to the login page. Now that this was determined, I just had to figure out why the commenter was redirected to the Dashboard.
After disabling all plugins and a lot of code inspection in the active theme and WordPress core, I came across the wp_safe_redirect function in pluggable.php. This function has the following lines which include the fall-back redirect to the Dashboard (admin_url):
/**
* Filters the redirect fallback URL for when the provided redirect is not safe (local).
*
* @since 4.3.0
*
* @param string $fallback_url The fallback URL to use by default.
* @param int $status The HTTP response status code to use.
*/
$fallback_url = apply_filters( 'wp_safe_redirect_fallback', admin_url(), $status );
$location = wp_validate_redirect( $location, $fallback_url );
Then it was time to determine what triggers the fallback_url to the Dashboard to be applied instead of the default post’s location. This leads to the next function wp_validate_redirect which tries to validate the URL being correct and if not, it returns the aforementioned admin_url fall-back. One of the checks is done by wp_sanitize_redirect and its sub-function _wp_sanitize_utf8_in_redirect:
/**
* Sanitizes a URL for use in a redirect.
*
* @since 2.3.0
*
* @param string $location The path to redirect to.
* @return string Redirect-sanitized URL.
*/
function wp_sanitize_redirect( $location ) {
// Encode spaces.
$location = str_replace( ' ', '%20', $location );
$regex = '/
(
(?: [\xC2-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx
| \xE0[\xA0-\xBF][\x80-\xBF] # triple-byte sequences 1110xxxx 10xxxxxx * 2
| [\xE1-\xEC][\x80-\xBF]{2}
| \xED[\x80-\x9F][\x80-\xBF]
| [\xEE-\xEF][\x80-\xBF]{2}
| \xF0[\x90-\xBF][\x80-\xBF]{2} # four-byte sequences 11110xxx 10xxxxxx * 3
| [\xF1-\xF3][\x80-\xBF]{3}
| \xF4[\x80-\x8F][\x80-\xBF]{2}
){1,40} # ...one or more times
)/x';
$location = preg_replace_callback( $regex, '_wp_sanitize_utf8_in_redirect', $location );
$location = preg_replace( '|[^a-z0-9-~+_.?#=&;,/:%!*\[\]()@]|i', '', $location );
$location = wp_kses_no_null( $location );
// Remove %0D and %0A from location.
$strip = array( '%0d', '%0a', '%0D', '%0A' );
return _deep_replace( $strip, $location );
}
/**
* URL encodes UTF-8 characters in a URL.
*
* @ignore
* @since 4.2.0
* @access private
*
* @see wp_sanitize_redirect()
*
* @param array $matches RegEx matches against the redirect location.
* @return string URL-encoded version of the first RegEx match.
*/
function _wp_sanitize_utf8_in_redirect( $matches ) {
return urlencode( $matches[0] );
}
The important part to notice is that UTF-8 characters in the URL are being URL encoded. For the common ASCII domain names this would never be an issue, however this particular site is using the letter “ö” (Latin small letter o with diaeresis, U+00F6) in its IDN schönbeck.dk. When replacing the UTF-8 letter with the URL encoded equivalent (%C3%B6), then we suddenly have an invalid URL that cannot be validated by WordPress, and therefore the fall-back redirect to admin_url is being triggered.
The Settings>General page actually permits IDNs in both the Site Address and WordPress Address fields, so why is the rest of the code not taking this into account? Obviously a bug. The solution is to leave out the domain part of the location redirect and only apply the UTF-8 URL encoding on the remainder of the URL.
Until this WordPress IDN bug is fixed the work around is to input the IDN in the Site Address and WordPress Address fields in Punycode, avoiding UTF-8 altogether.