The "optimization" of using arrays instead of regexps to implement to_utf8 and is_utf8? (and their brethren) is actually no faster. Go back to the logically-clearer implementation.