Problem
ConvertFromUnicodeJava does not convert Korean Unicode escape sequences.
Input like \uC548\uB155 is passed through as-is instead of being decoded to 안녕.
This happens because the function only handles the hardcoded range \u00A1 through \u00FF.
Korean characters fall in the \uAC00–\uD7A3 range, which is never reached.
On top of that, characters even within the supported range were rendered as ? due to encoding corruption in the source file.
A secondary issue exists in the RegEx-based variant: Replace() rescans the entire string on every iteration, so when the same pattern appears multiple times, it gets replaced on the first call and subsequent iterations call Replace() against a pattern that no longer exists.
Fix
Replaced the hardcoded lookup table with a RegEx pattern \u([0-9A-Fa-f]{4}) that matches any four-digit hex escape.
Instead of calling Replace() per match, the function now walks through matches by position using FirstIndex, appending unmatched segments as-is and converting each matched escape with ChrW().
This covers the full Unicode range \u0000–\uFFFF in a single pass.
Function ConvertFromUnicodeJava(ByVal TextStr As String) As String
Dim regEx As Object
Dim matches As Object
Dim match As Object
Dim result As String
Dim cursor As Long
Dim matchStart As Long
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Global = True
.Pattern = "\\u([0-9A-Fa-f]{4})"
End With
Set matches = regEx.Execute(TextStr)
If matches.Count = 0 Then
ConvertFromUnicodeJava = TextStr
Exit Function
End If
result = ""
cursor = 1
For Each match In matches
matchStart = match.FirstIndex + 1
If matchStart > cursor Then
result = result & Mid(TextStr, cursor, matchStart - cursor)
End If
result = result & ChrW(CLng("&H" & match.SubMatches(0)))
cursor = matchStart + match.Length
Next match
If cursor <= Len(TextStr) Then
result = result & Mid(TextStr, cursor)
End If
ConvertFromUnicodeJava = result
Set matches = Nothing
Set regEx = Nothing
End Function
Verified behavior
| Input |
Before |
After |
\uC548\uB155 |
\uC548\uB155 (not converted) |
안녕 |
Fran\u00E7ois |
Fran?ois |
François |
Copyright \u00A9 |
Copyright ? |
Copyright © |
\u00A9\u00A9\u00A9 |
correct output, 3× Replace calls |
correct output, single pass |
Problem
ConvertFromUnicodeJavadoes not convert Korean Unicode escape sequences.Input like
\uC548\uB155is passed through as-is instead of being decoded to안녕.This happens because the function only handles the hardcoded range
\u00A1through\u00FF.Korean characters fall in the
\uAC00–\uD7A3range, which is never reached.On top of that, characters even within the supported range were rendered as
?due to encoding corruption in the source file.A secondary issue exists in the RegEx-based variant:
Replace()rescans the entire string on every iteration, so when the same pattern appears multiple times, it gets replaced on the first call and subsequent iterations callReplace()against a pattern that no longer exists.Fix
Replaced the hardcoded lookup table with a RegEx pattern
\u([0-9A-Fa-f]{4})that matches any four-digit hex escape.Instead of calling
Replace()per match, the function now walks through matches by position usingFirstIndex, appending unmatched segments as-is and converting each matched escape withChrW().This covers the full Unicode range
\u0000–\uFFFFin a single pass.Verified behavior
\uC548\uB155\uC548\uB155(not converted)안녕Fran\u00E7oisFran?oisFrançoisCopyright \u00A9Copyright ?Copyright ©\u00A9\u00A9\u00A9