Java讀檔出現中文亂碼

JeffChang
Mar 23, 2018

--

因FileReader只會用系統編碼來把file裡的byte sequence轉成char。

如果檔案內容當初是以其他編碼(ex.Big5-HKSCS 香港增補字符集),用FileReader讀出來的String早已可能失真,之後再怎麼操作data都無法再還原成原始的data。

最簡單的概念就是:第一次把byte sequence轉成char sequence時就使用正確的encoding,得到不失真的字串,之後再轉成其他編碼再轉。

使用FileInputStream >>> InputStreamReader >>> BufferedReader的decoration方式,可以控制使用的encoding。

FileInputStream fis = new FileInputStream(path);
BufferedReader br = new BufferedReader(new InputStreamReader(fis, "Big5"));
String oneLine;
while((oneLine = br.readLine()) != null) {
// To Things
}

Update
從BufferedReader轉成字串輸出的方法,
在寫這篇文章的當時還不知道有IOUtils這樣方便的套件可以使用,
故取出Reader輸出成字串可以改寫為:

String result = IOUtils.toString(br);

就可以取出字串了。

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

JeffChang
JeffChang

Written by JeffChang

Java Backend Engineer In DDIM.

No responses yet

Write a response