Solved: Wrong encoding detect in opening chinese utf8 text files

Report issues, odd behaviors or submit a detailed bug report.
Post Reply
richiezhong
Posts: 3
Joined: 12 May 2022 04:30

Solved: Wrong encoding detect in opening chinese utf8 text files

Post by richiezhong »

Dear author,
I am using RJ texted to open a text file which was created by notepad++ and saved as utf8,
but it is wrong detect as 简体中文gb2312, which cannot be read.
Not all the files are wrong but some of them.
I could manually change the encoding while opening the file but not convinient.
while this file is detect and read correctly by notepad++.

RJ version: 15.50
os: Windows 10 professional
thanks and best wishes!
Last edited by richiezhong on 20 May 2022 10:56, edited 1 time in total.
User avatar
pjj
Posts: 2109
Joined: 13 Oct 2009 13:48
Location: Kraków, Poland

Re: Wrong encoding detect in opening chinese utf8 text files

Post by pjj »

Certainly, there is something off with encoding detection; for some reason when opening a file with the code below

Code: Select all

<?php
ptb_vardump('$customXML');
ptb_vardump('$custom'); // string
echo '<ul>', "\n";
foreach ($customXML->modules->item as $item) {
    if (!isset($item['href'])) {
        echo '<li>', $item['name'];
        echo '<ul>', "\n";
        foreach ($item->item as $subitem) {
            echo '<li><a href="/admin', $subitem['href'], '">', $subitem['title'], '</a></li>';
        }
        echo '</ul></li>', "\n";
    }
    else {
        echo '<li><a href="/admin', $item['href'], '">', $item['title'], '</a></li>';
    }
}
echo '</ul>', "\n";
RJ TE shows this dialog window:
autodetect1.png
autodetect1.png (12.19 KiB) Viewed 1416 times
Interestingly, the same code but without first two lines, i.e. without

Code: Select all

ptb_vardump('$customXML');
ptb_vardump('$custom'); // string
makes even more confusion in encoding detection module:
autodetect2.png
autodetect2.png (12.76 KiB) Viewed 1416 times
Yes, this code is pure ASCII. Środkowoeuropejski is Windows-1250, while Zachodnioeuropejski is Windows-1252.
Alium tibi quaere fratrem; hic, quem tuum putas, meus est. Titus Flāvius Caesar Vespasiānus Augustus
User avatar
Rickard Johansson
Site Admin
Posts: 6577
Joined: 19 Jul 2006 14:29

Re: Wrong encoding detect in opening chinese utf8 text files

Post by Rickard Johansson »

The option to "Detect All (encoding and code page) is not always accurate. It uses IMultiLanguage2 interface to detect the code page.

It is recommended to turn this off, unless you really need it. Setting a default encoding in options may help a little...
User avatar
pjj
Posts: 2109
Joined: 13 Oct 2009 13:48
Location: Kraków, Poland

Re: Wrong encoding detect in opening chinese utf8 text files

Post by pjj »

Rickard Johansson wrote: 15 May 2022 14:38 It is recommended to turn this off, unless you really need it. Setting a default encoding in options may help a little...
I turned it off and it helped; thank you! (I had default encoding set, btw.)
Alium tibi quaere fratrem; hic, quem tuum putas, meus est. Titus Flāvius Caesar Vespasiānus Augustus
richiezhong
Posts: 3
Joined: 12 May 2022 04:30

Re: Wrong encoding detect in opening chinese utf8 text files

Post by richiezhong »

Rickard Johansson wrote: 15 May 2022 14:38 The option to "Detect All (encoding and code page) is not always accurate. It uses IMultiLanguage2 interface to detect the code page.

It is recommended to turn this off, unless you really need it. Setting a default encoding in options may help a little...
set Detect All (encoding and code page) to on solved this problem. Thank you very much!
勾上“检测所有(编码和字码页)”,解决问题了,非常感谢!
Post Reply