Saturday, November 22, 2008

Use GREP or ACK to Search for Chinese Characters

I have been developing a solution for a Chinese customer. As a result, I need to search for Chinese characters in text files. I use GREP and ACK. There is some subtlety when searching for Chinese characters.

In order to input Chinese characters, the windows command line's code page needs to be set to 936. Issuing chcp 936 can do this. 936 is for GBK encoding.

There are two files a.txt and b.txt in d:/text. Both of these 2 files contain the following text:

中国
中国abc

The encoding for a.txt is GBK. The encoding for b.txt is UTF-8. The grep 中国 d:/test/*.txt only finds 中国 in a.txt.

The conclusion is that GREP can only find Double Byte characters such as Chinese only if the command line console and the file to be searched has the same encoding. This conclusion also applies to ACK.

Use PAR to format XML comment

PAR is fantastic for formatting text files. It can be used to give XML comment a pretty layout. For the following XML comment:

<!-- You can recognize truth by its beauty and -->
<!-- simplicity. When you get it right, it is obvious that it is right. -->

par 50 produces

<!-- You can recognize truth by its beauty -->
<!-- and simplicity. When you get it right, -->
<!-- it is obvious that it is right. -->

But for the following text

<!-- You can recognize truth by its beauty and simplicity. When you get it right, it is obvious that it is right. -->

par 50 produces

<!-- You can recognize truth by its beauty and
simplicity. When you get it right, it is obvious
that it is right. -->

It is not what we want. Instead, par 50 -p5 -s5 can be used to produce

<!-- You can recognize truth by its beauty -->
<!-- and simplicity. When you get it right, -->
<!-- it is obvious that it is right. -->

For the details of using PAR, you can refer to Par.

Monday, November 17, 2008

Paste Multiple Lines of Code into Clisp Console

There is a little problem when pasting multiple lines of code into clisp console. If TAB is used for code indentation, there will be the following error when pasting the code into CLISP console.

You are in the top-level Read-Eval-Print loop.

Using spaces for code indentation will fix this problem.

Wednesday, October 15, 2008

Display Chinese characters in Eclipse console

Recently, I have been working on a project for a Chinese company. As a result, I want to display Chinese characters in Eclipse console. I am using Eclipse 3.3. I don't want to change Regional and language settings for Windows XP since it will influence other programs. After searching with google, I find the following solution work for me. First, we need to configure the JRE we are using in Eclipse.

  1. In Preferences->Java->Installed JREs, choose the JRE you are using.

  2. Click Edit to show Edit JRE dialog

  3. Input -Dfile.encoding=UTF-8 in Default VM Arguments


Second, we need to configure all the Run (Debug) configurations.

  1. In Common tab, select UTF-8 as Console Encoding


Eclipse 3.4 does not have this prolem. It can display Chinese correctly by default.

Monday, September 22, 2008

Notebook and Ballpoint Pen

My favorite note book and ballpoint pen

Wednesday, August 27, 2008

Swap CAPSLOCK and ESC key in Windows

I decide to do the swap because I use VIM a lot. Save the following text into a file such as swap.reg. Double click the file import the registry values.

REGEDIT4
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout]
"Scancode Map"=hex:00,00,00,00,00,00,00,00,03,00,00,00,3a,00,01,00,01,00,3a,00,00,00,00,00

Saturday, August 16, 2008

Build Ruby on Windows

Get the Ruby 1.8.7 source code from ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.7-p71.tar.gz. Extract the source code to a directory such as D:/Ruby 1.8.7-p71 . Open a Visual Studio .NET 2003 Command Prompt. Run the following commands to compile ruby.
  • cd /d D:/Ruby 1.8.7-p71
  • win32\configure.bat
  • nmake
If the following error occurs, its means that Cygwin find is invoked from the make file. Make sure that windows find precedes Cygwin find in PATH environment variable.


Creating Makefile
find: `=': No such file or directory
NMAKE : fatal error U1077: 'cl' : return code '0x1'
Stop.