Saturday, November 22, 2008

Use GREP or ACK to Search for Chinese Characters

I have been developing a solution for a Chinese customer. As a result, I need to search for Chinese characters in text files. I use GREP and ACK. There is some subtlety when searching for Chinese characters.

In order to input Chinese characters, the windows command line's code page needs to be set to 936. Issuing chcp 936 can do this. 936 is for GBK encoding.

There are two files a.txt and b.txt in d:/text. Both of these 2 files contain the following text:

中国
中国abc

The encoding for a.txt is GBK. The encoding for b.txt is UTF-8. The grep 中国 d:/test/*.txt only finds 中国 in a.txt.

The conclusion is that GREP can only find Double Byte characters such as Chinese only if the command line console and the file to be searched has the same encoding. This conclusion also applies to ACK.

Use PAR to format XML comment

PAR is fantastic for formatting text files. It can be used to give XML comment a pretty layout. For the following XML comment:

<!-- You can recognize truth by its beauty and -->
<!-- simplicity. When you get it right, it is obvious that it is right. -->

par 50 produces

<!-- You can recognize truth by its beauty -->
<!-- and simplicity. When you get it right, -->
<!-- it is obvious that it is right. -->

But for the following text

<!-- You can recognize truth by its beauty and simplicity. When you get it right, it is obvious that it is right. -->

par 50 produces

<!-- You can recognize truth by its beauty and
simplicity. When you get it right, it is obvious
that it is right. -->

It is not what we want. Instead, par 50 -p5 -s5 can be used to produce

<!-- You can recognize truth by its beauty -->
<!-- and simplicity. When you get it right, -->
<!-- it is obvious that it is right. -->

For the details of using PAR, you can refer to Par.

Monday, November 17, 2008

Paste Multiple Lines of Code into Clisp Console

There is a little problem when pasting multiple lines of code into clisp console. If TAB is used for code indentation, there will be the following error when pasting the code into CLISP console.

You are in the top-level Read-Eval-Print loop.

Using spaces for code indentation will fix this problem.

Wednesday, October 15, 2008

Display Chinese characters in Eclipse console

Recently, I have been working on a project for a Chinese company. As a result, I want to display Chinese characters in Eclipse console. I am using Eclipse 3.3. I don't want to change Regional and language settings for Windows XP since it will influence other programs. After searching with google, I find the following solution work for me. First, we need to configure the JRE we are using in Eclipse.

  1. In Preferences->Java->Installed JREs, choose the JRE you are using.

  2. Click Edit to show Edit JRE dialog

  3. Input -Dfile.encoding=UTF-8 in Default VM Arguments


Second, we need to configure all the Run (Debug) configurations.

  1. In Common tab, select UTF-8 as Console Encoding


Eclipse 3.4 does not have this prolem. It can display Chinese correctly by default.

Monday, September 22, 2008

Notebook and Ballpoint Pen

My favorite note book and ballpoint pen

Wednesday, August 27, 2008

Swap CAPSLOCK and ESC key in Windows

I decide to do the swap because I use VIM a lot. Save the following text into a file such as swap.reg. Double click the file import the registry values.

REGEDIT4
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout]
"Scancode Map"=hex:00,00,00,00,00,00,00,00,03,00,00,00,3a,00,01,00,01,00,3a,00,00,00,00,00

Saturday, August 16, 2008

Build Ruby on Windows

Get the Ruby 1.8.7 source code from ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.7-p71.tar.gz. Extract the source code to a directory such as D:/Ruby 1.8.7-p71 . Open a Visual Studio .NET 2003 Command Prompt. Run the following commands to compile ruby.
  • cd /d D:/Ruby 1.8.7-p71
  • win32\configure.bat
  • nmake
If the following error occurs, its means that Cygwin find is invoked from the make file. Make sure that windows find precedes Cygwin find in PATH environment variable.


Creating Makefile
find: `=': No such file or directory
NMAKE : fatal error U1077: 'cl' : return code '0x1'
Stop.

Saturday, August 2, 2008

HTTP Status 404 when running tomcat inside Eclipse

HTTP Status 404 will show up when I running tomcat inside Eclipse. And my resource exists. The error will disappear when I restart tomcat for some times. How many times I need to restart tomcat to remove the error is random. And the dialogs of Run on server wizard is random.

Sometimes, I get this error when using welcome-file-list. For some reasons, the welcome-file can be found.

I have used tomcat in Eclipse for some. My overall experience is bad. Maybe some improvements should be made on WTP. For now, I use ANT to deal with Tomcat. And it works well.

Classpath subtlety when using tomcat inside Eclipse

First, an eclipse dynamic web project only recognized classes produced by source folders configured in Java Build Path->Source and classes in the libraries under WebContent/WEB-INF/lib folder. The locations of source folders and class output folders do not matter if they are inside this project. The count of source folders and output folders also does not matter. By default, the output class folder is build/classes. It does not recognize the jar libraries which are not under WebContent/WEB-INF/lib and class folder configured in Java Build Path->Libraries. For class compilation, it is Ok. ClassNotFound exception will be thrown if you begin to use Run on server.

Wednesday, July 2, 2008

Running the Eclipse 3.4 formatter application

D:\gnu\eclipse\eclipse.exe -application org.eclipse.jdt.core.JavaCodeFormatter -config D:\gnu\jee\workspace\Concurrency\.settings\org.elipse.jdt.core.prefs d:\Foo.java

Thursday, June 12, 2008

variable type checking in C

Put all the files in a directory. Run gcc *.c -o main.exe. The compilation will succeed. The reason is that C does not check type consistency of external variable declaration and reference.
1 one.c

int abc = 1;

2 main.c

#include <stdio.h>
extern float abc;
int main(void) {
printf( "%f", abc );
}

Function type checking in C

Put all the following 3 files in a directory. Run gcc *.c. The compilation will succeed. And running the resulted excecutable file will print I am here, man!. Run g++ *.c The compilation will fail. The reason is that during linking, C only check function names. But C++ check function type.
1. caller.c

#include "callee.h"
int main(void) {
foo();
}

2. callee.h

int foo(void);

3. callee.c

#include
void foo(int v) {
printf( "I am here, man!" );
}

Wednesday, May 14, 2008

Index overflow in Java for loop

The following little code snippet shows the overflow of for loop index in Java.

public class Out {
public static void main(String[] ars) {
for(int i = 0; i <= 2147483647 ; i+=100000000 )
{
System.out.println( i + "\n" );
}
}
}

You may expect that the iteration count is 22. But it is not true. i is 2100000000 after the 22nd iteration. Then 100000000 is added to i(2100000000) again. 2200000000 is bigger than 2147483647. So int overflow will happen. i will become a negative integer. So the loop will continue.

For details of loop index overflow, you can refer to Programming Language Pragmatics.

Friday, May 2, 2008

Genenared methods values and valueOf for Enum

I have recently tried Java Enum when doing software development. The following code is an example:

public Enum FooType {
ABC, DEF;
}

I found two methods valueOf and values in the Javadoc for my enum code. I did not write these 2 methods. And it's superclass java.lang.Enum<FooType> also does not have these 2 methods. Then I checked JLS 3.0. 8.9 Enums says these 2 methods are automatically generated. It is a little magic. One Java class can has methods which do not exist as source code. Too many magic which do not have a consistent rationale can make a language hard to learn and use.

Saturday, April 26, 2008

Modula-2

I tried a little Modula-2 programming today. I used XDS 2.5. I built the following little program successfully in XDS IDE.

MODULE hello;
FROM InOut IMPORT WriteString, WriteLn;
BEGIN
WriteString("Hello, world!");
WriteLn;
END hello.

And I could use xc to compile it. The following error showed up when I used xlink to link it.

XDS Link Version 2.6 Copyright (c) 1995-2001 Excelsior
Fatal error (13): No program entry point


I have googled answers for it and read XDS documentation. Unfornately, I can't find an answer. Again, it shows a quick start guide with some simple examples is extremely important for a beginner to use some new tools.

After wresting with xlink for some time , I worked out the following solution.

xlink hello.obj C:\free\XDS\LIB\x86\libxds.lib C:\free\XDS\LIB\x86\import32.lib C:\free\XDS\LIB\x86\xstart.lib

Tuesday, April 15, 2008

Reading of Programming Language Pragmatics

Programming Language Syntax


The following syntax shows dangling else problem of Pascal.

  • stmt -> if condition then_clause else_clause | other_stmt

  • then_clause -> then stmt

  • else_clause -> else stmt | ε


Sometimes, we have a hard time to use some language. Sometimes it is not because we are not smart enough. It is because of the flaws of the programming language.

Saturday, April 12, 2008

Hiding and overriding in Java

For some technical terms in this post, please refer to JLS 3.0. And the discussion is not very precise. I have looked for some explicit rules governing this topic in JLS. But I failed. So I am just trying to summarize my personal understanding. Any feedbacks are welcomed.

In the following discussion, A and B are used. Type A extends type B (type means class or interface).

Field


Only hiding applies tojava fields. Hiding happens in the following situation.Both A and B have a variable named as var. And var is of the same kind of variables ( static fields or instance fields).
Java allows a variable belonging to different variable kinds appears in both A and B. But hiding does not happen in such situations.

Method


Method m1 in A . Method m2 in B.

  1. m1 is subsignature of m2

  2. m1's visibility is no less than m2

  3. Both m1 and m2 are instance methods. Or both m1 and m2 are static methods. Otherwise, there will some compile errors.


Overriding


Overriding applies to instance methods.

Hiding


Overriding applies to static methods.

Wednesday, April 9, 2008

blank in shell script

Try to avoid the use of blank in pathnames. It will drive you crazy under some circumstances. Take the following script as an example.

#!/bin/sh
opt="-path \"*/a bc\" -prune"
find $opt -o -print0
bash -c "find $opt -o -print0"

find $opt -o -print0 does not work. But bash -c "find $opt -o -print0" works. I have tried very hard to find a way to make the former work. But I failed. I will be happy to hear if anybody has a solution.

Little hacking with bash shell

I have recently tried locate from findutils in cygwin. I wanted to use updatedb to create a file name database. And I wanted to exclude all the files and directories under .svn directories . So I tried updatedb --localpaths='D:/gnu/ws/ctre/space' --findoptions='-path "*/.svn" -prune -o'. But I could't exclude the files in .svn directories with this command. I am not very experienced with bash. But I wanted to solve this problem. So I checked update script and did some debugging to it. The problem is that bash shell will do pathname expansion to "*/.svn". My first fix is to add set -f in updatedb script. It works.

Then I wanted to open a bug for this script. For opening a bug, I need to do further investigation. So I learned more about bash and try more scripts. Finally, I found that I can use updatedb --localpaths='D:/gnu/ws/ctre/space' --findoptions='-path */.svn -prune -o' or updatedb --localpaths='D:/gnu/ws/ctre/space' --findoptions='-path '*/.svn' -prune -o'. Bash shell will preserve the literal value of every chacter within the qoutes. But bash shell will do expansion to characters enclosed by double quotes.

As a result, I did not open an invalid bug. I have learned more. It tells me that contributing to community will let me learn more.

Sometimes I was frustrated during this hacking. But the overall experience is good.

Saturday, April 5, 2008

Solution to one Design Puzzle in Head First Design Pattern

I am reading Head First Design Pattern. It is a great book.

Generally, I don't like exercises in a book which don't have answers provided. I have run into the design puzzle on page 468. At first sight, I knew that I should use state pattern. I have never applied state pattern in real software work before. So I took this exercise as an opportunity to practice state pattern. Here is my solution.

State.java

package headfirst.proxy.virtualproxy;
import java.net.*;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;

public interface State {
public int getIconWidth();
public int getIconHeight();
public void paintIcon( Component c, Graphics g, int x, int y );
}

NullState.java

package headfirst.proxy.virtualproxy;
import java.net.*;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;

public class NullState implements State {

private ImageProxy proxy;
private Thread retrievalThread;
private URL imageURL;
boolean retrieving = false;

public NullState( ImageProxy p, URL url) {
proxy = p;
imageURL = url;
}

public int getIconWidth() {
return 800;
}

public int getIconHeight() {
return 600;
}

public void paintIcon( final Component c, Graphics g, int x, int y ) {
g.drawString("Loading CD cover, please wait...", x+300, y+190);
if (!retrieving) {
retrieving = true;
retrievalThread = new Thread(new Runnable() {
public void run() {
try {
ImageIcon imageIcon = new ImageIcon(imageURL, "CD Cover");
proxy.setState( new ImageState( imageIcon ) );
c.repaint();
} catch (Exception e) {
e.printStackTrace();
}
}
});
retrievalThread.start();
}
}
}

ImageState.java

package headfirst.proxy.virtualproxy;
import java.net.*;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;

public class ImageState implements State {

private ImageIcon imageIcon;

public ImageState( ImageIcon i ) {
imageIcon = i;
}

public int getIconWidth() {
return imageIcon.getIconWidth();
}

public int getIconHeight() {
return imageIcon.getIconHeight();
}

public void paintIcon( final Component c, Graphics g, int x, int y ) {
imageIcon.paintIcon(c, g, x, y);
}
}

ImageProxy.java

package headfirst.proxy.virtualproxy;

import java.net.*;
import java.awt.*;
import java.awt.event.*;
import javax.swing.*;

public class ImageProxy implements Icon {

private NullState ns;
private State state;

public ImageProxy(URL url) {
ns = new NullState( this, url );
state = ns;
}

public int getIconWidth() {
return state.getIconWidth();
}

public int getIconHeight() {
return state.getIconWidth();
}

public void paintIcon(final Component c, Graphics g, int x, int y) {
state.paintIcon( c, g, x, y);
}

public void setState( State s )
{
state = s;
}
}

ClassNotFoundException problem with rmiregistry

It is very possible that you will get ClassNotFoundException when you try Getting Started tutorial for RMI in JDK documentation. There is a easy way to fix it. Start rmiregistry and the server implementation in the same command line console on Windows (assuming that the current directory is the root directory of the class file tree for the example).
  • start rmiregistry

  • java example.hello.Server

The CLASSPATHs for both rmiregistry and java example.hello.Server must contains the path for the class file tree for the example if you want to run them from different command line console.

I have run into this problem twice. And JDK documentation does not explicitly specified. I searched the web. Remote Method Invocation (RMI) - RMI server ClassNotFoundExceptionsolves my problem.

So documentation is vital to software quality. Good documentation can save developers a lot of precious time.

Monday, March 24, 2008

Working Directory

To decide the working directory for a process is difficult. I have found a entry in wikipedia for working directory. And I have added some rules for deciding working directory in the wikipedia entry Working directory. You can refer to it for details.

For this reason, we should always try to use pathname relative to classpath instead of working directory in Java.

Wednesday, March 19, 2008

Unicode & Java

Endianess


In computing, endianness is the byte (and sometimes bit) ordering used to represent
some kind of data.
Most modern computer processors agree on bit ordering "inside" individual bytes (this was not always the case). This means that any single-byte value will be read the same on almost any computer one may send it to.
Integers are usually stored as sequences of bytes, so that the encoded value can be obtained by simple concatenation. The two most common of them are:

  1. increasing numeric significance with increasing memory addresses, known as little-endian

  2. its opposite, most-significant byte first, called big-endian.


Inter x86 use little-endian. JVM use big-endian.(The above content is from Wikipedia Endianess Entry)

Unicode


Code Point. Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16.
Mapping of Unicode character planes is a good explanation of Unicode planes and code points.

UTF-16


To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

Java


In the Java SE API documentation, Unicode code point is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit is used for 16-bit char values that are code units of the UTF-16 encoding.

Sunday, February 24, 2008

It is not true that there is simply no way to extend an instantiable class and add an aspect while preserving the equals contract.

In Effective Java, Item 7 is "Obey the general contract when overriding equals". There is one sentence "There is simply no way to extend an instantiable class and add an aspect while preserving the equals contract." Josh uses the following classes to illustrate Transitivity. The following text is excerpted from the book.
The Point class.

public class Point {
private final int x;
private final int y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
public boolean equals(Object o) {
if (!(o instanceof Point))
return false;
Point p = (Point)o;
return p.x == x && p.y == y;
}
... // Remainder omitted
}

The ColorPoint class.

public class ColorPoint extends Point {
private Color color;
public ColorPoint(int x, int y, Color color) {
super(x, y);
this.color = color;
}

//Broken - violates transitivity.
public boolean equals(Object o) {
if (!(o instanceof Point))
return false;
// If o is a normal Point, do a color-blind
// comparison
if (!(o instanceof ColorPoint))
return o.equals(this);
// o is a ColorPoint; do a full comparison
ColorPoint cp = (ColorPoint)o;
return super.equals(o) && cp.color == color;
}

... // Remainder omitted
}


This approach does provide symmetry, but at the expense of transitivity:

ColorPoint p1 = new ColorPoint(1, 2, Color.RED);
Point p2 = new Point(1, 2);
ColorPoint p3 = new ColorPoint(1, 2, Color.BLUE);

At this point, p1.equals(p2) and p2.equals(p3) return true, while p1.equals(p3) returns false, a clear violation of transitivity. The first two comparisons are “color-blind,” while the third takes color into account.

So what's the solution? It turns out that this is a fundamental problem of equivalence relations in object-oriented languages. There is simply no way to extend an instantiable class and add an aspect while preserving the equals contract.

Then Josh gives a a workaround where ColorPoint does not extend Point. I thinks that we can allow ColorPoint to extend Point and preserve the equals contract. Here is my solution.
The Point class.

public class Point {
private final int x;
private final int y;

public Point( int x, int y ) {
this.x = x;
this.y = y;
}

public boolean equals( Object o ) {
if( this == o )
return true;
if( !(this.getClass() == o.getClass()) )
return false;
Point p = (Point) o;
return p.x == x && p.y == y;
}

... // Remainder omitted

}

The ColorPont class.

public class ColorPoint extends Point {
private Color color;
public ColorPoint(int x, int y, Color color) {
super(x, y);
this.color = color;
}

public boolean equals( Object o ) {
if( this == o )
return true;
if( o == null )
return false;
if( !(this.getClass() == o.getClass()) )
return false;
ColorPoint cp = (ColorPoint) o;
return super.equals( o ) && cp.color == color;
}
... // Remainder omitted
}

First, equals method use the == operator to check if the argument is a reference to this object. Secondly, it check whether the argument and this object are of the same type. I have tested this piece of code. It works. But I think that there may be some situations I have not taken into account.

Any comments are welcome.