Monday, March 24, 2008

Working Directory

To decide the working directory for a process is difficult. I have found a entry in wikipedia for working directory. And I have added some rules for deciding working directory in the wikipedia entry Working directory. You can refer to it for details.

For this reason, we should always try to use pathname relative to classpath instead of working directory in Java.

Wednesday, March 19, 2008

Unicode & Java

Endianess


In computing, endianness is the byte (and sometimes bit) ordering used to represent
some kind of data.
Most modern computer processors agree on bit ordering "inside" individual bytes (this was not always the case). This means that any single-byte value will be read the same on almost any computer one may send it to.
Integers are usually stored as sequences of bytes, so that the encoded value can be obtained by simple concatenation. The two most common of them are:

  1. increasing numeric significance with increasing memory addresses, known as little-endian

  2. its opposite, most-significant byte first, called big-endian.


Inter x86 use little-endian. JVM use big-endian.(The above content is from Wikipedia Endianess Entry)

Unicode


Code Point. Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16.
Mapping of Unicode character planes is a good explanation of Unicode planes and code points.

UTF-16


To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

Java


In the Java SE API documentation, Unicode code point is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit is used for 16-bit char values that are code units of the UTF-16 encoding.