[TOC] [Prev] [Next]

Object Serialization


Topics:

Overview

The capability to store and retrieve Java objects is essential to building all but the most transient applications. The key to storing and retrieving objects is representing the state of objects in a serialized form sufficient to reconstruct the object(s). For Java objects, the serialized form must be able to identify and verify the Java classes from which the fields were saved and to restore those fields to instances of the same classes. The serialized form does not need to include the complete class definition but requires that the class is available when needed.

Objects to be stored and retrieved frequently refer to other objects. Those other objects must be stored and retrieved at the same time to maintain the relationships between the objects. When an object is stored all of the objects that are reachable from that object are stored as well.

The goals for serializing Java objects are to:

Writing Objects to a Stream

Writing objects and primitives to a stream is a straight forward process. For example:


// Serialize today's date to a file.
	FileOutputStream f = new FileOutputStream("tmp");
	ObjectOutput  s  =  new  ObjectOutputStream(f);
	s.writeObject("Today");
	s.writeObject(new Date());
	s.flush();


First an OutputStream, in this case a FileOutputStream, is needed to receive the bytes. Then an ObjectOutputStream is created that writes to the OutputStream. Next, the string "Today" and a Date object are written to the stream. More generally, objects are written with the writeObject method and primitives are written to the stream with the methods of DataOutputStream.

The writeObject method serializes the specified object and traverses its references to other objects in the object graph recursively to create a complete serialized representation of the graph. Within a stream, the first reference to any object results in the object being serialized and the assignment of a handle to that object. Subsequent references to that object are encoded as the handle. Using object handles preserves sharing and circular references that occur naturally in object graphs. Subsequent references to an object use only the handle allowing a very compact representation.

The serialized encoding of an object consists of the object's class followed by the fields of each class starting with the highest superclass and ending with the actual class.

For an object to handle its own serialization it must implement the writeObject method. To maintain the integrity of the class, this method is private to the class and can only be called by the serialization at runtime. This method is invoked when the fields of its class are to be written; it should write the information needed to reinitialize the object when it is deserialized.

The default mechanism writes each non-static and non-transient field to the stream. Each field is written appropriately depending on its type. The fields are put in a canonical order so as to be insensitive to the order of declaration.

Objects of class Class are serialized as the name of the class and the fingerprint or hash of the interfaces, methods, and fields of the class. The name allows the class to be identified during deserialization and the hash of the class allows it to be verified against the class of the serialized object. All other normal Java classes are serialized by writing the encoding of its Class followed by its fields.

ObjectOutput streams can be extended to customize the information in the stream about classes or to replace objects to be serialized. Refer to the annotateClass and replaceObject method descriptions for details.

Reading Objects from a Stream

Reading an object from a stream is equally straight forward:


// Deserialize a string and date from a file.
	FileInputStream in = new FileInputStream("tmp");
	ObjectInputStream s = new ObjectInputStream(in);
	String today = (String)s.readObject();
	Date date = (Date)s.readObject();


First an InputStream, in this case a FileInputStream, is needed as the source stream. Then an ObjectInputStream is created that reads from the InputStream. Next, the string "Today" and a Date object are read from the stream. More generally, objects are read with the readObject method and primitives are read from the stream with the methods of DataInputStream.

The readObject method deserializes the specified object and traverses its references to other objects recursively to create the complete graph of objects serialized. Objects read from the stream are type checked as they are assigned.

Reading an object consists of the decoding of the object's class and the fields of each class starting with the highest superclass and ending with the actual class.

For an object to handle its own serialization it must implement the readObject method. To maintain the integrity of the class, this method is private to the class and can only be called by the serialization at runtime. This method is invoked when the fields of its class are to be read; it should read the information written by writeObject and make appropriate assignments to the object's fields. If the state of the object cannot be completely restored at the time the object is being read, a validation callback can be requested by calling the registerValidation method.

The default mechanism reads each non-static and non-transient field from the stream. Each field is read appropriately depending on its type. The fields are read in the same canonical order as when written so as to be insensitive to the order of declaration.

Objects of class Class are deserialized as the name of the class and fingerprint. A fingerprint is a hash of the interfaces, methods, and fields of the class. The resolveClass method is called to find the class by name and return its Class object. The hash is computed for the returned class and compared with the hash of the class serialized. Deserialization proceeds only if the class matches. This ensures that the structure of the stream matches the structure of the class. All other normal Java classes are deserialized by reading the encoding of its Class followed by its fields.

ObjectInput streams can be extended to utilize customized information in the stream about classes or to replace objects that have been deserialized. Refer to the resolveClass and resolveObject method descriptions for details.

Protecting Sensitive Information

Warning: The current implementation does not protect the private fields of objects, and object serialization, therefore, can be used to reveal private information that must be kept secret. At JDK 1.1, the implementation will require programmers to explicitly declare which classes can be serialized.

When developing a class that provides controlled access to resources, care must be taken to protect sensitive information and functions. During deserialization (by default) the private state of the object is restored. For example, a file descriptor contains a handle that provides access to an operating system resource. Being able to forge a file descriptor would allow some forms of illegal access, since restoring state is done from a stream. Therefore, the serializing runtime must take the conservative approach and not trust the stream to contain only valid representations of objects. To avoid compromising a class, the sensitive state of an object must not be restored from the stream or it must be reverified by the class. Several techniques are available to protect sensitive data in classes.

The easiest technique is to mark fields that contain sensitive data as "private transient". Transient and static fields are not serialized or deserialized. Simply marking the field will prevent the state from appearing in the stream and from being restored during deserialization. Since writing and reading (of private fields) cannot be superceded outside of the class, the class's transient fields are safe.

Particularly sensitive classes should not be serialized at all. To accomplish this, writeObject and readObject methods should be implemented to throw the NoAccessException. Throwing an exception will abort the entire serialization or deserialization process before any state from the class can be serialized or deserialized.

Some classes may find it beneficial to allow writing and reading but specifically handle and revalidate the state as it is deserialized. The class should implement writeObject and readObject methods to save and restore only the appropriate state. If access should be denied, throwing a NoAccessException will prevent further access.

Fingerprints of Classes

Within an object stream classes are represented by name and fingerprint. This fingerprint is used to verify that the class used to deserialize the object is the same as the class of the object serialized. The shallow signature or fingerprint of a class is computed by hashing the class name and access flags, the interfaces supported by the class, the field names, access flags and signatures, and the method names, access flags and signatures. Each set of interfaces, fields, and methods are put in a canonical order prior to hashing so that the order of declaration does not affect the hash. The shallow fingerprints of the class and all superclasses are rehashed to define the fingerprint of the class that is used in the stream.

The FingerPrintClass implementation also provides a total fingerprint that includes the fingerprints of each class referred to by the class as a parameter or return value.

The values and strings included in the hash are those of the Java Virtual Machine Specification that define classes, methods, and fields.

Secure Hash Stream

The SHAOutputStream provides an implementation of the National Institute of Standards and Technology (NIST) Secure Hash Algorithm (SHA). Its output is a 160-bit (20 byte) secure hash of the bytes written.



[TOC] [Prev] [Next]

rmi-comments@jse.East.Sun.COM
Copyright © 1996, Sun Microsystems, Inc. All rights reserved.