If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > JAVA > string compression

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-12-09, 06:49
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
string compression

I have an applet which requires a moderately large hashmap ( 1/2 mb ). The hashmap has a key that's a small string and returns an integer. The sting looks something like "k23k32|m15", "m1|k32", "m3|m7" etc while the number could be anything.

I was thinking of putting the whole hashmap into a string and then compressing the string down - putting that in the applet and then rebuilding the hashmap when the applet starts up. I'm pretty sure the strings can be compressed right down but wasn't keen on writing the compression algorithm myself - is there a simple method I can use to do this? any idea what level of compression it will give? is there anything else I could do?
Reply With Quote
  #2 (permalink)  
Old 12-13-09, 17:22
sco08y sco08y is offline
Registered User
 
Join Date: Oct 2002
Location: Baghdad, Iraq
Posts: 697
Now, you can get a binary form of a hashmap by serializing it. See the java.io.Serializable interface. Serializing will take the whole shebang and turn it into binary form.

You need to create an java.io.ObjectOutputStream, and then tell the hashmap to write itself out to it.

One way your applet can retrieve this file is if it downloads it separately.

If that's the case, you'll want that to pipe the ObjectOutput into a java.util.zip.GZipOutputStream. (Regular zip files (and jar files, which are the same thing) have a little extra baggage for archiving multiple files. A gzip file is designed to handle just one file.) That's not the best compression possible, but it works and is ubiquitous. Maybe something like LZMA would get you better compression, but you'd have to add more class files to your code.

I think a java.io.FilterOutputStream is used to layer streams on top of each other. And you'll need a final layer to hook it into file output, java.io.FileOutputStream or possibly one of the writer classes. It's been a while since I've done Java I/O, it's confusing at first but once you figure out what does what, it all fits together quite nicely and is very flexible.

Reading it in is just a matter of using input where you used output. And you'll be using an HTTP connection to read in the file.

UNLESS... you decide to bundle it in your applet's jar file. Then it's already compressed by the jar (really pkzip) compression. This is a little confusing, but basically your applet's Class object has a method "getResourceAsStream". You call that with the resource name and, since your applet's jar has already downloaded, it immediately returns the InputStream with your data. In that case, you don't need to compress the data, just package it with your class files.

Pros and cons: loading a separate resource is certainly less reliable. You'll have to handle the case where the connection fails. OTOH, your applet can't do anything until the entire jar file has loaded, so the user might *think* it has failed, which is arguably just as bad, because you can't provide a progress bar.

One last thought regarding compression:

A hashed structure randomizes the elements which introduces entropy that somewhat defeats compression. As a rule of thumb, sorted structures compress better.

The additional entropy might just not make a big difference, but it's worth testing whether reading the elements of the hash into an ArrayList, sorting them and serializing that won't give you better compression. If it's not substantial, the extra time you incur repackaging the data when you load it may not be worth it.
Reply With Quote
  #3 (permalink)  
Old 12-14-09, 04:49
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Thanks for your in depth post sco08y - I'll have to read through that very slowly and just look up all the technical words in the manual. I was hoping for something like String my_string.compress() but I guess that isn't going to happen. Shame
Reply With Quote
  #4 (permalink)  
Old 12-15-09, 15:08
scooby_at_work scooby_at_work is offline
Registered User
 
Join Date: Sep 2009
Posts: 44
Well, the simplest case is pretty simple. That's where you don't actually compress it any more than stashing it in your JAR, which is by far the best deployment strategy.

To create the file, you need to write a short app to make it in memory and write it out:

Code:
import java.io.ObjectOutputStream;
import java.io.FileOutputStream;

public class YourClass {
public int main(String[] argv) {
    Map map = new HashMap();
    ... populate map ...
    String path = "foo.map"; // or whatever extension. This will expect the .map file to be in
     // the same directory as the .class file for this class. See the docs if you want it elsewhere.
    FileOutputStream fos = new FileOutputStream(path);
    ObjectOutputStream oos = new ObjectOutputStream(fos);
    oos.write(map);
    oos.close():
    fos.close();
    // The FileOutputStream is closed by garbage collection.
}
}
So, then you run that and put your new file wherever you want it in with your class files.

You can also keep it with your .class files uncompressed while testing. For your applet to read it:

Code:
class AppletClass extends Applet {
private Map fooMap;

public static void iForgetWhatTheAppletInitializationMethodIsCalled() {
    InputStream is = this.class.getResourceAsStream("foo.map");
    ObjectInputStream ois = new ObjectInputStream(is);
    Object r = ois.readObject();
    this.fooMap = (Map) r; // Casts your map to Map
}
}
Since your JAR file is compressed, the compression is handled by the 'jar' utility. Decompression is handled by Java.

One note on doing things the Java way: Note that the attribute you're setting is of type Map, not HashMap. Unless you specifically need some feature of HashMap over what the interface Map provides, this allows you to swap out one map for another. Later, if you decided you wanted a TreeMap (since that's sorted), anything that was designed only to expect a Map will Just Work if it's passed a TreeMap.

Last edited by scooby_at_work; 12-15-09 at 15:12.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On