Results 1 to 4 of 4
  1. #1
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527

    Unanswered: string compression

    I have an applet which requires a moderately large hashmap ( 1/2 mb ). The hashmap has a key that's a small string and returns an integer. The sting looks something like "k23k32|m15", "m1|k32", "m3|m7" etc while the number could be anything.

    I was thinking of putting the whole hashmap into a string and then compressing the string down - putting that in the applet and then rebuilding the hashmap when the applet starts up. I'm pretty sure the strings can be compressed right down but wasn't keen on writing the compression algorithm myself - is there a simple method I can use to do this? any idea what level of compression it will give? is there anything else I could do?

  2. #2
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Now, you can get a binary form of a hashmap by serializing it. See the java.io.Serializable interface. Serializing will take the whole shebang and turn it into binary form.

    You need to create an java.io.ObjectOutputStream, and then tell the hashmap to write itself out to it.

    One way your applet can retrieve this file is if it downloads it separately.

    If that's the case, you'll want that to pipe the ObjectOutput into a java.util.zip.GZipOutputStream. (Regular zip files (and jar files, which are the same thing) have a little extra baggage for archiving multiple files. A gzip file is designed to handle just one file.) That's not the best compression possible, but it works and is ubiquitous. Maybe something like LZMA would get you better compression, but you'd have to add more class files to your code.

    I think a java.io.FilterOutputStream is used to layer streams on top of each other. And you'll need a final layer to hook it into file output, java.io.FileOutputStream or possibly one of the writer classes. It's been a while since I've done Java I/O, it's confusing at first but once you figure out what does what, it all fits together quite nicely and is very flexible.

    Reading it in is just a matter of using input where you used output. And you'll be using an HTTP connection to read in the file.

    UNLESS... you decide to bundle it in your applet's jar file. Then it's already compressed by the jar (really pkzip) compression. This is a little confusing, but basically your applet's Class object has a method "getResourceAsStream". You call that with the resource name and, since your applet's jar has already downloaded, it immediately returns the InputStream with your data. In that case, you don't need to compress the data, just package it with your class files.

    Pros and cons: loading a separate resource is certainly less reliable. You'll have to handle the case where the connection fails. OTOH, your applet can't do anything until the entire jar file has loaded, so the user might *think* it has failed, which is arguably just as bad, because you can't provide a progress bar.

    One last thought regarding compression:

    A hashed structure randomizes the elements which introduces entropy that somewhat defeats compression. As a rule of thumb, sorted structures compress better.

    The additional entropy might just not make a big difference, but it's worth testing whether reading the elements of the hash into an ArrayList, sorting them and serializing that won't give you better compression. If it's not substantial, the extra time you incur repackaging the data when you load it may not be worth it.

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Thanks for your in depth post sco08y - I'll have to read through that very slowly and just look up all the technical words in the manual. I was hoping for something like String my_string.compress() but I guess that isn't going to happen. Shame

  4. #4
    Join Date
    Sep 2009
    Posts
    44
    Well, the simplest case is pretty simple. That's where you don't actually compress it any more than stashing it in your JAR, which is by far the best deployment strategy.

    To create the file, you need to write a short app to make it in memory and write it out:

    Code:
    import java.io.ObjectOutputStream;
    import java.io.FileOutputStream;
    
    public class YourClass {
    public int main(String[] argv) {
        Map map = new HashMap();
        ... populate map ...
        String path = "foo.map"; // or whatever extension. This will expect the .map file to be in
         // the same directory as the .class file for this class. See the docs if you want it elsewhere.
        FileOutputStream fos = new FileOutputStream(path);
        ObjectOutputStream oos = new ObjectOutputStream(fos);
        oos.write(map);
        oos.close():
        fos.close();
        // The FileOutputStream is closed by garbage collection.
    }
    }
    So, then you run that and put your new file wherever you want it in with your class files.

    You can also keep it with your .class files uncompressed while testing. For your applet to read it:

    Code:
    class AppletClass extends Applet {
    private Map fooMap;
    
    public static void iForgetWhatTheAppletInitializationMethodIsCalled() {
        InputStream is = this.class.getResourceAsStream("foo.map");
        ObjectInputStream ois = new ObjectInputStream(is);
        Object r = ois.readObject();
        this.fooMap = (Map) r; // Casts your map to Map
    }
    }
    Since your JAR file is compressed, the compression is handled by the 'jar' utility. Decompression is handled by Java.

    One note on doing things the Java way: Note that the attribute you're setting is of type Map, not HashMap. Unless you specifically need some feature of HashMap over what the interface Map provides, this allows you to swap out one map for another. Later, if you decided you wanted a TreeMap (since that's sorted), anything that was designed only to expect a Map will Just Work if it's passed a TreeMap.
    Last edited by scooby_at_work; 12-15-09 at 16:12.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •