Archives on non-seekable streams

Archive formats such as zip

In general, handling archives on non-seekable streams is done in the same way as for seekable streams, with a few caveats.

The main limitation is that accessing entries randomly using OpenEntry() is not possible, the entries can only be accessed sequentially in the order they are stored within the archive.

For each archive type, there will also be other limitations which will depend on the order the entries' meta-data is stored within the archive. These are not too difficult to deal with, and are outlined below.

PutNextEntry and the entry size

When writing archives, some archive formats store the entry size before the entry's data (tar has this limitation, zip doesn't). In this case the entry's size must be passed to PutNextEntry() or an error occurs.

This is only an issue on non-seekable streams, since otherwise the archive output stream can seek back and fix up the header once the size of the entry is known.

For generic programming, one way to handle this is to supply the size whenever it is known, and rely on the error message from the output stream when the operation is not supported.

GetNextEntry and the weak reference mechanism

Some archive formats do not store all an entry's meta-data before the entry's data (zip is an example). In this case, when reading from a non-seekable stream, GetNextEntry() can only return a partially populated wxArchiveEntry object - not all the fields are set.

The input stream then keeps a weak reference to the entry object and updates it when more meta-data becomes available. A weak reference being one that does not prevent you from deleting the wxArchiveEntry object - the input stream only attempts to update it if it is still around.

The documentation for each archive entry type gives the details of what meta-data becomes available and when. For generic programming, when the worst case must be assumed, you can rely on all the fields of wxArchiveEntry being fully populated when GetNextEntry() returns, with the the following exceptions:

GetSize() Guaranteed to be available after the entry has been read to Eof(), or CloseEntry() has been called
IsReadOnly() Guaranteed to be available after the end of the archive has been reached, i.e. after GetNextEntry() returns NULL and Eof() is true

This mechanism allows CopyEntry() to always fully preserve entries' meta-data. No matter what order order the meta-data occurs within the archive, the input stream will always have read it before the output stream must write it.

wxArchiveNotifier

Notifier objects can be used to get a notification whenever an input stream updates a wxArchiveEntry object's data via the weak reference mechanism.

Consider the following code which renames an entry in an archive. This is the usual way to modify an entry's meta-data, simply set the required field before writing it with CopyEntry():

    wxArchiveInputStreamPtr  arc(factory->NewStream(in));
    wxArchiveOutputStreamPtr outarc(factory->NewStream(out));
    wxArchiveEntryPtr        entry;

    outarc->CopyArchiveMetaData(*arc);

    while (entry.reset(arc->GetNextEntry()), entry.get() != NULL) {
        if (entry->GetName() == from)
            entry->SetName(to);
        if (!outarc->CopyEntry(entry.release(), *arc))
            break;
    }

    bool success = arc->Eof() && outarc->Close();

However, for non-seekable streams, this technique cannot be used for fields such as IsReadOnly(), which are not necessarily set when GetNextEntry() returns. In this case a wxArchiveNotifier can be used:

class MyNotifier : public wxArchiveNotifier
{
public:
    void OnEntryUpdated(wxArchiveEntry& entry) { entry.SetIsReadOnly(false); }
};

The meta-data changes are done in your notifier's OnEntryUpdated() method, then SetNotifier() is called before CopyEntry():

    wxArchiveInputStreamPtr  arc(factory->NewStream(in));
    wxArchiveOutputStreamPtr outarc(factory->NewStream(out));
    wxArchiveEntryPtr        entry;
    MyNotifier               notifier;

    outarc->CopyArchiveMetaData(*arc);

    while (entry.reset(arc->GetNextEntry()), entry.get() != NULL) {
        entry->SetNotifier(notifier);
        if (!outarc->CopyEntry(entry.release(), *arc))
            break;
    }

    bool success = arc->Eof() && outarc->Close();

SetNotifier() calls OnEntryUpdated() immediately, then the input stream calls it again whenever it sets more fields in the entry. Since OnEntryUpdated() will be called at least once, this technique always works even when it is not strictly necessary to use it. For example, changing the entry name can be done this way too and it works on seekable streams as well as non-seekable.

ymasuda 平成17年11月19日