Word Documents
Matt Lawrence
matt.lawrence at virgin.net
Fri Dec 9 13:41:52 GMT 2005
Paul Makepeace wrote:
> Sam Smith wrote:
>
>> On Wed, 7 Dec 2005, Steve Mynott wrote:
>>
>>> On Tue, Dec 06, 2005 at 10:49:57PM +0000, Sam Smith typed:
>>>
>>>> Does anyone know if there's a way to tell, from perl (on
>>>> Unix) whether a word document has track changes turned on?
>>>
>>>
>>> Why don't you save a document without track changes and then with
>>> track changes on and try a binary compare to work out the difference?
>>>
>>> (Although admittedly modern versions of Word documents always seem to
>>> think they have been changed after opening and you may find several
>>> binary changes).
>>
>>
>> I tried that, it didn't help.
>>
>> I was hoping that it would be something like read byte X and
>> jump to the offset stored in it. It isn't. Which is no
>> surprise.
>
>
> The reason it's unlikely to work is that Word's binary "format" is
> essentially a serialized blob of the in-memory representation of the
> document. (This, IIRC, led to some interesting side-effects like users
> having access to the undo history of other people's documents.)
>
> Depending how much time you have you could spelunk the sources or ask on
> the developer lists of OpenOffice, Abiword, or Antiword.
OLE::Storage and OLE::Storage_Lite can help you access these .doc files,
The file format is described in a zipped html file from here:
http://wvware.sourceforge.net/word97.zip
I've had limited some success getting data out of this format in the
past. Although I haven't (yet) managed to extract that particular data.
I started working on a module to access data in the Word format, but
it's fallen on to the back burner and is far from ready for public
consumption. I'd be happy to share what I have so far if you think it'll
help.
Matt
More information about the london.pm
mailing list