User Rating:  / 2
PoorBest 

So here is the new blog section. I hope it will be a little different, perhaps more flexible, allowing wider encompassing subjects (and ramblings) than the more tighly focused posts found in the articles and news updates section of the LetTheLightIn website.

So with my previous post/update about working on a small utility library for use with the static 32bit zlib library. KetilO's RadASM v3.0 source code had an updater plugin with source code, that downloaded a zip file, extracted it and created the folder paths along the way. I initially took this code as a starting point and began writing up my first function ZL_ExtractAll.

 

ZL_ExtractAll looped through the zip file with the unzGotoFirstFile & unzGotoNextFile functions. The zip file was read into a GlobalAlloc'd memory area and then written out to a file. I had a previously created a function called CreateDirectoryPath a long time ago, that created all the folders in that path, and used it with appending the zip filename to create my final output name that the internal data file (uncompressed in memory) was then written out to.

After a few days, i slowly added in other functionality that i had decided i needed/wanted in the library and also began toying with the idea of adding a ZL_ExtractFile function. I added some code for checking each zip entry to see if it is a folder or a file entry, and to handle files that are 0 byte lengths and for creating empty folders.

Additionally i began building up the logic for handling *.* and passing this off to ZL_ExtractAll and began thinking more about the problem of handling wildcards when comparing filenames. Sure i could do it with FindFirstFile & FindNextFile API, but as i was enumerating the virtualized zip file archive in memory it would need a custom function.

So before i ramble on more about what i did to solve this wildcard issue, here is a convienant point to show some psuedo code used to represent roughly the outline for the ZL_ExtractAll function.

Invoke unzOpen, lpszZipFile                                 ; open zip file
mov hZipFile, eax                                           ; save zip file handle
Invoke unzGotoFirstFile, hZipFile                           ; goto first file in zip archive
.WHILE !eax
    Invoke unzGetCurrentFileInfo, hZipFile, ...               ; Get info about internal filename etc
    Invoke ZL_CreateDirectoryPath, Addr lpszFullPathFileName  ; create each folder in path
    Invoke unzReadFile, hZipFile, ...                         ; uncompress data to memory
    Invoke CreateFile, hFile, ...                             ; create output file
    Invoke WriteFile, hFile, ...                              ; write uncompressed data from memory to output file
    Invoke CloseHandle, hFile                                 ; close output file
    Invoke unzCloseCurrentFile, hZipFile                      ; close internal file
    Invoke unzGotoNextFile, hZipFile                          ; goto next internal file in zip file
.ENDW
Invoke unzClose, hZipFile                                   ; close zip file

I decided to roll up the sleeves to handle the wildcard file match/comparison. I ended up taking most of the code in the ZL_ExtractAll over to the ZL_ExtractFile function before realising i could be using just one function to do all - but more on that later.

So the function ZL_MatchFile took two params, the filespec passed into the call (filename or wildcards: ????.TXT, *.DOC, ?EST.EXE, IMAGE.*, or *.* for example) and the 2nd param is the internal filename as stored in the zip file.

First things first, uppercase both of them and copy into seperate local variables, strip the path from the internal filename if one exists (as zips can store relative paths, for ease of comparison i just wanted the filename only) and then start the actual comparison loop.

The loop compares one byte of each string, if they match, all well and good, increment the string pointers and loop again, if not, and the byte in the filespec isnt a ?, * or null, we exit out and return false. If we reach end of the strings and they matched then return with true. With the ? wildcard, we just ignore the bytes, increment pointers and loop again to check the next bytes. For the * i figured that what i needed to do was another loop to match the next byte after the *, if there is a match (say with *.txt and tester.txt as example, the next matching char would be the '.') we continue in the main loop comparing bytes as normal for each string. If the next byte afte * is null then we exit with true, as the filespec will realistically be a '*.*' or 'somefilename.*' or whatever. Took me a little while to figure out the logic for the loops, but got there in the end. Here is the heart of the code loop from the ZL_MatchFile function:

mov matchflag, FALSE
mov position, 0
mov eax, 0
.WHILE eax <= lenInternalFileName
    movzx eax, byte ptr [esi] ; contains a byte of the filename string
    movzx ebx, byte ptr [edi] ; contains a byte of the filespec string
    .IF bl == al ; both match, so ok, set flag, loop to check next chars
        inc position
        inc esi
        inc edi
        mov matchflag, TRUE
    .ELSE ; dont match, so check in case a wildcard is being used in the filespec, byte in bl
        .IF bl == '?' ; yep found a ? wildcard, so ok, set flag, loop to check next chars
            inc position
            inc esi
            inc edi
            mov matchflag, TRUE
        .ELSEIF bl == '*' ; found a * wildcard, so we enter another loop to find the next byte in the filespec string, in the filename string - hopefully!
            inc position
            inc esi
            inc edi
            movzx edx, byte ptr [edi] ; next character after asterisk to match up to
            mov matchchar, dl ; save this in a variable for ease of use
            .IF matchchar == 0 ; null byte at end of string we assume rest of string is matched so just set matchflag to true and break out of loop
                mov matchflag, TRUE
                ret
            .ENDIF
            ; else we check for next char to match upto and start the loop again to check from there.
            mov matchflag, FALSE
            mov eax, position
            .WHILE eax <= lenInternalFileName
                movzx eax, byte ptr [esi]
                .IF al == matchchar ; did we find out next char in the filename string, yes? ok then set flag, and break out to main loop to check next chars
                    mov matchflag, TRUE
                    .BREAK
                .ENDIF
                inc position
               inc esi
               mov eax, position
            .ENDW
        .ELSEIF al == 0h || bl == 0h ; just in case we hit nulls somewhere - dont want unlimited looping to compare bytes that are past our strings
            mov matchflag, FALSE
            .BREAK
        .ELSE ; no match on chars above, and not a ? or * wildcard (or null), so we just exit, as filename doesnt match filespec
            mov matchflag, FALSE
            .BREAK
        .ENDIF
    .ENDIF
    mov eax, position ; used to verify we havent moved past length of filename string, and to loop again if we havnt
.ENDW
mov eax, matchflag
ret

Its at this point i realised that i could handle the '*.*' in the ZL_MatchFile function at the start - just return true and exit, so i didnt need to branch off and call ZL_ExtractAll at all, earlier on in the ZL_ExtractFile function. So the ZL_ExtractAll function was retired. Over the next day or so of testing i also thought about the old pkzip/pkunzip programs and the command line switches used with them. How about adding and lpszExcludeFileSpec param to ZL_ExtractFile, and what about adding an lpszOptions param to handle listing the files instead of extracting? I could also implement a few other functions if i used the options param to allow extract with freshen files (-f) or newer files only (-n) type sub commands.

So ive delayed the library for the moment until i thought through implementing these new features. Additionally i want a callback function that other users can set to be called as the extraction takes place, so information like bytes extracted, total bytes to extract, percentage extracted can be passed back to the user callback function. I remember seeing the cool looking spinners made up of ascii chars |/-\- and % numbers updating on the console years ago with various archive programs and figured with a callback function that would be possible, if someone wanted to add that in. So i think i will build up a small command line program as an example of usage of the library. Also ive been toying with the idea of changing the ZL_ExtractFile function to ZL_UnZip instead. So the library name might change as well.

Thats all for the moment folks, stay tuned for updates etc. Thanks for reading this far ;)

Update: I decided to release the existing version for the moment. ZlibExtract v1.0 is available from its own project page here

Update 2: First version uploaded had a bug in it. I've hopefully fixed it and uploaded the newer version now. If you downloaded it already, please download it again. Thanks.

Category: Blog

Login Form