Jump to content

Combining multiple arrays...round 2


Cravin
 Share

Go to solution Solved by kylomas,

Recommended Posts

Replacing the >regexp above with _StringToHex() (see next post) will make this more universal (not only for file names) and eliminates all chance of mistaken identity. Also including StringLower() adds a little overhead, but it is required for case insensitive matching. Remove it and it will become case sensitive, which isn't what you want in this circumstance  - at least I don't think so.

;

$tname = _StringToHex(StringLower($array[$1]))

;

There is still a chance that a global variable with the same name might already exist, although it is unlikely the name will only contain an even number of hex characters (this depends on how you name your variables). I don't currently have a solution for this. If you don't use any globals, or always include at least one non hexadecimal character in the name (for example underscore $_ABCD), the problem will never occur.

Another problem that could occur (depending on circumstance) is hitting a limit for variable name length. Here it shouldn't be an issue: because the maximum length of a file name is 255 characters, and that is less than half any variable name length limit. Converting to hex is a reliable method but it also doubles the string length.


Aaargh - Problems with unicode in file names.


This is getting complicated.

Edited by czardas
Link to comment
Share on other sites

Final attempt at modifying kylomas' code using StringToBinary(): Now it should work fine with unicode file names. Please read the above post. In theory everything mentioned there applies here also (some of the numbers will be different).

;

func _process_array($array)

    local static $aFinal_idx = 0
    local $tname, $tFiles = $array[0]
    $ttFiles += $array[0]

    redim $aFinal[ ubound($aFinal) + $tFiles ]

    for $1 = 1 to ubound($array) - 1
        ;$tname = stringregexpreplace($array[$1],'[\.\:\\ ]','_')

        $tname = StringToBinary(StringLower($array[$1]), 2) ; This replaces the line above
        if isdeclared( eval('s' & $tname) ) = 0 then
            assign('s' & $tname,1,2)
            $aFinal[$aFinal_idx] = $array[$1]
            $aFinal_idx += 1
            $tUniqueFiles += 1
        Else
            $tDups += 1
        endif
    next

endfunc

;

I learned something new. :)

Edited by czardas
Link to comment
Share on other sites

Wow, this is some fantastic work, fellas.  Thanks for coming together and helping me solve this problem.  This has GREATLY reduced the amount of time it takes to remove/exclude duplicates from my arrays.  I have all I need to move forward at this point.

 

I just tested it on some large arrays and it's amazing. :)

I spent most of yesterday on my example. Oh well!

 

Additionally, thanks to you Czardas for spending as much time on this as you did :).  This community is great!

Link to comment
Share on other sites

It has been a learning experience for me too, so I'm glad I made the effort. Did you try replacing the code above. It will be slightly slower but prevents problems mentioned. There is always a big chance that some of your file names contain underscore already, not to mention unicode. Perhaps kylomas has further suggestions.

Link to comment
Share on other sites

Sorry to jump in, but I can't run code right now and have hard time catching up the thread details. What is the issue with Unicode filenames exactly?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Sorry to jump in, but I can't run code right now and have hard time catching up the thread details. What is the issue with Unicode filenames exactly?

 

I wasn't aware that variable names can be declared which use unicode characters in AutoIt. The code posted by kylomas assigns a variable with the name of the file.

Edited by czardas
Link to comment
Share on other sites

@czardas

The only other option that I considered last night was to replace each of the invalid chars with it's own signature in case I needed to retain file name integrity to translate the name back to it's origional.  Like this...

$tname = stringregexpreplace($array[$1],'\.','_-1-_')
$tname = stringregexpreplace($tname,'\:','_-2-_')
$tname = stringregexpreplace($tname,'\\','_-3-_')

Still not fullproof.  I think I like your solution better as I did not even consider non-English chars.   (English-centric hubris, again).

Dr.'s appointment in an hour so no time to work on this now...

kylomas

edit:  File names with underscores should not be a problem, unless I'm missing something.

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

It has been a learning experience for me too, so I'm glad I made the effort. Did you try replacing the code above. It will be slightly slower but prevents problems mentioned. There is always a big chance that some of your file names contain underscore already, not to mention unicode. Perhaps kylomas has further suggestions.

 

Interestingly I ran this with the code you changed vs the original that kylomas wrote and what I discovered is that your method uses more memory, but it actually seems to have reduced the time it takes to process by about 15~20% on a final array with about 170k elements.

Link to comment
Share on other sites

Well it's all very interesting to me. It's hard to pin down exactly the best method because results change with varying numbers of duplicates. The conversion to binary idea produces longer duplicate variable names. I expected it to run a bit slower because of the conversion. I'm not too familiar with some of the methods.

@kylomas

MsgBox(0, "False Positive", _
StringRegExpReplace("123_456.789",'[\.\:\\ ]','_') = _
StringRegExpReplace("123.456.789",'[\.\:\\ ]','_'))
Edited by czardas
Link to comment
Share on other sites

kylomas I just assumed a file name can be anything that I can create on my own computer such as: s_-1-_.txt

I just tested assigning non-word character variable names and they definately do not work. Unicode does not work without conversion. Underscore is a problem.

Edited by czardas
Link to comment
Share on other sites

Why would converting the Unicode string into its hex representation of UTF-<something like 8 or 16> not work and cover all bases? This would be easy to convert back and would alleviate any kind of limitation (or I'm misunderstanding something big.)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Why would converting the Unicode string into its hex representation of UTF-<something like 8 or 16> not work and cover all bases? This would be easy to convert back and would alleviate any kind of limitation (or I'm misunderstanding something big.)

;

That's exactly what I'm suggesting. Maybe there's a better way to do it than using StringToBnary(). First it has to be converted to la single case - lowercase or uppercase. Then it's converted to a binary representation.

;

$tname = StringToBinary(StringLower($array[$1]), 2) ; This replaces the line above

;

Where are you jchd, that you can't run any code? I have visions of you half way up a mountain or someting like that. :blink:

Edited by czardas
Link to comment
Share on other sites

AutoIt strings are already UTF16

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Okay but try assigning a variable called $_祲礳祳.txt. That doesn't work, so it needs converting to something

else. For example

$_0x797279337973002E007400780074 . . . Big Endian OR ==>

$_0x7279337973792E00740078007400 . . . Little Endian

oops ran the wrong bit of code - fixed. That's what I get using BinaryToString. I tested that the variables exist and can be assigned a value which can also be read.

Edited by czardas
Link to comment
Share on other sites

What I'm saying is that what you do is "take binary representation of the string as UTF16". In this case, no other conversion takes place than string --> binary.

(was on friend's tiny tablet)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

That"s the right way to do it.

I was just nitpicking about the "Then it's converted to UTF-16." part.  I know you know, but other readers might get confused by the wording.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I'm the culprit by jumping into the thread without taking the pain to read it in full!

(Back to bed for me now)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...