ah21 Posted June 28, 2012 Share Posted June 28, 2012 Hi, My first post here on the forum, so first of all hello! I'm trying to build a script that will basically take text and wrap HTML around it. Reason for this is I get lots of Word documents handed to me to then build into web pages. Most of the content is simply headings, paragaphs of text and unordered lists. Therefore I figured if I could build a script to do this tedious task for me, or at least the most part so I can check it myself afterwards, it would save time and effort. This is my first script, so I've given it a go but doesn't appear to work. The idea is for the text to be copied from MS Word to my clipboard which I'll do myself, then by running the program it converts symbols such as £ to £l; as well as wrap <p></p> tags around paragraphs and <ul></ul> around unordered lists. Then once the script has done this, it will put the outcome back into the clipboard so I can just Ctrl + V into my text editor. Not sure how to do the unordered list part though. Below is my attempt, any help would be appreciated. #cs gets clipboard data #ce $clipboard = ClipGet() #cs checks if data is already on cipboard #ce If FileExists($clipboard) Then $text = FileRead($clipboard) Else $text = $clipboard EndIf #cs cuts two line breaks down to one add paragraph tags at start and end of paragraph replace symbols with html codes x 5 #ce $html = StringRegExpReplace($text, "(\r\n){2,}", "\1") $html = StringRegExpReplace($text, "(.+)(\r\n|\z)", "<p>\1</p>\2") $html = StringReplace($text, "&", "&") $html = StringReplace($text, "£", "£") $html = StringReplace($text, "€", "€") $html = StringReplace($text, "“", "“") $html = StringReplace($text, "”", "”") #cs writes text to clipboard displays a message #ce ClipPut ($html) MsgBox(0, "Text to HTML converter", "Text converted to HTML and copied to clipboard") Exit Link to comment Share on other sites More sharing options...
Airwolf Posted June 28, 2012 Share Posted June 28, 2012 Not sure why, but this line - $html = StringReplace($text, "£", "£") - is the culprit. AutoIT does not like that symbol. It is breaking the conversion process. Certifications: A+, Network+, Security+, Linux+, LPIC-1, MCSA | Languages: AutoIt, C, SQL, .NETBooks: AutoIt v3: Your Quick Guide - $7.99 - O'Reilly Media - September 2007-------->[u]AutoIt v3 Development - newbie to g33k[/u] - Coming Soon - Fate Publishing - Spring 2013UDF Libraries: SkypeCOM UDF Library | ADUC Computers OU Cleanup | Find PixelChecksumExamples: Skype COM Examples - Skype4COMLib Examples converted from VBS to AutoIt Link to comment Share on other sites More sharing options...
BrewManNH Posted June 28, 2012 Share Posted June 28, 2012 Doesn't Word already have the ability to save a document as an HTML document? If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator Link to comment Share on other sites More sharing options...
jdelaney Posted June 28, 2012 Share Posted June 28, 2012 (edited) $html = StringReplace($text, chr(163), "£") Edited June 28, 2012 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
ah21 Posted June 28, 2012 Author Share Posted June 28, 2012 Thanks for the help guys. I've tried amending this part, but my code doesn't actually appear to be doing anything. Am I passing through the correct variables? Any of you actually ran the code and got it working? BrewManNH - yes Word does have the option to save the document as a HTML document, however it puts a ridiculous amount of inline styles in the code. I'm just wanting basic simple HTML stuff. Link to comment Share on other sites More sharing options...
ah21 Posted June 29, 2012 Author Share Posted June 29, 2012 Figured it out, appeared to be this line which was stopping the code from running..... $html = StringRegExpReplace($text, "(rn){2,}", "1") Anyone know how to get bulleted list to have HTML wrap around them? I made a start but no sure how to do this. I figured maybe doing a bullet point dot, but then didnt know how to write the code to say whatever text after the bullet point, any ideas? This will kinda make the <li> tags, but don't know how to make the <ul> tags to recognise the start and end of the unordered list. $html = StringRegExpReplace($text, "(• +)( )(rn|z)", "<li>1</li>2") Any help would really be appreciated. Link to comment Share on other sites More sharing options...
jdelaney Posted June 29, 2012 Share Posted June 29, 2012 (edited) try this: $Text = "• test" $html = StringRegExpReplace($Text, "(?:•[s]{0,})(.*)(?:rn|z)", "<li>1</li>") oh, edited to include the '+': $html = StringRegExpReplace($Text, "(?:[•|+][s]{0,})(.*)(?:rn|z)", "<li>1</li>") last one, i think: $html = StringRegExpReplace($sText, "(?:[•|+][s]{0,})(.*)(rn.*)", "<li>1</li>2") Edited June 29, 2012 by jdelaney IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
ah21 Posted June 29, 2012 Author Share Posted June 29, 2012 Thanks jdelaney! The 3rd bit of code you provided appears to only wrap the <li> tags on every odd numbered list item. So on the 1st, 3rd, 5th etc it was wrapping the tags, however on even numbers did not wrap the HTML around it. The second bit of code works well, however put them all on one line, so just needs a carridge return after each closing </li>. I tried the following, but didn't work. Any ideas what to try? $html = StringRegExpReplace($Text, "(?:[•|+][s]{0,})(.*)(?:rn|z)", "<li>1</li>r") I also spotted then what I use more than one StringRegExpReplace in my code, it seems to only make the last one in my code work. So for example when I added the code in the last post, my script when ran appeard to ignore adding <p> tags to my text. Anyone know how to fix this? Link to comment Share on other sites More sharing options...
jdelaney Posted June 29, 2012 Share Posted June 29, 2012 whoops, yeah, that was wrong code...try this one to global replace $html = StringRegExpReplace($sText, "(?:[•|+][s]{0,})(.*)(rn|z)", "<li>1</li>2", 0) IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
ah21 Posted June 29, 2012 Author Share Posted June 29, 2012 Thanks! This now works with wrapping <li> tags around them Not quite sure why only one of my StringRegExpReplace's only work. My code runs through the following, does all of them except the second to last line which is adding the <p> tags. If I swap the last line of code with the second to last line then it makes the other bit of code work. So bascially the StringRegExpReplace which is last in my code works, but the one previous does not. No idea why this is happening! Is it becuase if need to be in a switch statement? Don't know if you can do this with AutoIt Script, however in PHP for example you can do.... switch (n) { case label1: code to be executed if n=label1; break; case label2: code to be executed if n=label2; break; default: code to be executed if n is different from both label1 and label2; } My code being below.... $html = StringReplace($text, "&", "&") $html = StringReplace($text, "“", "“") $html = StringReplace($text, "”", "”") $html = StringReplace($text, "€", "€") $html = StringReplace($text, "£", "£") $html = StringReplace($text, " ", " ") $html = StringRegExpReplace($text, "(.+)(rn|z)", "<p>1</p>2") $html = StringRegExpReplace($text, "(?:[•|+][s]{0,})(.*)(rn|z)", "<li>1</li>2", 0) Link to comment Share on other sites More sharing options...
jdelaney Posted June 29, 2012 Share Posted June 29, 2012 what is the second to last stringregexp trying to get to? so there is some char(), followed by a + immediatly followed by a CRLF/end? (that's how it's written, currently) please explain it IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window. Link to comment Share on other sites More sharing options...
ah21 Posted June 30, 2012 Author Share Posted June 30, 2012 what is the second to last stringregexp trying to get to? so there is some char(), followed by a + immediatly followed by a CRLF/end? (that's how it's written, currently)please explain itIt is finding a full stop followed by a carridge return, which if found puts a </p> straight after the full stop and then putting an opening <p> at the start of the sentance. If that makes sense? It works when ran as the only StringRegExpReplace in the code, but must have an error somewhere or something in my code I'm guessing causing the two StringRegExpReplace to conflict. Not sure why, I assume you should be able to have more than one StringRegExpReplace in your code? I can show you all the code I have if that would be easier to see what is happening?Thanks for all the help on this, really appreciate it. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now