myspacee Posted February 26, 2010 Share Posted February 26, 2010 Hello to all,scripting about pdf conversion to txt. (ebook, epub, flash related)I've some pdf, extract text, clean a bit and use for other task.Search help to remove in txt file some text. Post some rows:#<font 10 ""> 3d hdtv: ready for primetime? #<font 7 ""> Tablets 2.0 Why 2010 Could (Finally) Be Their Year nexus oneHan #<font 16 ""> www.storemags.com & www.fantamag.com #<font 8 ""> MARCH 2010 MARCH 2010 vol. 29 no. 3 44CovER SToRY Though #<font 47 ""> www.storemags.com & www.fantamag.com #<font 26 "">#<font 24 ""> PC Magazine Digital Edition, #<font 8 ""> #<font 24 ""> www.storemags.com & www.fantamag.com #<font 21 ""> The iPad: A Must-Have? #<font 17 ""> The New York TimesI've 2 task to accomplish:- 1st remove text into ""> R4Nd0M #<font marker- 2nd remove text into #<font R4Nd0M ""> markerSee that markers can contain random text (can be number or letter), and can't find solution or function that can help me.Any hint is appreciated, thank you.m. Link to comment Share on other sites More sharing options...
myspacee Posted February 28, 2010 Author Share Posted February 28, 2010 up, try to understand how use StringRegExpReplace but have some problem... anyone can help a bit ? Thank you, m. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted February 28, 2010 Moderators Share Posted February 28, 2010 myspacee,I am confused about what you want to do - can you show us how you want the lines to look after the 2 deletions?If you want to get rid of the 2 font tags, then this pattern should work: StringRegExpReplace($sText, "(?U)(#<.*>)", "")(?U) = Inverse greediness, look for the shortest match (otherwise you lose the text between the tags as well!)(#<.*>) = Look for #<, followed by a number of characters, followed by >M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
myspacee Posted February 28, 2010 Author Share Posted February 28, 2010 Thank you for reply, I've #<font 10 ""> 3d hdtv: ready for primetime? #<font 7 ""> Tablets 2.0 Why 2010 Could (Finally) Be Their Year nexus oneHan #<font 16 ""> www.storemags.com & www.fantamag.com #<font 8 ""> MARCH 2010 MARCH 2010 vol. 29 no. 3 44CovER SToRY Though #<font 47 ""> www.storemags.com & www.fantamag.com #<font 26 "">#<font 24 ""> PC Magazine Digital Edition, #<font 8 ""> #<font 24 ""> www.storemags.com & www.fantamag.com #<font 21 ""> The iPad: A Must-Have? #<font 17 ""> The New York Times I want to obtain Tablets 2.0 Why 2010 Could (Finally) Be Their Year nexus oneHan MARCH 2010 MARCH 2010 vol. 29 no. 3 44CovER SToRY Though PC Magazine Digital Edition, The iPad: A Must-Have? The New York Times try few StringRegExpReplace combination but i'm not so smart as i think :] m. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted February 28, 2010 Moderators Share Posted February 28, 2010 myspacee,I can do it in 2 passes. First pass - get rid of the initial #<font tag> text #<font tag>:StringRegExpReplace($sText, "(?U)(?m:^)(#<.+>.+#<.+>)", "")(?U) = Inverse greediness, look for shortest match(?m:^) = Start at beginning of a line(#<.+>.+#<.+>) = Match the first 2 tags and any text between themThen a second pass to get rid of the remaining #<font tag>:StringRegExpReplace($sText, "(?U)(#<.+>)", "")(?U) = Inverse greediness(#<.+>) = match the remaining tagsI am sure a SRE guru will come along in a minute and laugh until his sides hurt - but that should get you started! M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
myspacee Posted February 28, 2010 Author Share Posted February 28, 2010 (edited) I think to use 2 step too, but not discard #<font tag> text #<font tag> : $chars = StringRegExpReplace($chars, "(?U)(> .*#<)", "") $chars = StringRegExpReplace($chars, "(?U)(#<.*>)", "") Is a bad idea ? [EDIT: after test is a very bad idea !] m. (now testing yours...) Edited February 28, 2010 by myspacee Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now