Sign in to follow this  
Followers 0
myspacee

Delete text in a txt file

6 posts in this topic

Hello to all,

scripting about pdf conversion to txt. (ebook, epub, flash related)

I've some pdf, extract text, clean a bit and use for other task.

Search help to remove in txt file some text. Post some rows:

#<font 10 ""> 3d hdtv: ready for primetime? #<font 7 ""> Tablets 2.0 Why 2010 Could (Finally) Be Their Year nexus oneHan
#<font 16 ""> www.storemags.com & www.fantamag.com #<font 8 ""> MARCH 2010 MARCH 2010 vol. 29 no. 3 44CovER SToRY Though
#<font 47 ""> www.storemags.com & www.fantamag.com #<font 26 "">#<font 24 ""> PC Magazine Digital Edition, #<font 8 ""> 
#<font 24 ""> www.storemags.com & www.fantamag.com #<font 21 ""> The iPad: A Must-Have? #<font 17 ""> The New York Times

I've 2 task to accomplish:

- 1st remove text into ""> R4Nd0M #<font marker

- 2nd remove text into #<font R4Nd0M ""> marker

See that markers can contain random text (can be number or letter),

and can't find solution or function that can help me.

Any hint is appreciated, thank you.

m.

Share this post


Link to post
Share on other sites



up,

try to understand how use StringRegExpReplace but have some problem...

anyone can help a bit ?

Thank you,

m.

Share this post


Link to post
Share on other sites

myspacee,

I am confused about what you want to do - can you show us how you want the lines to look after the 2 deletions?

If you want to get rid of the 2 font tags, then this pattern should work: :mellow:

StringRegExpReplace($sText, "(?U)(#<.*>)", "")

(?U) = Inverse greediness, look for the shortest match (otherwise you lose the text between the tags as well!)

(#<.*>) = Look for #<, followed by a number of characters, followed by >

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Thank you for reply,

I've

#<font 10 ""> 3d hdtv: ready for primetime? #<font 7 ""> Tablets 2.0 Why 2010 Could (Finally) Be Their Year nexus oneHan
#<font 16 ""> www.storemags.com & www.fantamag.com #<font 8 ""> MARCH 2010 MARCH 2010 vol. 29 no. 3 44CovER SToRY Though
#<font 47 ""> www.storemags.com & www.fantamag.com #<font 26 "">#<font 24 ""> PC Magazine Digital Edition, #<font 8 ""> 
#<font 24 ""> www.storemags.com & www.fantamag.com #<font 21 ""> The iPad: A Must-Have? #<font 17 ""> The New York Times

I want to obtain

Tablets 2.0 Why 2010 Could (Finally) Be Their Year nexus oneHan
MARCH 2010 MARCH 2010 vol. 29 no. 3 44CovER SToRY Though
PC Magazine Digital Edition,
The iPad: A Must-Have? The New York Times

try few StringRegExpReplace combination but i'm not so smart as i think :]

m.

Share this post


Link to post
Share on other sites

myspacee,

I can do it in 2 passes. :mellow:

First pass - get rid of the initial #<font tag> text #<font tag>:

StringRegExpReplace($sText, "(?U)(?m:^)(#<.+>.+#<.+>)", "")

(?U) = Inverse greediness, look for shortest match

(?m:^) = Start at beginning of a line

(#<.+>.+#<.+>) = Match the first 2 tags and any text between them

Then a second pass to get rid of the remaining #<font tag>:

StringRegExpReplace($sText, "(?U)(#<.+>)", "")

(?U) = Inverse greediness

(#<.+>) = match the remaining tags

I am sure a SRE guru will come along in a minute and laugh until his sides hurt - but that should get you started! :(

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

I think to use 2 step too,

but not discard #<font tag> text #<font tag> :

$chars = StringRegExpReplace($chars, "(?U)(> .*#<)", "")
$chars = StringRegExpReplace($chars, "(?U)(#<.*>)", "")

Is a bad idea ? [EDIT: after test is a very bad idea !]

m.

(now testing yours...) :mellow:

Edited by myspacee

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0