Jump to content

WinWiesel

Members
  • Posts

    4
  • Joined

  • Last visited

WinWiesel's Achievements

Seeker

Seeker (1/7)

0

Reputation

  1. Thx a lot Xman! I'll have a look at this parser, though I'm pretty satisfied with my program so far. I get the needed Information from the JSON file with a couple of regex'es and the decoding is done by my code, posted above. But its good to know, these parsers exist. When I need something more complex, I'll try them out... Best regards, WinWiesel!
  2. Here's my final (?) solution, that should cover every kind of decoding HTML text. Maybe it's useful for someone: $sText = '\u4e3a\u4e86\u9632\u6b62\u5723\u7269\u91d1\u5777\u5783\u843d\u5165\u4fb5\u7565\u8005\u7684\u624b\u4e2d\uff0c\u5723\u5730\u4e9a\u6208\u5927\u9646\u4e0a\u7684\u6218\u58eb\u4eec\u7eb7\u7eb7\u633a\u8eab\u800c\u51fa\u3002' & _ 'F1\u00ae 2020 allows you to create your F1\u00ae team & whatever @ THX GUYS!!!' MsgBox(0, "", _ConvertHTML($sText)) Func _ConvertHTML($sText) Local Const $aisEntities[][2] = [[34, 'quot'], [38, 'amp'], [39, 'apos'], [60, 'lt'], [62, 'gt'], [160, 'nbsp'], [161, 'iexcl'], [162, 'cent'], [163, 'pound'], [164, 'curren'], [165, 'yen'], [166, 'brvbar'], [167, 'sect'], [168, 'uml'], [169, 'copy'], [170, 'ordf'], [171, 'laquo'], [172, 'not'], [173, 'shy'], [174, 'reg'], [175, 'macr'], [176, 'deg'], [177, 'plusmn'], [180, 'acute'], [181, 'micro'], [182, 'para'], [183, 'middot'], [184, 'cedil'], [186, 'ordm'], [187, 'raquo'], [191, 'iquest'], [192, 'Agrave'], [193, 'Aacute'], [194, 'Acirc'], [195, 'Atilde'], [196, 'Auml'], [197, 'Aring'], [198, 'AElig'], [199, 'Ccedil'], [200, 'Egrave'], [201, 'Eacute'], [202, 'Ecirc'], [203, 'Euml'], [204, 'Igrave'], [205, 'Iacute'], [206, 'Icirc'], [207, 'Iuml'], [208, 'ETH'], [209, 'Ntilde'], [210, 'Ograve'], [211, 'Oacute'], [212, 'Ocirc'], [213, 'Otilde'], [214, 'Ouml'], [215, 'times'], [216, 'Oslash'], [217, 'Ugrave'], [218, 'Uacute'], [219, 'Ucirc'], [220, 'Uuml'], [221, 'Yacute'], [222, 'THORN'], [223, 'szlig'], [224, 'agrave'], [225, 'aacute'], [226, 'acirc'], [227, 'atilde'], [228, 'auml'], [229, 'aring'], [230, 'aelig'], [231, 'ccedil'], [232, 'egrave'], [233, 'eacute'], [234, 'ecirc'], [235, 'euml'], [236, 'igrave'], [237, 'iacute'], [238, 'icirc'], [239, 'iuml'], [240, 'eth'], [241, 'ntilde'], [242, 'ograve'], [243, 'oacute'], [244, 'ocirc'], [245, 'otilde'], [246, 'ouml'], [247, 'divide'], [248, 'oslash'], [249, 'ugrave'], [250, 'uacute'], [251, 'ucirc'], [252, 'uuml'], [253, 'yacute'], [254, 'thorn'], [255, 'yuml'], [338, 'OElig'], [339, 'oelig'], [352, 'Scaron'], [353, 'scaron'], [376, 'Yuml'], [402, 'fnof'], [710, 'circ'], [732, 'tilde'], [913, 'Alpha'], [914, 'Beta'], [915, 'Gamma'], [916, 'Delta'], [917, 'Epsilon'], [918, 'Zeta'], [919, 'Eta'], [920, 'Theta'], [921, 'Iota'], [922, 'Kappa'], [923, 'Lambda'], [924, 'Mu'], [925, 'Nu'], [926, 'Xi'], [927, 'Omicron'], [928, 'Pi'], [929, 'Rho'], [931, 'Sigma'], [932, 'Tau'], [933, 'Upsilon'], [934, 'Phi'], [935, 'Chi'], [936, 'Psi'], [937, 'Omega'], [945, 'alpha'], [946, 'beta'], [947, 'gamma'], [948, 'delta'], [949, 'epsilon'], [950, 'zeta'], [951, 'eta'], [952, 'theta'], [953, 'iota'], [954, 'kappa'], [955, 'lambda'], [956, 'mu'], [957, 'nu'], [958, 'xi'], [959, 'omicron'], [960, 'pi'], [961, 'rho'], [962, 'sigmaf'], [963, 'sigma'], [964, 'tau'], [965, 'upsilon'], [966, 'phi'], [967, 'chi'], [968, 'psi'], [969, 'omega'], [977, 'thetasym'], [978, 'upsih'], [982, 'piv'], [8194, 'ensp'], [8195, 'emsp'], [8201, 'thinsp'], [8204, 'zwnj'], [8205, 'zwj'], [8206, 'lrm'], [8207, 'rlm'], [8211, 'ndash'], [8212, 'mdash'], [8216, 'lsquo'], [8217, 'rsquo'], [8218, 'sbquo'], [8220, 'ldquo'], [8221, 'rdquo'], [8222, 'bdquo'], [8224, 'dagger'], [8225, 'Dagger'], [8226, 'bull'], [8230, 'hellip'], [8240, 'permil'], [8242, 'prime'], [8243, 'Prime'], [8249, 'lsaquo'], [8250, 'rsaquo'], [8254, 'oline'], [8260, 'frasl'], [8364, 'euro'], [8465, 'image'], [8472, 'weierp'], [8476, 'real'], [8482, 'trade'], [8501, 'alefsym'], [8592, 'larr'], [8593, 'uarr'], [8594, 'rarr'], [8595, 'darr'], [8596, 'harr'], [8629, 'crarr'], [8656, 'lArr'], [8657, 'uArr'], [8658, 'rArr'], [8659, 'dArr'], [8660, 'hArr'], [8704, 'forall'], [8706, 'part'], [8707, 'exist'], [8709, 'empty'], [8711, 'nabla'], [8712, 'isin'], [8713, 'notin'], [8715, 'ni'], [8719, 'prod'], [8721, 'sum'], [8722, 'minus'], [8727, 'lowast'], [8730, 'radic'], [8733, 'prop'], [8734, 'infin'], [8736, 'ang'], [8743, 'and'], [8744, 'or'], [8745, 'cap'], [8746, 'cup'], [8747, 'int'], [8764, 'sim'], [8773, 'cong'], [8776, 'asymp'], [8800, 'ne'], [8801, 'equiv'], [8804, 'le'], [8805, 'ge'], [8834, 'sub'], [8835, 'sup'], [8836, 'nsub'], [8838, 'sube'], [8839, 'supe'], [8853, 'oplus'], [8855, 'otimes'], [8869, 'perp'], [8901, 'sdot'], [8968, 'lceil'], [8969, 'rceil'], [8970, 'lfloor'], [8971, 'rfloor'], [9001, 'lang'], [9002, 'rang'], [9674, 'loz'], [9824, 'spades'], [9827, 'clubs'], [9829, 'hearts'], [9830, 'diams']] For $i = 0 To UBound($aisEntities) - 1 $sText = StringReplace($sText, "&" & $aisEntities[$i][1] & ";", ChrW($aisEntities[$i][0]), 0 , 1) Next $sText = Execute('"' & StringRegExpReplace($sText, '\\u([[:xdigit:]]{4})', '" & ChrW(0x$1) & "') & '"') $sText = Execute('"' & StringRegExpReplace($sText, '&#(\d{3});', '" & Chr($1) & "') & '"') Return $sText EndFunc Since I didn't find a short algorithm to decode HTML entities (tried something with Execute and an Object dictionary), I "borrowed" the array containing the codes, from MrCreator (thx m8). Thanks a lot for your help, guys! WinWiesel
  3. Very nice, mikell! That's exactly, what I was looking for. I knew, it was doable with Execute, I just got confused with the quotations. ...and yes, this is simlyfied chinese @MrCreatoR: thanks for your effort, but this is pretty much the way, I wanted to avoid. It's more or less the same, as the second solution in my post.
  4. Hi there! My first post here, so please bare with me Basically, I'm writing a tool that downloads a json file with the steam web api and I want to display parts of it in my gui (I use the WinHTTP UDF to download). Sometimes the json file contains HTML encoding strings. As long as the string contains solely these, this code is working fine: $sText = "\u4e3a\u4e86\u9632\u6b62\u5723\u7269\u91d1\u5777\u5783\u843d\u5165\u4fb5\u7565\u8005\u7684\u624b\u4e2d\uff0c\u5723\u5730\u4e9a\u6208\u5927\u9646\u4e0a\u7684\u6218\u58eb\u4eec\u7eb7\u7eb7\u633a\u8eab\u800c\u51fa\u3002" $aText = StringSplit($sText, "\u", 3) $sNew = "" For $i = 1 To UBound($aText) - 1 $sNew &= ChrW(Dec($aText[$i])) Next MsgBox(0, "", $sNew) Unfortunately, quite often, there are special letters like "®", "©" or "&" mixed with "normal" ASCII letters, like this: Here's my solution for this: $sText = "F1\u00ae 2020 allows you to create your F1\u00ae team & whatever" Dim $aCodes[][2] = [['®', ChrW(0x00AE)],[' ', ChrW(0x0020)],['\u2019', ChrW(0x2019)],[''', ChrW(0x0027)], _ ['\u0026', ChrW(0x0026)],['&', ChrW(0x0026)],['\u00ae', ChrW(0x00ae)],['\u2122', ChrW(0x2122)], _ ['Ü', ChrW(0x00DC)],['ü',ChrW(0x00FC)],['\u00fc', ChrW(0x00fc)],['\u00e4', ChrW(0x00e4)], _ ['\u201c', ChrW(0x201c)],['\u201d',ChrW(0x201d)],['"', ChrW(0x0022)],['’',ChrW(0x2019)], _ ['\u00e8', ChrW(0x00e8)],['™',ChrW(0x2122)]] For $i = 0 To UBound($aCodes) - 1 $sText = StringReplace($sText, $aCodes[$i][0], $aCodes[$i][1], 0 , 1) Next MsgBox(0, "", $sText) I'm not very pleased with this solution and I think, the best way for any of these cases is Regex. Here's, what's working ONLY for "\u4e3a"): $sText = "\u4e3aand more text\u4e3a" $sText = StringRegExpReplace($sText, '\\u(\w{4})', ChrW(Dec("4e3a"))) MsgBox(0, "", $sText) But how can I do this for any "\uXXX"? I have no Idea, how to fit in the backreference parameter "$1". I was thinking, something like this might work, but it doesn't: $sText = "\u4e3a\u4e86\u9632\u6b62\u5723\u7269\u91d1\u5777\u5783\u843d\u5165\u4fb5\u7565\u8005\u7684\u624b\u4e2d\uff0c\u5723\u5730\u4e9a\u6208\u5927\u9646\u4e0a\u7684\u6218\u58eb\u4eec\u7eb7\u7eb7\u633a\u8eab\u800c\u51fa\u3002" $sReplace = StringRegExpReplace($sText, '\\u(\w{4})', Execute("ChrW(" & Dec("'$1'") & ")")) MsgBox(0, "", $sReplace) ;~ or this $sReplace = StringRegExpReplace($sText, '\\u(\w{4})', Execute("ChrW(Dec(" & "'$1'" & "))")) MsgBox(0, "", $sReplace) I may have messed up the Execute command. Is there a correct way to do this? If that's not the way at all, I'm grateful for any suggestions... Thanks in advance, WinWiesel!
×
×
  • Create New...