3/17/2023 0 Comments Github python text cleanerIt should run at least as fast as the regex (likely faster), and it's far less error-prone, since no character has special meaning (translation tables are just mappings from Unicode ordinals to None, meaning delete, another ordinal, meaning single character replacement, or a string, meaning char -> multichar replacement they don't have a concept of special escapes). Killpunctuation = str.maketrans('', '', replace the line: text = text = anslate(killpunctuation) First off, outside the function, make a translation table of the things to remove: # The redundant - is harmless here since the result is a dict which dedupes anyway Instead, replace that line with a simple str.translate call. You could fix this by just removing the second - in your character class (you already included it at the beginning of the class where it doesn't need to be escaped), changing from text = "", text)īut I'm going to suggest dropping regular expressions here the risk of mistakes with lots of literal punctuation is high, and there are other methods that don't involve regex at all that should work just fine and not make you worry if you escaped all the important stuff (the alternative is over-escaping, which makes the regex unreadable, and still error-prone). =-}, you'd have silently removed all characters from ordinal 61 to 125 inclusive, which would have included, along with a mess of punctuation, all standard ASCII letters, both lower and uppercase. In a way you got lucky if the characters around the - had been reversed, e.g. ![]() Since character ranges must go from low ordinal to high ordinal, 125->61 is nonsensical, thus the error. Your character class (as shown in the traceback) is invalid } comes after = in ordinal value ( } is 125, = is 61), and the - in between them means it's trying to match any character from }'s ordinal to ='s and in between. ![]() I am using Python 3.6, specifically the Anaconda build Anaconda3-2018.12-Windows-x86_64. Raise source.error(msg, len(this) 1 len(that)) P = _parse_sub(source, pattern, flags
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |