[python] How would I get everything before a : in a string Python

I am looking for a way to get all of the letters in a string before a : but I have no idea on where to start. Would I use regex? If so how?

string = "Username: How are you today?"

Can someone show me a example on what I could do?

This question is related to python regex string split

The answer is


partition() may be better then split() for this purpose as it has the better predicable results for situations you have no delimiter or more delimiters.


Using index:

>>> string = "Username: How are you today?"
>>> string[:string.index(":")]
'Username'

The index will give you the position of : in string, then you can slice it.

If you want to use regex:

>>> import re
>>> re.match("(.*?):",string).group()
'Username'                       

match matches from the start of the string.

you can also use itertools.takewhile

>>> import itertools
>>> "".join(itertools.takewhile(lambda x: x!=":", string))
'Username'

I have benchmarked these various technics under Python 3.7.0 (IPython).

TLDR

  • fastest (when the split symbol c is known): pre-compiled regex.
  • fastest (otherwise): s.partition(c)[0].
  • safe (i.e., when c may not be in s): partition, split.
  • unsafe: index, regex.

Code

import string, random, re

SYMBOLS = string.ascii_uppercase + string.digits
SIZE = 100

def create_test_set(string_length):
    for _ in range(SIZE):
        random_string = ''.join(random.choices(SYMBOLS, k=string_length))
        yield (random.choice(random_string), random_string)

for string_length in (2**4, 2**8, 2**16, 2**32):
    print("\nString length:", string_length)
    print("  regex (compiled):", end=" ")
    test_set_for_regex = ((re.compile("(.*?)" + c).match, s) for (c, s) in test_set)
    %timeit [re_match(s).group() for (re_match, s) in test_set_for_regex]
    test_set = list(create_test_set(16))
    print("  partition:       ", end=" ")
    %timeit [s.partition(c)[0] for (c, s) in test_set]
    print("  index:           ", end=" ")
    %timeit [s[:s.index(c)] for (c, s) in test_set]
    print("  split (limited): ", end=" ")
    %timeit [s.split(c, 1)[0] for (c, s) in test_set]
    print("  split:           ", end=" ")
    %timeit [s.split(c)[0] for (c, s) in test_set]
    print("  regex:           ", end=" ")
    %timeit [re.match("(.*?)" + c, s).group() for (c, s) in test_set]

Results

String length: 16
  regex (compiled): 156 ns ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        19.3 µs ± 430 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  index:            26.1 µs ± 341 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  26.8 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            26.3 µs ± 835 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            128 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

String length: 256
  regex (compiled): 167 ns ± 2.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        20.9 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  index:            28.6 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  27.4 µs ± 979 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            31.5 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            148 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

String length: 65536
  regex (compiled): 173 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        20.9 µs ± 613 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  index:            27.7 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  27.2 µs ± 796 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            26.5 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            128 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

String length: 4294967296
  regex (compiled): 165 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  partition:        19.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  index:            27.7 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split (limited):  26.1 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  split:            28.1 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
  regex:            137 µs ± 6.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

You don't need regex for this

>>> s = "Username: How are you today?"

You can use the split method to split the string on the ':' character

>>> s.split(':')
['Username', ' How are you today?']

And slice out element [0] to get the first part of the string

>>> s.split(':')[0]
'Username'

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to split

Parameter "stratify" from method "train_test_split" (scikit Learn) Pandas split DataFrame by column value How to split large text file in windows? Attribute Error: 'list' object has no attribute 'split' Split function in oracle to comma separated values with automatic sequence How would I get everything before a : in a string Python Split String by delimiter position using oracle SQL JavaScript split String with white space Split a String into an array in Swift? Split pandas dataframe in two if it has more than 10 rows