I think you could solve this with .strip()
in gazpacho:
Input:
html = """\
<p>
<strong class="offender">YOB:</strong> 1987<br />
<strong class="offender">RACE:</strong> WHITE<br />
<strong class="offender">GENDER:</strong> FEMALE<br />
<strong class="offender">HEIGHT:</strong> 5'05''<br />
<strong class="offender">WEIGHT:</strong> 118<br />
<strong class="offender">EYE COLOR:</strong> GREEN<br />
<strong class="offender">HAIR COLOR:</strong> BROWN<br />
</p>
"""
Code:
soup = Soup(html)
text = soup.find("p").strip(whitespace=False) # to keep \n characters intact
lines = [
line.strip()
for line in text.split("\n")
if line != ""
]
data = dict([line.split(": ") for line in lines])
Output:
print(data)
# {'YOB': '1987',
# 'RACE': 'WHITE',
# 'GENDER': 'FEMALE',
# 'HEIGHT': "5'05''",
# 'WEIGHT': '118',
# 'EYE COLOR': 'GREEN',
# 'HAIR COLOR': 'BROWN'}