[python] Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything

To be highly positive you work with the actual email body (yet, still with the possibility you're not parsing the right part), you have to skip attachments, and focus on the plain or html part (depending on your needs) for further processing.

As the before-mentioned attachments can and very often are of text/plain or text/html part, this non-bullet-proof sample skips those by checking the content-disposition header:

b = email.message_from_string(a)
body = ""

if b.is_multipart():
    for part in b.walk():
        ctype = part.get_content_type()
        cdispo = str(part.get('Content-Disposition'))

        # skip any text/plain (txt) attachments
        if ctype == 'text/plain' and 'attachment' not in cdispo:
            body = part.get_payload(decode=True)  # decode
            break
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
    body = b.get_payload(decode=True)

BTW, walk() iterates marvelously on mime parts, and get_payload(decode=True) does the dirty work on decoding base64 etc. for you.

Some background - as I implied, the wonderful world of MIME emails presents a lot of pitfalls of "wrongly" finding the message body. In the simplest case it's in the sole "text/plain" part and get_payload() is very tempting, but we don't live in a simple world - it's often surrounded in multipart/alternative, related, mixed etc. content. Wikipedia describes it tightly - MIME, but considering all these cases below are valid - and common - one has to consider safety nets all around:

Very common - pretty much what you get in normal editor (Gmail,Outlook) sending formatted text with an attachment:

multipart/mixed
 |
 +- multipart/related
 |   |
 |   +- multipart/alternative
 |   |   |
 |   |   +- text/plain
 |   |   +- text/html
 |   |      
 |   +- image/png
 |
 +-- application/msexcel

Relatively simple - just alternative representation:

multipart/alternative
 |
 +- text/plain
 +- text/html

For good or bad, this structure is also valid:

multipart/alternative
 |
 +- text/plain
 +- multipart/related
      |
      +- text/html
      +- image/jpeg

Hope this helps a bit.

P.S. My point is don't approach email lightly - it bites when you least expect it :)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to email

Monitoring the Full Disclosure mailinglist require(vendor/autoload.php): failed to open stream Failed to authenticate on SMTP server error using gmail Expected response code 220 but got code "", with message "" in Laravel How to to send mail using gmail in Laravel? Laravel Mail::send() sending to multiple to or bcc addresses Getting "The remote certificate is invalid according to the validation procedure" when SMTP server has a valid certificate How to validate an e-mail address in swift? PHP mail function doesn't complete sending of e-mail How to validate email id in angularJs using ng-pattern

Examples related to python-2.7

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict

Examples related to mod-wsgi

Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything "make_sock: could not bind to address [::]:443" when restarting apache (installing trac and mod_wsgi) Target WSGI script cannot be loaded as Python module How to install python developer package? Difference between socket and websocket?

Examples related to wsgi

Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything How many concurrent requests does a single Flask process receive? Target WSGI script cannot be loaded as Python module 104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?