Python How to parse the Body from a raw email given that raw email does not have a Body tag or anything

Question

It seems easy to get the   From To Subject   etc via  import email b   email message from string a  bbb   b  from   ccc   b  to     assuming that  a  is the raw-email string which looks something like this   a      From root a1 local tld Thu Jul 25 19 28 59 2013 Received  from a1 local tld  localhost  127 0 0 1       by a1 local tld  8 14 4 8 14 4  with ESMTP id r6Q2SxeQ003866     for  lt ooo a1 local tld gt   Thu  25 Jul 2013 19 28 59 -0700 Received   from root localhost      by a1 local tld  8 14 4 8 14 4 Submit  id r6Q2Sxbh003865      Thu  25 Jul 2013 19 28 59 -0700 From  root a1 local tld Subject  oooooooooooooooo To  ooo a1 local tld Cc   X-Originating-IP  192 168 15 127 X-Mailer  Webmin 1 420 Message-Id   lt 1374805739 3861 a1 gt  Date  Thu  25 Jul 2013 19 28 59 -0700  PDT  MIME-Version  1 0 Content-Type  multipart mixed  boundary  bound1374805739   This is a multi-part message in MIME format   --bound1374805739 Content-Type  text plain Content-Transfer-Encoding  7bit  ooooooooooooooooooooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooooooooooooooooooooo ooooooooooooooooooooooooooooooooooooooooooooooo  --bound1374805739--      THE QUESTION  how do you get the Body of this email via python    So far this is the only code i am aware of but i have yet to test it   if email is multipart        for part in email get payload            print part get payload   else      print email get payload     is this the correct way    or maybe there is something simpler such as     import email b   email message from string a  bbb   b  body

User · Answer

Here s the code that works for me everytime  for Outlook emails     to read Subjects and Body of email in a folder  or subfolder   import win32com client    import package  outlook   win32com client Dispatch  Outlook Application   GetNamespace  MAPI      create object   get to the desired folder  MyEmail xyz com is my root folder   root folder    outlook Folders  MyEmail xyz com   Folders  Inbox   Folders  SubFolderName       Inbox  and  SubFolderName  are the subfolders   messages   root folder Items  for message in messages  if message Unread    True       gets only  Unread  emails     subject content   message subject   to store subject lines of mails      body content   message body   to store Body of mails      print subject content      print body content       message Unread   True           mark the mail as  Read      message   messages GetNext     iterate over mails

User · Answer

There is very good package available to parse the email contents with proper documentation   import mailparser  mail   mailparser parse from file f  mail   mailparser parse from file obj fp  mail   mailparser parse from string raw mail  mail   mailparser parse from bytes byte mail    How to Use   mail attachments  list of all attachments mail body mail to

User · Answer

To be highly positive you work with the actual email body  yet  still with the possibility you re not parsing the right part   you have to skip attachments  and focus on the plain or html part  depending on your needs  for further processing   As the before-mentioned attachments can and very often are of text plain or text html part  this non-bullet-proof sample skips those by checking the content-disposition header   b   email message from string a  body       if b is multipart        for part in b walk            ctype   part get content type           cdispo   str part get  Content-Disposition               skip any text plain  txt  attachments         if ctype     text plain  and  attachment  not in cdispo              body   part get payload decode True     decode             break   not multipart - i e  plain text  no attachments  keeping fingers crossed else      body   b get payload decode True    BTW  walk   iterates marvelously on mime parts  and get payload decode True  does the dirty work on decoding base64 etc  for you   Some background - as I implied  the wonderful world of MIME emails presents a lot of pitfalls of  wrongly  finding the message body  In the simplest case it s in the sole  text plain  part and get payload   is very tempting  but we don t live in a simple world - it s often surrounded in multipart alternative  related  mixed etc  content  Wikipedia describes it tightly - MIME  but considering all these cases below are valid - and common - one has to consider safety nets all around   Very common - pretty much what you get in normal editor  Gmail Outlook  sending formatted text with an attachment   multipart mixed      - multipart related              - multipart alternative                      - text plain           - text html                    - image png      -- application msexcel   Relatively simple - just alternative representation   multipart alternative      - text plain   - text html   For good or bad  this structure is also valid   multipart alternative      - text plain   - multipart related                - text html        - image jpeg   Hope this helps a bit   P S  My point is don t approach email lightly - it bites when you least expect it

User · Answer

Python 3 6  provides built-in convenience methods to find and decode the plain text body as in  Todor Minakov s answer   You can use the EMailMessage get body   and get content   methods   msg   email message from string s  policy email policy default  body   msg get body   plain     if body      body   body get content   print body    Note this will give None if there is no  obvious  plain text body part   If you are reading from e g  an mbox file  you can give the mailbox constructor an EmailMessage factory   mbox   mailbox mbox mboxfile  factory lambda f  email message from binary file f  policy email policy default   create False  for msg in mbox            Note you must pass email policy default as the policy  since it s not the default

User · Answer

Use Message get payload  b   email message from string a  if b is multipart        for payload in b get payload              if payload is multipart                print payload get payload   else      print b get payload

User · Answer

If emails is the pandas dataframe and emails message the column for email text     Helper functions def get text from email msg          To get the content from email objects        parts          for part in msg walk            if part get content type       text plain               parts append  part get payload         return    join parts   def split email addresses line          To separate multiple email addresses        if line          addrs   line split              addrs   frozenset map lambda x  x strip    addrs       else          addrs   None     return addrs   import email   Parse the emails into a list email objects messages   list map email message from string  emails  message     emails drop  message   axis 1  inplace True    Get fields from parsed email objects keys   messages 0  keys   for key in keys      emails key     doc key  for doc in messages    Parse content from emails emails  content     list map get text from email  messages     Split multiple email addresses emails  From     emails  From   map split email addresses  emails  To     emails  To   map split email addresses     Extract the root of  file  as  user  emails  user     emails  file   map lambda x x split      0   del messages  emails head

User · Answer

There is no b  body   in python  You have to use get payload   if isinstance mailEntity get payload    list       for eachPayload in mailEntity get payload               do things you want               real mail body is in eachPayload get payload      else         means there is only text plain part            use mailEntity get payload   to get the body      Good Luck

[python] Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything

Examples related to python

Examples related to email

Examples related to python-2.7

Examples related to mod-wsgi

Examples related to wsgi