Python Ignore Incorrect padding error when base64 decoding

Question

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it  If I use   base64 decodestring b64 string    it raises an  Incorrect padding  error  Is there another way   UPDATE  Thanks for all the feedback  To be honest  all the methods mentioned sounded a bit hit and miss so I decided to try openssl  The following command worked a treat   openssl enc -d -base64 -in b64string -out binary data

User · Answer

There are two ways to correct the input data described here, or, more specifically and in line with the OP, to make Python module base64's b64decode method able to process the input data to something without raising an un-caught exception:

Append == to the end of the input data and call base64.b64decode(...)
If that raises an exception, then

i. Catch it via try/except,

ii. (R?)Strip any = characters from the input data (N.B. this may not be necessary),

iii. Append A== to the input data (A== through P== will work),

iv. Call base64.b64decode(...) with those A==-appended input data

The result from Item 1. or Item 2. above will yield the desired result.

Caveats

This does not guarantee the decoded result will be what was originally encoded, but it will (sometimes?) give the OP enough to work with:

Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream").

See What we know and Assumptions below.

TL;DR

From some quick tests of base64.b64decode(...)

it appears that it ignores non-[A-Za-z0-9+/] characters; that includes ignoring =s unless they are the last character(s) in a parsed group of four, in which case the =s terminate the decoding (a=b=c=d= gives the same result as abc=, and a==b==c== gives the same result as ab==).
It also appears that all characters appended are ignored after the point where base64.b64decode(...) terminates decoding e.g. from an = as the fourth in a group.

As noted in several comments above, there are either zero, or one, or two, =s of padding required at the end of input data for when the [number of parsed characters to that point modulo 4] value is 0, or 3, or 2, respectively. So, from items 3. and 4. above, appending two or more =s to the input data will correct any [Incorrect padding] problems in those cases.

HOWEVER, decoding cannot handle the case where the [total number of parsed characters modulo 4] is 1, because it takes a least two encoded characters to represent the first decoded byte in a group of three decoded bytes. In uncorrupted encoded input data, this [N modulo 4]=1 case never happens, but as the OP stated that characters may be missing, it could happen here. That is why simply appending =s will not always work, and why appending A== will work when appending == does not. N.B. Using [A] is all but arbitrary: it adds only cleared (zero) bits to the decoded, which may or not be correct, but then the object here is not correctness but completion by base64.b64decode(...) sans exceptions.

What we know from the OP and especially subsequent comments is

It is suspected that there are missing data (characters) in the Base64-encoded input data
The Base64 encoding uses the standard 64 place-values plus padding: A-Z; a-z; 0-9; +; /; = is padding. This is confirmed, or at least suggested, by the fact that openssl enc ... works.

Assumptions

The input data contain only 7-bit ASCII data
The only kind of corruption is missing encoded input data
The OP does not care about decoded output data at any point after that corresponding to any missing encoded input data

Github

Here is a wrapper to implement this solution:

https://github.com/drbitboy/missing_b64

User · Answer

Simply add additional characters like     or any other and make it a multiple of 4 before you try decoding the target string value  Something like   if len value    4    0   check if multiple of 4     while len value    4    0          value   value           req str   base64 b64decode value  else      req str   base64 b64decode value

User · Answer

I got this error without any use of base64  So i got a solution that error is in localhost it works fine on 127 0 0 1

User · Answer

Incorrect padding  can mean not only  missing padding  but also  believe it or not   incorrect padding    If suggested  adding padding  methods don t work  try removing some trailing bytes   lens   len strg  lenx   lens -  lens   4 if lens   4 else 4  try      result   base64 decodestring strg  lenx   except etc   Update  Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace  otherwise length calculations will be upset   It would be a good idea if you showed us a  short  sample of the data that you need to recover  Edit your question and copy paste the result of print repr sample    Update 2  It is possible that the encoding has been done in an url-safe manner  If this is the case  you will be able to see minus and underscore characters in your data  and you should be able to decode it by using base64 b64decode strg   -     If you can t see minus and underscore characters in your data  but can see plus and slash characters  then you have some other problem  and may need the add-padding or remove-cruft tricks   If you can see none of minus  underscore  plus and slash in your data  then you need to determine the two alternate characters  they ll be the ones that aren t in  A-Za-z0-9   Then you ll need to experiment to see which order they need to be used in the 2nd arg of base64 b64decode    Update 3  If your data is  company confidential    a  you should say so up front  b  we can explore other avenues in understanding the problem  which is highly likely to be related to what characters are used instead of   and   in the encoding alphabet  or by other formatting or extraneous characters   One such avenue would be to examine what non- standard  characters are in your data  e g   from collections import defaultdict d   defaultdict int  import string s   set string ascii letters   string digits  for c in your data     if c not in s        d c     1 print d

User · Answer

Just add padding as required  Heed Michael s warning  however   b64 string            4 - len b64 string    4    4   ugh

User · Answer

In case this error came from a web server  Try url encoding your post value  I was POSTing via  curl  and discovered I wasn t url-encoding my base64 value so characters like     were not escaped so the web server url-decode logic automatically ran url-decode and converted   to spaces       is a valid base64 character and perhaps the only character which gets mangled by an unexpected url-decode

User · Answer

In my case I faced that error while parsing an email  I got the attachment as base64 string and extract it via re search  Eventually there was a strange additional substring at the end   dHJhaWxlcgo8PCAvU2l6ZSAxNSAvUm9vdCAxIDAgUiAvSW5mbyAyIDAgUgovSUQgWyhcMDAyXDMz MHtPcFwyNTZbezU VzheXDM0MXFcMzExKShcMDAyXDMzMHtPcFwyNTZbezU VzheXDM0MXFcMzEx KV0KPj4Kc3RhcnR4cmVmCjY3MDEKJSVFT0YK  --  ic0008m4wtZ4TqBFd sXC8--   When I deleted --  ic0008m4wtZ4TqBFd sXC8-- and strip the string then parsing was fixed up    So my advise is make sure that you are decoding a correct base64 string

User · Answer

You can simply use base64 urlsafe b64decode data  if you are trying to decode a web image  It will automatically take care of the padding

User · Answer

Use   string           -len string    4     restore stripped    s   Credit goes to a comment somewhere here    gt  gt  gt  import base64   gt  gt  gt  enc   base64 b64encode  1     gt  gt  gt  enc  gt  gt  gt   MQ      gt  gt  gt  base64 b64decode enc   gt  gt  gt   1    gt  gt  gt  enc   enc rstrip        gt  gt  gt  enc  gt  gt  gt   MQ    gt  gt  gt  base64 b64decode enc      TypeError  Incorrect padding   gt  gt  gt  base64 b64decode enc          -len enc    4    gt  gt  gt   1    gt  gt  gt

User · Answer

It seems you just need to add padding to your bytes before decoding  There are many other answers on this question  but I want to point out that  at least in Python 3 x  base64 b64decode will truncate any extra padding  provided there is enough in the first place   So  something like  b abc   works just as well as b abc     as does b abc          What this means is that you can just add the maximum number of padding characters that you would ever need   which is three  b         and base64 will truncate any unnecessary ones   This lets you write   base64 b64decode s   b         which is simpler than   base64 b64decode s   b       -len s    4

User · Answer

Adding the padding is rather    fiddly   Here s the function I wrote with the help of the comments in this thread as well as the wiki page for base64  it s surprisingly helpful  https   en wikipedia org wiki Base64 Padding   import logging import base64 def base64 decode s          Add missing padding to string and return the decoded base64 string         log   logging getLogger       s   str s  strip       try          return base64 b64decode s      except TypeError          padding   len s    4         if padding    1              log error  Invalid base64 string      format s               return            elif padding    2              s    b             elif padding    3              s    b            return base64 b64decode s

User · Answer

As said in other responses  there are various ways in which base64 data could be corrupted   However  as Wikipedia says  removing the padding  the     characters at the end of base64 encoded data  is  lossless       From a theoretical point of view  the padding character is not needed    since the number of missing bytes can be calculated from the number   of Base64 digits    So if this is really the only thing  wrong  with your base64 data  the padding can just be added back  I came up with this to be able to parse  data  URLs in WeasyPrint  some of which were base64 without padding   import base64 import re  def decode base64 data  altchars b              Decode base64  padding being optional        param data  Base64 data as an ASCII byte string      returns  The decoded byte string               data   re sub rb   a-zA-Z0-9 s      altchars  b    data     normalize     missing padding   len data    4     if missing padding          data    b      4 - missing padding      return base64 b64decode data  altchars    Tests for this function  weasyprint tests test css py L68

User · Answer

You should use  base64 b64decode b64 string          By default  the altchars are

User · Answer

If there s a padding error it probably means your string is corrupted  base64-encoded strings should have a multiple of four length  You can try adding the padding character     yourself to make the string a multiple of four  but it should already have that unless something is wrong

User · Answer

I ran into this problem as well and nothing worked  I finally managed to find the solution which works for me  I had zipped content in base64 and this happened to 1 out of a million records     This is a version of the solution suggested by Simon Sapin   In case the padding is missing 3 then I remove the last 3 characters   Instead of  0gA1RD5L 9AUGtH9MzAwAAA     We get  0gA1RD5L 9AUGtH9MzAwAA           missing padding   len data    4         if missing padding    3              data   data 0 -3          elif missing padding    0              print   Missing padding       str missing padding               data           4 - missing padding          data decoded   base64 b64decode data       According to this answer Trailing As in base64 the reason is nulls  But I still have no idea why the encoder messes this up

User · Answer

Incorrect padding error is caused because sometimes  metadata is also present in the encoded string If your string looks something like   data image png base64    base 64 stuff      then you need to remove the first part before decoding it   Say if you have image base64 encoded string  then try below snippet    from PIL import Image from io import BytesIO from base64 import b64decode imagestr    data image png base64    base 64 stuff      im   Image open BytesIO b64decode imagestr split      1     im save  image png

User · Answer

Check the documentation of the data source you re trying to decode  Is it possible that you meant to use base64 urlsafe b64decode s  instead of base64 b64decode s   That s one reason you might have seen this error message      Decode string s using a URL-safe alphabet  which substitutes - instead   of   and   instead of   in the standard Base64 alphabet    This is for example the case for various Google APIs  like Google s Identity Toolkit and Gmail payloads

User · Answer

In my case I faced this error  after deleting the venv for the perticular project and it showing error for each fields so I tried by changing the BROWSER Chrome to Edge   And actually it worked

[python] Python: Ignore 'Incorrect padding' error when base64 decoding

Examples related to python

Examples related to base64