[python] Python: Ignore 'Incorrect padding' error when base64 decoding

There are two ways to correct the input data described here, or, more specifically and in line with the OP, to make Python module base64's b64decode method able to process the input data to something without raising an un-caught exception:

  1. Append == to the end of the input data and call base64.b64decode(...)
  2. If that raises an exception, then

    i. Catch it via try/except,

    ii. (R?)Strip any = characters from the input data (N.B. this may not be necessary),

    iii. Append A== to the input data (A== through P== will work),

    iv. Call base64.b64decode(...) with those A==-appended input data

The result from Item 1. or Item 2. above will yield the desired result.

Caveats

This does not guarantee the decoded result will be what was originally encoded, but it will (sometimes?) give the OP enough to work with:

Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream").

See What we know and Assumptions below.

TL;DR

From some quick tests of base64.b64decode(...)

  1. it appears that it ignores non-[A-Za-z0-9+/] characters; that includes ignoring =s unless they are the last character(s) in a parsed group of four, in which case the =s terminate the decoding (a=b=c=d= gives the same result as abc=, and a==b==c== gives the same result as ab==).

  2. It also appears that all characters appended are ignored after the point where base64.b64decode(...) terminates decoding e.g. from an = as the fourth in a group.

As noted in several comments above, there are either zero, or one, or two, =s of padding required at the end of input data for when the [number of parsed characters to that point modulo 4] value is 0, or 3, or 2, respectively. So, from items 3. and 4. above, appending two or more =s to the input data will correct any [Incorrect padding] problems in those cases.

HOWEVER, decoding cannot handle the case where the [total number of parsed characters modulo 4] is 1, because it takes a least two encoded characters to represent the first decoded byte in a group of three decoded bytes. In uncorrupted encoded input data, this [N modulo 4]=1 case never happens, but as the OP stated that characters may be missing, it could happen here. That is why simply appending =s will not always work, and why appending A== will work when appending == does not. N.B. Using [A] is all but arbitrary: it adds only cleared (zero) bits to the decoded, which may or not be correct, but then the object here is not correctness but completion by base64.b64decode(...) sans exceptions.

What we know from the OP and especially subsequent comments is

  • It is suspected that there are missing data (characters) in the Base64-encoded input data
  • The Base64 encoding uses the standard 64 place-values plus padding: A-Z; a-z; 0-9; +; /; = is padding. This is confirmed, or at least suggested, by the fact that openssl enc ... works.

Assumptions

  • The input data contain only 7-bit ASCII data
  • The only kind of corruption is missing encoded input data
  • The OP does not care about decoded output data at any point after that corresponding to any missing encoded input data

Github

Here is a wrapper to implement this solution:

https://github.com/drbitboy/missing_b64