Is there a way in C# to see if a string is Base 64 encoded other than just trying to convert it and see if there is an error? I have code code like this:
// Convert base64-encoded hash value into a byte array.
byte[] HashBytes = Convert.FromBase64String(Value);
I want to avoid the "Invalid character in a Base-64 string" exception that happens if the value is not valid base 64 string. I want to just check and return false instead of handling an exception because I expect that sometimes this value is not going to be a base 64 string. Is there some way to check before using the Convert.FromBase64String function?
Thanks!
Update:
Thanks for all of your answers. Here is an extension method you can all use so far it seems to make sure your string will pass Convert.FromBase64String without an exception. .NET seems to ignore all trailing and ending spaces when converting to base 64 so "1234" is valid and so is " 1234 "
public static bool IsBase64String(this string s)
{
s = s.Trim();
return (s.Length % 4 == 0) && Regex.IsMatch(s, @"^[a-zA-Z0-9\+/]*={0,3}$", RegexOptions.None);
}
For those wondering about performance of testing vs catching and exception, in most cases for this base 64 thing it is faster to check than to catch the exception until you reach a certain length. The smaller the length faster it is
In my very unscientific testing: For 10000 iterations for character length 100,000 - 110000 it was 2.7 times faster to test first.
For 1000 iterations for characters length 1 - 16 characters for total of 16,000 tests it was 10.9 times faster.
I am sure there is a point where it becomes better to test with the exception based method. I just don't know at what point that is.
This question is related to
c#
validation
base64
I believe the regex should be:
Regex.IsMatch(s, @"^[a-zA-Z0-9\+/]*={0,2}$")
Only matching one or two trailing '=' signs, not three.
s
should be the string that will be checked. Regex
is part of the System.Text.RegularExpressions
namespace.
public static bool IsBase64String1(string value)
{
if (string.IsNullOrEmpty(value))
{
return false;
}
try
{
Convert.FromBase64String(value);
if (value.EndsWith("="))
{
value = value.Trim();
int mod4 = value.Length % 4;
if (mod4 != 0)
{
return false;
}
return true;
}
else
{
return false;
}
}
catch (FormatException)
{
return false;
}
}
Knibb High football rules!
This should be relatively fast and accurate but I admit I didn't put it through a thorough test, just a few.
It avoids expensive exceptions, regex, and also avoids looping through a character set, instead using ascii ranges for validation.
public static bool IsBase64String(string s)
{
s = s.Trim();
int mod4 = s.Length % 4;
if(mod4!=0){
return false;
}
int i=0;
bool checkPadding = false;
int paddingCount = 1;//only applies when the first is encountered.
for(i=0;i<s.Length;i++){
char c = s[i];
if (checkPadding)
{
if (c != '=')
{
return false;
}
paddingCount++;
if (paddingCount > 3)
{
return false;
}
continue;
}
if(c>='A' && c<='z' || c>='0' && c<='9'){
continue;
}
switch(c){
case '+':
case '/':
continue;
case '=':
checkPadding = true;
continue;
}
return false;
}
//if here
//, length was correct
//, there were no invalid characters
//, padding was correct
return true;
}
Do decode, re encode and compare the result to original string
public static Boolean IsBase64(this String str)
{
if ((str.Length % 4) != 0)
{
return false;
}
//decode - encode and compare
try
{
string decoded = System.Text.Encoding.UTF8.GetString(System.Convert.FromBase64String(str));
string encoded = System.Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(decoded));
if (str.Equals(encoded, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
}
catch { }
return false;
}
Sure. Just make sure each character is within a-z
, A-Z
, 0-9
, /
, or +
, and the string ends with ==
. (At least, that's the most common Base64 implementation. You might find some implementations that use characters different from /
or +
for the last two characters.)
1) Use function as below:
string encoded = "WW91ckJhc2U2NHN0cmluZw==";
msgbox("Is string base64=" + IsBase64(encoded));
2) Below is the function:
public bool IsBase64(string base64String)
{
try
{
if (!base64String.Length < 1)
{
if (!base64String.Equals(Convert.ToBase64String(Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(Convert.FromBase64String(base64String)))), StringComparison.InvariantCultureIgnoreCase) & !System.Text.RegularExpressions.Regex.IsMatch(base64String, @"^[a-zA-Z0-9\+/]*={0,2}$"))
{
return false;
}
if ((base64String.Length % 4) != 0 || string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0 || base64String.Contains(" ") || base64String.Contains(Constants.vbTab) || base64String.Contains(Constants.vbCr) || base64String.Contains(Constants.vbLf))
{
return false;
}
}
else
{
return false;
}
return true;
}
catch (FormatException ex)
{
return false;
}
}
Why not just catch the exception, and return False?
This avoids additional overhead in the common case.
Imho this is not really possible. All posted solutions fails for strings like "test" and so on. If they can be divided through 4, are not null or empty, and if they are a valid base64 character, they will pass all tests. That can be many strings ...
So there is no real solution other than knowing that this is a base 64 encoded string. What I've come up with is this:
if (base64DecodedString.StartsWith("<xml>")
{
// This was really a base64 encoded string I was expecting. Yippie!
}
else
{
// This is gibberish.
}
I expect that the decoded string begins with a certain structure, so I check for that.
Yes, since Base64 encodes binary data into ASCII strings using a limited set of characters, you can simply check it with this regular expression:
/^[A-Za-z0-9\=\+\/\s\n]+$/s
which will assure the string only contains A-Z, a-z, 0-9, '+', '/', '=', and whitespace.
I will use like this so that I don't need to call the convert method again
public static bool IsBase64(this string base64String,out byte[] bytes)
{
bytes = null;
// Credit: oybek http://stackoverflow.com/users/794764/oybek
if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
|| base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
return false;
try
{
bytes=Convert.FromBase64String(base64String);
return true;
}
catch (Exception)
{
// Handle the exception
}
return false;
}
I would suggest creating a regex to do the job. You'll have to check for something like this: [a-zA-Z0-9+/=] You'll also have to check the length of the string. I'm not sure on this one, but i'm pretty sure if something gets trimmed (other than the padding "=") it would blow up.
Or better yet check out this stackoverflow question
I prefer this usage:
public static class StringExtensions
{
/// <summary>
/// Check if string is Base64
/// </summary>
/// <param name="base64"></param>
/// <returns></returns>
public static bool IsBase64String(this string base64)
{
//https://stackoverflow.com/questions/6309379/how-to-check-for-a-valid-base64-encoded-string
Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
return Convert.TryFromBase64String(base64, buffer, out int _);
}
}
Then usage
if(myStr.IsBase64String()){
...
}
Use Convert.TryFromBase64String from C# 7.2
public static bool IsBase64String(string base64)
{
Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
return Convert.TryFromBase64String(base64, buffer , out int bytesParsed);
}
Just for the sake of completeness I want to provide some implementation. Generally speaking Regex is an expensive approach, especially if the string is large (which happens when transferring large files). The following approach tries the fastest ways of detection first.
public static class HelperExtensions {
// Characters that are used in base64 strings.
private static Char[] Base64Chars = new[] { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/' };
/// <summary>
/// Extension method to test whether the value is a base64 string
/// </summary>
/// <param name="value">Value to test</param>
/// <returns>Boolean value, true if the string is base64, otherwise false</returns>
public static Boolean IsBase64String(this String value) {
// The quickest test. If the value is null or is equal to 0 it is not base64
// Base64 string's length is always divisible by four, i.e. 8, 16, 20 etc.
// If it is not you can return false. Quite effective
// Further, if it meets the above criterias, then test for spaces.
// If it contains spaces, it is not base64
if (value == null || value.Length == 0 || value.Length % 4 != 0
|| value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
return false;
// 98% of all non base64 values are invalidated by this time.
var index = value.Length - 1;
// if there is padding step back
if (value[index] == '=')
index--;
// if there are two padding chars step back a second time
if (value[index] == '=')
index--;
// Now traverse over characters
// You should note that I'm not creating any copy of the existing strings,
// assuming that they may be quite large
for (var i = 0; i <= index; i++)
// If any of the character is not from the allowed list
if (!Base64Chars.Contains(value[i]))
// return false
return false;
// If we got here, then the value is a valid base64 string
return true;
}
}
EDIT
As suggested by Sam, you can also change the source code slightly. He provides a better performing approach for the last step of tests. The routine
private static Boolean IsInvalid(char value) {
var intValue = (Int32)value;
// 1 - 9
if (intValue >= 48 && intValue <= 57)
return false;
// A - Z
if (intValue >= 65 && intValue <= 90)
return false;
// a - z
if (intValue >= 97 && intValue <= 122)
return false;
// + or /
return intValue != 43 && intValue != 47;
}
can be used to replace if (!Base64Chars.Contains(value[i]))
line with if (IsInvalid(value[i]))
The complete source code with enhancements from Sam will look like this (removed comments for clarity)
public static class HelperExtensions {
public static Boolean IsBase64String(this String value) {
if (value == null || value.Length == 0 || value.Length % 4 != 0
|| value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
return false;
var index = value.Length - 1;
if (value[index] == '=')
index--;
if (value[index] == '=')
index--;
for (var i = 0; i <= index; i++)
if (IsInvalid(value[i]))
return false;
return true;
}
// Make it private as there is the name makes no sense for an outside caller
private static Boolean IsInvalid(char value) {
var intValue = (Int32)value;
if (intValue >= 48 && intValue <= 57)
return false;
if (intValue >= 65 && intValue <= 90)
return false;
if (intValue >= 97 && intValue <= 122)
return false;
return intValue != 43 && intValue != 47;
}
}
The answer must depend on the usage of the string. There are many strings that may be "valid base64" according to the syntax suggested by several posters, but that may "correctly" decode, without exception, to junk. Example: the 8char string Portland
is valid Base64. What is the point of stating that this is valid Base64? I guess that at some point you'd want to know that this string should or should not be Base64 decoded.
In my case, I have Oracle connection strings that may be in plain text like:
Data source=mydb/DBNAME;User Id=Roland;Password=.....`
or in base64 like
VXNlciBJZD1sa.....................................==
I just have to check for the presence of a semicolon, because that proves that it is NOT base64, which is of course faster than any above method.
I know you said you didn't want to catch an exception. But, because catching an exception is more reliable, I will go ahead and post this answer.
public static bool IsBase64(this string base64String) {
// Credit: oybek https://stackoverflow.com/users/794764/oybek
if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
|| base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
return false;
try{
Convert.FromBase64String(base64String);
return true;
}
catch(Exception exception){
// Handle the exception
}
return false;
}
Update: I've updated the condition thanks to oybek to further improve reliability.
Check Base64 or normal string
public bool IsBase64Encoded(String str)
{
try
{
// If no exception is caught, then it is possibly a base64 encoded string
byte[] data = Convert.FromBase64String(str);
// The part that checks if the string was properly padded to the
// correct length was borrowed from d@anish's solution
return (str.Replace(" ","").Length % 4 == 0);
}
catch
{
// If exception is caught, then it is not a base64 encoded string
return false;
}
}
I have just had a very similar requirement where I am letting the user do some image manipulation in a <canvas>
element and then sending the resulting image retrieved with .toDataURL()
to the backend. I wanted to do some server validation before saving the image and have implemented a ValidationAttribute
using some of the code from other answers:
[AttributeUsage(AttributeTargets.Property, AllowMultiple = false, Inherited = false)]
public class Bae64PngImageAttribute : ValidationAttribute
{
public override bool IsValid(object value)
{
if (value == null || string.IsNullOrWhiteSpace(value as string))
return true; // not concerned with whether or not this field is required
var base64string = (value as string).Trim();
// we are expecting a URL type string
if (!base64string.StartsWith("data:image/png;base64,"))
return false;
base64string = base64string.Substring("data:image/png;base64,".Length);
// match length and regular expression
if (base64string.Length % 4 != 0 || !Regex.IsMatch(base64string, @"^[a-zA-Z0-9\+/]*={0,3}$", RegexOptions.None))
return false;
// finally, try to convert it to a byte array and catch exceptions
try
{
byte[] converted = Convert.FromBase64String(base64string);
return true;
}
catch(Exception)
{
return false;
}
}
}
As you can see I am expecting an image/png type string, which is the default returned by <canvas>
when using .toDataURL()
.
Source: Stackoverflow.com