Challenge 1: Base64 Encoding

Cryptopals

We’ve all used Convert.ToBase64String() but what is it actually happening under the covers? Sure, it’s taking a value and representing it using only characters from a range of 64 characters, but how exactly does it do that? Up until now, I probably couldn’t have told you.

My favorite example for understanding how Base64 encoding works is actually from Wikipedia:

Base64 adapted from Wikipedia
Source Text (ASCII) M a n
Octets 77 (0x4d) 97 (0x61) 110 (0x6e)
Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Base64
encoded
Sextets 19 22 5 46
Character T W F u
Octets 84 (0x54) 87 (0x57) 70 (0x46) 117 (0x75)

To Base64 encode, we take our original string, convert it into bytes (octets), take 6 bits at a time (sextets), and then look up that sextet’s corresponding value in our Base64 lookup table.

Hex to Bytes

In the challenge, we’re given a hex string to Base64 encode. Hexadecimal is the Base16 (0–9, a–f) representation of a byte. For example, the byte 77 would be 4d. In C languages, you’ll often see this prefixed with 0x, making it 0x4d.

We typically don’t work directly with hex strings, and I’ve yet to find simple handling for hex string in C#, so the first thing I did was to take the hex string and turn it into bytes:

public static byte[] StringToBytes(string hex) {
  var hexAsBytes = new byte[hex.Length / 2];

  for (var i = 0; i < hex.Length; i += 2) {
    hexAsBytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
  }

  return hexAsBytes;
}

Here we have our hex string, take two characters at a time, and then convert them to a byte using Convert.ToByte() with a fromBase of 16 (which is the base for hexadecimals (0–f)).

Bytes to Binary

Now that we have the bytes, we need to get their octets, and then break those octets into sextets.

The simplest way I could find was to parse the byte array into a string representation of the binary. I’m sure there’s a more efficient way to do this, but I found this the clearest way for me to see what was going on.

public static string ToBinaryString(byte[] bytes) {
  var binaryOctets = bytes.Select(x =>
     Convert.ToString(x, 2) // get as 1's & 0's
       .PadLeft(8, '0')) // ensure always 8 chars longs, otherwise padding 0's on left
       .ToList();

  return binaryOctets.Aggregate(string.Empty, 
     (current, currentBit) => current + currentBit); // concatenate into single string 
}

Octets to Sextets

Now that we have the octets, let’s get them sextets by taking 6 bits at a time and then convert them back into integers (to simplify the next step):

public static List<int> OctetsToSextets(string bits) {
  var taken = 0;
  var sextets = new List<int>();

  while (taken < bits.Length) {
    var sextet = bits
         .Skip(taken)
         .Take(6)
         .Aggregate(string.Empty, (c, c1) => c + c1);

    sextets.Add(Convert.ToInt32(sextet, 2));
    taken += 6;
  }

  return sextets;
}

Base64 Lookup

Now that we have our sextets, we can look up their corresponding base 64 value. So, the value 42 turns into q, and 4 is E.

private static string SextetsToBase64String(IList<int> sextets) {
  string base64String = null;

  foreach (var sextet in sextets) {
    base64String += Base64Lookup[sextet];
  }

  return base64String;
}
private static readonly char[] Base64Lookup = {
  'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
  };

And if you chain the above together, you should end up with the hex string 4d616e being encoded as: TWFu.

Padding

So, what happens when the original text length is not neatly divisible by 3? Well, we need to deal with padding.

Going back to that really useful Wikipedia example, we can see that if the final block is not 3 in length, then we need to pad both the final sextet and the encoded value itself.

Source Text (ASCII) M a
Octets 77 (0x4d) 97 (0x61)
Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0
Base64
encoded
Sextets 19 22 4 Padding
Character T W E =
Octets 84 (0x54) 87 (0x57) 69 (0x45) 61 (0x3D)
Base64 adapted from Wikipedia

So, the final sextet gets padded with 0s on the right, and any missing sextets become the padding character '='.

We can update our OctetsToSextets with a check such as:

if (sextet.Length != 6) {
  sextet = sextet.PadRight(6, '0'); // if not 6 in length, pad right with 0's
}

And then, after creating our Base64 value in SextetsToBase64String, check to see if it’s missing any characters, padding with '=' as appropriate.

if (base64String?.Length % 4 != 0) {
  if (base64String?.Length % 4 == 2) base64String += "==";
  else base64String += "=";
}

So, testing our code, if we enter the hex string 4d61, then we should end up with TWE=.

There’s a much nicer implementation by David Zych which goes into more detail. I recommend checking it out.

We’ve only dealt with encoding here, however, you'll need to handle decoding later in the challenges. You can either use Convert.FromBase64String() or implement the reverse of the above yourself.