Byte latent transformer (BLT) uses a special architecture to learn directly from raw bytes as opposed to tokenized chunks of text.