UNOFFICIAL TAK STREAM FORMAT

Table of contents

INTRODUCTION

The TAK format was created by Thomas Becker. Thomas has to date only released a binary encoder/decoder application and an SDK consisting of a binary decoding library (.dll) and .h interface file. There is no other official documentation or source code related to the format.

This document describes the v1.0.x TAK stream format. Thomas has stated that later versions of TAK will change the stream format.

This document currently only documents the container format and is the result of examining a variety of test streams encoded by the reference encoder. It is NOT an official specification.

Hopefully in time some decoder source code or documentation on the content of the audio frames will become available, but this document contains enough information for the following manipulations of a TAK stream:

  1. Extracting metadata such as audio format (samplerate, bits/sample, channels) and duration from a TAK stream;
  2. Testing a TAK stream for errors, based on the many embedded CRCs;
  3. Manipulation of the seektable;
  4. Manipulation of the STREAMINFO blocks in frame headers (e.g. to increase or decrease their frequency).

This document borrows terminology (and some small amounts of text) from the FLAC documentation - thanks to Josh Coalson for his codec and documentation.

This document is (C) Dave Chapman 2008. Please email comments/corrections/additions to "dave at dchapman punto com"

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

A copy of the license is available at http://www.gnu.org/copyleft/fdl.html.

Overview

All integers in a TAK bitstream are stored in little-endian format.

Metadata blocks

The stream header consists of a number of metadata blocks, followed by a metadata block of type 0x00 to indicate the end of the stream header. The reference decoder (v1.0.4) will skip unknown metadata blocks.

Each metadata block starts with the following 4 bytes:

OffsetLengthNameContents
01Block typeBlock type code:
00 - ENDOFMETADATA
01 - STREAMINFO
02 - SEEKTABLE
03 - WAVEMETADATA
04 - ENCODERINFO
05 - PADDING
13Block length24-bit little-endian integer storing the length of the block in bytes, excluding this 4-byte block header

The reference encoder (v1.0.4) writes the metadata blocks in the order 01, 02, 04, 03, 05, but the decoder will accept the metadata blocks in a different order, as long as 01 is first and 00 is last.

00 - ENDOFMETADATA

This is the last metadata block and always has length zero.

01 - STREAMINFO

Length - 13 bytes (0x0d)

OffsetLengthNameContents
01unknownunknown - e.g. 0x80
11num_samples (bits 0-1)
framesizecode
unknown
Bits 6-7 contain the two least-significant bits of num_samples
Bits 2-5 bits are framesizecode - (buf[1] >> 2) & 0xf
Bits 0-1 are unknown
24num_samples (bits 2-33)A 34-bit integer containing the total number of samples per channel in the stream. (This field contains the high 32 bits - the low 2 bits are in previous byte)
63samplerate(Samplerate - 6000) (Probably just 17 bits - 128*1024)
91unknown
samplesize
channels
unknown
bits 4-7 are unknown
bit 3 indicates the number of channels (0 = mono, 1 = stereo)
bits 1-2 indicate the sample size (00 = 8-bit, 01 = 16-bit and 10 = 24-bit)
bit 0 is unknown.
103CRC24-bit CRC of bytes 0 to 9

Based on the definition of tak_str_FrameSizeType enumeration in the SDK, the frame size code is defined as follows:

Notes:

The first four (ms) frame sizes are approximate. For example, 125ms of a 44.1KHz stream is 5512.5 samples, which is rounded down to give a frame size of 5512 samples - 124.9887ms.

02 - SEEKTABLE

Each seekpoint entry is a 40-bit integer pointing to the offset (relative to the first audio frame in the stream) of the first byte of an audio frame.

OffsetLengthNameContents
02num_seekpointsNumber of seekpoints in the seektable.
21unknownunknown - seems to always be 0xe0
31seek interval(Seek interval - 1) e.g. 0x00 represents a 1s seek interval.
4num_seekpoints * 5seekpoints[num_seekpoints]Array of 40-bit seekpoints.
var3CRC24-bit CRC

Notes:

The reference encoder appears to only use seek intervals which are a power of two - i.e. 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s and 256s. When the reference decoder displays these values, it adjusts them to frame sizes which are not integer divisors of the samples/second - e.g. 8s in a 44.1KHz stream with "125ms" frames would be displayed as 7999ms.

03 - WAVEMETADATA

OffsetLengthNameContents
03HeaderLength24-bit length in bytes of header data
33FooterLength24-bit length in bytes of footer data
6HeaderLengthHeaderData
6+HeaderLengthFooterLengthFooterData
6+HeaderLength+FooterLength3CRC24-bit CRC

04 - ENCODERINFO

This block contains information on the version of the encoder used to create the stream and the preset options selected.

For example, TAK v1.0.4 is major version number 1, minor version 0, and revision 4.

OffsetLengthNameContents
01 Revision 0x04 - revision
11 Minor 0x00 - Minor version
21 Major 0x01 - Major version
31 Preset 0x02 - Preset:
high nibble is "evaluation" - 0 = normal, 1 = extra, 2 = max
low nibble is 0 to 5
e.g -p3e would be coded as 0x13
43 CRC 24-bit CRC of bytes 0-3

Notes:

For preset, Tak 1.0.1 allows T/F/N/H/E or 0-4 for Turbo/Fast/Normal(default)/High/Extra. Tak 1.0.4 just uses 0-5.

The -fi option to Tak 1.0.1 displays "Unknown" for -p5 streams.

Tak 1.0.2 added the "Insane" preset, which was later renamed to 5

If this metadata block is missing, then the reference decoder will display "V 1.0.0, -p2" as the encoder info for the stream.

05 - PADDING

The stream header can contain multiple padding blocks. These are filled with zeros, and do not include a CRC.

AUDIO FRAMES

Every audio frame consists of a frame header with a 24-bit CRC calculated over the header data, followed by the frame data which also ends with a 24-bit CRC.

The frame header is as follows:

OffsetLengthNameContents
02Sync code0xff 0xa0
23Frame number and flagsThe high 21-bits (when these 3 bytes are treated as a little-endian 24-bit int) are the frame number (starting at 0 for first frame)
Bit 0 == 1 indicates the final frame in a stream and the header includes the frame size of this frame (even if it is the same as the frame size)
Bit 1 == 1 indicates that STREAMINFO appears in this frame header
Bit 2 is unknown
var2Frame sizeIf Bit 0 in flags == 1, then 2 bytes contain (framesize - 1) where framesize is the number of samples per channel encoded in this frame.
var11Stream infoIf Bit 1 in flags == 1, then 10 bytes contain the stream info (same format as STREAMINFO in stream header), followed by a trailing 0x00 (meaning unknown)
var3CRC24-bit CRC calculated over whole header, including sync code

Notes:

The main contents of the audio frame are unknown:

OffsetLengthNameContents
0unknownunknownunknown
var3CRC24-bit CRC calculated over main body of audio frame.

It is not known how to calculate the length of audio frames, but one method is to incrementally calculate the CRC and compare it with the following three bytes in the stream. This method appears to work, but can sometimes give false positives - i.e. CRCs in the middle of the frame. Therefore, an application performing this test needs to also check that the two bytes after the CRC are the sync code for the next frame (or it has reached the end of the file).

CRC calculation

Every byte (apart from those where the content is fixed) in a TAK stream is protected by a 24-bit CRC. This CRC is calculated in the same way as the CRC specified in RFC2440. To quote that RFC:

6.1. An Implementation of the CRC-24 in "C"

       #define CRC24_INIT 0xb704ceL
       #define CRC24_POLY 0x1864cfbL

       typedef long crc24;
       crc24 crc_octets(unsigned char *octets, size_t len)
       {
           crc24 crc = CRC24_INIT;
           int i;

           while (len--) {
               crc ^= (*octets++) << 16;
               for (i = 0; i < 8; i++) {
                   crc <<= 1;
                   if (crc & 0x1000000)
                       crc ^= CRC24_POLY;
               }
           }
           return crc & 0xffffffL;
       }

Valid XHTML 1.0 Strict Valid CSS!