UNOFFICIAL TAK STREAM FORMAT
Table of contents
INTRODUCTION
The TAK format was created by Thomas Becker. Thomas has to date only released a binary encoder/decoder application and an SDK consisting of a binary decoding library (.dll) and .h interface file. There is no other official documentation or source code related to the format.
This document describes the v1.0.x TAK stream format. Thomas has stated that later versions of TAK will change the stream format.
This document currently only documents the container format and is the result of examining a variety of test streams encoded by the reference encoder. It is NOT an official specification.
Hopefully in time some decoder source code or documentation on the content of the audio frames will become available, but this document contains enough information for the following manipulations of a TAK stream:
- Extracting metadata such as audio format (samplerate, bits/sample, channels) and duration from a TAK stream;
- Testing a TAK stream for errors, based on the many embedded CRCs;
- Manipulation of the seektable;
- Manipulation of the STREAMINFO blocks in frame headers (e.g. to increase or decrease their frequency).
This document borrows terminology (and some small amounts of text) from the FLAC documentation - thanks to Josh Coalson for his codec and documentation.
This document is (C) Dave Chapman 2008. Please email comments/corrections/additions to "dave at dchapman punto com"
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is available at http://www.gnu.org/copyleft/fdl.html.
Overview
All integers in a TAK bitstream are stored in little-endian format.
- A TAK bitstream consists of the "tBaK" marker at the beginning of the stream, followed by a mandatory metadata block (called the STREAMINFO block), any number of other metadata blocks, then the audio frames, and then an optional APEv2 tag.
- TAK supports up to 256 kinds of metadata block; v1.0.4 of the reference encoder creates the following:
- ENDOFMETADATA: This block appears as the last metadata block and contains no payload. It marks the end of the metadata blocks.
- STREAMINFO: This block has information about the whole stream, like sample rate, number of channels, total number of samples, etc. It must be present as the first metadata block in the stream. Other metadata blocks may follow, and ones that the decoder doesn't understand, it will skip.
- SEEKTABLE: This block contains a list of offsets to the start of audio frames in the stream, relative to the offset of the first audio frame. By default, the reference encoder creates seekpoints at approximately one second intervals. If a large stream is encoded via piping, and the -sts parameter is set too small, then a seektable with seekpoints at larger intervals will be written into the available space.
- WAVEMETADATA: This block is used to recreate the source WAVE file. It stores the header and, if it existed, footer data from the WAVE file.
- ENCODERINFO: This block has information about the encoder and encoding options used to create the stream.
- PADDING: This block allows for up to 2^24 bytes of padding and unlike all other blocks does not contain a CRC. This block is created when encoding streams with an unknown total duration via a pipe and consists of the space unused by the seektable.
- The audio data is composed of one or more fixed-size audio frames - each compressed frame (apart from the final frame) contains the same number of samples.
- Each frame consists of a frame header, which contains a sync code, optional information about the frame like the number of samples (for the final frame), sample rate, number of channels, et cetera, and a 24-bit CRC. The frame header also contains a frame number which can be used to identify the frame's location in the stream.
- Following the frame header is the as yet undocumented audio data, and finally, a second 24-bit CRC, this time calculated over the audio data in the frame.
- Approximately every two seconds, the frame header includes a copy of the STREAMINFO block. This allows a decoder to start decoding in the middle of the stream, without needing access to the information in the stream header.
- The actual payload format of the audio frame is unknown, apart from the fact that it ends with a 24-bit CRC calculated over the frame payload.
Metadata blocks
The stream header consists of a number of metadata blocks, followed by a metadata block of type 0x00 to indicate the end of the stream header. The reference decoder (v1.0.4) will skip unknown metadata blocks.
Each metadata block starts with the following 4 bytes:
Offset | Length | Name | Contents |
---|---|---|---|
0 | 1 | Block type | Block type code: 00 - ENDOFMETADATA 01 - STREAMINFO 02 - SEEKTABLE 03 - WAVEMETADATA 04 - ENCODERINFO 05 - PADDING |
1 | 3 | Block length | 24-bit little-endian integer storing the length of the block in bytes, excluding this 4-byte block header |
The reference encoder (v1.0.4) writes the metadata blocks in the order 01, 02, 04, 03, 05, but the decoder will accept the metadata blocks in a different order, as long as 01 is first and 00 is last.
00 - ENDOFMETADATA
This is the last metadata block and always has length zero.
01 - STREAMINFO
Length - 13 bytes (0x0d)
Offset | Length | Name | Contents |
---|---|---|---|
0 | 1 | unknown | unknown - e.g. 0x80 |
1 | 1 | num_samples (bits 0-1) framesizecode unknown | Bits 6-7 contain the two least-significant bits of num_samples Bits 2-5 bits are framesizecode - (buf[1] >> 2) & 0xf Bits 0-1 are unknown |
2 | 4 | num_samples (bits 2-33) | A 34-bit integer containing the total number of samples per channel in the stream. (This field contains the high 32 bits - the low 2 bits are in previous byte) |
6 | 3 | samplerate | (Samplerate - 6000) (Probably just 17 bits - 128*1024) |
9 | 1 | unknown samplesize channels unknown |
bits 4-7 are unknown bit 3 indicates the number of channels (0 = mono, 1 = stereo) bits 1-2 indicate the sample size (00 = 8-bit, 01 = 16-bit and 10 = 24-bit) bit 0 is unknown. |
10 | 3 | CRC | 24-bit CRC of bytes 0 to 9 |
Based on the definition of tak_str_FrameSizeType enumeration in the SDK, the frame size code is defined as follows:
- 0 - 94ms
- 1 - 125ms
- 2 - 188ms
- 3 - 250ms
- 4 - 4096 samples
- 5 - 8192 samples
- 6 - 16384 samples
- 7 - 512 samples
- 8 - 1024 samples
- 9 - 2048 samples
- 10 - 6144 samples
- 11 - 1288 samples
Notes:
The first four (ms) frame sizes are approximate. For example, 125ms of a 44.1KHz stream is 5512.5 samples, which is rounded down to give a frame size of 5512 samples - 124.9887ms.
02 - SEEKTABLE
Each seekpoint entry is a 40-bit integer pointing to the offset (relative to the first audio frame in the stream) of the first byte of an audio frame.
Offset | Length | Name | Contents |
---|---|---|---|
0 | 2 | num_seekpoints | Number of seekpoints in the seektable. |
2 | 1 | unknown | unknown - seems to always be 0xe0 |
3 | 1 | seek interval | (Seek interval - 1) e.g. 0x00 represents a 1s seek interval. |
4 | num_seekpoints * 5 | seekpoints[num_seekpoints] | Array of 40-bit seekpoints. |
var | 3 | CRC | 24-bit CRC |
Notes:
The reference encoder appears to only use seek intervals which are a power of two - i.e. 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s and 256s. When the reference decoder displays these values, it adjusts them to frame sizes which are not integer divisors of the samples/second - e.g. 8s in a 44.1KHz stream with "125ms" frames would be displayed as 7999ms.
03 - WAVEMETADATA
Offset | Length | Name | Contents |
---|---|---|---|
0 | 3 | HeaderLength | 24-bit length in bytes of header data |
3 | 3 | FooterLength | 24-bit length in bytes of footer data |
6 | HeaderLength | HeaderData | |
6+HeaderLength | FooterLength | FooterData | |
6+HeaderLength+FooterLength | 3 | CRC | 24-bit CRC |
04 - ENCODERINFO
This block contains information on the version of the encoder used to create the stream and the preset options selected.
For example, TAK v1.0.4 is major version number 1, minor version 0, and revision 4.
Offset | Length | Name | Contents |
---|---|---|---|
0 | 1 | Revision | 0x04 - revision |
1 | 1 | Minor | 0x00 - Minor version |
2 | 1 | Major | 0x01 - Major version |
3 | 1 | Preset | 0x02 - Preset: high nibble is "evaluation" - 0 = normal, 1 = extra, 2 = max low nibble is 0 to 5 e.g -p3e would be coded as 0x13 |
4 | 3 | CRC | 24-bit CRC of bytes 0-3 |
Notes:
For preset, Tak 1.0.1 allows T/F/N/H/E or 0-4 for Turbo/Fast/Normal(default)/High/Extra. Tak 1.0.4 just uses 0-5.
The -fi option to Tak 1.0.1 displays "Unknown" for -p5 streams.
Tak 1.0.2 added the "Insane" preset, which was later renamed to 5
If this metadata block is missing, then the reference decoder will display "V 1.0.0, -p2" as the encoder info for the stream.
05 - PADDING
The stream header can contain multiple padding blocks. These are filled with zeros, and do not include a CRC.
AUDIO FRAMES
Every audio frame consists of a frame header with a 24-bit CRC calculated over the header data, followed by the frame data which also ends with a 24-bit CRC.
The frame header is as follows:
Offset | Length | Name | Contents |
---|---|---|---|
0 | 2 | Sync code | 0xff 0xa0 |
2 | 3 | Frame number and flags | The high 21-bits (when these 3 bytes are treated as a little-endian 24-bit int) are the frame number (starting at 0 for first frame) Bit 0 == 1 indicates the final frame in a stream and the header includes the frame size of this frame (even if it is the same as the frame size) Bit 1 == 1 indicates that STREAMINFO appears in this frame header Bit 2 is unknown |
var | 2 | Frame size | If Bit 0 in flags == 1, then 2 bytes contain (framesize - 1) where framesize is the number of samples per channel encoded in this frame. |
var | 11 | Stream info | If Bit 1 in flags == 1, then 10 bytes contain the stream info (same format as STREAMINFO in stream header), followed by a trailing 0x00 (meaning unknown) |
var | 3 | CRC | 24-bit CRC calculated over whole header, including sync code |
Notes:
- If a stream contains more frames than specified in the num_samples field in STREAMINFO, then the extra frames are ignored without warning by the reference decoder.
The main contents of the audio frame are unknown:
Offset | Length | Name | Contents |
---|---|---|---|
0 | unknown | unknown | unknown |
var | 3 | CRC | 24-bit CRC calculated over main body of audio frame. |
It is not known how to calculate the length of audio frames, but one method is to incrementally calculate the CRC and compare it with the following three bytes in the stream. This method appears to work, but can sometimes give false positives - i.e. CRCs in the middle of the frame. Therefore, an application performing this test needs to also check that the two bytes after the CRC are the sync code for the next frame (or it has reached the end of the file).
CRC calculation
Every byte (apart from those where the content is fixed) in a TAK stream is protected by a 24-bit CRC. This CRC is calculated in the same way as the CRC specified in RFC2440. To quote that RFC:
6.1. An Implementation of the CRC-24 in "C" #define CRC24_INIT 0xb704ceL #define CRC24_POLY 0x1864cfbL typedef long crc24; crc24 crc_octets(unsigned char *octets, size_t len) { crc24 crc = CRC24_INIT; int i; while (len--) { crc ^= (*octets++) << 16; for (i = 0; i < 8; i++) { crc <<= 1; if (crc & 0x1000000) crc ^= CRC24_POLY; } } return crc & 0xffffffL; }