Commit 577fbc03 authored by vaclav's avatar vaclav
Browse files

revision of Readme.txt file

parent 065d7624
Loading
Loading
Loading
Loading
+90 −80
Original line number Diff line number Diff line
@@ -31,12 +31,13 @@
*******************************************************************************************************/


These files represent a pre-release of a codec candidate to the IVAS
These files represent a codec candidate to the IVAS
Extension to the 3GPP EVS Codec floating-point C simulation. All code is
written in ANSI-C. The system is implemented as two separate programs:
written in C. The system is implemented as three separate programs:

        IVAS_cod   Encoder
        IVAS_dec   Decoder
        IVAS_rend  Renderer

For encoding using the coder program, the input is a binary
audio file (*.8k, *.16k, *.32k, *.48k) and the output is a binary
@@ -62,7 +63,8 @@ such as an HP (HP-UX) or a Sun, then binary files will need to be modified
by swapping the byte order in the files.

The input and output files (*.8k, *.16k, *.32k, *.48k) are 16-bit signed
binary files with 8/16/32/48 kHz sampling rate with no headers.
binary files with 8/16/32/48 kHz sampling rate with no headers. Alternatively, 
the input and output files are WAV files.

The Encoder produces bitstream files in either ITU G.192 or MIME file
storage format.
@@ -126,10 +128,9 @@ should have the following structure:
    |-- lib_debug
    |-- lib_dec
    |-- lib_enc
	|-- lib_lc3plus
    |-- lib_rend
    |-- lib_util
    |-- scripts
    |-- tests
    |-- readme.txt

The package includes a Makefile for gcc, which has been verified on
@@ -140,9 +141,10 @@ in the c-code directory.

The package also includes a solution-file for Microsoft Visual Studio 2017 (x86). 
To compile the code, please open "Workspace_msvc\Workspace_msvc.sln" and build 
"encoder" for the encoder and "decoder" for the decoder executable. The resulting 
encoder/decoder/renderer executables are named "IVAS_cod.exe", "IVAS_dec.exe", 
and "IVAS_rend.exe". All reside in the c-code directory. 
"encoder" for the encoder, "decoder" for the decoder, and "renderer" for the 
renderer executable. The resulting encoder/decoder/renderer executables are 
"IVAS_cod.exe", "IVAS_dec.exe", and "IVAS_rend.exe". All reside in the c-code 
main directory. 


                       RUNNING THE SOFTWARE
@@ -168,8 +170,8 @@ R : Bitrate in bps,
                                                for 2 ISM, 3 ISM and 4 ISM also 160000, 192000, 256000
                                                for 3 ISM and 4 ISM also 384000
                                                for 4 ISM also 512000
                      for IVAS SBA, MASA, MC R=(13200, 16400, 24400, 32000, 48000, 64000, 80000,
                                                96000, 128000, 160000, 192000, 256000, 384000, 512000)
                      for IVAS SBA, MASA, MC, ISM-MASA, and ISM-SBA R=(13200, 16400, 24400, 32000, 
                      48000, 64000, 80000, 96000, 128000, 160000, 192000, 256000, 384000, 512000)
                      Alternatively, R can be a bitrate switching file which consists of R values
                      indicating the bitrate for each frame in bps. These values are stored in
                      binary format using 4 bytes per value
@@ -201,27 +203,24 @@ EVS mono is default, for IVAS choose one of the following: -stereo, -ism, -sba,
                      where InputConf specifies the channel configuration: 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4
                      Loudspeaker positions are assumed to have azimuth and elevation as per
                      ISO/IEC 23091-3:2018 Table 3. Channel order is as per ISO/IEC 23008-3:2015 Table 95.
                      See readme.txt for details.
                      See below for details.
-dtx D              : Activate DTX mode, D = (0, 3-100) is the SID update rate
                      where 0 = adaptive, 3-100 = fixed in number of frames,
                      default is deactivated
                      where 0 = adaptive, 3-100 = fixed in number of frames, default is deactivated
-dtx                : Activate DTX mode with a SID update rate of 8 frames
                      Note: DTX is supported in EVS, stereo, ISM, SBA up to 80kbps and MASA up to 128kbps
-rf p o             : Activate channel-aware mode for WB and SWB signal at 13.2kbps,
                      Note: DTX is supported in EVS, stereo, ISM, MASA, and SBA up to 80kbps
-rf p o             : Activate channel-aware mode in EVS for WB and SWB signal at 13.2kbps,
                      where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames.
                      Alternatively p and o can be replaced by a rf configuration file with each line
                      contains the values of p and o separated by a space,
                      default is deactivated
                      contains the values of p and o separated by a space, default is deactivated
-max_band B         : Activate bandwidth limitation, B = (NB, WB, SWB or FB)
                      alternatively, B can be a text file where each line contains "nb_frames B"
-no_delay_cmp       : Turn off delay compensation
-stereo_dmx_evs     : Activate stereo downmix function for EVS.
-stereo_dmx_evs     : Stereo downmix function for EVS
-mime               : Mime output bitstream file format
                      The encoder produces TS26.445 Annex.2.6 Mime Storage Format, (not RFC4867 Mime Format).
                      default output bitstream file format is G.192
-bypass mode        : SBA PCA by-pass, mode = (1, 2), 1 = PCA off, 2 = signal adaptive, default is 1
-q                  : Quiet mode, no frame counters
                      default is deactivated
-q                  : Quiet mode, limit printouts to terminal, default is deactivated


The usage of the "IVAS_dec" program is as follows:
@@ -233,7 +232,8 @@ Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file
Mandatory parameters:
---------------------
OutputConf           : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA,
                       HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT
                       HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, 
                       BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT
                       By default, channel order and loudspeaker positions are equal to the
                       encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker
                       layout file. See below for details.
@@ -261,7 +261,7 @@ Options:
                      Format files, the magic word in the mime file is used to determine
                      which of the two supported formats is in use.
                      default bitstream file format is G.192
-hrtf File          : HRTF filter File used in ISm format and BINAURAL output configuration
-hrtf File          : HRTF filter File used in BINAURAL rendering
-T File             : Head rotation specified by external trajectory File
-otr tracking_type  : Head orientation tracking type: 'none', 'ref', 'avg', 'ref_vec'
                      or 'ref_vec_lev' (only for binaural rendering)
@@ -269,11 +269,11 @@ Options:
                      works only in combination with '-otr ref' mode
-rvf File           : Reference vector specified by external trajectory file
                      works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes
-render_config File : Renderer configuration option File
-render_config File : Renderer configuration option with parameters specified in File
-om File            : MD output file for BINAURAL_SPLIT_PCM output
-non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90,
                      left or l or 90->left, right or r or -90->right, center or c or  0->middle
-q                  : Quiet mode, no frame counter
                      default is deactivated
-q                  : Quiet mode, limit printouts to terminal, default is deactivated


The usage of the "IVAS_rend" program is as follows:
@@ -282,34 +282,36 @@ The usage of the "IVAS_rend" program is as follows:
Usage: IVAS_rend [options]

Valid options:
  --input_file, -i                          Path to the input file (WAV, raw PCM or scene description file)
  --input_format, -if                       Audio format of input file (e.g. 5_1 or HOA3 or META, use -l for a list)
  --input_metadata, -im                     Space-separated list of path to metadata files for ISM or MASA inputs or BINAURAL_SPLIT_PCM input mode
  --output_file, -o                         Path to the output file
  --output_format, -of                      Output format to render.
                                            Alternatively, can be a custom loudspeaker layout file
  --sample_rate, -fs                        Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs
  --trajectory_file, -tf                    Head rotation trajectory file for simulation of head tracking (only for binaural outputs)
  --output_metadata, -om                    coded metadata file for BINAURAL_SPLIT_PCM output mode
  --post_rend_bfi_file, -prbfi              Split rendering option: bfi file
  --reference_rotation_file, -rf            Reference rotation trajectory file for simulation of head tracking (only for binaural outputs)
  --custom_hrtf, -hrtf                      Custom HRTF file for binaural rendering (only for binaural outputs)
  --render_config, -rc                      Binaural renderer configuration file (only for binaural outputs)
  --non_diegetic_pan, -ndp                  Panning mono non diegetic sound to stereo -90<= pan <= 90
-i File             : Input audio File (WAV, raw PCM or scene description file)
-if Format          : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list)
-im Files           : Metadata files for ISM (one file per object) or MASA inputs or BINAURAL_SPLIT_PCM input mode
-o File             : Output audio File
-of Format          : Audio Format of output file
                      Alternatively, it can be a custom loudspeaker layout file
-fs                 : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs
-tf File            : Head rotation trajectory file for simulation of head tracking (only for binaural outputs)
-om File            : Coded metadata File for BINAURAL_SPLIT_PCM output mode
-prbfi File         : Split rendering option: bfi File
-rf File           	: Reference rotation trajectory File for simulation of head tracking (only for binaural outputs)
-rvf File           : Reference vector trajectory File for simulation of head tracking (only for binaural outputs)
-hrtf File          : Custom HRTF File for binaural rendering (only for binaural outputs)
-rc File            : Binaural renderer configuration File (only for binaural outputs)
-ndp P              : Panning mono non-diegetic sound to stereo -90<= P <= 90
                      left or l or 90->left, right or r or -90->right, center or c or 0 ->middle
                                            
  --tracking_type, -otr                     Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs)
  --lfe_position, -lp                       Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear (like --gain, -g) and azimuth, elevation are in degrees.
-otr tracking_type  : Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs)
-lp Position        : Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear 
                      (like --gain, -g) and azimuth, elevation are in degrees.
                      If specified, overrides the default behavior which attempts to map input to output LFE channel(s)
  --lfe_matrix, -lm                         LFE panning matrix. File (CSV table) containing a matrix of dimensions [ num_input_lfe x num_output_channels ] with elements specifying linear routing gain (like --gain, -g). 
                                            If specified, overrides the output LFE position option and the default behavior which attempts to map input to output LFE channel(s)
  --no_delay_cmp, -ndc                      [flag] Turn off delay compensation
  --quiet, -q                               [flag] Limit printouts to terminal
  --gain, -g                                Input gain (linear, not in dB) to be applied to input audio file
  --list, -l                                List supported audio formats
  --reference_vector_file, -rvf             Reference vector trajectory file for simulation of head tracking (only for binaural outputs)
  --exterior_orientation_file, -exof        External orientation trajectory file for simulation of external orientations
  --sync_md_delay, -smd                     Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes for TDRenderer (13ms -> 10ms -> 2subframes)
-lm File            : LFE panning matrix File (CSV table) containing a matrix of dimensions [ num_input_lfe x 
                      num_output_channels ] with elements specifying linear routing gain (like --gain, -g). 
                      If specified, overrides the output LFE position option and the default behavior which attempts to map 
                      input to output LFE channel(s)
-ndc                : Turn off delay compensation
-q                  : Quiet mode, limit printouts to terminal, default is deactivated
-g                  : Input gain (linear, not in dB) to be applied to input audio file
-l                  : List supported audio formats
-exof               : External orientation trajectory file for simulation of external orientations
-smd                : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes.


                       MULTICHANNEL LOUDSPEAKER INPUT / OUTPUT CONFIGURATIONS
@@ -344,10 +346,10 @@ An example custom loudspeaker layout file is available: ls_setup_16ch_8+4+4.txt
                       RUNNING THE SELF TEST
                       =====================

A codec verification script is available in scripts/self_test.py. The
script demonstrates how to use the software at several operating points and
compares the output to a reference version/implementation. Please note:
In order to keep the run-time short it does not cover all operating
A codec verification script is available at https://forge.3gpp.org/rep/ivas-codec-pc/ivas-codec/ 
in scripts/self_test.py. The script demonstrates how to use the software at several operating points 
and compares the output to a reference version/implementation. 
Please note: In order to keep the run-time short it does not cover all operating
points or complete coverage.

Documentation on the self_test.py can be found as a part of scripts/README.md.
@@ -385,13 +387,29 @@ stvST32c.wav - 2 channels, 32000 Hz, 659200 samples per channel, clean spe
stvST32n.wav       - 2 channels, 32000 Hz, 620800 samples per channel, noisy speech
stvST48c.wav       - 2 channels, 48000 Hz, 988800 samples per channel, clean speech/audio
stvST48n.wav       - 2 channels, 48000 Hz, 931200 samples per channel, noisy speech
stv1MASA1TC48c.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 144000 samples 
stv1MASA1TC48n.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 963840 samples
stv1MASA2TC48c.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 288000 samples per channel
stv1MASA2TC48n.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 963840 samples per channel
stv2MASA1TC48c.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 288000
stv2MASA2TC48c.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 144000 samples per channel

stv1MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 144000 samples 
stv1MASA1TC48n.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 963840 samples
stv1MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 288000 samples per channel
stv1MASA2TC48n.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 963840 samples per channel
stv2MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 288000
stv2MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 144000 samples per channel
stvOMASA_1ISM_1MASA2TC48c.wav - 3 channels (1 discrete audio object and 1 MASA 2 transport channels), 48000 Hz
stvOMASA_1ISM_2MASA1TC32c.wav - 2 channels (1 discrete audio object and 2 MASA 1 transport channel), 32000 Hz 
stvOMASA_1ISM_2MASA2TC48c.wav - 3 channels (1 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
stvOMASA_2ISM_1MASA1TC16c.wav - 3 channels (2 discrete audio object and 1 MASA 1 transport channel), 48000 Hz
stvOMASA_2ISM_1MASA2TC48c.wav - 4 channels (2 discrete audio object and 1 MASA 2 transport channels), 16000 Hz
stvOMASA_2ISM_2MASA2TC48c.wav - 4 channels (2 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
stvOMASA_3ISM_1MASA1TC32c.wav - 4 channels (3 discrete audio object and 1 MASA 1 transport channel), 32000 Hz
stvOMASA_3ISM_1MASA2TC16c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 16000 Hz
stvOMASA_3ISM_1MASA2TC32c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz
stvOMASA_3ISM_1MASA2TC48c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz
stvOMASA_3ISM_2MASA1TC48c.wav - 4 channels (3 discrete audio object and 2 MASA 1 transport channel), 48000 Hz
stvOMASA_3ISM_2MASA2TC32c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 32000 Hz
stvOMASA_3ISM_2MASA2TC48c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
stvOMASA_4ISM_1MASA1TC48c.wav - 5 channels (4 discrete audio object and 1 MASA 1 transport channel), 48000 Hz
stvOMASA_4ISM_1MASA2TC48c.wav - 6 channels (4 discrete audio object and 1 MASA 2 transport channels), 48000 Hz
stvOMASA_4ISM_2MASA1TC48c.wav - 5 channels (4 discrete audio object and 2 MASA 1 transport channel), 48000 Hz
stvOMASA_4ISM_2MASA2TC48c.wav - 6 channels (4 discrete audio object and 2 MASA 2 transport channels), 48000 Hz

For the MASA operation modes, in addition the following metadata files
located in /scripts/testv/ folder are required:
@@ -466,21 +484,13 @@ headrot_case01_3000_q.csv
headrot_case02_3000_q.csv 
headrot_case03_3000_q.csv

For Reference vector specified by external trajectory file, example files are available at 
/scripts/trajectories folder.


For the Renderer configuration option operation modes, external configuration files are available:

rend_config_hospital_patientroom.cfg
config_recreation.cfg
config_renderer.cfg
For Reference vector specified by external trajectory file, example files are available in folder 
/scripts/trajectories.


                       ADDITIONAL SCRIPTS
                       ==================
For the Renderer configuration option operation modes, external configuration files are available, e.g.:

Additional scripts for item generation and codec testing are available
in the directories scripts and tests. Please refer to scripts/README.md, resp.
tests/README.md for additional documentation.
rend_rend_config_hospital_patientroom.cfg
rend_config_recreation.cfg
rend_config_renderer.cfg