revision of Readme.txt file (577fbc03) · Commits · IVAS Codec Public Collaboration / IVAS Codec

readme.txt

+90 −80

Original line number	Diff line number	Diff line
		@@ -31,12 +31,13 @@
		*******************************************************************************************************/


		These files represent a pre-release of a codec candidate to the IVAS
		These files represent a codec candidate to the IVAS
		Extension to the 3GPP EVS Codec floating-point C simulation. All code is
		written in ANSI-C. The system is implemented as two separate programs:
		written in C. The system is implemented as three separate programs:

		IVAS_cod Encoder
		IVAS_dec Decoder
		IVAS_rend Renderer

		For encoding using the coder program, the input is a binary
		audio file (.8k, .16k, .32k, .48k) and the output is a binary
		@@ -62,7 +63,8 @@ such as an HP (HP-UX) or a Sun, then binary files will need to be modified
		by swapping the byte order in the files.

		The input and output files (.8k, .16k, .32k, .48k) are 16-bit signed
		binary files with 8/16/32/48 kHz sampling rate with no headers.
		binary files with 8/16/32/48 kHz sampling rate with no headers. Alternatively,
		the input and output files are WAV files.

		The Encoder produces bitstream files in either ITU G.192 or MIME file
		storage format.
		@@ -126,10 +128,9 @@ should have the following structure:
		\|-- lib_debug
		\|-- lib_dec
		\|-- lib_enc
		\|-- lib_lc3plus
		\|-- lib_rend
		\|-- lib_util
		\|-- scripts
		\|-- tests
		\|-- readme.txt

		The package includes a Makefile for gcc, which has been verified on
		@@ -140,9 +141,10 @@ in the c-code directory.

		The package also includes a solution-file for Microsoft Visual Studio 2017 (x86).
		To compile the code, please open "Workspace_msvc\Workspace_msvc.sln" and build
		"encoder" for the encoder and "decoder" for the decoder executable. The resulting
		encoder/decoder/renderer executables are named "IVAS_cod.exe", "IVAS_dec.exe",
		and "IVAS_rend.exe". All reside in the c-code directory.
		"encoder" for the encoder, "decoder" for the decoder, and "renderer" for the
		renderer executable. The resulting encoder/decoder/renderer executables are
		"IVAS_cod.exe", "IVAS_dec.exe", and "IVAS_rend.exe". All reside in the c-code
		main directory.


		RUNNING THE SOFTWARE
		@@ -168,8 +170,8 @@ R : Bitrate in bps,
		for 2 ISM, 3 ISM and 4 ISM also 160000, 192000, 256000
		for 3 ISM and 4 ISM also 384000
		for 4 ISM also 512000
		for IVAS SBA, MASA, MC R=(13200, 16400, 24400, 32000, 48000, 64000, 80000,
		96000, 128000, 160000, 192000, 256000, 384000, 512000)
		for IVAS SBA, MASA, MC, ISM-MASA, and ISM-SBA R=(13200, 16400, 24400, 32000,
		48000, 64000, 80000, 96000, 128000, 160000, 192000, 256000, 384000, 512000)
		Alternatively, R can be a bitrate switching file which consists of R values
		indicating the bitrate for each frame in bps. These values are stored in
		binary format using 4 bytes per value
		@@ -201,27 +203,24 @@ EVS mono is default, for IVAS choose one of the following: -stereo, -ism, -sba,
		where InputConf specifies the channel configuration: 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4
		Loudspeaker positions are assumed to have azimuth and elevation as per
		ISO/IEC 23091-3:2018 Table 3. Channel order is as per ISO/IEC 23008-3:2015 Table 95.
		See readme.txt for details.
		See below for details.
		-dtx D : Activate DTX mode, D = (0, 3-100) is the SID update rate
		where 0 = adaptive, 3-100 = fixed in number of frames,
		default is deactivated
		where 0 = adaptive, 3-100 = fixed in number of frames, default is deactivated
		-dtx : Activate DTX mode with a SID update rate of 8 frames
		Note: DTX is supported in EVS, stereo, ISM, SBA up to 80kbps and MASA up to 128kbps
		-rf p o : Activate channel-aware mode for WB and SWB signal at 13.2kbps,
		Note: DTX is supported in EVS, stereo, ISM, MASA, and SBA up to 80kbps
		-rf p o : Activate channel-aware mode in EVS for WB and SWB signal at 13.2kbps,
		where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames.
		Alternatively p and o can be replaced by a rf configuration file with each line
		contains the values of p and o separated by a space,
		default is deactivated
		contains the values of p and o separated by a space, default is deactivated
		-max_band B : Activate bandwidth limitation, B = (NB, WB, SWB or FB)
		alternatively, B can be a text file where each line contains "nb_frames B"
		-no_delay_cmp : Turn off delay compensation
		-stereo_dmx_evs : Activate stereo downmix function for EVS.
		-stereo_dmx_evs : Stereo downmix function for EVS
		-mime : Mime output bitstream file format
		The encoder produces TS26.445 Annex.2.6 Mime Storage Format, (not RFC4867 Mime Format).
		default output bitstream file format is G.192
		-bypass mode : SBA PCA by-pass, mode = (1, 2), 1 = PCA off, 2 = signal adaptive, default is 1
		-q : Quiet mode, no frame counters
		default is deactivated
		-q : Quiet mode, limit printouts to terminal, default is deactivated


		The usage of the "IVAS_dec" program is as follows:
		@@ -233,7 +232,8 @@ Usage for IVAS: IVAS_dec.exe [Options] OutputConf Fs bitstream_file output_file
		Mandatory parameters:
		---------------------
		OutputConf : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA,
		HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT
		HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB,
		BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT
		By default, channel order and loudspeaker positions are equal to the
		encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker
		layout file. See below for details.
		@@ -261,7 +261,7 @@ Options:
		Format files, the magic word in the mime file is used to determine
		which of the two supported formats is in use.
		default bitstream file format is G.192
		-hrtf File : HRTF filter File used in ISm format and BINAURAL output configuration
		-hrtf File : HRTF filter File used in BINAURAL rendering
		-T File : Head rotation specified by external trajectory File
		-otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg', 'ref_vec'
		or 'ref_vec_lev' (only for binaural rendering)
		@@ -269,11 +269,11 @@ Options:
		works only in combination with '-otr ref' mode
		-rvf File : Reference vector specified by external trajectory file
		works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes
		-render_config File : Renderer configuration option File
		-render_config File : Renderer configuration option with parameters specified in File
		-om File : MD output file for BINAURAL_SPLIT_PCM output
		-non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90,
		left or l or 90->left, right or r or -90->right, center or c or 0->middle
		-q : Quiet mode, no frame counter
		default is deactivated
		-q : Quiet mode, limit printouts to terminal, default is deactivated


		The usage of the "IVAS_rend" program is as follows:
		@@ -282,34 +282,36 @@ The usage of the "IVAS_rend" program is as follows:
		Usage: IVAS_rend [options]

		Valid options:
		--input_file, -i Path to the input file (WAV, raw PCM or scene description file)
		--input_format, -if Audio format of input file (e.g. 5_1 or HOA3 or META, use -l for a list)
		--input_metadata, -im Space-separated list of path to metadata files for ISM or MASA inputs or BINAURAL_SPLIT_PCM input mode
		--output_file, -o Path to the output file
		--output_format, -of Output format to render.
		Alternatively, can be a custom loudspeaker layout file
		--sample_rate, -fs Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs
		--trajectory_file, -tf Head rotation trajectory file for simulation of head tracking (only for binaural outputs)
		--output_metadata, -om coded metadata file for BINAURAL_SPLIT_PCM output mode
		--post_rend_bfi_file, -prbfi Split rendering option: bfi file
		--reference_rotation_file, -rf Reference rotation trajectory file for simulation of head tracking (only for binaural outputs)
		--custom_hrtf, -hrtf Custom HRTF file for binaural rendering (only for binaural outputs)
		--render_config, -rc Binaural renderer configuration file (only for binaural outputs)
		--non_diegetic_pan, -ndp Panning mono non diegetic sound to stereo -90<= pan <= 90
		-i File : Input audio File (WAV, raw PCM or scene description file)
		-if Format : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list)
		-im Files : Metadata files for ISM (one file per object) or MASA inputs or BINAURAL_SPLIT_PCM input mode
		-o File : Output audio File
		-of Format : Audio Format of output file
		Alternatively, it can be a custom loudspeaker layout file
		-fs : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs
		-tf File : Head rotation trajectory file for simulation of head tracking (only for binaural outputs)
		-om File : Coded metadata File for BINAURAL_SPLIT_PCM output mode
		-prbfi File : Split rendering option: bfi File
		-rf File : Reference rotation trajectory File for simulation of head tracking (only for binaural outputs)
		-rvf File : Reference vector trajectory File for simulation of head tracking (only for binaural outputs)
		-hrtf File : Custom HRTF File for binaural rendering (only for binaural outputs)
		-rc File : Binaural renderer configuration File (only for binaural outputs)
		-ndp P : Panning mono non-diegetic sound to stereo -90<= P <= 90
		left or l or 90->left, right or r or -90->right, center or c or 0 ->middle

		--tracking_type, -otr Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs)
		--lfe_position, -lp Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear (like --gain, -g) and azimuth, elevation are in degrees.
		-otr tracking_type : Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs)
		-lp Position : Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear
		(like --gain, -g) and azimuth, elevation are in degrees.
		If specified, overrides the default behavior which attempts to map input to output LFE channel(s)
		--lfe_matrix, -lm LFE panning matrix. File (CSV table) containing a matrix of dimensions [ num_input_lfe x num_output_channels ] with elements specifying linear routing gain (like --gain, -g).
		If specified, overrides the output LFE position option and the default behavior which attempts to map input to output LFE channel(s)
		--no_delay_cmp, -ndc [flag] Turn off delay compensation
		--quiet, -q [flag] Limit printouts to terminal
		--gain, -g Input gain (linear, not in dB) to be applied to input audio file
		--list, -l List supported audio formats
		--reference_vector_file, -rvf Reference vector trajectory file for simulation of head tracking (only for binaural outputs)
		--exterior_orientation_file, -exof External orientation trajectory file for simulation of external orientations
		--sync_md_delay, -smd Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes for TDRenderer (13ms -> 10ms -> 2subframes)
		-lm File : LFE panning matrix File (CSV table) containing a matrix of dimensions [ num_input_lfe x
		num_output_channels ] with elements specifying linear routing gain (like --gain, -g).
		If specified, overrides the output LFE position option and the default behavior which attempts to map
		input to output LFE channel(s)
		-ndc : Turn off delay compensation
		-q : Quiet mode, limit printouts to terminal, default is deactivated
		-g : Input gain (linear, not in dB) to be applied to input audio file
		-l : List supported audio formats
		-exof : External orientation trajectory file for simulation of external orientations
		-smd : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes.


		MULTICHANNEL LOUDSPEAKER INPUT / OUTPUT CONFIGURATIONS
		@@ -344,10 +346,10 @@ An example custom loudspeaker layout file is available: ls_setup_16ch_8+4+4.txt
		RUNNING THE SELF TEST
		=====================

		A codec verification script is available in scripts/self_test.py. The
		script demonstrates how to use the software at several operating points and
		compares the output to a reference version/implementation. Please note:
		In order to keep the run-time short it does not cover all operating
		A codec verification script is available at https://forge.3gpp.org/rep/ivas-codec-pc/ivas-codec/
		in scripts/self_test.py. The script demonstrates how to use the software at several operating points
		and compares the output to a reference version/implementation.
		Please note: In order to keep the run-time short it does not cover all operating
		points or complete coverage.

		Documentation on the self_test.py can be found as a part of scripts/README.md.
		@@ -385,13 +387,29 @@ stvST32c.wav - 2 channels, 32000 Hz, 659200 samples per channel, clean spe
		stvST32n.wav - 2 channels, 32000 Hz, 620800 samples per channel, noisy speech
		stvST48c.wav - 2 channels, 48000 Hz, 988800 samples per channel, clean speech/audio
		stvST48n.wav - 2 channels, 48000 Hz, 931200 samples per channel, noisy speech
		stv1MASA1TC48c.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 144000 samples
		stv1MASA1TC48n.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 963840 samples
		stv1MASA2TC48c.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 288000 samples per channel
		stv1MASA2TC48n.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 963840 samples per channel
		stv2MASA1TC48c.wav - 1 channel (1 MASA transport channel), 48000 Hz, 48000 Hz, 288000
		stv2MASA2TC48c.wav - 2 channels (2 MASA transport channel), 48000 Hz, 48000 Hz, 144000 samples per channel

		stv1MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 144000 samples
		stv1MASA1TC48n.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 963840 samples
		stv1MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 288000 samples per channel
		stv1MASA2TC48n.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 963840 samples per channel
		stv2MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 288000
		stv2MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 144000 samples per channel
		stvOMASA_1ISM_1MASA2TC48c.wav - 3 channels (1 discrete audio object and 1 MASA 2 transport channels), 48000 Hz
		stvOMASA_1ISM_2MASA1TC32c.wav - 2 channels (1 discrete audio object and 2 MASA 1 transport channel), 32000 Hz
		stvOMASA_1ISM_2MASA2TC48c.wav - 3 channels (1 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
		stvOMASA_2ISM_1MASA1TC16c.wav - 3 channels (2 discrete audio object and 1 MASA 1 transport channel), 48000 Hz
		stvOMASA_2ISM_1MASA2TC48c.wav - 4 channels (2 discrete audio object and 1 MASA 2 transport channels), 16000 Hz
		stvOMASA_2ISM_2MASA2TC48c.wav - 4 channels (2 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
		stvOMASA_3ISM_1MASA1TC32c.wav - 4 channels (3 discrete audio object and 1 MASA 1 transport channel), 32000 Hz
		stvOMASA_3ISM_1MASA2TC16c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 16000 Hz
		stvOMASA_3ISM_1MASA2TC32c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz
		stvOMASA_3ISM_1MASA2TC48c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz
		stvOMASA_3ISM_2MASA1TC48c.wav - 4 channels (3 discrete audio object and 2 MASA 1 transport channel), 48000 Hz
		stvOMASA_3ISM_2MASA2TC32c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 32000 Hz
		stvOMASA_3ISM_2MASA2TC48c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
		stvOMASA_4ISM_1MASA1TC48c.wav - 5 channels (4 discrete audio object and 1 MASA 1 transport channel), 48000 Hz
		stvOMASA_4ISM_1MASA2TC48c.wav - 6 channels (4 discrete audio object and 1 MASA 2 transport channels), 48000 Hz
		stvOMASA_4ISM_2MASA1TC48c.wav - 5 channels (4 discrete audio object and 2 MASA 1 transport channel), 48000 Hz
		stvOMASA_4ISM_2MASA2TC48c.wav - 6 channels (4 discrete audio object and 2 MASA 2 transport channels), 48000 Hz

		For the MASA operation modes, in addition the following metadata files
		located in /scripts/testv/ folder are required:
		@@ -466,21 +484,13 @@ headrot_case01_3000_q.csv
		headrot_case02_3000_q.csv
		headrot_case03_3000_q.csv

		For Reference vector specified by external trajectory file, example files are available at
		/scripts/trajectories folder.


		For the Renderer configuration option operation modes, external configuration files are available:

		rend_config_hospital_patientroom.cfg
		config_recreation.cfg
		config_renderer.cfg
		For Reference vector specified by external trajectory file, example files are available in folder
		/scripts/trajectories.


		ADDITIONAL SCRIPTS
		==================
		For the Renderer configuration option operation modes, external configuration files are available, e.g.:

		Additional scripts for item generation and codec testing are available
		in the directories scripts and tests. Please refer to scripts/README.md, resp.
		tests/README.md for additional documentation.
		rend_rend_config_hospital_patientroom.cfg
		rend_config_recreation.cfg
		rend_config_renderer.cfg