Beam Search
get_steps_params
Get probability masks and beam widths required for each step of beam search algorithm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src |
Tensor[SEQUENCE_LEN, TOKEN_SIZE]
|
Keyless reading columns that are passed to the TRecover encoder. |
required |
Returns:
Name | Type | Description |
---|---|---|
steps_mask |
Tensor[SEQUENCE_LEN, TOKEN_SIZE]
|
Probability masks that consists of zeros in the places that correspond to the letters allowed for selection in the column (src[i]) and values equal to minus infinity in all others. |
steps_width |
List[int]
|
The beam width for each step of the algorithm. |
Source code in src/trecover/utils/beam_search.py
beam_step
Implementation of the beam search algorithm step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
candidates |
List[Tuple[Tensor[1, STEP_NUMBER, TOKEN_SIZE], float]]
|
List of candidates from the previous step. |
required |
step_mask |
Tensor[TOKEN_SIZE]
|
Column's mask that consists of zeros in the places that correspond to the letters allowed for selection in the column and values equal to minus infinity in all others. Required so that only the letters in the column are selected as a candidates. |
required |
step_width |
int
|
Number of candidates that are contained in the step column. |
required |
encoded_src |
Tensor[SEQUENCE_LEN, 1, D_MODEL]
|
Columns for keyless reading that were encoded by TRecover encoder. |
required |
model |
TRecover
|
Trained model for keyless reading. |
required |
beam_width |
int
|
Number of candidates that can be selected at the current step. |
required |
device |
torch.device
|
Device on which to allocate the candidate chains. |
required |
Returns:
Name | Type | Description |
---|---|---|
step_candidates |
List[Tuple[Tensor[1, STEP_NUMBER
|
List of candidates of size "beam_width" for the current step sorted in descending order of their probabilities. |
Notes
For each chain candidate from the previous step: * Probability distribution is calculated using trained TRecover model to select the next symbol from the current column,taking into account the "step_mask". * The most probable symbols are selected from the calculated probability distribution, the number of which is set by the "step_width" and "beam_width" parameters. * For each selected symbol, a new candidate chain with updated probability is constructed and placed in the "step_candidates" list.
All candidates are sorted in descending order of probabilities and the most probable ones are selected from them, the number of which is set by the "beam_width" parameter.
Source code in src/trecover/utils/beam_search.py
celery_task_loop
Get a beam search algorithm loop function, which is implemented for the Celery task.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
task |
celery.Task
|
Celery task base class. |
required |
Returns:
Name | Type | Description |
---|---|---|
inner_loop |
Callable[[Tensor, Tensor, TRecover, int, torch.device], List[Tuple[Tensor, float]]]
|
Beam search algorithm loop function for the Celery task. |
Source code in src/trecover/utils/beam_search.py
cli_interactive_loop
Get a beam search algorithm loop function, which is implemented for the cli interface.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
label |
str
|
Label for the cli progress bar. |
'Processing'
|
Returns:
Name | Type | Description |
---|---|---|
inner_loop |
Callable[[Tensor, Tensor, TRecover, int, torch.device], List[Tuple[Tensor, float]]]
|
Beam search algorithm loop function for the cli interface. |
Source code in src/trecover/utils/beam_search.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|
dashboard_loop
Beam search algorithm loop implementation for the dashboard interface.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src |
Tensor[SEQUENCE_LEN, TOKEN_SIZE]
|
Keyless reading columns that are passed to the TRecover encoder. |
required |
encoded_src |
Tensor[SEQUENCE_LEN, 1, D_MODEL]
|
Keyless reading columns that were encoded by TRecover encoder. |
required |
model |
TRecover
|
Trained model for keyless reading. |
required |
width |
int
|
Number of candidates that can be selected at each step. |
required |
device |
torch.device
|
Device on which to allocate the candidate chains. |
required |
Returns:
Name | Type | Description |
---|---|---|
candidates |
List[Tuple[Tensor[1, SEQUENCE_LEN
|
List of chains sorted in descending order of probabilities. The number of candidates is set by the "width" parameter. |
Notes
Probability masks and beam widths values required for each step of the beam search algorithm are calculated by the "get_steps_params" function using keyless reading columns ("src") that are passed to the TRecover encoder.
An empty chain (zero token of shape [1, 1, TOKEN_SIZE]) with zero probability is used as the first candidate for the algorithm.
At each step of the algorithm, the progress bar is updated and displayed on the dashboard.
Source code in src/trecover/utils/beam_search.py
standard_loop
Base implementation of the beam search algorithm loop.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src |
Tensor[SEQUENCE_LEN, TOKEN_SIZE]
|
Keyless reading columns that are passed to the TRecover encoder. |
required |
encoded_src |
Tensor[SEQUENCE_LEN, 1, D_MODEL]
|
Keyless reading columns that were encoded by TRecover encoder. |
required |
model |
TRecover
|
Trained model for keyless reading. |
required |
width |
int
|
Number of candidates that can be selected at each step. |
required |
device |
torch.device
|
Device on which to allocate the candidate chains. |
required |
Returns:
Name | Type | Description |
---|---|---|
candidates |
List[Tuple[Tensor[1, SEQUENCE_LEN
|
List of chains sorted in descending order of probabilities. The number of candidates is set by the "width" parameter. |
Notes
Probability masks and beam widths values required for each step of the beam search algorithm are calculated by the "get_steps_params" function using keyless reading columns ("src") that are passed to the TRecover encoder.
An empty chain (zero token of shape [1, 1, TOKEN_SIZE]) with zero probability is used as the first candidate for the algorithm.
Source code in src/trecover/utils/beam_search.py
beam_search
Beam search algorithm implementation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src |
Tensor[SEQUENCE_LEN, TOKEN_SIZE]
|
Keyless reading columns that are passed to the TRecover encoder. |
required |
model |
TRecover
|
Trained model for keyless reading. |
required |
width |
int
|
Number of candidates that can be selected at each step. |
required |
device |
torch.device
|
Device on which to allocate the candidate chains. |
required |
beam_loop |
Callable[[Tensor, Tensor, TRecover, int, torch.device],
|
List[Tuple[Tensor, float]]], default=standard_loop Beam search algorithm loop function. |
standard_loop
|
Returns:
Name | Type | Description |
---|---|---|
candidates |
List[Tuple[Tensor[SEQUENCE_LEN], float]]
|
List of chains sorted in descending order of probabilities. The number of candidates is set by the "width" parameter. |
Notes
Initially, the keyless reading columns ("src") are encoded using TRecover encoder, then the encoded columns ("encoded_src") are used at each step of the algorithm.
Source code in src/trecover/utils/beam_search.py
async_beam_step
async
Asynchronous implementation of the beam search algorithm step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
candidates |
List[Tuple[Tensor[1, STEP_NUMBER, TOKEN_SIZE], float]]
|
List of candidates from the previous step. |
required |
step_mask |
Tensor[TOKEN_SIZE]
|
Column's mask that consists of zeros in the places that correspond to the letters allowed for selection in the column and values equal to minus infinity in all others. Required so that only the letters in the column are selected as a candidates. |
required |
step_width |
int
|
Number of candidates that are contained in the step column. |
required |
encoded_src |
Tensor[SEQUENCE_LEN, 1, D_MODEL]
|
Columns for keyless reading that were encoded by TRecover encoder. |
required |
model |
TRecover
|
Trained model for keyless reading. |
required |
beam_width |
int
|
Number of candidates that can be selected at the current step. |
required |
device |
torch.device
|
Device on which to allocate the candidate chains. |
required |
Returns:
Name | Type | Description |
---|---|---|
step_candidates |
List[Tuple[Tensor[1, STEP_NUMBER
|
List of candidates of size "beam_width" for the current step sorted in descending order of their probabilities. |
Notes
For each chain candidate from the previous step: * Probability distribution is calculated using trained TRecover model to select the next symbol from the current column,taking into account the "step_mask". * The most probable symbols are selected from the calculated probability distribution, the number of which is set by the "step_width" and "beam_width" parameters. * For each selected symbol, a new candidate chain with updated probability is constructed and placed in the "step_candidates" list.
All candidates are sorted in descending order of probabilities and the most probable ones are selected from them, the number of which is set by the "beam_width" parameter.
Source code in src/trecover/utils/beam_search.py
api_interactive_loop
Get an asynchronous beam search algorithm loop function, which is implemented for the API interface.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queue |
asyncio.Queue
|
Asynchronous queue for storing intermediate results. |
required |
delimiter |
str
|
Delimiter for columns visualization. |
''
|
Returns:
Name | Type | Description |
---|---|---|
async_inner_loop |
Callable[[Tensor, Tensor, TRecover, int, torch.device], Awaitable]
|
Asynchronous beam search algorithm loop function for the API interface. |
Source code in src/trecover/utils/beam_search.py
standard_async_loop
async
Base asynchronous implementation of the beam search algorithm loop.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src |
Tensor[SEQUENCE_LEN, TOKEN_SIZE]
|
Keyless reading columns that are passed to the TRecover encoder. |
required |
encoded_src |
Tensor[SEQUENCE_LEN, 1, D_MODEL]
|
Keyless reading columns that were encoded by TRecover encoder. |
required |
model |
TRecover
|
Trained model for keyless reading. |
required |
width |
int
|
Number of candidates that can be selected at each step. |
required |
device |
torch.device
|
Device on which to allocate the candidate chains. |
required |
Returns:
Name | Type | Description |
---|---|---|
candidates |
List[Tuple[Tensor[1, SEQUENCE_LEN
|
List of chains sorted in descending order of probabilities. The number of candidates is set by the "width" parameter. |
Notes
Probability masks and beam widths values required for each step of the beam search algorithm are calculated by the "get_steps_params" function using keyless reading columns ("src") that are passed to the TRecover encoder.
An empty chain (zero token of shape [1, 1, TOKEN_SIZE]) with zero probability is used as the first candidate for the algorithm.
Source code in src/trecover/utils/beam_search.py
async_beam_search
async
Asynchronous beam search algorithm implementation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src |
Tensor[SEQUENCE_LEN, TOKEN_SIZE]
|
Keyless reading columns that are passed to the TRecover encoder. |
required |
model |
TRecover
|
Trained model for keyless reading. |
required |
width |
int
|
Number of candidates that can be selected at each step. |
required |
device |
torch.device
|
Device on which to allocate the candidate chains. |
required |
beam_loop |
Callable[[Tensor, Tensor, TRecover, int, torch.device],
|
List[Tuple[Tensor, float]]], default=standard_loop Beam search algorithm loop function. |
standard_async_loop
|
Returns:
Name | Type | Description |
---|---|---|
candidates |
Optional[List[Tuple[Tensor[SEQUENCE_LEN], float]]]
|
List of chains sorted in descending order of probabilities. The number of candidates is set by the "width" parameter. Returns None if "api_interactive_loop" is used as a beam search loop function. |
Notes
Initially, the keyless reading columns ("src") are encoded using TRecover encoder, then the encoded columns ("encoded_src") are used at each step of the asynchronous algorithm.