Home TLEX

Error Handling

Errors can occur either due to invalid patterns or when no match is performed, eg:

import { Tokenizer, Rule, TapeInterface as Tape, Token } from "tlex";
const tokenizer = new Tokenizer()
                  .add("a")
                  .add("b")
                  .add("c")
                  .add("d+b")
                  .add(/\s+/, { skip: true } );

Output

With the following input:

console.log(tokenizer.tokenize("a b c ddddb e"));

We would expect an error at the "e" as it is not the start character of any rule. Note that the error is returned as a token at the end.

[
  Token {
    tag: null,
    matchIndex: 0,
    start: 0,
    end: 1,
    id: 0,
    value: 'a',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 1,
    start: 2,
    end: 3,
    id: 2,
    value: 'b',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 2,
    start: 4,
    end: 5,
    id: 4,
    value: 'c',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 3,
    start: 6,
    end: 11,
    id: 6,
    value: 'ddddb',
    groups: {},
    positions: {}
  },
  {
    tag: 'ERROR',
    start: 12,
    end: 13,
    value: 'Unexpected Character: e'
  }
]

Similarly for a slightly different input:

console.log(tokenizer.tokenize("a b c ddddcb e"));

we would expect the failure on the invalid lexeme ddddc:

[
  Token {
    tag: null,
    matchIndex: 0,
    start: 0,
    end: 1,
    id: 0,
    value: 'a',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 1,
    start: 2,
    end: 3,
    id: 2,
    value: 'b',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 2,
    start: 4,
    end: 5,
    id: 4,
    value: 'c',
    groups: {},
    positions: {}
  },
  {
    tag: 'ERROR',
    start: 6,
    end: 11,
    value: 'Unexpected Symbol: ddddc'
  }
]

As before tokens before the error are returned with the error token being the last item.

Catching errors

Instead of stopping at the first invalid character or lexeme, errors can be caught so that lexing can be continued (either via a reset, or user specified correction).

tokenizer.onError = (err: Error, tape: Tape, index: number) => {
  return null;
};
console.log(tokenizer.tokenize("a b c ddddcb e a b c"));

Output

[
  Token {
    tag: null,
    matchIndex: 0,
    start: 0,
    end: 1,
    id: 0,
    value: 'a',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 1,
    start: 2,
    end: 3,
    id: 2,
    value: 'b',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 2,
    start: 4,
    end: 5,
    id: 4,
    value: 'c',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 1,
    start: 11,
    end: 12,
    id: 6,
    value: 'b',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 0,
    start: 15,
    end: 16,
    id: 9,
    value: 'a',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 1,
    start: 17,
    end: 18,
    id: 11,
    value: 'b',
    groups: {},
    positions: {}
  },
  Token {
    tag: null,
    matchIndex: 2,
    start: 19,
    end: 20,
    id: 13,
    value: 'c',
    groups: {},
    positions: {}
  }
]

Here when an error is encountered it is simply skipped and tokenization continues at the next index.