Skip to article frontmatterSkip to article content
from __init__ import install_dependencies

await install_dependencies()
import ast
from typing import Annotated
from ipywidgets import Text, interact
from pydantic import Field, ValidationError, validate_call

%load_ext divewidgets

Parsing User Input

A parser is a crucial component that analyzes the structure of data, often in the form of text, and converts it into a more meaningful format. For instance, when executing a Python program, the Python interpreter first parses the program’s source code into an Abstract Syntax Tree (AST).

print(ast.dump(ast.parse('q = a/b if b else "undefined"', "_", "exec"), indent=4))

This involves breaking down the source code into individual components and interpreting their roles. E.g.,

The processs is called tokenization, which is performed by the so-called lexer. The parser composes a hierarchical structure that accurately represents the operations and their execution flow. E.g., the python code

... = .../... if ... else ...

is translated to the tree structure:

...
        Assign(
            targets=...
            value=IfExp(
                test=...
                body=BinOp(
                    left=...
                    op=Div(),
                    right=...
                orelse=...
...

In other words, the division operation Div needs to be completed first so that the conditional expression IfExp can be completed, which then allows the assignment operation Assign to complete. As the language grows, the logic involved in parsing the program becomes more sophisticated. For more details, see the ANTLR video.

Parsing Boolean Values

The following is a very (very) simple parser that can understood yes/no strings in user input and convert them to boolean values.

@interact(x=Text('yes'))
def parse(x):
    s = x.lower()
    match s:
        case "yes" | "y":
            return True
        case "no" | "n":
            return False
        case _:
            return string

The matching is case in-sensitive:

parse("yes"), parse("n"), parse("Y"), parse("No")

Instead of using the if statement, it is more convenient to use the match statement introduced in Python 3.10. The following flowchart shows roughly how the statement is executed:

%%flowchart
st=>start: Start
suite0=>operation: s = x.lower()
cond1=>condition: s == "yes" or s == "y"
cond2=>condition: s == "no" or s == "n"
suite1=>inputoutput: return True
suite2=>inputoutput: return False
suite3=>inputoutput: return x
e=>end

st(right)->suite0(right)->cond1
cond1(yes)->suite1->e
cond1(no)->cond2
cond2(yes)->suite2->e
cond2(no)->suite3->e
  • Initially, s = string.lower() obtains the string in lowercase.
  • In the first case of the match statement, "yes" or "y" is converted to the boolean value True.
  • In the second case of the match statement, "no" or "n" is converted to the boolean value False.
@interact(x=Text('T'))
def parse(x):
    # YOUR CODE HERE
    raise NotImplementedError
# tests
assert parse("TRUE") is True
assert parse("Y") is True
assert parse("t") is True
assert parse("False") is False
assert parse("F") is False
assert parse("n") is False
assert parse("TrUE") is True
assert parse("No") is False

Parsing Numbers

It is desirable to parse numbers as well. There is indeed a way to check whether a string consists only of digits:

str.isdigit("1302"), "CS1302".isdigit()

Unfortunately, the function failed to detect negative integers:

"-12".isdigit()

The following function resolves the issue using the try statement. It even works x is of type int.

def isint(x):
    """
    Returns True if x can be converted to an integer, and False otherwise.
    """
    try:
        int(x)
    except ValueError:
        return False
    return True


isint("CS1302"), isint("1302"), isint("-1302"), isint(-1302)

How does it work? isint(x) would describe its implementation as follows:

I try to convert x to int and return True except when ValueError is raised, in which case I return False[1]

This is illustrated by the following flowchart.

%%flowchart
st=>start: Start
cond1=>condition: ValueError
suite1=>operation: int(x)
suite2=>inputoutput: return True
suite3=>inputoutput: return False
e=>end

st(right)->suite1(right)->cond1
cond1(yes)->suite3->e
cond1(no)->suite2->e
@interact(x=Text("-13+0.2j"))
def parse(x):
    # YOUR CODE HERE
    raise NotImplementedError
# tests
assert (_:=parse("1302")) == 1302 and isinstance(_, int)
assert (_:=parse("-13.02")) == -13.02 and isinstance(_, float)
assert (_:=parse("-13+0.2j")) == -13+0.2j and isinstance(_, complex)
assert parse("yes") is True
assert parse("N") is False
assert (_:=parse("-1302")) == -1302 and isinstance(_, int)
assert (_:=parse("inf")) == float("inf") and isinstance(_, float)

Data Validation

If you have implemented your parser correctly in the last section, the interact function of ipywidgets allows you to play with the function interactively:

@interact(a=Text("3"), b=Text("4"))
def length_of_hypotenuse(a, b):
    a, b = parse(a), parse(b)
    c = (a**2 + b**2) ** (0.5)
    return c

Without the parser, i.e., with the line a, b = parse(a), parse(b) removed, the above code will fail because a and b are passed to length_of_hypotenuse as string values, not numbers, and exponentiation such as a ** 2 is not implemented for string value. With the parser, however, you can even call the function with integer arguments:

length_of_hypotenuse(3, 4)

Assertion

Interestingly, the function does not fail even if the input arguments are negative numbers.

a, b = -3, 4
length_of_hypotenuse(a, b)

Instead of allow the input arguments to be any values of any type, it is often better to validate the arguments and raise error if the values or types are unexpected. We can achieve this using the assert statement:

%%optlite -h 400
def length_of_hypotenuse(a, b):
    assert a >= 0 and b >= 0
    c = (a**2 + b**2) ** (0.5)
    return c


length_of_hypotenuse(-3, 4)

Validation is the process of checking whether a desired condition holds before further processing to avoid costly mistakes. Indeed, we have been using the assert statements for validation: For instance, you may validate the notebook before submission to lower the chance of careless mistakes. After the submission, there are also hidden tests to validate whether the submitted programs are engineered to work only on the visible test cases.

Our function is still imperfect. For instance, it allows edge length to be infinite:

Note that if the input argument is too large, the exponentiation function will raise an OverflowError:

%%optlite -h 500
def length_of_hypotenuse(a, b):
    assert a >= 0 and b >= 0
    c = (a**2 + b**2) ** (0.5)
    return c


c = length_of_hypotenuse(3e300, 4)

However, sometimes, no error is raised even if the input is too large:

def length_of_hypotenuse(a, b):
    assert a >= 0 and b >= 0
    c = (a**2 + b**2) ** (0.5)
    return c


c = length_of_hypotenuse(3e400, 4)
c = length_of_hypotenuse(3, 4e400)
c = length_of_hypotenuse(3e400, 4e400)
def length_of_hypotenuse(a, b):
    # YOUR CODE HERE
    raise NotImplementedError
    return c
# tests
def test_AE(a, b):
    try:
        c = length_of_hypotenuse(a, b)
        return max(a, b, c) < float("inf")
    except AssertionError:
        return True


assert length_of_hypotenuse(3, 4) == 5
assert test_AE(3, 4)
assert test_AE(3e300, 4)
assert test_AE(3e400, 4)
assert test_AE(3, 4e400)
assert test_AE(3e400, 4e400)

Type Hinting and Validation

Instead of manually checking input arguments, you can use the packages Pydantic and Typing:

NonNegative = Annotated[float, Field(ge=0)]


@interact(a=Text("3"), b=Text("4"))
@validate_call(validate_return=True)
def length_of_hypotenuse(a: NonNegative, b: NonNegative) -> NonNegative:
    """
    Return the length of hypotenuse.
    """
    c = (a * a + b * b) ** (0.5)
    return c

Notice that the string inputs a and b are automatically converted to float.

length_of_hypotenuse("3", 4)

However, the above code does not raise any ValidationError if the input a or b, or the output length of the hypotenuse overflows to float("inf"):

length_of_hypotenuse(3e400, 4)
# YOUR CODE HERE
raise NotImplementedError


@interact(a=Text("3"), b=Text("4"))
@validate_call(validate_return=True)
def length_of_hypotenuse(
    a: NonNegativeFinite, b: NonNegativeFinite
) -> NonNegativeFinite:
    """
    Return the length of hypotenuse.
    """
    c = (a * a + b * b) ** (0.5)
    return c
# test
def test_VE(a, b):
    try:
        c = length_of_hypotenuse(a, b)
        return max(a, b, c) < float("inf")
    except ValidationError:
        return True

assert test_AE(3, 4)
assert test_VE(3e300, 4)
assert test_VE(3e400, 4)
assert test_VE(3, 4e400)
assert test_VE(3e400, 4e400)
Footnotes
  1. Why use first person narration? Just to avoid error like tries (a syntax error in Python), or trys (a gramatical mistake).