from __init__ import install_dependencies
await install_dependencies()
import ast
from typing import Annotated
from ipywidgets import Text, interact
from pydantic import Field, ValidationError, validate_call
%load_ext divewidgets
Parsing User Input¶
A parser is a crucial component that analyzes the structure of data, often in the form of text, and converts it into a more meaningful format. For instance, when executing a Python program, the Python interpreter first parses the program’s source code into an Abstract Syntax Tree (AST).
print(ast.dump(ast.parse('q = a/b if b else "undefined"', "_", "exec"), indent=4))
This involves breaking down the source code into individual components and interpreting their roles. E.g.,
q
is understood as a variable name for storing a value, whilea
andb
are also variable names but for loading values; and"undefined"
is regarded as a contant string literal.
The processs is called tokenization, which is performed by the so-called lexer. The parser composes a hierarchical structure that accurately represents the operations and their execution flow. E.g., the python code
... = .../... if ... else ...
is translated to the tree structure:
...
Assign(
targets=...
value=IfExp(
test=...
body=BinOp(
left=...
op=Div(),
right=...
orelse=...
...
In other words, the division operation Div
needs to be completed first so that the conditional expression IfExp
can be completed, which then allows the assignment operation Assign
to complete. As the language grows, the logic involved in parsing the program becomes more sophisticated. For more details, see the ANTLR video.
Parsing Boolean Values¶
The following is a very (very) simple parser that can understood yes/no strings in user input and convert them to boolean values.
@interact(x=Text('yes'))
def parse(x):
s = x.lower()
match s:
case "yes" | "y":
return True
case "no" | "n":
return False
case _:
return string
The matching is case in-sensitive:
parse("yes"), parse("n"), parse("Y"), parse("No")
Instead of using the if
statement, it is more convenient to use the match
statement introduced in Python 3.10. The following flowchart shows roughly how the statement is executed:
%%flowchart
st=>start: Start
suite0=>operation: s = x.lower()
cond1=>condition: s == "yes" or s == "y"
cond2=>condition: s == "no" or s == "n"
suite1=>inputoutput: return True
suite2=>inputoutput: return False
suite3=>inputoutput: return x
e=>end
st(right)->suite0(right)->cond1
cond1(yes)->suite1->e
cond1(no)->cond2
cond2(yes)->suite2->e
cond2(no)->suite3->e
- Initially,
s = string.lower()
obtains the string in lowercase. - In the first case of the match statement,
"yes"
or"y"
is converted to the boolean valueTrue
. - In the second case of the match statement,
"no"
or"n"
is converted to the boolean valueFalse
.
@interact(x=Text('T'))
def parse(x):
# YOUR CODE HERE
raise NotImplementedError
# tests
assert parse("TRUE") is True
assert parse("Y") is True
assert parse("t") is True
assert parse("False") is False
assert parse("F") is False
assert parse("n") is False
assert parse("TrUE") is True
assert parse("No") is False
Parsing Numbers¶
It is desirable to parse numbers as well. There is indeed a way to check whether a string consists only of digits:
str.isdigit("1302"), "CS1302".isdigit()
Unfortunately, the function failed to detect negative integers:
"-12".isdigit()
The following function resolves the issue using the try
statement. It even works x
is of type int
.
def isint(x):
"""
Returns True if x can be converted to an integer, and False otherwise.
"""
try:
int(x)
except ValueError:
return False
return True
isint("CS1302"), isint("1302"), isint("-1302"), isint(-1302)
How does it work? isint(x)
would describe its implementation as follows:
I
try
to convertx
toint
andreturn True
except
whenValueError
is raised, in which case Ireturn False
[1]
This is illustrated by the following flowchart.
%%flowchart
st=>start: Start
cond1=>condition: ValueError
suite1=>operation: int(x)
suite2=>inputoutput: return True
suite3=>inputoutput: return False
e=>end
st(right)->suite1(right)->cond1
cond1(yes)->suite3->e
cond1(no)->suite2->e
@interact(x=Text("-13+0.2j"))
def parse(x):
# YOUR CODE HERE
raise NotImplementedError
# tests
assert (_:=parse("1302")) == 1302 and isinstance(_, int)
assert (_:=parse("-13.02")) == -13.02 and isinstance(_, float)
assert (_:=parse("-13+0.2j")) == -13+0.2j and isinstance(_, complex)
assert parse("yes") is True
assert parse("N") is False
assert (_:=parse("-1302")) == -1302 and isinstance(_, int)
assert (_:=parse("inf")) == float("inf") and isinstance(_, float)
Data Validation¶
If you have implemented your parser correctly in the last section, the interact
function of ipywidgets
allows you to play with the function interactively:
@interact(a=Text("3"), b=Text("4"))
def length_of_hypotenuse(a, b):
a, b = parse(a), parse(b)
c = (a**2 + b**2) ** (0.5)
return c
Without the parser, i.e., with the line a, b = parse(a), parse(b)
removed, the above code will fail because a
and b
are passed to length_of_hypotenuse
as string values, not numbers, and exponentiation such as a ** 2
is not implemented for string value. With the parser, however, you can even call the function with integer arguments:
length_of_hypotenuse(3, 4)
How does interact
work?
interact
work?The @interact
line is a decorator you will learn later in the course. It automatically create a user interface with two text input a
and b
, and continuously pass their updated values as arguments to length_of_hypotenuse
.
Assertion¶
Interestingly, the function does not fail even if the input arguments are negative numbers.
a, b = -3, 4
length_of_hypotenuse(a, b)
Instead of allow the input arguments to be any values of any type, it is often better to validate the arguments and raise error if the values or types are unexpected. We can achieve this using the assert
statement:
%%optlite -h 400
def length_of_hypotenuse(a, b):
assert a >= 0 and b >= 0
c = (a**2 + b**2) ** (0.5)
return c
length_of_hypotenuse(-3, 4)
Validation is the process of checking whether a desired condition holds before further processing to avoid costly mistakes. Indeed, we have been using the assert
statements for validation: For instance, you may validate the notebook before submission to lower the chance of careless mistakes. After the submission, there are also hidden tests to validate whether the submitted programs are engineered to work only on the visible test cases.
Our function is still imperfect. For instance, it allows edge length to be infinite:
Note that if the input argument is too large, the exponentiation function will raise an OverflowError
:
%%optlite -h 500
def length_of_hypotenuse(a, b):
assert a >= 0 and b >= 0
c = (a**2 + b**2) ** (0.5)
return c
c = length_of_hypotenuse(3e300, 4)
However, sometimes, no error is raised even if the input is too large:
def length_of_hypotenuse(a, b):
assert a >= 0 and b >= 0
c = (a**2 + b**2) ** (0.5)
return c
c = length_of_hypotenuse(3e400, 4)
c = length_of_hypotenuse(3, 4e400)
c = length_of_hypotenuse(3e400, 4e400)
def length_of_hypotenuse(a, b):
# YOUR CODE HERE
raise NotImplementedError
return c
# tests
def test_AE(a, b):
try:
c = length_of_hypotenuse(a, b)
return max(a, b, c) < float("inf")
except AssertionError:
return True
assert length_of_hypotenuse(3, 4) == 5
assert test_AE(3, 4)
assert test_AE(3e300, 4)
assert test_AE(3e400, 4)
assert test_AE(3, 4e400)
assert test_AE(3e400, 4e400)
Type Hinting and Validation¶
NonNegative = Annotated[float, Field(ge=0)]
@interact(a=Text("3"), b=Text("4"))
@validate_call(validate_return=True)
def length_of_hypotenuse(a: NonNegative, b: NonNegative) -> NonNegative:
"""
Return the length of hypotenuse.
"""
c = (a * a + b * b) ** (0.5)
return c
Notice that the string inputs a
and b
are automatically converted to float
.
length_of_hypotenuse("3", 4)
However, the above code does not raise any ValidationError
if the input a
or b
, or the output length of the hypotenuse overflows to float("inf")
:
length_of_hypotenuse(3e400, 4)
# YOUR CODE HERE
raise NotImplementedError
@interact(a=Text("3"), b=Text("4"))
@validate_call(validate_return=True)
def length_of_hypotenuse(
a: NonNegativeFinite, b: NonNegativeFinite
) -> NonNegativeFinite:
"""
Return the length of hypotenuse.
"""
c = (a * a + b * b) ** (0.5)
return c
# test
def test_VE(a, b):
try:
c = length_of_hypotenuse(a, b)
return max(a, b, c) < float("inf")
except ValidationError:
return True
assert test_AE(3, 4)
assert test_VE(3e300, 4)
assert test_VE(3e400, 4)
assert test_VE(3, 4e400)
assert test_VE(3e400, 4e400)
Why use first person narration? Just to avoid error like
tries
(a syntax error in Python), ortry
s (a gramatical mistake).