Abstract¶
In this notebook, readers will see how different expressions in a computer program get evaluated to different types of values. With variables, programmers can assign a meaningful name, known as an identifier, to a value of various types without having to worry about where it should be stored and transferred in the physical storage unit for computations. At this point, readers may regard identifiers simply as variables that can take different values, although this model will be refined later with the concepts of aliasing and mutations.
from __init__ import install_dependencies
await install_dependencies()
import sys # to access system-specific parameters
from dis import dis # for disassembly of bytecode
from ipywidgets import interact # for interactive user interface controls
# use OPTLite to visualize code execution
%load_ext divewidgets
# Set LLM alias
%load_ext jupyter_ai
%ai update chatgpt dive:chat
Values¶
Programming is the art of instructing a computer to perform tasks by manipulating data, which is represented as values. In mathematical terms, a computer (programming language) is essentially a set of rules for how values are stored and manipulated[1]. In other words, values are the building blocks of computation, and they represent a specific piece of data that can be manipulated in various ways.
Integers¶
While a machine language only uses binary numbers as values, a high-level programming language provides more flexible and expressive ways to represent and manipulate values.
In Python, for instance, we can enter an integer in different number systems as follows.
# in decimal (base 10)
15
# in 'b'inary (base 2)
0b1111
# in 'o'ctadecimal (base 8)
0o17
# in he'x'adecimal (base 16)
0xF
All the above expressions are integer literals, namely, integers written out literally. They have the same numerical value, which gets printed in decimal by default.
Caution
Later in the course, you will learn that almost everything in Python is an object but different objects can have the same value. Indeed, equality in value is a comparison operation that can be defined by the programmer.
There are also expressions with integer values but are not integer literals:
4 + 5 + 6
pow(2, 4) - 1
### BEGIN SOLUTION
# There is (in principle) no limit on how big an integer can be
10 ** sys.get_int_max_str_digits() # this is okay as long as it is not printed!
10 ** sys.get_int_max_str_digits() - 1
### END SOLUTION
# SPOILER: Printing an integer actually involves converting it to a string!
How to represent all integers?
Wouldn’t it be nice if there is no limit on how many integers a computer can represent? Since bits can represent at most integers, do we need infinite bits to represent an unbounded number of integers? That would take forever to input an integer into the computer, let alone storing it in finite memory! How about using a variable-length code? See the Kraft-McMillan inequality. An example of variable-length code is UTF-8 encoding for Unicode.
%%ai chatgpt -f text
Use a simple analogy to explain in a short paragraph how can a computer represent all integers using variable-length code?
Strings¶
A string value is a sequence of characters that can be written literally using quotes:
# single quote
print("\U0001f600:\n\tI'm a string.")
# double quote
print("\N{grinning face}:\n\tI'm a string.")
# triple double quote
print(
"""😀:
I'm a string."""
)
Note that all the literals represent the same value:
Escape sequence
\U0001f600
and \N{grinning face}
are escape sequences representing the same grinning face emoji 😀 where
0001f600
is the unicode in hexadecimal,grinning face
is the name, and\
is called the escape symbol.
Control code
\n
and \t
are control code that does not represent any symbol.
\n
creates a new line when printing the string.\t
creates a tab to indent the line.
Benefits of allowing single, double, and triple quotes for string literals
- By using double quotes, we don’t need to escape the single quote in strings such as “I’m”.
- Triple quotes enable a multi-line string literal to include the newline character directly, resulting in a more readable representation of the literal.
%%ai chatgpt -f text
Explain in one line why a string value is often quoted in a computer program?
%%ai chatgpt -f text
Are there programming languages that do not quote its string? Why?
The following is yet another way to print the same string:
print("\N{grinning face}:", "\tI'm a string.", sep="\n")
It is an elegant one-line code (one-liner) where
sep="\n"
is a keyword argument that specifies the separator of the list of strings.- The default separator is a single space character, i.e.,
sep=" "
.
In a notebook, we can get the docstring (document string) of a function conveniently using the symbol ?
such as ?print
or
print?
We can also use the contextual help by placing the cursor over a function name and
- click the menu item
Help
Show Contextual Help
or - press the short-cut key Shift + Tab.
### BEGIN SOLUTION
# install art module
%pip install art >/dev/null 2>&1
import art
yyyy, mm = "2024", "09"
art.tprint(
f"""{yyyy}{mm}CS1302
Intro to Comp Progm'g"""
)
### END SOLUTION
Test your message below. Try switching to the more powerful LLM such as dive-azure:gpt4o
if you have already installed your API key.
%%ai chatgpt -f text
Explain what you see in the following:
(ง •̀_•́)ง
╰(●’◡’●)╮
(..•˘_˘•..)
(づ ̄ 3 ̄)づ
User Input¶
Instead of entering a value in a program, a programmer can get user input values at runtime, i.e., when a program executes:
print("Your name is", input("Please input your name: ") + ".")
- The
input
method prints its argument, if any, as a prompt. - The method takes user’s input and returns it as a string.
- There is no need to delimit the input string by quotation marks. Simply press
enter
after typing a string.
print("My name is", print("Python"))
Solution to Exercise 3 #
- Unlike
input
, the functionprint
does not return the string it is trying to print. Printing a string is, therefore, different from returning a string. print
actually returns aNone
object that gets printed asNone
.
%%ai chatgpt -f text
Explain in one-line whether the Python print function return the value it prints?
Variables¶
A complicated computation often needs to be broken down into many basic computations, with intermediate values stored and transferred to different memory locations. Keeping track of where a value is written, and allocating free memory locations to write to, are not only burdensome but also error-prone.
This is where the concept of variables comes in—they serve as a logical (as opposed to physical) unit of storage that abstracts away the complexities of memory management.
Assignment¶
To define a variable, we can use the assignment operator =
as follows:
x = 15
What does the above code mean?
- is equal to 15?
- is defined to be 15?
Let’s discover the truth by executing the following assignments step-by-step using OPTLite:
%%optlite -h 300
x = 15
x = x + 1
In the second assignment, does it mean:
- is equal to +1?
- is defined to be +1?
If we say yes to the above questions, the value of should be . (Why?)
%%ai chatgpt -f math
What is the solution to $x+1=x$?
To see how the Python interpreter carries out the assignment operation, the following code compiles the Python code to bytecode (similar to machine code) and displays it in assembly language.[2]
dis(compile('x = x + 1', '_', 'exec'))
It is possible to assign different values to multiple variables in one line using the so-called tuple assignment syntax:
%%optlite -l -h 400
x, y, z = "15", "30", 15
One can also assign the same value to different variables in one line using a chained assignment:
%%optlite -l -h 400
x = y = z = 0
Once defined, a variable can be deleted using the del
keyword. Accessing a variable that has not been assigned any value raises an error.
%%optlite -h 350
x = y = 1+1j
del x
x
Caution
You will learn later in the course that deleting a variable does not necessarily delete its value.
Are there constants in Python?
As the concepts of variables and values in programming are analogous to those in mathematics. Is there a programming counterpart for mathematical constants such as π? Do you agree with pydoc on the list of constants?
%%ai chatgpt -f text
How would you define constants in Python? Are literals also constants?
Identifiers¶
One reason why Python is expressive is that it affords programmers a significant amount of flexibility when it comes to choosing variable names. For instance, identifiers, such as variable names, are case-sensitive and of unlimited length, unlike older languages such as Pascal and Fortran. This flexibility also makes the program more readable. For instance, consider the following program:
%%optlite -h 400
def name():
return first.name + last.name
first.name = "John"
last.name = "Smith"
print(name())
Unfortunately, the program fails in the middle, why? Let’s take a closer look at the operations involved:
dis(compile("first.name + last.name", "_", "eval"))
How about the following fix?
dis(compile("first-name + last-name", "_", "eval"))
Obviously, not all names are valid identifiers as some names may be misinterpreted by the Python interpreter. Instead of fixing the names by trial-and-error, let’s try to learn the exact syntax for identifiers
identifier ::= xid_start xid_continue*
...
which is specified in a notation called the Extended Backus-Naur Form (EBNF). Furthermore, some identifiers called keywords are reserved. There are also soft keywords that are reserved under specific contexts.
%%ai chatgpt -f text
Explain in one paragraph the notation of the following in pydoc:
identifier ::= xid_start xid_continue*
@interact
def identifier_syntax(
assignment=[
"a-number = 15",
"a_number = 15",
"15 = 15",
"_15 = 15",
"del = 15",
"Del = 15",
"type = print",
"print = type",
"input = print",
]
):
exec(assignment)
print("Ok.")
Solution to Exercise 4
a-number = 15
violates Rule 1 because-
is not allowed.-
is interpreted as an operator.15 = 15
violates Rule 1 because15
starts with a digit instead of letter or _.del = 15
violates Rule 2 becausedel
is a keyword.
Python Enhancement Proposals
To help make the code more readable, programmers follow additional style guides such as Python Enhancement Proposals (PEP) 8:
- Function names should be lowercase, with words separated by underscores as necessary to improve readability.
- Variable names follow the same convention as function names.
Type Conversion¶
The following program tries to compute the sum of two numbers from user inputs:
num1 = input("Please input an integer: ")
num2 = input("Please input another integer: ")
print(num1, "+", num2, "is equal to", num1 + num2)
Solution to Exercise 5 #
The two numbers are concatenated instead of added together.
input
returns user input as a string. E.g., if the user enters 12
, the input is
- not treated as the integer twelve, but rather
- treated as a string containing two characters, one followed by two.
To confirm this, we can use type
to return the data type of an expression.
num1 = input("Please input an integer: ")
print("Your input is", num1, "with type", type(num1))
type(15), type(print), type(print()), type(input), type(type), type(type(type))
What happens when we add strings together?
"4" + "5" + "6"
How to fix the bug then?
We can convert a string to an integer using int
.
int("4") + int("5") + int("6")
We can also convert an integer to a string using str
.
str(4) + str(5) + str(6)
num1 = input("Please input an integer: ")
num2 = input("Please input another integer: ")
# print(num1, '+', num2, 'is equal to', num1 + num2) # fix this line below
### BEGIN SOLUTION
print(num1, "+", num2, "is equal to", int(num1) + int(num2))
### END SOLUTION
Error Types¶
In addition to writing code, a programmer spends significant time in debugging code that contains errors. A natural question is:
Can an error be automatically detected by the computer?
You have just seen an example of a logical error, which is due to an error in the logic. The ability to debug or even detect such an error is, unfortunately, beyond Python interpreter’s intelligence.
Let’s see if LLM can debug logical error:
%%ai chatgpt -f text
What is wrong with the following code?
--
num1 = input("Please input a number: ")
num2 = input("Please input another number: ")
print(num1, "+", num2, "is equal to", num1 + num2)
Can LLM detects logical error?
The response depends heavily on the provided information. For example, suppose the above program intended to:
- Add two complex numbers from user input.
Typically, numbers refer to real numbers. Therefore, specifying the requirement to handle complex numbers is crucial.
Refining questions based on the LLM’s responses can yield more meaningful answers. Essentially, we should learn to ask questions to gain knowledge:
"學問" == "學" "問"
Other kinds of error may be detected automatically by Python. As an example, note that equality ==
is not equal to assignment =
:
%%optlite -l -h 400
print("Assignment:")
"學問" = "學" "問"
As another example, juxtaposition is not the same as addition +
:
%%optlite -l -h 400
print("Juxtaposition:")
3 == 2 1
Python interpreter detects the bug and raises a syntax error.
Why Syntax error can be detected automatically?
Note that the print statement is not executed. Why?
The Python interpreter can detect syntax error even before executing the code because the interpreter simply fails to translate the code to lower-level executable code.
The following code raises a different kind of error.
%%optlite -l -h 500
print("Add integer to string:")
"4" + "5" + 6 # adding string to integer
Why Python throws a TypeError when evaluating '4' + '5' + 6
?
'4' + '5' + 6
?There is no implementation of +
operation on a value of type str
and a value of type int
.
- Unlike the syntax error, the Python interpreter can only detect a type error at runtime (when executing the code.)
- Hence, such an error is called a runtime error.
Python is a strongly-and-dynamically-typed language:
- Dynamically-typed languages check data type only at runtime after translating the code to machine code.
- Strongly-typed: languages do not force a type conversion to avoid a type error.
To understand what the above actually means, let’s consider how other languages work differently.
C/C++ and Java are statically-typed languages that checks data type during compilation, so the type error above becomes a compile-time error instead of a runtime error.
try:
!gcc 456.cpp
except Error:
print('Cannot run shell command.')
try:
!javac 456.java
except Error:
print('Cannot run shell command.')
On the opposite extreme, the web programming language Javascript does not raise an error at all:
%%javascript
let x = '4' + '5' + 6;
element.append(x + ' ' + typeof(x));
// no error because 6 is converted to a str implicitly
%%javascript
let x = '4' * '5' * 6;
element.append(x + ' ' + typeof(x));
// no error because 4 and 5 are converted to numbers implicitly
Javascript forces a type conversion to make the code run. It is called a weakly-typed language because of this flexibility.
Why not make Python a more flexible weakly-typed language by automatic type conversion?
While weakly-typed languages may seem more robust, they can potentially lead to more logical errors. JavaScript, despite its popularity, is known for its tricky behavior, as demonstrated in wtfjs. To improve readability and avoid logical errors, it is recommended to use the strongly-typed language TypeScript.[4]
%%javascript
element.append(
("b" + "a" +
+ "a" + "a").toUpperCase()
);
Solution to Exercise 8 #
See the explanation here.
May be LLM knows why?
%%ai chatgpt -f text
Why does the following javascript return BANANA?
element.append(
("b" + "a" +
+ "a" + "a").toUpperCase()
);
Indeed, a program that defines what values to compute is essentially a value itself. More precisely, one cannot distinguish between a value and a program in the context of a Turing Machine.
The conversion from machine code to assembly code is called disassembly, hence the name
dis
. BTW, why an interpreted language like python has a compiler? The interpreter actually compiles the source code to a more compact form called the bytecode that allows the interpreter to run it faster. Indeed, even though Python was originally intended to be an interpreted language, Python 3.13 supports JIT compilation that will compile source code Just In Time to machine code for the first run... Why? See this post.match
andcase
are recognized as keywords in the context of amatch
statement, where_
is treated as a wildcard. This is regardless of whethermatch
,case
, or_
are used as variables outside this context.TypeScript is a superset of JavaScript that adds optional static typing and other features to the language. By enforcing stronger typing, TypeScript can detect potential errors at compile-time and improve the overall reliability of your code. Learn more about TypeScript at typescriptlang.org.