Data Files - CS5483

# initialization
import os

if not os.getenv(
    "NBGRADER_EXECUTION"
):  # Skip the code or auto-grading may take too long to complete.
    %load_ext jupyter_ai
    # Set LLM alias
    %ai update chatgpt dive:chat

Load data into Weka Explorer Interface¶

How to start Weka in JupyterHub?

Open the Launcher (File->New Launcher)
Start a Desktop from the Launcher.
Start a Terminal from the menu on the top left.
Run the command weka and click the Explorer button.
Load data from the folder /data/ under the linux root directory.

%%ai chatgpt
How is Weka implemented and what is its main advantage over other data mining 
tools?

Use Weka to do [Witten11] Exercises 17.1.1 and 17.1.2..

YOUR ANSWER HERE

Create an ARFF file¶

Exercise 3

Create an ARFF file named AND.arff in the current directory for the AND gate $Y=X_1\cdot X_2$ . Use 0 and 1 to represent False and True respectively.

Tip

To generate the file from this notebook directly:

Copy the solution template below to the following solution cell.
Replace each underscore _ by an appropriate value.
Execute the code cell.

content = '''
@RELATION AND
@ATTRIBUTE X1 {0, 1}
@ATTRIBUTE X2 {_, _}
@ATTRIBUTE Y {_, _}
@DATA
_, _, _
_, _, _
_, _, _
_, _, _
'''

Alternatively, you may use other editors in JupyterHub to create AND.arff in the current folder.

# YOUR CODE HERE
raise NotImplementedError

# write the content of text to the file
try: content
except NameError: 
    print("AND.arff not generated because `content` is undefined.")
else:
    filename = 'AND.arff'
    with open(filename,'w') as f:
        f.write(content)
    print("AND.arff generated.")

Run the following test cell to see if your file is a valid ARFF file. You may also download and load the ARFF file into WEKA to see if there is any syntax error.

# test
print('Content of AND.arff:')
with open(filename) as f:
    print(f.read())

from scipy.io import arff
import pandas as pd

d = arff.loadarff(filename)
df = pd.DataFrame(d[0]).astype(int)
df.head()

# Hidden tests
# Its content is intended to be invisible. Do NOT remove the cell.

%%ai chatgpt
How is ARFF compared to CSV is Weka implemented and why is one former better or
more popular than the other?