Evaluation using Weka

import os

if not os.getenv(
    "NBGRADER_EXECUTION"
):
    %load_ext jupyter_ai
    %ai update chatgpt dive:chat
    # %ai update chatgpt dive-azure:gpt4o

In this notebook, you will learn to use Weka to complete [Witten11] Exercises 17.1.3 to 17.1.10. You may refer to Chapter 10 for a more detailed introduction to Weka. The following tip should be useful for providing your answers in this tutorial notebook and completing projects later on in the course.

How to add diagrams to notebooks?

To include figures like Figure 1 in your solution, refer to the MyST guide. While you can copy and paste images into a markdown cell, they will be difficult to edit.

For annotating and combining screenshots, use the pre-installed VSCode Draw.io extension. This tool allows you to copy and paste multiple images into a single file and annotate them using SVG elements, such as dataset_editor.dio.svg:

Launch the VSCode interface. (See Learning Materials)
Open the tutorial folder in VSCode.
Create a file with the .dio.svg or .drawio.svg extension.
Double-click the file to open it with vscode-drawio extension.

Dataset Editor¶

After loading the data in the preprocess panel, we can inspect or change the data using the dataset editor shown in Figure 1.

YOUR ANSWER HERE

Applying a Filter¶

We can also modify the data using filters. After selecting a filter,

left-click the filter to change its configuration or
right-click the filter configuration in Weka to copy the configuration to the clipboard as shown in Figure 2.

Exercise 4 (Ex 17.1.6-7)

Attention

Give the configuration of the filter of interest. E.g., the following is the default configuration for the RemoveWithValues filter:

weka.filters.unsupervised.instance.RemoveWithValues -S 0.0 -C last -L first-last

YOUR ANSWER HERE

Classify Panel¶

To train a classifier, use the classify panel shown in Figure 3 to select a classification algorithm and start the training.

The default test options use 10-fold cross-validation but we can choose to
- use the training set for testing,
- supply a separate dataset for testing, or
- use only a specified percentage of the original data for training and holdout the remaining data for testing.
After training, we can right-click the result in the result list to visualize the classifier errors.
For decision tree classifier, we can sometimes visualize the tree in addition to its text representation from the Classifier output.

YOUR ANSWER HERE

%%ai chatgpt -f text
Regardless of the test options, Weka runs the learning algorithm on the full
dataset to obtain the model to deploy. Wouldn't this cause overfitting?

Caution

LLMs can hallucinate, especially for concepts that require critical thinking. You should verify the answers with rigorous proofs or reasoning. Try modifying the prompt to force LLM to go through proper reasoning. You may need to clear the history to avoid the LLM being too absorbed into the previous prompts:

%ai reset