Skip to article frontmatterSkip to article content
import logging
import os

if not os.getenv(
    "NBGRADER_EXECUTION"
):
    %load_ext jupyter_ai
    %ai update chatgpt dive:chat
    # %ai update chatgpt dive-azure:gpt4o
Terminator (franchise logo)

In this notebook, you will compete with your classmates and your machine by

  1. handcrafting a decision tree using Weka UserClassifier, and
  2. using python-weka-wrapper to build the J48 (C4.5) decision tree as a comparison.

Let’s find out who is the most intelligent!

Interactive Decision Tree Construction

import logging
import os

if not os.getenv("NBGRADER_EXECUTION"):
    import weka.core.jvm as jvm
    import weka.core.packages as packages

    jvm.start(packages=True, logging_level=logging.ERROR)
    pkg, version = "userClassifier", "1.0.2"
    if not packages.is_installed(pkg):
        print(f"Installing {pkg}...")
        packages.install_package("userClassifier", version="1.0.2")
        print("Done.")
    else:
        print(f"Skipping {pkg}, already installed.")

Follow the instruction above [Witten11] Ex 17.2.12 to

  1. install the package UserClassifier,
  2. hand-build a decision tree using segment-challenge.arff as the training set, and
  3. test the performance using segment-test.arff as the test set.

YOUR ANSWER HERE

YOUR ANSWER HERE

Get ready to dive into the thrilling world of decision trees! It’s time to showcase your data science prowess and outshine your classmates. Here’s what you need to do:

YOUR ANSWER HERE

YOUR ANSWER HERE

%%ai chatgpt -f text
I am in a competition to hand build the best decision tree using the 
UserClassifier package of Weka. Can you describe in one paragraph how to use
the scatter plots to find pairs of attributes to split? I cannot do detailed
calculations. How to avoid overfitting?

Python Weka Wrapper

To see if your hand-built classifier can beat the machine, use J48 (C4.5) to build a decision tree. Instead of using the Weka Explorer Interface, you will run Weka directly from the notebook using python-weka-wrapper3.

Because Weka is written in Java, we need to start the java virtual machine first.

import weka.core.jvm as jvm
import logging

jvm.start(logging_level=logging.ERROR)

Loading dataset

To load the dataset, create an ArffLoader as follows:

from weka.core.converters import Loader

loader = Loader(classname="weka.core.converters.ArffLoader")

The loader has the method load_url to load data from the web, such as the Weka GitHub repository:

weka_data_path = (
    "https://raw.githubusercontent.com/Waikato/weka-3.8/master/wekadocs/data/"
)
trainset = loader.load_url(
    weka_data_path + "segment-challenge.arff"
)  # use load_file to load from file instead

For classification, we have to specify the class attribute. For instance, the method class_is_last mutates trainset to have the last attribute as the class attribute:

trainset.class_is_last()
from weka.core.dataset import Instances

# YOUR CODE HERE
raise NotImplementedError
print(Instances.summary(testset))
Source
# tests
assert testset.relationname == "segment"
assert testset.num_instances == 810
assert testset.num_attributes == 20

Training using J48

To train a decision tree using J48, we create the classifier and then apply the method build_classifier on the training set.

from weka.classifiers import Classifier

J48 = Classifier(classname="weka.classifiers.trees.J48")
J48.build_classifier(trainset)
J48

To visualize the tree by generating an SVG file:

import pygraphviz as pgv
from IPython.display import SVG

# Create a PyGraphviz AGraph object from the DOT data
pgv.AGraph(string=J48.graph).draw('J48tree.svg', prog='dot')

# Display the SVG file
SVG(filename="J48tree.svg")

Evaluation

To evaluate the decision tree on the training set:

from weka.classifiers import Evaluation

J48train = Evaluation(trainset)
J48train.test_model(J48, trainset)
train_accuracy = J48train.percent_correct
print(f"Training accuracy: {train_accuracy:.4g}%")
# YOUR CODE HERE
raise NotImplementedError
print(f"Test accuracy: {test_accuracy:.4g}%")

YOUR ANSWER HERE

To stop the Java virtual machine, run the following line. To restart jvm, you must restart the kernel.

jvm.stop()