import pandas as pd
import plotly.graph_objects as go
Man vs Machine Rematch¶
Segment Challenge Results¶
def plot_man_vs_machine():
# Load the data
rf_data = pd.read_csv("RF.csv")
human_data = pd.read_csv("human.csv")
# Create a combined dataframe with an additional column to distinguish the datasets
rf_data["source"] = "RF"
human_data["source"] = "Human"
combined_data = pd.concat([rf_data, human_data])
# Exclude data points with missing values
combined_data = combined_data.dropna()
# Function to filter out dominating points
def filter_max_accuracy_points(data):
data = data.sort_values(by="depth")
filtered_data = []
for i, row in data.iterrows():
if not any(
(data["depth"] <= row["depth"]) & (data["accuracy"] > row["accuracy"])
):
filtered_data.append(row)
return pd.DataFrame(filtered_data)
# Apply the filtering function for each source
max_accuracy_points = (
combined_data.groupby("source")
.apply(filter_max_accuracy_points, include_groups=False)
.reset_index(drop=True)
)
# Create the scatter plot using go.Scatter
fig = go.Figure()
# Add traces for each source
for source in combined_data["source"].unique():
source_data = combined_data[combined_data["source"] == source]
fig.add_trace(
go.Scatter(
x=source_data["depth"],
y=source_data["accuracy"],
mode="markers+text",
text=source_data["name"],
name=source,
textfont=dict(color="rgba(0,0,0,0)"), # Make text transparent
marker=dict(size=10),
)
)
# Update layout with labels and title
fig.update_layout(
title="Man vs Machine", xaxis_title="Tree Depth", yaxis_title="Accuracy"
)
# Add hover information
fig.update_traces(hovertemplate="<b>%{text}</b><br>Accuracy: %{y}<br>Depth: %{x}")
# Add annotations for the points with the highest accuracy
for i, row in max_accuracy_points.iterrows():
fig.add_annotation(
x=row["depth"],
y=row["accuracy"],
text=f"{row['name']}, {row['accuracy']}",
showarrow=True,
arrowhead=2,
ax=20,
ay=-30,
bgcolor="rgba(255, 255, 255, 0.6)",
opacity=1,
font=dict(size=10),
hovertext=f"{row['name']}, {row['accuracy']}",
)
return fig
man_vs_machine_fig = plot_man_vs_machine()
man_vs_machine_fig.show()
Two heads are better than one¶
- Accuracies of and are both ________%. Are they good?
- Can we combine them into a better classifier ?
- achieves an accuracy of ______________________%.
- How does it work in general?
Architecture¶
- Base classifiers ’s are simple but possibly have weak preliminary predictions ’s.
- Combined classifier uses the combination rule to merge ’s into a good final prediction .
Architecture for probabilistic classifiers¶
- Base classifiers ’s are simple but possibly have weak probability estimates .
- Combined classifier uses the combination rule to merge 's into a good final prediction .
How to get good performance?¶
- Reduce risk by avoiding underfitting and overfitting.
- For many loss functions (0-1 loss, sum of squared error, ...): where
- is the expected predictor (W is a random variable. Why?).
- Variance is the dependence of on the data, also known as overfitting/underfitting.
- Bias is the deviation of from , also known as overfitting/underfitting.
- See the bias-variance trade-off.
Bias and variance for probabilistic classifiers¶
- For probabilistic classifiers,
where
- implies called m______________;
- instead of is used as the ground truth;
- Information (or Kullback-Leibler) divergence is used as the loss function
- variance becomes the mutual information
How to reduce variance and bias?¶
- Base classifiers should be diverse, i.e., capture as many different pieces of relevant information as possible to reduce ______.
- The combination rule should reduce variance by smoothing out the noise while aggregating relevant information into the final decision.
Bagging (Bootstrap Aggregation) Base classifiers¶
- Construct bootstrap samples.
- Construct a base classifier for each bootstrap sample.
Bagging (Bootstrap Aggregation) Majority voting¶
Example¶
- Accuracy = _________________________%.
Is it always good to follow the majority?¶
- Accuracy = _________________________%.
- It is beneficial to return 0 more often because _________________________.
- How to do this in general?
Sum rule and threshold moving¶
iff
Binary classification: Choose iff
for some chosen threshold γ.
What about multi-class classification?
Bagging (Bootstrap Aggregation) Average of probabilities¶
Other techniques to diversify base classifiers¶
- Random forest: Bagging with modified decision tree induction
- Forest-RI: For each split, consider random i___________________ s___________________ where only randomly chosen features are considered.
- Forest-RC: For each split, consider random l___________________ c___________________ of randomly chosen features.
- Voting (weka
.classifier .meta .vote) and Stacking (weka.classifier.meta.stacking): - Use different classification algorithms.
- Adaptive boosting (Adaboost):
- Each base classifier tries to _______________________________ made by previous base classifiers.
Other techniques to combine decisions¶
- Random forest: Bagging with modified decision tree induction
- Majority voting
- Average of probabilities
- Voting
- Majority voting or median
- Average/product/minimum/maximum probabilities
- Stacking: Use a meta classifier.
- Adaptive boosting (Adaboost): 2003 Gödel Prize winner
- Weighted majority voting
What is Adaboost¶
- An ensemble method that learns from mistakes:
- Combined classifier: Majority voting but with more weight on more accurate base classifier. where is the amount of say of and is the error rate w.r.t. . (See the precise formula below.)
- Base classifiers: Train sequentially in on obtained by Bagging with starting with and with chosen so that .
- Compute the error rate
Machine vs Machine¶
def plot_machine_vs_machine():
# Load the data
rf_data = pd.read_csv("RF.csv")
adb_data = pd.read_csv("ADB.csv")
# Create a combined dataframe with an additional column to distinguish the datasets
rf_data["source"] = "RF"
adb_data["source"] = "ADB"
combined_data = pd.concat([rf_data, adb_data])
# Exclude data points with missing values
combined_data = combined_data.dropna()
# Function to filter out dominating points
def filter_max_accuracy_points(data):
data = data.sort_values(by="depth")
filtered_data = []
for i, row in data.iterrows():
if not any(
(data["depth"] <= row["depth"]) & (data["accuracy"] > row["accuracy"])
):
filtered_data.append(row)
return pd.DataFrame(filtered_data)
# Apply the filtering function for each source
max_accuracy_points = (
combined_data.groupby("source")
.apply(filter_max_accuracy_points, include_groups=False)
.reset_index(drop=True)
)
# Create the scatter plot using go.Scatter
fig = go.Figure()
# Add traces for each source
for source in combined_data["source"].unique():
source_data = combined_data[combined_data["source"] == source]
fig.add_trace(
go.Scatter(
x=source_data["depth"],
y=source_data["accuracy"],
mode="markers+text",
text=source_data["name"],
name=source,
textfont=dict(color="rgba(0,0,0,0)"), # Make text transparent
marker=dict(size=10),
)
)
# Update layout with labels and title
fig.update_layout(
title="Machine vs Machine", xaxis_title="Tree Depth", yaxis_title="Accuracy"
)
# Add hover information
fig.update_traces(hovertemplate="<b>%{text}</b><br>Accuracy: %{y}<br>Depth: %{x}")
# Add annotations for the points with the highest accuracy
for i, row in max_accuracy_points.iterrows():
fig.add_annotation(
x=row["depth"],
y=row["accuracy"],
text=f"{row['name']}, {row['accuracy']}",
showarrow=True,
arrowhead=2,
ax=20,
ay=-30,
bgcolor="rgba(255, 255, 255, 0.6)",
opacity=1,
font=dict(size=10),
hovertext=f"{row['name']}, {row['accuracy']}",
)
return fig
machine_vs_machine_fig = plot_machine_vs_machine()
machine_vs_machine_fig.show()
References¶
- Techniques to improve classification accuracy
- [Witten11] Chapter 8
- Optional:
- Breiman, L. (1996). “Bagging predictors.” Machine learning, 24(2), 123-140.
- Breiman, L. (2001). “Random forests.” Machine learning, 45(1), 5-32.
- Freund Y, Schapire R, Abe N. “A short introduction to boosting.” Journal-Japanese Society For Artificial Intelligence. 1999 Sep 1;14(771-780):1612.
- Zhu, H. Zou, S. Rosset, T. Hastie, “Multi-class AdaBoost”, 2009.
- Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. 10.1007/bf00058655
- Breiman, L. (2001). Machine Learning, 45(1), 5–32. 10.1023/a:1010933404324