MLfull
MLfull
PRACTICAL-3
import pandas as pd
import numpy as np
pd.read_csv('/content/drive/MyDrive/Dataset/this1.csv')
df_sal.head() df_sal.describe()
df_sal.drop(columns=['Phone'], inplace=True)
df_sal.head() df_sal.describe()
# Convert target variable 'Price ($)' to binary (e.g., high price = 1, low price = 0)
X = df_sal[['Annual Income']] y =
df_sal['Price_Class']
# Split data into training and testing sets x_train, x_test, y_train, y_test =
log_reg = LogisticRegression()
y_pred_log_reg = log_reg.predict(x_test)
Output:-
PRACTICAL-4
import numpy as np
if ax is None:
ylim = ax.get_ylim() x =
= np.meshgrid(y, x) xy =
np.vstack([X.ravel(), Y.ravel()]).T
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plot_svc_decision_function(model, axi)
Output:
PRACTICAL-5
import numpy as np
load_iris()
X, y = iris.data[:, :2], iris.target # Only use the first two features for visualization X_train,
clf = KNeighborsClassifier(n_neighbors=5)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
print("Accuracy:", accuracy)
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.show()
Output:
PRACTICAL-6
K-Fold Cross-Validation:
ROC:
ROC (Receiver Operating Characteristic) curve is a graphical plot that illustrates the diagnostic
ability of a binary classification model. It shows the trade-off between the true positive rate
(Sensitivity) and false positive rate (1 - Specificity) for different threshold values.
Code:
import numpy as np
import pandas as pd
data = pd.read_csv('/content/drive/MyDrive/Dataset/weatherAUS.csv')
data.head()
data.describe()
print(data.columns)
label_value_count = data['RainTomorrow'].value_counts()
print(label_value_count)
print(data.info())
data.dropna(subset=['RainTomorrow'], inplace=True)
y = data.loc[:, 'RainTomorrow']
random_state = np.random.RandomState(0)
clf = RandomForestClassifier(random_state=random_state)
cv = StratifiedKFold(n_splits=5, shuffle=False)
tprs = []
aucs = []
i=1
X = X.drop(columns_to_drop, axis=1)
label_encoder = LabelEncoder()
X[col] = label_encoder.fit_transform(X[col]
X_imputed = imputer.fit_transform(X)
aucs.append(roc_auc)
{roc_auc:.2f})')
i += 1
alpha=1)
plt.legend(loc='lower right')
fontsize=12)
plt.show()
Output:
PRACTICAL-7
numpy as np
train_acc = []
test_acc = []
train_acc.append(clf.score(X_train, y_train))
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()
Output:
PRACTICAL-8
Bagging involves training multiple base models (usually decision trees) independently on
different random subsets of the training data (with replacement) and then averaging their
predictions to reduce variance.
Boosting:
Boosting trains multiple weak learners sequentially, where each learner focuses on correcting the
mistakes of its predecessors by giving more weight to misclassified instances. Common
algorithms include AdaBoost and Gradient Boosting.
Stacking:
Stacking combines multiple base classifiers with a meta-classifier (usually a linear model) that
learns to combine the predictions of the base classifiers.
Code:
import numpy as np
base_clf1 = DecisionTreeClassifier(random_state=42)
base_clf3 = LogisticRegression(random_state=42)
random_state=42)
final_estimator=LogisticRegression(), cv=5)
bagging_train_acc = []
bagging_test_acc = []
adaboost_train_acc = []
adaboost_test_acc = []
stacking_train_acc = []
stacking_test_acc = []
bagging_clf.fit(X_train, y_train)
bagging_train_acc.append(bagging_clf.score(X_train, y_train))
bagging_test_acc.append(bagging_clf.score(X_test, y_test))
MACHINE LEARNING 26 LDRP-INSTITUTE OF TECHNOLOGY
AND RESEARCH
PATEL SUCHI 21BEIT30101
adaboost_clf.fit(X_train, y_train)
adaboost_train_acc.append(adaboost_clf.score(X_train, y_train))
adaboost_test_acc.append(adaboost_clf.score(X_test, y_test))
stacking_clf.fit(X_train, y_train)
stacking_train_acc.append(stacking_clf.score(X_train, y_train))
stacking_test_acc.append(stacking_clf.score(X_test, y_test))
plt.figure(figsize=(10, 6))
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()
Output:
PRACTICAL-9
K-Means clustering:Code:
import numpy as np
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
centers = kmeans.cluster_centers_
plt.title('K-Means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
Output:
Code:
agg_clustering = AgglomerativeClustering(n_clusters=4)
agg_clusters = agg_clustering.fit_predict(X)
plt.xlabel('Feature 1')
MACHINE LEARNING 30 LDRP-INSTITUTE OF TECHNOLOGY
AND RESEARCH
PATEL SUCHI 21BEIT30101
plt.ylabel('Feature 2')
plt.show()
Output:
PRACTICAL-10
AIM- Study and Implement various Dimensionality technique like PCA and
LDA
PCA is a dimensionality reduction technique that is commonly used for data preprocessing and
feature extraction. It works by transforming the original features into a new set of uncorrelated
variables called principal components. The main goal of PCA is to reduce the dimensionality of
the dataset while retaining as much variance as possible.
Steps in PCA:
import numpy as np
importmatplotlib.pyplot as plt
iris = load_iris()
X = iris.data
y = iris.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
plt.colorbar(label='Classes')
plt.show() Output:
LDA is a supervised dimensionality reduction technique that is used for feature extraction and
classification. Unlike PCA, which focuses on maximizing variance, LDA aims to maximize the
separation between classes in the data. It does this by finding the linear combinations of features
that best discriminate between different classes.
Steps in LDA:
● Compute the mean vectors for each class and the overall mean vector of
the data.
● Compute the between-class scatter matrix and within-class scatter matrix.
● Compute the eigenvectors and eigenvalues of the generalized eigenvalue
problem.
● Sort the eigenvectors by decreasing eigenvalues and choose the top k
eigenvectors as the discriminant directions.
● Project the original data onto the discriminant directions to obtain the new
feature space.
Code:
lda = LDA(n_components=2)
X_lda = lda.fit_transform(X_scaled, y)
Component 2')
plt.colorbar(label='Classes')
plt.show()
Output: