ISY503 Intelligent Systems Report Sample

Assignment Brief

This individual assessment provides you an opportunity to explore the impact of applying various Machine Learning techniques on a dataset in a sandbox environment. You will complete the Programming Exercise from Google that will introduce you to modelling in the Machine Learning world. Note thatthis exercise islimited to exploring the application of Linear Regression in great detail, however, the feature engineering, transformations and hyperparameter tuning involved in applying different implementations of the regression algorithm are investigated. There is an emphasis on understanding the impacts of various feature transformations as well. Although a simple data set has been provided in this task, there will be opportunity to apply normalisation techniques.

You should follow the task instructions set out in the Google lab to setup and run the various libraries and environments as well as loading the dataset. The instructions will take you through various tasks including identifying different applicable ML models, appropriate hyperparameter and feature transformation exploration. While writing your own models, think outside the box and see if a custom ML model can be made. As there is no “one right answer” to this task, the assessment is seeking to help you explore the impacts of various possible options to further your own understanding. The task instructions and rubric outline in detail what each grade assigned to students will demonstrate.

The assessment also requires you to write a manual of approximately 500 words, explaining the models and ML techniques utilised, what impact they had on the data exploration and visualisation task and provide an evaluation of their efficiency. Once again, this is an exploration task and your analysis and conclusions of the effectiveness of various models you’ve investigated will be the subject of the marking criteria.

Task Instructions

You need a Google account to do this assessment. You can create a free Google account here:
https://myaccount.google.com/.

Once created, you need to navigate to the Google created lab: “Intro to Modelling” here:

https://colab.research.google.com/github/google/eng-edu/blob/master/ml/fe/exercises/intro_to_modeling.ipynb?utm_source=ss-data-prep&utm_campaign=colab-external&utm_medium=referral&utm_content=intro_to_modeling

In addition to following the instructions outlined in the lab, you must:

• Implement a possible solution to each of the tasks outlined in the lab

• Add appropriate comments to your code created, following machine learning best practices for clean coding: https://towardsdatascience.com/clean-machine-learning-code- bd32bd0e9212

• Identify various different models that would be appropriate to use as alternatives for the tasks presented by the lab by varying hyperparameters and features. There is also an opportunity for you to create your own custom model by using different regressor functions within TensorFlow. For more details, see:
https://www.tensorflow.org/tutorials/customization/basics

• Familiarise yourself with the assessment’s rubric to understand how the various assignment grades are assigned.

• Produce a manual of 500 words in length outlining:

o The answers to the questions posed in each of the tasks within the lab.

o The choice of models you made during your assessment including the various hyperparameters you chose and feature engineering performed for the appropriate task.

o An analysis of the various models created and an evaluation of their efficiency.

Solution

1.0 Introduction

The task requires the prediction of car prices based on the cars’ specifications using a dataset approach and employing machine learning algorithms. The models used include Linear Regression, Logistic Regression for binary classification and Random Forest Regressor, and GridSearchCV for hyperparameters optimization. It is intended to find out which of the specified models works best in terms of prying car prices and to analyze the consequences of feature transformation and parameter tuning for university assignment help.

2.0 Data Preprocessing

The first process is data loading and data preprocessing. In case of data with missing values, such values are dealt with by the elimination of rows containing the missing values. The following input features of the model are chosen: ‘wheel-base’, ‘length’, ‘width’, ‘height’, ‘curb-weight’, ‘engine-size’, ‘bore’, ‘stroke’, ‘compression-ratio’, ‘horsepower’, ‘peak-rpm’, ‘city-mpg’, ‘highway-mpg’. The dependent variable to be forecasted is the car price.

Figure 1: Data Preprocessing
(Source: Self-Created)

Normalization is followed on the features to bring all the features on equivalent level using StandardScaler (Priyambudi & Nugroho, 2024).

3.0 Model Training and Evaluation

3.1 Linear Regression

Figure 2: Linear Regression
(Source: Self-Created)

Linear Regression is implemented as a first fundamental model selected amongst others for the compared implementation (Montgomery et al., 2021). The training of the model is done on the basis of scaled training set and the same is tested on the test set. Evaluation criteria are based on two parameters; the Mean Squared Error (MSE) and R-squared (R2). The Linear Regression model provides the MSE as 11475859.138525905 and the R2 score of 0.8132196459981795.

3.2 Logistic Regression

Figure 3: Logistic Regression
(Source: Self-Created)

Logistic Regression is applied when the target variable is transformed into a binary format- when a product’s price is above or below the median (Boateng & Abaye, 2019). This model is accurate for binary classification, and this gets a score of 0.8421052631578947; however, this is not the type of model useful for the current problem of continuous prediction.

3.3 Random Forest Regressor

Figure 4: Random Forest Regressor
(Source: Self-Created)

Consequently, the Random Forest Regressor is trained, and testing gives an MSE of 10149997.445899445 and the R2 score of 0.836874299735337. This model is better than the Linear Regression model which shows the effectiveness of ensemble methods for this task.

3.4 GridsearchCV

Figure 5: GridsearchCV
(Source: Self-Created)

GridSearchCV helps to improve the performance of each machine learning model through searching the best space of the parameter grid. For instance, it employs cross-validation to test varieties of the parameter and come up with the model that consists of the ideal hyperparameters helping in enhancing the model’s performance and accuracy.

4.0 Hyperparameter Tuning

To get the best performance of the Random Forest model, GridSearchCV is used on the Random Forest model with a parameter grid which consists of the number of estimators, maximum depth, and minimum samples split parameters (Rasheed et al., 2024). The best parameters found are {‘max_depth’: 10, ‘min_samples_split’: 2, ‘n_estimators’: 300}. The tuned model has the MSE of 8561620.923231785 and the R2 of 0.8702032634170061, therefore the prediction model is significantly more accurate than the original model.

5.0 Visualizations

Several graphics are produced to analyze the data and to compare the results of the model with and without the features added:

Figure 6: Correlation Heatmap
(Source: Self-Created)

? Correlation Heatmap: Explains the relationships in features, pointing to the fact that which features are highly dependent on the target column (Zhang et al., 2019).

Figure 7: Feature Importance
(Source: Self-Created)

? Feature Importance: Draws the feature importance in Random Forest model, where ‘engine-size’ and ‘curb-weight’ are presented as important features.

Figure 8: Actual vs. Predicted Prices
(Source: Self-Created)

? Actual vs. Predicted Prices: Scatter plot for Linear Regression to define actual and predicted prices facilitating the observation of the correlation of actual and predicted prices.

Figure 9: Price Distribution by Fuel Type
(Source: Self-Created)

? Price Distribution by Fuel Type: Boxplot of Car prices with regards to Fuel type.

Figure 10: Price vs. Horsepower by Fuel Type
(Source: Self-Created)

? Price vs. Horsepower by Fuel Type: A scatter plot to assess calibration concerning the horsepower of car and price with the type of fuel as the secondary axis.

6.0 Conclusion

It is evident that the Random Forest Regressor, and specifically after the hyperparameter tuning phase, outperforms Linear Regression in predicting the prices of cars. Speaking of the analysis of the feature importance, the visualizations offer a rather profound insight into factors that affect car prices. This analysis also epitomizes feature engineering and normalization as critical concepts aside from the choice of the model as crucial components of an ideal model for prediction. Hyperparameter tuning is another way of fine-tuning the model in order to finest it more and this brings out the importance of the systematic optimization in the machine learning tasks.

The project outlines such basic concepts of machine learning as regression analyses, feature transformations and ensembles, in the context of a real-life problem. The findings from this analysis can inform further study in the more complicated contexts and other areas of application. 

References

Boateng, E. Y., & Abaye, D. A. (2019). A review of the logistic regression model with emphasis on medical research. Journal of data analysis and information processing, 7(04), 190. Retrieved from: https://www.scirp.org/html/4-2870278_95655.htm , Retrieved on: 25.07.2024

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linear regression analysis. John Wiley & Sons. Retrieved from: http://sutlib2.sut.ac.th/sut_contents/H133678.pdf , Retrieved on: 25.07.2024

Priyambudi, Z. S., & Nugroho, Y. S. (2024, January). Which algorithm is better? An implementation of normalization to predict student performance. In AIP Conference Proceedings (Vol. 2926, No. 1). AIP Publishing. Retrieved from: https://pubs.aip.org/aip/acp/article-abstract/2926/1/020110/2999314 , Retrieved on: 25.07.2024

Rasheed, S., Kumar, G. K., Rani, D. M., Kantipudi, M. P., & Anila, M. (2024). Heart Disease Prediction Using GridSearchCV and Random Forest. EAI Endorsed Transactions on Pervasive Health and Technology, 10. Retrieved from: https://publications.eai.eu/index.php/phat/article/download/5523/3063 , Retrieved on: 25.07.2024

Zhang, J., Kalantidis, Y., Rohrbach, M., Paluri, M., Elgammal, A., & Elhoseiny, M. (2019, July). Large-scale visual relationship understanding. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 9185-9194). Retrieved from: https://ojs.aaai.org/index.php/AAAI/article/view/4953/4826 , Retrieved on: 25.07.2024

Fill the form to continue reading
Would you like to schedule a callback?
Send us a message and we will get back to you

Highlights

Earn While You Learn With Us
Confidentiality Agreement
Money Back Guarantee
Live Expert Sessions
550+ Ph.D Experts
21 Step Quality Check
100% Quality
24*7 Live Help
On Time Delivery
Plagiarism-Free
Get Instant Help
University Assignment Help

Still Finding University Assignment Help? You’ve Come To The Right Place!


CAPTCHA
AU ADDRESS
81 Isla Avenue Glenroy, Mel, VIC, 3046 AU
CONTACT