Using your data to predict the future! Part 2

9 juni 2023
Ben Holland
Alle
0

Which Algorithm?

Choosing the right predictive tool in Alteryx Designer can be challenging, as there are many different options available and each one has its own strengths and weaknesses. To make the best choice, it’s important to consider the specific needs of your project and the characteristics of the data you are working with. Some factors to consider when choosing a predictive tool include:

The type of prediction you want to make: Some tools are better suited to making predictions about continuous values, while others are better at predicting discrete classes. Consider the type of outcome you want to predict and choose a tool that is designed for that type of prediction.

The complexity of the data: Some predictive tools are better suited to handling complex, high-dimensional data, while others are better at making predictions based on simple, low-dimensional data. Consider the characteristics of your data and choose a tool that is able to effectively handle the complexity of your data.
The linearity of the data can affect which machine learning algorithm you should choose in a few different ways.
1. First, linear algorithms generally perform better on linearly separable data, meaning data that can be separated into different classes or categories by a single straight line. If your data is linearly separable, you may want to consider using a linear model such as logistic regression or linear regression.
2. Second, non-linear algorithms can often handle more complex, non-linear data better than linear algorithms. If your data is not linearly separable, or if it has a more complex structure, you may want to consider using a non-linear model such as a decision tree, random forest, or a neural network.

By considering these, you can choose the right predictive tool in Alteryx Designer for your project and get the best possible results.

So, what did I use?

In my project I used to criteria outline above to look at the effectiveness of the different linear algorithms. In the end I selected to look at linear regression, support vector machine (SVM) regression, and count regression. These are all types of regression algorithms, which are used to predict a continuous numeric value based on one or more input variables. In my project, I was looking to predict the number of crashes based on multiple weather variables.

Linear Regression

Linear regression is a simple and widely used technique for modelling the relationship between a dependent variable and one or more independent variables. In linear regression, the model assumes that the relationship between the dependent and independent variables is linear, meaning that the model can be represented by a straight line.

Support Vector Machine (SVM)

Support vector machine (SVM) regression is a type of non-linear regression that uses a different approach to model the relationship between the dependent and independent variables. In SVM regression, the model tries to find the line or hyperplane that maximises the margin between the data points of different classes. This can make SVM regression more effective at handling complex, non-linear data than linear regression.

Count Regression

A count regression is a type of regression that is used to model count data, which is data that represents the number of occurrences of some event (such as the number of clicks on a website or the number of purchases made by a customer). These models are typically used when the dependent variable is a count or an integer value, and the model is used to predict the number of occurrences of some event based on one or more input variables.

In Alteryx Designer you can use these algorithms as individual tools. These tools can be trained and tested by connecting your data that has been cleaned and prepared into the correct format (see part 1 of the series: https://www.theinformationlab.nl/en/2023/02/10/using-your-data-to-predict-the-future-2/). Each of these tools has specific settings and parameters that you can use to customise the behaviour of the model, such as the type of regularisation to use or the kernel function to use in the case of SVM regression. You can then use the output from these tools to make predictions on new data, or to evaluate the performance of the trained model.

Next time…

Having figured out exactly which tools and algorithms we should be using the next part of methodology is to assess them and determine which is most suitable for the final product. Next time, we will be doing exactly that by using the Score tool. See you then!

Thank you for reading this blog. Also check out our other blogs page to view more blogs on Tableau, Alteryx, and Snowflake here.

Work together with one of our consultants and maximise the effects of your data.

Contact us , and we’ll help you right away.