Types of Data
There are two types of variables you’ll find in your data – numerical and categorical. Numerical data can be divided into continuous or discrete values. And categorical data can be broken down into nominal and ordinal values.
Numerical data is measurable information, and it is, of course, data represented as numbers and not words or text.
Continuous numbers are numbers that don’t have a logical end to them. Examples include variables that represent money or height.
Discrete numbers are the opposite; they have a logical end to them. Some examples include variables for days in the month, or some bugs logged.
It’s important to note here something you’ve probably heard before – correlation does not mean causation. What this means is that even though two variables may be correlated, it doesn’t mean that one variable causes the second variable to react. There may be a relationship between them, but there could be other factors as to the cause of the relationship.
For categorical data, this is any data that isn’t a number, which can mean a string of text or date. These variables can be broken down into nominal and ordinal values, though you won’t often see this done.
Ordinal values are values that have a set order to them. Examples of ordinal values include having a priority on a bug such as “Critical” or “Low” or the ranking of a race as “First” or “Third”. Nominal values are the opposite of ordinal values, and they represent values with no set order to them. Nominal value examples include variables such as “Country” or “Marital Status”.
In addition to ordinal and nominal values, there is a special type of categorical data called binary. Binary data types only have two values – yes or no. This can be represented in different ways such as “True” and “False” or 1 and 0. Binary data is used heavily for classification machine learning models. Examples of binary variables can include whether a person has stopped their subscription service or not, or if a person bought a car or not.