2.10
Consider quantitative data on the price of houses and their corresponding ground area. Such quantitative data with two variables are called bivariate data.
The variable that acts as the cause is called the independent variable, while another variable that shows the response is called the dependent variable.
This dependence of one variable over the other can be visualized using the scatter plot. Here, the independent variable—the ground area—is represented along the X-axis, and the dependent variable—the price of houses—is represented along the Y-axis.
Mark the prices corresponding to the ground area. Then, draw the best fit line such that an almost equal number of points are present above and below this line. These points together form the pattern to identify the correlation between the two variables.
Notice that the increase in the ground area leads to a rise in the price of houses. Such an increasing trend denotes a positive correlation.
Conversely, if one observes a decreasing trend, it indicates a negative correlation. No trend means no correlation.
De meest gangbare en eenvoudigste manier om de relatie tussen twee variabelen, x en y, weer te geven, is met een spreidingsdiagram (scatter plot). Een spreidingsdiagram visualiseert de richting van een statistisch verband tussen de variabelen. Er is sprake van een duidelijke richting wanneer:
De sterkte van het verband kan worden beoordeeld door te analyseren hoe dicht de datapunten zich rondom een lijn, een machtsfunctie, een exponentiële functie of een ander functietype bevinden. Voor een lineair verband geldt een uitzondering. Beschouw een spreidingsdiagram waarin alle datapunten exact op een horizontale lijn liggen, wat een ‘perfecte pasvorm’ suggereert. Een horizontale lijn zou echter in feite geen statistisch verband aantonen.
Bij het interpreteren van een spreidingsdiagram is het essentieel om zowel het algemene patroon als eventuele uitschieters of afwijkingen te identificeren.
Consider quantitative data on the price of houses and their corresponding ground area. Such quantitative data with two variables are called bivariate data.
The variable that acts as the cause is called the independent variable, while another variable that shows the response is called the dependent variable.
This dependence of one variable over the other can be visualized using the scatter plot. Here, the independent variable—the ground area—is represented along the X-axis, and the dependent variable—the price of houses—is represented along the Y-axis.
Mark the prices corresponding to the ground area. Then, draw the best fit line such that an almost equal number of points are present above and below this line. These points together form the pattern to identify the correlation between the two variables.
Notice that the increase in the ground area leads to a rise in the price of houses. Such an increasing trend denotes a positive correlation.
Conversely, if one observes a decreasing trend, it indicates a negative correlation. No trend means no correlation.
From Chapter 2:
Now Playing
Summarizing and Visualizing Data
8.9K Views
Summarizing and Visualizing Data
9.8K Views
Summarizing and Visualizing Data
20.8K Views
Summarizing and Visualizing Data
10.3K Views
Summarizing and Visualizing Data
10.1K Views
Summarizing and Visualizing Data
65.0K Views
Summarizing and Visualizing Data
6.9K Views
Summarizing and Visualizing Data
5.3K Views
Summarizing and Visualizing Data
14.0K Views
Summarizing and Visualizing Data
4.9K Views
Summarizing and Visualizing Data
4.0K Views
Summarizing and Visualizing Data
17.7K Views
Summarizing and Visualizing Data
6.6K Views
Summarizing and Visualizing Data
6.1K Views
Summarizing and Visualizing Data
13.3K Views