Big Data Statistics for Business
Indice degli argomenti
-
Exploratory analysis of numerical and categorical variables: summary, visualization, and association
-
A dataset containing 10 variables of 340 diamonds. The variables are as follows:
Carat (weight of the diamond)
Cut (quality of the cut)
Color (diamond colour, from D (best) to J (worst))
Clarity (a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
Depth (total depth percentage)
Table (width of top of diamond relative to widest point)
Price (price in US dollars)
x (length in mm)
y (width in mm)
z (depth in mm)
-
-
Dataset1 contains 3 variables:
1) X (continuous numerical variable)
2) Y (discrete numerical variable)
3) Gender (categorical variable)
-
Dataset2a contains 3 variables:
1) X (continuous numerical variable)
2) Y (discrete numerical variable)
3) Gender (categorical variable)
-
Dataset2b contains 3 variables:
1) X (continuous numerical variable)
2) Y (discrete numerical variable)
3) Gender (categorical variable)
-
Dataset2c contains 3 variables:
1) X (continuous numerical variable)
2) Y (discrete numerical variable)
3) Gender (categorical variable)
-
Dataset2d contains 3 variables:
1) Contract (categorical variable, C = Consumption price contract, F = fixed price contract)
2) Components (discrete numerical variable, number of components of the family)
3) Income (continuous numerical variable, family income)
-
Dataset3 contains 3 variables:
1) X (continuous numerical variable)
2) Y (discrete numerical variable)
3) Gender (categorical variable)
-
-
DatasetHouse contains 2 variables observed for 799 houses:
1) Distance (distance of the house from the center)
2) Price (price of the house)
-
Dataset50Startups contains 4 variables observed for 50 startups:
1) R&D expenditure
2) Administration expenditure
3) Marketing expenditure
4) Profit
-
Dataset50Startups2 contains 5 variables observed for 50 startups:
1) R&D expenditure
2) Administration expenditure
3) Marketing expenditure
4) State
5) Profit
-
DatasetBikeSharing contains the following variables:
1) season (categorical): Winter, Spring, Summer, Fall
2) holiday (categorical): 0 (not a holiday), 1 (a holiday)
3) workingday (categorical): 0 (not working), 1 (working)
4) temp (numerical): temperature
5) atemp (numerical): feeling temperature
6) hum (numerical): humidity
7) windspeed (numerical): wind speed
8) cnt (numerical): count of total rental bikes -
DatasetMarketing contains the following variables for 171 companies:
1) Youtube advertising expenditures2) Facebook advertising expenditures
3) Newspaper advertising expenditures
4) Sales
-
DatasetInsuranceClaim contains 6 variables for 4406 insurance claims:
- Amount of compensation
- Coverage
- Deductible
- Location size
- Gender of the claimer
- Fraudolent claim
-
DatasetTelecomChurn contains 8 variables for 3042 customers:
- Churn
- AccountWeeks
- ContractRenewal
- DataPlan
- CustServCalls
- DayMins
- DayCalls
- MonthlyCharge
-
-
DatasetGarment contains the following variables for 974 companies:
1) Number of workers (No. workers)2) Targeted productivity
3) Over time (in hours)
4) Presence of an incentive
5) Actual productivity
-
DatasetMall contains the following variables for 284 weeks:
1) Show (categorical variable, NO = no show was organized, YES = a show was organized)
2) Temperature
3) Fuel price
4) Number of products on offer
5) Number of posters
6) Weekly sales
-
-
DatasetTelecom contains 9 variables for 260 customers:
- AccountWeeks
- ContractRenewal (binary variable)
- DataPlan (binary variable)
- DataUsage
- CustServCalls
- DayMins
- DayCalls
- MonthlyCharge
- RoamMins
-
DatasetStartups3 contains 5 variables observed for 50 startups:
1) R&D expenditure
2) Administration expenditure
3) Marketing expenditure
4) State
5) Profit