Last updated: 2024-03-14

Checks: 7 0

Knit directory: for-future-reference/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190125) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version efbb8a5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/linalg-datacamp.Rmd) and HTML (docs/linalg-datacamp.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 16cb14b John Blischak 2019-04-01 Build site.
Rmd 4a3a317 John Blischak 2019-04-01 Add PCA examples.
html a35299b John Blischak 2019-03-31 Build site.
Rmd c2ae4e5 John Blischak 2019-03-31 Notes from Linear Algebra for Data Science with R.

Notes from the DataCamp course Linear Algebra for Data Science with R.

v <- c(10, 10)
v
[1] 10 10

Increase x-component:

A <- matrix(c(5, 0, 0, 1), nrow = 2, byrow = TRUE)
A
     [,1] [,2]
[1,]    5    0
[2,]    0    1
A %*% v
     [,1]
[1,]   50
[2,]   10

Decrease y-component:

A <- matrix(c(1, 0, 0, 1/5), nrow = 2, byrow = TRUE)
A
     [,1] [,2]
[1,]    1  0.0
[2,]    0  0.2
A %*% v
     [,1]
[1,]   10
[2,]    2

Reflect about the x-axis:

A <- matrix(c(1, 0, 0, -1), nrow = 2, byrow = TRUE)
A
     [,1] [,2]
[1,]    1    0
[2,]    0   -1
A %*% v
     [,1]
[1,]   10
[2,]  -10

Reflect about the y-axis:

A <- matrix(c(-1, 0, 0, 1), nrow = 2, byrow = TRUE)
A
     [,1] [,2]
[1,]   -1    0
[2,]    0    1
A %*% v
     [,1]
[1,]  -10
[2,]   10

Identity matrix:

diag(3)
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

Find the inverse with solve.

A <- matrix(c(0, 5, -2, 0), nrow = 2, byrow = TRUE)
A
     [,1] [,2]
[1,]    0    5
[2,]   -2    0
solve(A)
     [,1] [,2]
[1,]  0.0 -0.5
[2,]  0.2  0.0
A %*% solve(A)
     [,1] [,2]
[1,]    1    0
[2,]    0    1
solve(A) %*% A
     [,1] [,2]
[1,]    1    0
[2,]    0    1

Compute the determinant:

A <- matrix(c(1, 0, 0, 2), nrow = 2, byrow = TRUE)
A
     [,1] [,2]
[1,]    1    0
[2,]    0    2
det(A)
[1] 2
A <- matrix(c(1, 0, 2, 0), nrow = 2, byrow = TRUE)
A
     [,1] [,2]
[1,]    1    0
[2,]    2    0
det(A)
[1] 0

Compute eigenvalues (\(\lambda\)) and eigenvectors (\(\textbf{x}\)):

\[ A\textbf{x} = \lambda\textbf{x}\]

A <- cbind(c(1,-1), c(-1,1))
(E <- eigen(A))
eigen() decomposition
$values
[1] 2 0

$vectors
           [,1]       [,2]
[1,] -0.7071068 -0.7071068
[2,]  0.7071068 -0.7071068
A %*% E$vectors[, 1]
          [,1]
[1,] -1.414214
[2,]  1.414214
E$values[1] * E$vectors[, 1]
[1] -1.414214  1.414214
A %*% E$vectors[, 2]
     [,1]
[1,]    0
[2,]    0
E$values[2] * E$vectors[, 2]
[1] 0 0

Principal Component Analysis

combine_url <- "https://assets.datacamp.com/production/repositories/2654/datasets/760dae913f682ba6b2758207280138662ddedc0d/DataCampCombine.csv"
combine <- read.csv(combine_url)
dim(combine)
[1] 2885   13
head(combine)
             player position       school year height weight forty vertical
1   Jaire Alexander       CB   Louisville 2018     71    192  4.38     35.0
2       Brian Allen        C Michigan St. 2018     73    298  5.34     26.5
3      Mark Andrews       TE     Oklahoma 2018     77    256  4.67     31.0
4         Troy Apke        S     Penn St. 2018     74    198  4.34     41.0
5 Dorance Armstrong     EDGE       Kansas 2018     76    257  4.87     30.0
6         Ade Aruna       DE       Tulane 2018     78    262  4.60     38.5
  bench broad_jump three_cone shuttle
1    14        127       6.71    3.98
2    27         99       7.81    4.71
3    17        113       7.34    4.38
4    16        131       6.56    4.03
5    20        118       7.12    4.23
6    18        128       7.53    4.48
                                      drafted
1  Green Bay Packers / 1st / 18th pick / 2018
2  Los Angeles Rams / 4th / 111th pick / 2018
3   Baltimore Ravens / 3rd / 86th pick / 2018
4                                            
5                                            
6 Minnesota Vikings / 6th / 218th pick / 2018
x <- as.matrix(combine[, 5:12])
# Subtract the column means
(column_means <- apply(x, 2, mean))
    height     weight      forty   vertical      bench broad_jump three_cone 
 73.992721 252.087348   4.811584  32.598267  21.091854 113.190295   7.305719 
   shuttle 
  4.408548 
x <- sweep(x, 2, column_means)
apply(x, 2, mean)
       height        weight         forty      vertical         bench 
-1.572879e-15 -9.502468e-15  3.977476e-16 -1.041735e-16  1.217401e-15 
   broad_jump    three_cone       shuttle 
 4.524137e-15  2.807089e-16 -2.469472e-17 
# Calculate variance-covariance matrix
x_cov <- t(x) %*% x / (nrow(x) - 1)
dim(x_cov)
[1] 8 8
x_cov
                height      weight       forty    vertical       bench
height       7.1597944   90.788084  0.52676257   -5.065512   6.2780614
weight      90.7880840 2105.176834 13.04832553 -132.887401 188.5644427
forty        0.5267626   13.048326  0.10318906   -1.034265   0.9549273
vertical    -5.0655119 -132.887401 -1.03426459   18.232972  -8.7243553
bench        6.2780614  188.564443  0.95492726   -8.724355  40.8989801
broad_jump -11.2472274 -330.417460 -2.55786742   33.083582 -24.1124924
three_cone   0.5955444   15.944407  0.11523591   -1.245058   1.2152061
shuttle      0.3912377    9.644035  0.06894544   -0.811669   0.6913887
            broad_jump  three_cone     shuttle
height      -11.247227  0.59554442  0.39123769
weight     -330.417460 15.94440664  9.64403467
forty        -2.557867  0.11523591  0.06894544
vertical     33.083582 -1.24505770 -0.81166895
bench       -24.112492  1.21520610  0.69138865
broad_jump   90.079934 -3.11504850 -1.89762575
three_cone   -3.115048  0.18612283  0.09866407
shuttle      -1.897626  0.09866407  0.07283010
# The diagonal entries are the variance of the original columns
diag(x_cov)
      height       weight        forty     vertical        bench   broad_jump 
   7.1597944 2105.1768336    0.1031891   18.2329720   40.8989801   90.0799335 
  three_cone      shuttle 
   0.1861228    0.0728301 
apply(x, 2, var)
      height       weight        forty     vertical        bench   broad_jump 
   7.1597944 2105.1768336    0.1031891   18.2329720   40.8989801   90.0799335 
  three_cone      shuttle 
   0.1861228    0.0728301 
# The non-diagonal entries are the covariance between the two columns
cov(x[, 1], x[, 5])
[1] 6.278061
x_cov[1, 5]
[1] 6.278061
x_cov[5, 1]
[1] 6.278061
# Calculate eigenvalues of covariance matrix
x_eig <- eigen(x_cov)
x_eig
eigen() decomposition
$values
[1] 2.187628e+03 4.403246e+01 2.219205e+01 5.267129e+00 2.699702e+00
[6] 6.317016e-02 1.480866e-02 1.307283e-02

$vectors
             [,1]         [,2]          [,3]          [,4]         [,5]
[1,] -0.042047079 -0.061885367  0.1454490039 -0.1040556410  0.980792060
[2,] -0.980711529 -0.130912788  0.1270100265  0.0193388930 -0.066908382
[3,] -0.006112061  0.012525260  0.0025260713 -0.0021291637 -0.004096693
[4,]  0.062926466 -0.333556369  0.0398922845  0.9366594549  0.074901137
[5,] -0.088291423 -0.313533433 -0.9363461471 -0.0745692157  0.107188391
[6,]  0.156742686 -0.876925849  0.2904565302 -0.3252903706 -0.126494599
[7,] -0.007468520  0.014691994  0.0009057581  0.0003320888 -0.020902644
[8,] -0.004518826  0.009863931  0.0023111814 -0.0094052914 -0.004010629
             [,6]          [,7]          [,8]
[1,] -0.020679696 -6.155636e-03  0.0008055445
[2,]  0.008423587  6.988341e-04  0.0036087841
[3,] -0.152469227 -2.539868e-01 -0.9549983725
[4,] -0.012214516  7.045063e-03 -0.0070051256
[5,] -0.009167322 -8.604309e-05 -0.0048308793
[6,] -0.013753112 -2.187651e-03 -0.0076907609
[7,] -0.894560357 -3.743559e-01  0.2427137770
[8,] -0.419039274  8.917710e-01 -0.1700673446
# Percentage of variance explained by each principal component
barplot(x_eig$values / sum(x_eig$values) * 100)

Version Author Date
16cb14b John Blischak 2019-04-01
# Using prcomp (and also scaling the columns)
scaled <- scale(combine[5:12])
pca <- prcomp(scaled)
summary(pca)
Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7
Standard deviation     2.3679 0.9228 0.78904 0.61348 0.46811 0.37178 0.34834
Proportion of Variance 0.7009 0.1064 0.07782 0.04704 0.02739 0.01728 0.01517
Cumulative Proportion  0.7009 0.8073 0.88514 0.93218 0.95957 0.97685 0.99202
                           PC8
Standard deviation     0.25266
Proportion of Variance 0.00798
Cumulative Proportion  1.00000

sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.1

loaded via a namespace (and not attached):
 [1] jsonlite_1.8.8    compiler_4.3.3    highr_0.10        promises_1.2.1   
 [5] Rcpp_1.0.12       stringr_1.5.1     git2r_0.33.0      callr_3.7.5      
 [9] later_1.3.2       jquerylib_0.1.4   yaml_2.3.8        fastmap_1.1.1    
[13] R6_2.5.1          knitr_1.45        tibble_3.2.1      rprojroot_2.0.4  
[17] bslib_0.6.1       pillar_1.9.0      rlang_1.1.3       utf8_1.2.4       
[21] cachem_1.0.8      stringi_1.8.3     httpuv_1.6.14     xfun_0.42        
[25] getPass_0.2-4     fs_1.6.3          sass_0.4.8        cli_3.6.2        
[29] magrittr_2.0.3    ps_1.7.6          digest_0.6.34     processx_3.8.3   
[33] rstudioapi_0.15.0 lifecycle_1.0.4   vctrs_0.6.5       evaluate_0.23    
[37] glue_1.7.0        whisker_0.4.1     fansi_1.0.6       rmarkdown_2.26   
[41] httr_1.4.7        tools_4.3.3       pkgconfig_2.0.3   htmltools_0.5.7