Last updated: 2019-06-18

Checks: 7 0

Knit directory: wflow-tidy-tuesday/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190611) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd d425335 John Blischak 2019-06-18 Clean up pkg loading.
html f04007f John Blischak 2019-06-18 Build site.
Rmd abd8dd4 John Blischak 2019-06-18 Christmas bird counts

Introduction

Setup

library(entropy)
library(tidyverse)
library(cowplot)
theme_set(theme_cowplot())
birds <- read_csv("data/bird_counts.csv")

Exploration

head(birds)
# A tibble: 6 x 6
   year species species_latin how_many_counted total_hours how_many_counte…
  <dbl> <chr>   <chr>                    <dbl>       <dbl>            <dbl>
1  1921 Americ… Botaurus len…                0           8                0
2  1921 Americ… Anas rubripes                0           8                0
3  1921 Americ… Fulica ameri…                0           8                0
4  1921 Americ… Corvus brach…                0           8                0
5  1921 Americ… Spinus trist…                0           8                0
6  1921 Americ… Falco sparve…                0           8                0
str(birds)
Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    18706 obs. of  6 variables:
 $ year                    : num  1921 1921 1921 1921 1921 ...
 $ species                 : chr  "American Bittern" "American Black Duck" "American Coot" "American Crow" ...
 $ species_latin           : chr  "Botaurus lentiginosus" "Anas rubripes" "Fulica americana" "Corvus brachyrhynchos" ...
 $ how_many_counted        : num  0 0 0 0 0 0 0 1 0 12 ...
 $ total_hours             : num  8 8 8 8 8 8 8 8 8 8 ...
 $ how_many_counted_by_hour: num  0 0 0 0 0 0 0 0.125 0 1.5 ...
 - attr(*, "spec")=
  .. cols(
  ..   year = col_double(),
  ..   species = col_character(),
  ..   species_latin = col_character(),
  ..   how_many_counted = col_double(),
  ..   total_hours = col_double(),
  ..   how_many_counted_by_hour = col_double()
  .. )

year

head(birds$year)
[1] 1921 1921 1921 1921 1921 1921
summary(birds$year)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1921    1947    1970    1970    1994    2017 
birds %>% count(year) %>%
  ggplot(aes(x = year, y = n)) + geom_point()

Version Author Date
f04007f John Blischak 2019-06-18
stopifnot(table(birds$year) == 199)

species

The species name in English and the scientific name

head(birds$species)
[1] "American Bittern"    "American Black Duck" "American Coot"      
[4] "American Crow"       "American Goldfinch"  "American Kestrel"   
sum(is.na(birds$species))
[1] 0
birds %>% count(species) %>% select(n) %>% unique
# A tibble: 1 x 1
      n
  <int>
1    94
stopifnot(table(birds$species) == 94)

species_latin

The species name in latin

head(birds$species_latin)
[1] "Botaurus lentiginosus" "Anas rubripes"         "Fulica americana"     
[4] "Corvus brachyrhynchos" "Spinus tristis"        "Falco sparverius"     
sum(is.na(birds$species_latin))
[1] 0
birds %>% count(species_latin) %>% select(n) %>% unique
# A tibble: 1 x 1
      n
  <int>
1    94
stopifnot(table(birds$species_latin) == 94)

how_many_counted

Actual raw bird count observed

head(birds$how_many_counted)
[1] 0 0 0 0 0 0
summary(birds$how_many_counted)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    0.0     0.0     0.0   193.5     5.0 73000.0 
hist(birds$how_many_counted)

Version Author Date
f04007f John Blischak 2019-06-18
birds %>% filter(how_many_counted > 50000)
# A tibble: 8 x 6
   year species species_latin how_many_counted total_hours how_many_counte…
  <dbl> <chr>   <chr>                    <dbl>       <dbl>            <dbl>
1  1979 Europe… Sturnus vulg…            65000         171             380.
2  1980 Europe… Sturnus vulg…            73000         186             392.
3  1981 Europe… Sturnus vulg…            65000         167             389.
4  1982 Europe… Sturnus vulg…            72000         164             439.
5  1983 Europe… Sturnus vulg…            70000         168             417.
6  1984 Europe… Sturnus vulg…            71000         178             399.
7  1990 Europe… Sturnus vulg…            66000         164             402.
8  1992 Europe… Sturnus vulg…            59600         167             357.

total_hours

Total hours spent observing

head(birds$total_hours)
[1] 8 8 8 8 8 8
summary(birds$total_hours)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    8.0   149.5   171.0   170.8   203.8   251.0    3781 
hist(birds$total_hours)

Version Author Date
f04007f John Blischak 2019-06-18
# The missing data points are from the earlier years
table(is.na(birds$total_hours), birds$year)
       
        1921 1922 1924 1925 1926 1928 1930 1931 1932 1933 1934 1935 1936
  FALSE  199    0    0    0    0    0    0    0    0    0    0    0    0
  TRUE     0  199  199  199  199  199  199  199  199  199  199  199  199
       
        1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949
  FALSE    0    0  199  199    0  199  199  199    0  199    0    0    0
  TRUE   199  199    0    0  199    0    0    0  199    0  199  199  199
       
        1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962
  FALSE  199  199  199  199  199  199  199  199  199  199  199  199  199
  TRUE     0    0    0    0    0    0    0    0    0    0    0    0    0
       
        1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975
  FALSE  199  199  199  199  199  199  199  199  199  199  199  199  199
  TRUE     0    0    0    0    0    0    0    0    0    0    0    0    0
       
        1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988
  FALSE  199  199  199  199  199  199  199  199  199  199  199  199  199
  TRUE     0    0    0    0    0    0    0    0    0    0    0    0    0
       
        1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
  FALSE  199  199  199  199  199  199  199  199  199  199  199  199  199
  TRUE     0    0    0    0    0    0    0    0    0    0    0    0    0
       
        2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
  FALSE  199  199  199  199  199  199  199  199  199  199  199  199  199
  TRUE     0    0    0    0    0    0    0    0    0    0    0    0    0
       
        2015 2016 2017
  FALSE  199  199  199
  TRUE     0    0    0

how_many_counted_by_hour

How many birds were counted divided by the number of person-hours that year

head(birds$how_many_counted_by_hour)
[1] 0 0 0 0 0 0
summary(birds$how_many_counted_by_hour)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   0.000   0.000   1.336   0.051 439.024    3781 
hist(birds$how_many_counted_by_hour)

Version Author Date
f04007f John Blischak 2019-06-18
birds %>% filter(!is.na(how_many_counted_by_hour),
                 how_many_counted_by_hour > 300)
# A tibble: 8 x 6
   year species species_latin how_many_counted total_hours how_many_counte…
  <dbl> <chr>   <chr>                    <dbl>       <dbl>            <dbl>
1  1979 Europe… Sturnus vulg…            65000         171             380.
2  1980 Europe… Sturnus vulg…            73000         186             392.
3  1981 Europe… Sturnus vulg…            65000         167             389.
4  1982 Europe… Sturnus vulg…            72000         164             439.
5  1983 Europe… Sturnus vulg…            70000         168             417.
6  1984 Europe… Sturnus vulg…            71000         178             399.
7  1990 Europe… Sturnus vulg…            66000         164             402.
8  1992 Europe… Sturnus vulg…            59600         167             357.
stopifnot(is.na(birds$total_hours) == is.na(birds$how_many_counted_by_hour))

Is there a relationship between the number of birds counted and the total number of person hours?

ggplot(birds, aes(x = total_hours, y = how_many_counted)) +
  geom_point()
Warning: Removed 3781 rows containing missing values (geom_point).

Version Author Date
f04007f John Blischak 2019-06-18

How many hours per year?

birds %>%
  group_by(year) %>%
  summarize(total_hours = unique(total_hours)) %>%
  ggplot(aes(x = year, y = total_hours)) +
  geom_point()
Warning: Removed 19 rows containing missing values (geom_point).

Version Author Date
f04007f John Blischak 2019-06-18

What about the number of distinct species per year?

birds %>%
  group_by(year) %>%
  summarize(n_species = sum(how_many_counted > 0)) %>%
  ggplot(aes(x = year, y = n_species)) +
  geom_point()

Version Author Date
f04007f John Blischak 2019-06-18

Is there a relationship between the number of distinct species counted and the total number of person hours?

birds %>%
  group_by(year) %>%
  summarize(n_species = sum(how_many_counted > 0),
            total_hours = unique(total_hours)) %>%
  ggplot(aes(x = total_hours, y = n_species)) +
  geom_point()
Warning: Removed 19 rows containing missing values (geom_point).

Version Author Date
f04007f John Blischak 2019-06-18
birds %>% na.omit() %>%
  mutate(concordance = how_many_counted / total_hours == how_many_counted_by_hour) %>%
  select(concordance) %>%
  summarize(matched = sum(concordance), n = n())
# A tibble: 1 x 2
  matched     n
    <int> <int>
1   14925 14925

How many years has each species been observed?

birds %>%
  group_by(species) %>%
  summarize(n_years = sum(how_many_counted > 0)) %>%
  ggplot(aes(x = n_years)) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Version Author Date
f04007f John Blischak 2019-06-18
starling <- birds %>%
  filter(species == "European Starling")
ggplot(starling, aes(x = year, y = how_many_counted)) + geom_line()

Version Author Date
f04007f John Blischak 2019-06-18
ggplot(starling, aes(x = year, y = how_many_counted_by_hour)) + geom_line()

Version Author Date
f04007f John Blischak 2019-06-18
b <- birds %>%
  na.omit() %>%
  filter(year >= 1939) %>%
  mutate(observed = how_many_counted > 0,
         genus = str_split_fixed(species_latin, " ", 2)[, 1]) %>%
  select(year, species, species_latin, genus, observed, how_many_counted_by_hour)
head(b)
# A tibble: 6 x 6
   year species       species_latin     genus  observed how_many_counted_b…
  <dbl> <chr>         <chr>             <chr>  <lgl>                  <dbl>
1  1939 American Bit… Botaurus lentigi… Botau… FALSE                 0     
2  1939 American Bla… Anas rubripes     Anas   TRUE                  2.98  
3  1939 American Coot Fulica americana  Fulica FALSE                 0     
4  1939 American Crow Corvus brachyrhy… Corvus FALSE                 0     
5  1939 American Gol… Spinus tristis    Spinus FALSE                 0     
6  1939 American Kes… Falco sparverius  Falco  TRUE                  0.0370

Entropy

By species

p_entropy_species <- birds %>%
  group_by(year) %>%
  summarize(entropy = entropy(how_many_counted)) %>%
  ggplot(aes(x = year, y = entropy)) + geom_line() +
  labs(x = "Year", y = "Shannon diversity",
       title = "Species diversity per year")
p_entropy_species

Version Author Date
f04007f John Blischak 2019-06-18

By genus

length(unique(birds$species_latin))
[1] 199
length(unique(str_split_fixed(birds$species_latin, " ", 2)[, 1]))
[1] 127
p_entropy_genus <- birds %>%
  mutate(genus = str_split_fixed(species_latin, " ", 2)[, 1]) %>%
  group_by(year, genus) %>%
  summarize(n_genus = sum(how_many_counted)) %>%
  ungroup() %>%
  group_by(year) %>%
  summarize(entropy = entropy(n_genus)) %>%
  ggplot(aes(x = year, y = entropy)) + geom_line() +
  labs(x = "Year", y = "Shannon diversity",
       title = "Genus diversity per year")
p_entropy_genus

Version Author Date
f04007f John Blischak 2019-06-18
plot_grid(p_entropy_species + ylim(0.5, 3.5),
          p_entropy_genus + ylim(0.5, 3.5))

Version Author Date
f04007f John Blischak 2019-06-18

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_0.9.4   forcats_0.4.0   stringr_1.4.0   dplyr_0.8.1    
 [5] purrr_0.3.2     readr_1.3.1     tidyr_0.8.3     tibble_2.1.3   
 [9] ggplot2_3.1.1   tidyverse_1.2.1 entropy_1.2.1  

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.5 xfun_0.7         haven_2.1.0      lattice_0.20-38 
 [5] vctrs_0.1.0      colorspace_1.4-1 generics_0.0.2   htmltools_0.3.6 
 [9] yaml_2.2.0       utf8_1.1.4       rlang_0.3.4      pillar_1.4.1    
[13] glue_1.3.1       withr_2.1.2      modelr_0.1.4     readxl_1.3.1    
[17] plyr_1.8.4       munsell_0.5.0    gtable_0.3.0     workflowr_1.4.0 
[21] cellranger_1.1.0 rvest_0.3.4      evaluate_0.14    labeling_0.3    
[25] knitr_1.23       fansi_0.4.0      broom_0.5.2      Rcpp_1.0.1      
[29] scales_1.0.0     backports_1.1.4  jsonlite_1.6     fs_1.3.1        
[33] hms_0.4.2        digest_0.6.19    stringi_1.4.3    grid_3.6.0      
[37] rprojroot_1.2    cli_1.1.0        tools_3.6.0      magrittr_1.5    
[41] lazyeval_0.2.2   zeallot_0.1.0    crayon_1.3.4     whisker_0.3-2   
[45] pkgconfig_2.0.2  xml2_1.2.0       lubridate_1.7.4  assertthat_0.2.1
[49] rmarkdown_1.13.4 httr_1.4.0       rstudioapi_0.10  R6_2.4.0        
[53] nlme_3.1-140     git2r_0.25.2     compiler_3.6.0