Last updated: 2019-06-11
Checks: 7 0
Knit directory: wflow-tidy-tuesday/
This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190611)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: docs/figure/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | b016af6 | John Blischak | 2019-06-11 | Add meteorites analysis. |
library(lubridate)
library(tidyverse)
library(cowplot)
theme_set(theme_cowplot())
m <- read_csv("data/meteorites.csv")
head(m)
# A tibble: 6 x 10
name id name_type class mass fall year lat long geolocation
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
1 Aachen 1 Valid L5 21 Fell 1880 50.8 6.08 (50.775, 6…
2 Aarhus 2 Valid H6 720 Fell 1951 56.2 10.2 (56.18333,…
3 Abee 6 Valid EH4 107000 Fell 1952 54.2 -113 (54.21667,…
4 Acapu… 10 Valid Acap… 1914 Fell 1976 16.9 -99.9 (16.88333,…
5 Achir… 370 Valid L6 780 Fell 1902 -33.2 -65.0 (-33.16667…
6 Adhi … 379 Valid EH4 4239 Fell 1919 32.1 71.8 (32.1, 71.…
str(m)
Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 45716 obs. of 10 variables:
$ name : chr "Aachen" "Aarhus" "Abee" "Acapulco" ...
$ id : num 1 2 6 10 370 379 390 392 398 417 ...
$ name_type : chr "Valid" "Valid" "Valid" "Valid" ...
$ class : chr "L5" "H6" "EH4" "Acapulcoite" ...
$ mass : num 21 720 107000 1914 780 ...
$ fall : chr "Fell" "Fell" "Fell" "Fell" ...
$ year : num 1880 1951 1952 1976 1902 ...
$ lat : num 50.8 56.2 54.2 16.9 -33.2 ...
$ long : num 6.08 10.23 -113 -99.9 -64.95 ...
$ geolocation: chr "(50.775, 6.08333)" "(56.18333, 10.23333)" "(54.21667, -113.0)" "(16.88333, -99.9)" ...
- attr(*, "spec")=
.. cols(
.. name = col_character(),
.. id = col_double(),
.. name_type = col_character(),
.. class = col_character(),
.. mass = col_double(),
.. fall = col_character(),
.. year = col_double(),
.. lat = col_double(),
.. long = col_double(),
.. geolocation = col_character()
.. )
Expect name
and id
to be unique.
length(m$name) == length(unique(m$name))
[1] TRUE
length(m$id) == length(unique(m$id))
[1] TRUE
“Relict” meteorites are still assigned a class:
table(m$name_type, useNA = "ifany")
Relict Valid
75 45641
table(m$class[m$name_type == "Relict"])
Chondrite-fusion crust Fusion crust Relict H
4 4 1
Relict iron Relict OC
1 65
There are lots of classes. Many are rare.
length(unique(m$class))
[1] 455
summary(as.numeric(table(m$class)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 1.0 4.0 100.5 17.0 8339.0
mass
has some missing values, some zeros, and a strong right-skew.
summary(m$mass)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0 7 33 13278 203 60000000 131
sum(m$mass == 0, na.rm = TRUE)
[1] 19
sum(m$mass > 10^5, na.rm = TRUE)
[1] 282
hist(log(m$mass))
The meteorites are more often found rather than observed to fall (source).
table(m$fall, useNA = "ifany")
Fell Found
1107 44609
Most were found after 1900. One was found in the future.
summary(m$year)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
860 1987 1998 1992 2003 2101 291
hist(m$year)
sum(m$year < 1900, na.rm = TRUE)
[1] 728
sum(m$year > year(now()), na.rm = TRUE)
[1] 1
Some meteorites are missing location data.
summary(m$lat)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-87.37 -76.71 -71.50 -39.12 0.00 81.17 7315
summary(m$long)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-165.43 0.00 35.67 61.07 157.17 354.47 7315
head(m$geolocation)
[1] "(50.775, 6.08333)" "(56.18333, 10.23333)" "(54.21667, -113.0)"
[4] "(16.88333, -99.9)" "(-33.16667, -64.95)" "(32.1, 71.8)"
sum(is.na(m$geolocation))
[1] 7315
sum(is.na(m$lat) & is.na(m$long) & is.na(m$geolocation))
[1] 7315
Remove difficult-to-classify meteorites and entries with missing values. Only keep meteorites found between 1900 and now.
m_clean <- m %>%
na.omit %>%
filter(name_type == "Valid",
year <= year(now()),
year >= 1900) %>%
select(name, class:geolocation)
nrow(m_clean)
[1] 37387
head(m_clean)
# A tibble: 6 x 8
name class mass fall year lat long geolocation
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
1 Aarhus H6 720 Fell 1951 56.2 10.2 (56.18333, 10.2…
2 Abee EH4 107000 Fell 1952 54.2 -113 (54.21667, -113…
3 Acapulco Acapulco… 1914 Fell 1976 16.9 -99.9 (16.88333, -99.…
4 Achiras L6 780 Fell 1902 -33.2 -65.0 (-33.16667, -64…
5 Adhi Kot EH4 4239 Fell 1919 32.1 71.8 (32.1, 71.8)
6 Adzhi-Bogdo (… LL3-6 910 Fell 1949 44.8 95.2 (44.83333, 95.1…
m_clean %>%
group_by(class) %>%
summarize(n = n()) %>%
arrange(desc(n)) %>%
filter(n > 1000)
# A tibble: 7 x 2
class n
<chr> <int>
1 L6 7457
2 H5 6193
3 H4 3869
4 H6 3852
5 L5 3255
6 LL5 2192
7 LL6 1649
base_fall <- ggplot(m_clean, aes(x = fall)) + geom_boxplot()
base_fall %+% aes(y = year) + labs(title = "Difference in year")
base_fall %+% aes(y = mass) + scale_y_log10() + labs(title = "Difference in size")
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 1 rows containing non-finite values (stat_boxplot).
m_per_year <- m_clean %>%
group_by(year) %>%
summarize(n = n(),
mass_med = median(mass))
head(m_per_year)
# A tibble: 6 x 3
year n mass_med
<dbl> <int> <dbl>
1 1900 16 1350
2 1901 9 8400
3 1902 12 8510
4 1903 22 19140
5 1904 11 5700
6 1905 14 3132.
ggplot(m_per_year, aes(x = year, y = n)) + geom_line() + geom_vline(xintercept = 1970, col = "red")
ggplot(m_per_year, aes(x = year, y = mass_med)) + geom_line() + geom_vline(xintercept = 1970, col = "red")
m_clean$size <- ifelse(m_clean$mass > 10^5, "big", "small")
m_per_year_size <- m_clean %>%
group_by(year, size) %>%
summarize(n = n(),
mass_med = median(mass))
head(m_per_year_size)
# A tibble: 6 x 4
# Groups: year [4]
year size n mass_med
<dbl> <chr> <int> <dbl>
1 1900 small 16 1350
2 1901 small 9 8400
3 1902 big 1 15500000
4 1902 small 11 5720
5 1903 big 4 504500
6 1903 small 18 12150
ggplot(m_per_year_size, aes(x = year, y = n, color = size)) + geom_smooth() + geom_vline(xintercept = 1970, col = "red")
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
plot_grid(
ggplot(m_per_year, aes(x = year, y = n)) +
geom_line() +
# geom_vline(xintercept = 1970, col = "red", linetype = "dashed") +
labs(title = "Number of meteorites found per year since 1900",
x = "Year", y = "Number of meteorites"),
ggplot(m_per_year_size, aes(x = year, y = n, color = size)) +
geom_smooth(method = "loess") +
# geom_vline(xintercept = 1970, col = "red", linetype = "dashed") +
labs(title = "Number of meteorites by size",
x = "Year", y = "Number of meteorites",
subtitle = "(loess smoothing)") +
theme(legend.position = "bottom") +
scale_color_brewer(name = "Mass (grams)",
labels = c("> 100,000", "< 100,000"), palette = "Dark2"),
ncol = 1
)
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cowplot_0.9.4 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.1
[5] purrr_0.3.2 readr_1.3.1 tidyr_0.8.3 tibble_2.1.3
[9] ggplot2_3.1.1 tidyverse_1.2.1 lubridate_1.7.4
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 xfun_0.7 haven_2.1.0
[4] lattice_0.20-38 vctrs_0.1.0 colorspace_1.4-1
[7] generics_0.0.2 htmltools_0.3.6 yaml_2.2.0
[10] utf8_1.1.4 rlang_0.3.4 pillar_1.4.1
[13] glue_1.3.1 withr_2.1.2 RColorBrewer_1.1-2
[16] modelr_0.1.4 readxl_1.3.1 plyr_1.8.4
[19] munsell_0.5.0 gtable_0.3.0 workflowr_1.4.0
[22] cellranger_1.1.0 rvest_0.3.4 evaluate_0.14
[25] labeling_0.3 knitr_1.23 fansi_0.4.0
[28] broom_0.5.2 Rcpp_1.0.1 scales_1.0.0
[31] backports_1.1.4 jsonlite_1.6 fs_1.3.1
[34] hms_0.4.2 digest_0.6.19 stringi_1.4.3
[37] grid_3.6.0 rprojroot_1.2 cli_1.1.0
[40] tools_3.6.0 magrittr_1.5 lazyeval_0.2.2
[43] zeallot_0.1.0 crayon_1.3.4 whisker_0.3-2
[46] pkgconfig_2.0.2 xml2_1.2.0 assertthat_0.2.1
[49] rmarkdown_1.13 httr_1.4.0 rstudioapi_0.10
[52] R6_2.4.0 nlme_3.1-140 git2r_0.25.2
[55] compiler_3.6.0