Last updated: 2024-03-14
Checks: 7 0
Knit directory: for-future-reference/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190125)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version efbb8a5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/sql.Rmd
) and HTML
(docs/sql.html
) files. If you’ve configured a remote Git
repository (see ?wflow_git_remote
), click on the hyperlinks
in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 640cf95 | John Blischak | 2019-08-20 | Build site. |
Rmd | f2b843a | John Blischak | 2019-08-20 | Add notes on SQL. |
SELECT * FROM movies;
SELECT name, genre
FROM movies;
SELECT name AS 'Titles'
FROM movies;
SELECT DISTINCT genre
from movies;
SELECT *
FROM movies
WHERE imdb_rating < 5;
# _ = single character wildcard
SELECT *
FROM movies
WHERE name LIKE 'Se_en';
# % = zero or more characters wildcard
SELECT *
FROM movies
WHERE name LIKE '%man%';
SELECT name
FROM movies
WHERE imdb_rating IS NOT NULL;
SELECT *
FROM movies
WHERE year BETWEEN 1990 AND 1999;
# Would include movie named 'J' but not 'JAWS'
SELECT *
FROM movies
WHERE name BETWEEN 'A' AND 'J';
SELECT *
FROM movies
WHERE year BETWEEN 1970 and 1979;
SELECT *
FROM movies
WHERE year BETWEEN 1970 AND 1979
AND imdb_rating > 8;
SELECT *
FROM movies
WHERE year > 2014
OR genre = 'action';
Note: ORDER BY always goes after WHERE (if WHERE is present).
SELECT name, year
FROM movies
ORDER BY name;
SELECT name, year, imdb_rating
FROM movies
ORDER BY imdb_rating DESC;
LIMIT always goes at the very end of the query. Also, it is not supported in all SQL databases.
SELECT *
FROM movies
ORDER BY imdb_rating DESC
LIMIT 3;
SELECT name,
CASE
WHEN genre = 'romance' THEN 'Chill'
WHEN genre = 'comedy' THEN 'Chill'
ELSE 'Intense'
END AS 'Mood'
FROM movies;
SELECT
is the clause we use every time we want to query
information from a database.AS
renames a column or table.DISTINCT
return unique values.WHERE
is a popular command that lets you filter the
results of the query based on conditions that you specify.LIKE
and BETWEEN
are special
operators.AND
and OR
combines multiple
conditions.ORDER BY
sorts the result.LIMIT
specifies the maximum number of rows that the
query will return.CASE
creates different outputs.SQL Fundamentals for R Users course
SELECT * from recent_grads LIMIT 10
SELECT Major, ShareWomen from recent_grads WHERE ShareWomen < 0.5
SELECT Major, Major_category, Median, ShareWomen from recent_grads WHERE ShareWomen > 0.5 AND Median > 50000
SELECT Major, Major_category, ShareWomen, Unemployment_rate
FROM recent_grads
WHERE Major_category = 'Engineering'
AND (ShareWomen > 0.5 OR Unemployment_rate < 0.051)
SELECT Major, ShareWomen, Unemployment_rate
FROM recent_grads
WHERE ShareWomen > 0.3 AND Unemployment_rate < 0.1
ORDER BY ShareWomen DESC
SELECT Major_category, Major, Unemployment_rate
FROM recent_grads
WHERE Major_category = 'Engineering' OR Major_category = 'Physical Sciences'
ORDER BY Unemployment_rate
SELECT c.*, f.name AS 'country_name'
FROM facts AS f
INNER JOIN cities as c
ON c.facts_id = f.id
LIMIT 5;
most DBMS require that the SELECT and FROM statements come first followed by any other statements such as WHERE.
Syntax for an inner join:
SELECT [column_names] FROM [table_name_one]
INNER JOIN [table_name_two] ON [join_constraint];
SELECT *
FROM facts
INNER JOIN cities
ON cities.facts_id = facts.id
LIMIT 10;
SELECT f.name AS country, c.name as capital_city
FROM cities c
INNER JOIN facts f ON f.id = c.facts_id
WHERE c.capital = 1;
SELECT f.name AS country, f.population
FROM facts f
LEFT JOIN cities c ON c.facts_id = f.id
WHERE c.name IS NULL;
SELECT c.name AS capital_city, f.name AS country, c.population
FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
WHERE capital = 1
ORDER BY c.population DESC
LIMIT 10;
SELECT c.name capital_city, f.name country, c.population
FROM facts f
INNER JOIN (
SELECT * FROM cities
WHERE capital = 1
AND population > 10000000
) c ON c.facts_id = f.id
ORDER BY c.population DESC
LIMIT 10;
sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] vctrs_0.6.5 httr_1.4.7 cli_3.6.2 knitr_1.45
[5] rlang_1.1.3 xfun_0.42 stringi_1.8.3 processx_3.8.3
[9] promises_1.2.1 jsonlite_1.8.8 glue_1.7.0 rprojroot_2.0.4
[13] git2r_0.33.0 htmltools_0.5.7 httpuv_1.6.14 ps_1.7.6
[17] sass_0.4.8 fansi_1.0.6 rmarkdown_2.26 jquerylib_0.1.4
[21] tibble_3.2.1 evaluate_0.23 fastmap_1.1.1 yaml_2.3.8
[25] lifecycle_1.0.4 whisker_0.4.1 stringr_1.5.1 compiler_4.3.3
[29] fs_1.6.3 pkgconfig_2.0.3 Rcpp_1.0.12 rstudioapi_0.15.0
[33] later_1.3.2 digest_0.6.34 R6_2.5.1 utf8_1.2.4
[37] pillar_1.9.0 callr_3.7.5 magrittr_2.0.3 bslib_0.6.1
[41] tools_4.3.3 cachem_1.0.8 getPass_0.2-4