This function will validate that links do not throw an error in markdown documents. This will include links to images and will respect robots.txt for websites.

## Usage

validate_links(yrn)

## Format

• allowed_uri_protocols a character string of length 23

• link_tests a character string of length 9 containing templates that use the output of validate_links() for formatting.

• link_info a character string of length 9 that gives information and informative links for additional context for failures.

## Arguments

yrn

a tinkr::yarn or Episode object.

lt

the output of make_link_table()

source_list

output of link_source_list

an xml_nodeset of headings

root

the root path to the lesson that contains this file.

## Value

a data frame with parsed information from xml2::url_parse() and columns of logical values indicating the tests that passed.

## Details

All links must resolve to a specific location. If it does not exist, then the link is invalid. At the moment, we can only do local links.

These links must start with a valid and secure protocol. Allowed protocols are taken from the allowed protocols in Wordpress: http, https, ftp, ftps, mailto, news, irc, irc6, ircs, gopher, nntp, feed, telnet, mms, rtsp, sms, svn, tel, fax, xmpp, webcal, urn

Misspellings and unsupported protocols (e.g. javascript: and bitcoin: will be flagged).

In addition, we will enforce the use of HTTPS over HTTP.

These links will have no protocol, but should resolve to the HTML version of a page and have the correct capitalisation.

#### Anchors (aka fragments)

Anchors are located at the end of URLs that start with a # sign. These are used to indicate a section of the documenation.

### Accessibility (a11y)

Accessibillity ensures that your links are accurate and descriptive for people who have slow connections or use screen reader technology.

#### Alt-text (for images)

All images must have associated alt-text. In pandoc, this is acheived by writing the alt attribute in curly braces after the image: ![image caption](link){alt='alt text'}: https://webaim.org/techniques/alttext/

## Note

At the moment, we do not currently test if all links are reachable. This is a feature planned for the future.

This function is internal. Please use the methods for the Episode and Lesson classes.

Episode and Lesson for the methods that will throw warnings

## Examples

l <- Lesson$new(lesson_fragment()) e <- l$episodes[[3]]
# Our link validators run a series of tests on links and images and return a
# with the results of the tests
v <- asNamespace('pegboard')$validate_links(e) names(v) #> [1] "scheme" "server" "port" #> [4] "user" "path" "query" #> [7] "fragment" "orig" "text" #> [10] "alt" "title" "type" #> [13] "rel" "anchor" "sourcepos" #> [16] "filepath" "node" "known_protocol" #> [19] "enforce_https" "internal_anchor" "internal_file" #> [22] "internal_well_formed" "all_reachable" "img_alt_text" #> [25] "descriptive" "link_length" v #> scheme server port user #> 1 https docs.python.org NA #> 2 https docs.python.org NA #> 3 https docs.python.org NA #> 4 https pandas.pydata.org NA #> 5 https docs.python.org NA #> 9 https carpentries.org NA #> 10 NA #> 11 https carpentries.org NA #> 12 NA #> 6 21982 #> 7 80686392 #> 8 21982 #> 13 54 #> path query #> 1 /3/library/glob.html #> 2 /3/library/glob.html #> 3 /3/library/glob.html #> 4 /pandas-docs/stable/reference/api/pandas.DataFrame.shape.html #> 5 /3/library/stdtypes.html #> 9 /assets/img/TheCarpentries.svg #> 10 ../no-workie.svg #> 11 /assets/img/TheCarpentries.svg #> 12 ../no-workie.svg #> 6 #> 7 #> 8 #> 13 #> fragment #> 1 glob.glob #> 2 #> 3 #> 4 #> 5 str.split #> 9 #> 10 #> 11 #> 12 #> 6 #> 7 #> 8 #> 13 #> orig #> 1 https://docs.python.org/3/library/glob.html#glob.glob #> 2 https://docs.python.org/3/library/glob.html #> 3 https://docs.python.org/3/library/glob.html #> 4 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html #> 5 https://docs.python.org/3/library/stdtypes.html#str.split #> 9 https://carpentries.org/assets/img/TheCarpentries.svg #> 10 ../no-workie.svg #> 11 https://carpentries.org/assets/img/TheCarpentries.svg #> 12 ../no-workie.svg #> 6 {{ page.root }}/index.html #> 7 {{ site.swc_pages }}/shell-novice #> 8 {{ page.root }}{% link #> 13 {{ page.root }}/no-workie.svg #> text alt title type #> 1 glob.glob <NA> link #> 2 glob <NA> link #> 3 glob <NA> link #> 4 shape method <NA> link #> 5 split <NA> link #> 9 books as clubs <NA> img #> 10 books as clubs <NA> img #> 11 Carpentries logo <NA> image #> 12 Non-working image <NA> image #> 6 Home <NA> <NA> link #> 7 shell <NA> <NA> link #> 8 link that isn't parsed correctly by commonmark <NA> <NA> link #> 13 Non-working image with jekyll syntax <NA> <NA> image #> rel anchor sourcepos filepath node #> 1 <NA> FALSE 51 _episodes/14-looping-data-sets.md <link so.... #> 2 <NA> FALSE 57 _episodes/14-looping-data-sets.md <link so.... #> 3 <NA> FALSE 58 _episodes/14-looping-data-sets.md <link so.... #> 4 <NA> FALSE 140 _episodes/14-looping-data-sets.md <link so.... #> 5 <NA> FALSE 163 _episodes/14-looping-data-sets.md <link so.... #> 9 <NA> FALSE 189 _episodes/14-looping-data-sets.md <img src.... #> 10 <NA> FALSE 191 _episodes/14-looping-data-sets.md <img src.... #> 11 <NA> FALSE 195 _episodes/14-looping-data-sets.md <image s.... #> 12 <NA> FALSE 197 _episodes/14-looping-data-sets.md <image s.... #> 6 <NA> FALSE NA _episodes/14-looping-data-sets.md <link xm.... #> 7 <NA> FALSE NA _episodes/14-looping-data-sets.md <link xm.... #> 8 <NA> FALSE NA _episodes/14-looping-data-sets.md <link xm.... #> 13 <NA> FALSE NA _episodes/14-looping-data-sets.md <image x.... #> known_protocol enforce_https internal_anchor internal_file #> 1 TRUE TRUE TRUE TRUE #> 2 TRUE TRUE TRUE TRUE #> 3 TRUE TRUE TRUE TRUE #> 4 TRUE TRUE TRUE TRUE #> 5 TRUE TRUE TRUE TRUE #> 9 TRUE TRUE TRUE TRUE #> 10 TRUE TRUE TRUE FALSE #> 11 TRUE TRUE TRUE TRUE #> 12 TRUE TRUE TRUE FALSE #> 6 TRUE TRUE TRUE TRUE #> 7 TRUE TRUE TRUE TRUE #> 8 TRUE TRUE TRUE TRUE #> 13 TRUE TRUE TRUE TRUE #> internal_well_formed all_reachable img_alt_text descriptive link_length #> 1 TRUE TRUE TRUE TRUE TRUE #> 2 TRUE TRUE TRUE TRUE TRUE #> 3 TRUE TRUE TRUE TRUE TRUE #> 4 TRUE TRUE TRUE TRUE TRUE #> 5 TRUE TRUE TRUE TRUE TRUE #> 9 TRUE TRUE TRUE TRUE TRUE #> 10 TRUE TRUE TRUE TRUE TRUE #> 11 TRUE TRUE FALSE TRUE TRUE #> 12 TRUE TRUE FALSE TRUE TRUE #> 6 TRUE TRUE TRUE TRUE TRUE #> 7 TRUE TRUE TRUE TRUE TRUE #> 8 TRUE TRUE TRUE TRUE TRUE #> 13 TRUE TRUE FALSE TRUE TRUE # URL protocols ----------------------------------------------------------- # To avoid potentially malicious situations, we have an explicit list of # allwed URI protocols, which can be found in the allowed_uri_protocols # character vector: asNamespace('pegboard')$allowed_uri_protocols
#>  [1] ""       "http"   "https"  "ftp"    "ftps"   "mailto" "news"   "irc"
#>  [9] "irc6"   "ircs"   "gopher" "nntp"   "feed"   "telnet" "mms"    "rtsp"
#> [17] "sms"    "svn"    "tel"    "fax"    "xmpp"   "webcal" "urn"
# note that we make an additional check for the http protocol.

# Creating Warnings from the table ----------------------------------------
# The validator does not produce any warnings or messages, but this data
# frame can be passed on to other functions that will throw them for us. We
# have a function that will throw a warning/message for each link that
# fails the tests. These messages are controlled by link_tests and
# link_info.
asNamespace('pegboard')$link_tests #> known_protocol #> "[invalid protocol] ({scheme})" #> enforce_https #> "[needs HTTPS] {orig}" #> internal_anchor #> "[missing anchor] {orig}" #> internal_file #> "[missing file] {orig}" #> internal_well_formed #> "[incorrect formatting]: [{text}][{orig}] -> [{text}]({orig})" #> all_reachable #> "" #> img_alt_text #> "[image missing alt-text] {orig}" #> descriptive #> "[uninformative link text] {sQuote(text)}" #> link_length #> "[link text too short] {sQuote(text)}" asNamespace('pegboard')$link_info
#>                                                                                                                                                                               known_protocol
#> "Links must have a known URL protocol (e.g. https, ftp, mailto). See <https://developer.wordpress.org/reference/functions/wp_allowed_protocols/#return> for a list of acceptable protocols."
#>                                                                                                                                                                                enforce_https
#>                                                                                                                                   "Links must use HTTPS <https://https.cio.gov/everything/>"
#>                                                                                                                                                                              internal_anchor
#>                                                                                                                                                                                internal_file
#>                                                                                                                                                    "Some linked internal files do not exist"
#>                                                                                                                                                                         internal_well_formed
#>                                                                                                                                                      "Some links were incorrectly formatted"
#>                                                                                                                                                                                all_reachable
#>                                                                                                                                                                                           ""
#>                                                                                                                                                                                 img_alt_text
#>                                                                                                                                                                                  descriptive
#> ! There were errors in 4/13 links
#> ◌ Some linked internal files do not exist
#> ◌ Images need alt-text