Skip to contents

This function will validate that links do not throw an error in markdown documents. This will include links to images and will respect robots.txt for websites.

Usage

validate_links(yrn)

allowed_uri_protocols

link_known_protocol(VAL)

link_enforce_https(VAL)

link_all_reachable(VAL)

link_img_alt_text(VAL)

link_length(VAL)

link_descriptive(VAL)

link_source_list(lt)

link_internal_anchor(VAL, source_list, headings)

link_internal_file(VAL, source_list, root)

link_internal_well_formed(VAL, source_list)

link_tests

link_info

Format

  • allowed_uri_protocols a character string of length 23

  • link_tests a character string of length 9 containing templates that use the output of validate_links() for formatting.

  • link_info a character string of length 9 that gives information and informative links for additional context for failures.

Arguments

yrn

a tinkr::yarn or Episode object.

lt

the output of make_link_table()

source_list

output of link_source_list

headings

an xml_nodeset of headings

root

the root path to the lesson that contains this file.

Value

a data frame with parsed information from xml2::url_parse() and columns of logical values indicating the tests that passed.

Details

All links must resolve to a specific location. If it does not exist, then the link is invalid. At the moment, we can only do local links.

These links must start with a valid and secure protocol. Allowed protocols are taken from the allowed protocols in Wordpress: http, https, ftp, ftps, mailto, news, irc, irc6, ircs, gopher, nntp, feed, telnet, mms, rtsp, sms, svn, tel, fax, xmpp, webcal, urn

Misspellings and unsupported protocols (e.g. javascript: and bitcoin: will be flagged).

In addition, we will enforce the use of HTTPS over HTTP.

These links will have no protocol, but should resolve to the HTML version of a page and have the correct capitalisation.

Anchors (aka fragments)

Anchors are located at the end of URLs that start with a # sign. These are used to indicate a section of the documenation.

Accessibility (a11y)

Accessibillity ensures that your links are accurate and descriptive for people who have slow connections or use screen reader technology.

Alt-text (for images)

All images must have associated alt-text. In pandoc, this is acheived by writing the alt attribute in curly braces after the image: ![image caption](link){alt='alt text'}: https://webaim.org/techniques/alttext/

Descriptive text

All links must have descriptive text associated with them, which is beneficial for screen readers scanning the links on a page to not have a list full of "link", "link", "link": https://webaim.org/techniques/hypertext/link_text#uninformative

Text length

Link text length must be greater than 1: https://webaim.org/techniques/hypertext/link_text#link_length

Note

At the moment, we do not currently test if all links are reachable. This is a feature planned for the future.

This function is internal. Please use the methods for the Episode and Lesson classes.

See also

Episode and Lesson for the methods that will throw warnings

Examples

l <- Lesson$new(lesson_fragment())
e <- l$episodes[[3]]
# Our link validators run a series of tests on links and images and return a 
# data frame with information about the links (via xml2::url_parse), along 
# with the results of the tests
v <- asNamespace('pegboard')$validate_links(e)
names(v)
#>  [1] "scheme"               "server"               "port"                
#>  [4] "user"                 "path"                 "query"               
#>  [7] "fragment"             "orig"                 "text"                
#> [10] "alt"                  "title"                "type"                
#> [13] "rel"                  "anchor"               "sourcepos"           
#> [16] "filepath"             "node"                 "known_protocol"      
#> [19] "enforce_https"        "internal_anchor"      "internal_file"       
#> [22] "internal_well_formed" "all_reachable"        "img_alt_text"        
#> [25] "descriptive"          "link_length"         
v
#>    scheme            server     port user
#> 1   https   docs.python.org       NA     
#> 2   https   docs.python.org       NA     
#> 3   https   docs.python.org       NA     
#> 4   https pandas.pydata.org       NA     
#> 5   https   docs.python.org       NA     
#> 9   https   carpentries.org       NA     
#> 10                                NA     
#> 11  https   carpentries.org       NA     
#> 12                                NA     
#> 6                              21982     
#> 7                           80686392     
#> 8                              21982     
#> 13                                54     
#>                                                             path query
#> 1                                           /3/library/glob.html      
#> 2                                           /3/library/glob.html      
#> 3                                           /3/library/glob.html      
#> 4  /pandas-docs/stable/reference/api/pandas.DataFrame.shape.html      
#> 5                                       /3/library/stdtypes.html      
#> 9                                 /assets/img/TheCarpentries.svg      
#> 10                                              ../no-workie.svg      
#> 11                                /assets/img/TheCarpentries.svg      
#> 12                                              ../no-workie.svg      
#> 6                                                                     
#> 7                                                                     
#> 8                                                                     
#> 13                                                                    
#>     fragment
#> 1  glob.glob
#> 2           
#> 3           
#> 4           
#> 5  str.split
#> 9           
#> 10          
#> 11          
#> 12          
#> 6           
#> 7           
#> 8           
#> 13          
#>                                                                                      orig
#> 1                                   https://docs.python.org/3/library/glob.html#glob.glob
#> 2                                             https://docs.python.org/3/library/glob.html
#> 3                                             https://docs.python.org/3/library/glob.html
#> 4  https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html
#> 5                               https://docs.python.org/3/library/stdtypes.html#str.split
#> 9                                   https://carpentries.org/assets/img/TheCarpentries.svg
#> 10                                                                       ../no-workie.svg
#> 11                                  https://carpentries.org/assets/img/TheCarpentries.svg
#> 12                                                                       ../no-workie.svg
#> 6                                                              {{ page.root }}/index.html
#> 7                                                       {{ site.swc_pages }}/shell-novice
#> 8                                                                  {{ page.root }}{% link
#> 13                                                          {{ page.root }}/no-workie.svg
#>                                              text            alt title  type
#> 1                                       glob.glob           <NA>        link
#> 2                                            glob           <NA>        link
#> 3                                            glob           <NA>        link
#> 4                                    shape method           <NA>        link
#> 5                                           split           <NA>        link
#> 9                                                 books as clubs  <NA>   img
#> 10                                                books as clubs  <NA>   img
#> 11                               Carpentries logo           <NA>       image
#> 12                              Non-working image           <NA>       image
#> 6                                            Home           <NA>  <NA>  link
#> 7                                           shell           <NA>  <NA>  link
#> 8  link that isn't parsed correctly by commonmark           <NA>  <NA>  link
#> 13           Non-working image with jekyll syntax           <NA>  <NA> image
#>     rel anchor sourcepos                          filepath         node
#> 1  <NA>  FALSE        51 _episodes/14-looping-data-sets.md <link so....
#> 2  <NA>  FALSE        57 _episodes/14-looping-data-sets.md <link so....
#> 3  <NA>  FALSE        58 _episodes/14-looping-data-sets.md <link so....
#> 4  <NA>  FALSE       140 _episodes/14-looping-data-sets.md <link so....
#> 5  <NA>  FALSE       163 _episodes/14-looping-data-sets.md <link so....
#> 9  <NA>  FALSE       189 _episodes/14-looping-data-sets.md <img src....
#> 10 <NA>  FALSE       191 _episodes/14-looping-data-sets.md <img src....
#> 11 <NA>  FALSE       195 _episodes/14-looping-data-sets.md <image s....
#> 12 <NA>  FALSE       197 _episodes/14-looping-data-sets.md <image s....
#> 6  <NA>  FALSE        NA _episodes/14-looping-data-sets.md <link xm....
#> 7  <NA>  FALSE        NA _episodes/14-looping-data-sets.md <link xm....
#> 8  <NA>  FALSE        NA _episodes/14-looping-data-sets.md <link xm....
#> 13 <NA>  FALSE        NA _episodes/14-looping-data-sets.md <image x....
#>    known_protocol enforce_https internal_anchor internal_file
#> 1            TRUE          TRUE            TRUE          TRUE
#> 2            TRUE          TRUE            TRUE          TRUE
#> 3            TRUE          TRUE            TRUE          TRUE
#> 4            TRUE          TRUE            TRUE          TRUE
#> 5            TRUE          TRUE            TRUE          TRUE
#> 9            TRUE          TRUE            TRUE          TRUE
#> 10           TRUE          TRUE            TRUE         FALSE
#> 11           TRUE          TRUE            TRUE          TRUE
#> 12           TRUE          TRUE            TRUE         FALSE
#> 6            TRUE          TRUE            TRUE          TRUE
#> 7            TRUE          TRUE            TRUE          TRUE
#> 8            TRUE          TRUE            TRUE          TRUE
#> 13           TRUE          TRUE            TRUE          TRUE
#>    internal_well_formed all_reachable img_alt_text descriptive link_length
#> 1                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 2                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 3                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 4                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 5                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 9                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 10                 TRUE          TRUE         TRUE        TRUE        TRUE
#> 11                 TRUE          TRUE        FALSE        TRUE        TRUE
#> 12                 TRUE          TRUE        FALSE        TRUE        TRUE
#> 6                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 7                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 8                  TRUE          TRUE         TRUE        TRUE        TRUE
#> 13                 TRUE          TRUE        FALSE        TRUE        TRUE
# URL protocols -----------------------------------------------------------
# To avoid potentially malicious situations, we have an explicit list of
# allwed URI protocols, which can be found in the `allowed_uri_protocols`
# character vector:
asNamespace('pegboard')$allowed_uri_protocols
#>  [1] ""       "http"   "https"  "ftp"    "ftps"   "mailto" "news"   "irc"   
#>  [9] "irc6"   "ircs"   "gopher" "nntp"   "feed"   "telnet" "mms"    "rtsp"  
#> [17] "sms"    "svn"    "tel"    "fax"    "xmpp"   "webcal" "urn"   
# note that we make an additional check for the http protocol.

# Creating Warnings from the table ----------------------------------------
# The validator does not produce any warnings or messages, but this data
# frame can be passed on to other functions that will throw them for us. We
# have a function that will throw a warning/message for each link that
# fails the tests. These messages are controlled by `link_tests` and 
# `link_info`.
asNamespace('pegboard')$link_tests
#>                                                 known_protocol 
#>                                "[invalid protocol] ({scheme})" 
#>                                                  enforce_https 
#>                                         "[needs HTTPS] {orig}" 
#>                                                internal_anchor 
#>                                      "[missing anchor] {orig}" 
#>                                                  internal_file 
#>                                        "[missing file] {orig}" 
#>                                           internal_well_formed 
#> "[incorrect formatting]: [{text}][{orig}] -> [{text}]({orig})" 
#>                                                  all_reachable 
#>                                                             "" 
#>                                                   img_alt_text 
#>                              "[image missing alt-text] {orig}" 
#>                                                    descriptive 
#>                     "[uninformative link text] {sQuote(text)}" 
#>                                                    link_length 
#>                         "[link text too short] {sQuote(text)}" 
asNamespace('pegboard')$link_info
#>                                                                                                                                                                               known_protocol 
#> "Links must have a known URL protocol (e.g. https, ftp, mailto). See <https://developer.wordpress.org/reference/functions/wp_allowed_protocols/#return> for a list of acceptable protocols." 
#>                                                                                                                                                                                enforce_https 
#>                                                                                                                                   "Links must use HTTPS <https://https.cio.gov/everything/>" 
#>                                                                                                                                                                              internal_anchor 
#>                                                                                                                     "Some link anchors for relative links (e.g. [anchor]: link) are missing" 
#>                                                                                                                                                                                internal_file 
#>                                                                                                                                                    "Some linked internal files do not exist" 
#>                                                                                                                                                                         internal_well_formed 
#>                                                                                                                                                      "Some links were incorrectly formatted" 
#>                                                                                                                                                                                all_reachable 
#>                                                                                                                                                                                           "" 
#>                                                                                                                                                                                 img_alt_text 
#>                                                                                                          "Images need alt-text <https://webaim.org/techniques/hypertext/link_text#alt_link>" 
#>                                                                                                                                                                                  descriptive 
#>                                                                                         "Avoid uninformative link phrases <https://webaim.org/techniques/hypertext/link_text#uninformative>" 
#>                                                                                                                                                                                  link_length 
#>                                                                                   "Avoid single-letter or missing link text <https://webaim.org/techniques/hypertext/link_text#link_length>" 
asNamespace('pegboard')$throw_link_warnings(v)
#> ! There were errors in 4/13 links
#> ◌ Some linked internal files do not exist
#> ◌ Images need alt-text
#> <https://webaim.org/techniques/hypertext/link_text#alt_link>
#> 
#> ::warning file=_episodes/14-looping-data-sets.md,line=191:: [missing file]
#> ../no-workie.svg
#> ::warning file=_episodes/14-looping-data-sets.md,line=195:: [image missing
#> alt-text] https://carpentries.org/assets/img/TheCarpentries.svg
#> ::warning file=_episodes/14-looping-data-sets.md,line=197:: [missing file]
#> ../no-workie.svg [image missing alt-text] ../no-workie.svg
#> ::warning file=_episodes/14-looping-data-sets.md,line=NA:: [image missing
#> alt-text] { page.root }/no-workie.svg