Links like [link text]({{ page.root }}/destination.html) are not parsed correctly by our commonmark parser and are output as text. Use this to find these missing links and transform them into link or image elements.

## Usage

fix_links(body)

text_to_links(txt, ns = NULL, type, sourcepos = NULL)

find_between_nodes(a, b, include = TRUE)

## Arguments

body

an XML document.

ns

a namespace object

node

a node determined to be a text representation of a link destination

txt

text derived from xml2::xml_text()

type

sourcepos

defaults to NULL. If this is not NULL, it's the sourcepos attribute of the text node(s) and will be applied to the new nodes.

pattern

a regular expression that is used for splitting the link from the surrounding text.

## Value

fix_links(): the modified body

• find_broken_link(): a list where each element represents a fragmented link. Inside each element are two elements:

• parent: the parent paragraph node for the link

• nodes: the series of four or five nodes that make up the link text

• get_link_fragments(): the preceding three or four nodes, which will be the text of the link or the alt text of the image.

text_to_links(): if ns is NULL: a character vector of XML text nodes, otherwise, new XML text nodes.

• get_link_fragments(): the preceding three or four nodes, which will be the text of the link or the alt text of the image.

## Details

### Motivation

Jekyll implements the liquid template language, which can break some syntax expected by commonmark. If this syntax appears in a link context, that link is rendred as text. Carpentries Lessons created before 2023 use Jekyll and have this templating embedded for many links.

In order to convert a pre-workbench lesson to use The Workbench, we need to make sure all the links are accurately represented to avoid invalid syntax and broken links from sneaking into the lesson.

### Implementation Details

For example, a valid line with a link that looks like [Home](index.html) and other text will appear in XML as:

...
<text> and other text</text>
...

However, if a link uses liquid templating for a variable such as: [Home]({{ page.root }}/index.html) and other text, it will appear in XML as

...
<text asis="true">[</text>
<text>Home</text>
<text asis="true">]</text>
<text>({{ page.root }}/index.html) and other text</text>
...

Note: the nodes with asis elements are from tinkr protecting square brackets. When we run fix_links(), these nodes are collapsed into a link:

...
<text> and other text</text>
...

And with that we can further transform the link to replace the liquid templating with something that makes sense in sandpaper.

find_broken_links() uses the pattern generated by make_link_patterns() to search for potential links.

fix_broken_links() uses the output of find_broken_links() to replace the node fragments with links.

make_link_patterns() a generator to create an XPath query that will search for liquid markup following a closing bracket.

get_link_fragment_nodes(): Get the source for the link node fragments

fix_broken_link() takes a set of nodes that comprises a single link and recomposes them into a link or image node.

links_within_text_regex(): finding different types of links within markdown text can be challenging because it involves characters used in regex for grouping and character classes. In general, I want to do two things with text that I get back from a document:

1. split the links out from the text

2. identify which parts of the resulting vector are links.

This way, I can convert the links to links and the text to text.

text_to_links(): Splits links away from text and returns a nodeset to insert

make_link(): makes a link depending on the link type

## Examples

loop <- fs::path(lesson_fragment(), "_episodes", "14-looping-data-sets.md")
e <- Episode$new(loop, fix_links = FALSE) e$links  # five links
#> {xml_nodeset (5)}
#> [1] <link sourcepos="36:8-36:75" destination="https://docs.python.org/3/libra ...
#> [2] <link sourcepos="42:25-42:77" destination="https://docs.python.org/3/libr ...
#> [3] <link sourcepos="43:9-43:61" destination="https://docs.python.org/3/libra ...
#> [4] <link sourcepos="125:17-125:118" destination="https://pandas.pydata.org/p ...
#> [5] <link sourcepos="148:62-148:129" destination="https://docs.python.org/3/l ...
e$images # four images #> {xml_nodeset (4)} #> [1] <html_block sourcepos="174:1-174:86" xml:space="preserve">&lt;img src="ht ... #> [2] <html_block sourcepos="176:1-176:49" xml:space="preserve">&lt;img src=".. ... #> [3] <image sourcepos="180:1-180:74" destination="https://carpentries.org/asse ... #> [4] <image sourcepos="182:1-182:38" destination="../no-workie.svg" title="">\ ... # fix_links() --------------------------------------------------------------- asNamespace("pegboard")$fix_links(e$body) e$links  # eight links
#> {xml_nodeset (8)}
#> [1] <link sourcepos="36:8-36:75" destination="https://docs.python.org/3/libra ...
#> [2] <link sourcepos="42:25-42:77" destination="https://docs.python.org/3/libr ...
#> [3] <link sourcepos="43:9-43:61" destination="https://docs.python.org/3/libra ...
#> [4] <link sourcepos="125:17-125:118" destination="https://pandas.pydata.org/p ...
#> [5] <link sourcepos="148:62-148:129" destination="https://docs.python.org/3/l ...
#> [6] <link xmlns="http://commonmark.org/xml/1.0" destination="{{ page.root }}/ ...
#> [7] <link xmlns="http://commonmark.org/xml/1.0" destination="{{ site.swc_page ...
#> [8] <link xmlns="http://commonmark.org/xml/1.0" destination="{{ page.root }}{ ...
e$images # five images #> {xml_nodeset (5)} #> [1] <html_block sourcepos="174:1-174:86" xml:space="preserve">&lt;img src="ht ... #> [2] <html_block sourcepos="176:1-176:49" xml:space="preserve">&lt;img src=".. ... #> [3] <image sourcepos="180:1-180:74" destination="https://carpentries.org/asse ... #> [4] <image sourcepos="182:1-182:38" destination="../no-workie.svg" title="">\ ... #> [5] <image xmlns="http://commonmark.org/xml/1.0" destination="{{ page.root }} ... asNamespace("pegboard")$make_link_patterns()
#> .//md:text[@asis][text()=']']/following-sibling::md:text[(contains(text(), '({{') and contains(text(), '}}'))]

helpers
#>                                         to_split
#> "(?<!(\\]|\\)|\\!))\$|\$(?!(\\]|\$|\$$))|\$$" #> find_links #> "(?<!\\[)\$(\\[|\\()"
txt <- "text ![image text](a.png) with [a link](b.org) and text"
res <- strsplit(txt, helpers["to_split"], perl = TRUE)[[1]]
data.frame(res)
#>                        res
#> 1 text ![image text](a.png
#> 2                    with
#> 4                 and text
#> [1]  TRUE FALSE  TRUE FALSE