Skip to content
Permalink
ab9b34b1ed
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
447 lines (373 sloc) 18.4 KB
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Making your work replicable</title>
<meta name="author" content="James Brusey"/>
<meta name="description" content=""/>
<meta name="keywords" content=""/>
<style type="text/css">
.underline { text-decoration: underline; }
</style>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js/dist/reveal.css"/>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/reveal.js/dist/theme/white.css" id="theme"/>
<meta name="description" content="J Brusey - enabling supervision">
</head>
<body>
<div class="reveal">
<div class="slides">
<section id="sec-title-slide"><h1 class="title">Making your work replicable</h1><p class="subtitle"></p>
<h2 class="author">James Brusey</h2><h2 class="date">7 March 2024</h2>
</section>
<section>
<section id="slide-org3315257">
<h2 id="org3315257">Motivation</h2>
<ul>
<li>There is a replication crisis in science today</li>
<li>There are plenty of incentives to publish <b>more</b> papers</li>
<li>There are few incentives to publish <b>better</b> papers</li>
<li>Penalties for negligence, bias, and fraud are minor even if offenders are found out</li>
</ul>
<div id="orgaffd63d" class="figure">
<p><img src="science-fictions-book.jpg" alt="science-fictions-book.jpg" />
</p>
</div>
</section>
</section>
<section>
<section id="slide-orgd9d44b9">
<h2 id="orgd9d44b9">Our institutions are implicated</h2>
<ul>
<li>Publishers charge outrageous fees for essentially running a web / archive service</li>
<li>Universities continue to promote staff based on questionable metrics</li>
<li>Professors teach their PhD students to continue to `game' the system</li>
<li>Reviewers reject replication studies as not `sufficiently novel'</li>
<li>Academics collude in citation cartels to bump up each others citation ranking</li>
</ul>
</section>
</section>
<section>
<section id="slide-orgc54182a">
<h2 id="orgc54182a">The public are starting not to trust academics</h2>
<ul>
<li>Surprisingly, academics are still respected</li>
<li>Public trust is not a given and should not be taken for granted</li>
<li>We (academics) have a responsibility to fix things
<ul>
<li>let's start by improving replicability of our research</li>
</ul></li>
</ul>
</section>
</section>
<section>
<section id="slide-org6a6d052">
<h2 id="org6a6d052">Ideas for improving replicability</h2>
<p>
These ideas are focused on the <i>analysis</i> rather than the experimental work itself.
</p>
</section>
</section>
<section>
<section id="slide-org8c03e15">
<h2 id="org8c03e15">Please stop using Word and Excel</h2>
<ul>
<li>An old version of Excel caused a statistical analysis error during the Covid pandemic
<ul>
<li>but why were they using Excel?</li>
</ul></li>
<li>An analysis of genomics research shows that many studies have fallen prey to MARCH1 gene being altered by autocorrect in Excel</li>
<li>There are many reasons why you should not use Word but the number one reason is that it will stop you from automating parts of your research&#x2014;you will tend to be relying on cutting and pasting in figures and tables rather than auto-generating them. <i>Convert away before it is too late.</i></li>
</ul>
</section>
</section>
<section>
<section id="slide-orgd3a1778">
<h2 id="orgd3a1778">Use the command line and GNU Make</h2>
<ul>
<li>Analysis ends up having several steps
<ul>
<li>combining multiple data-sets into one</li>
<li>cleaning up NA entries</li>
<li>removing junk entries</li>
<li>summarising data to produce a table</li>
<li>producing a graph</li>
</ul></li>
</ul>
</section>
</section>
<section>
<section id="slide-org01629db">
<h2 id="org01629db">Method for using Make</h2>
<ul>
<li>Each step should be performed with a command or script (e.g., gnuplot)</li>
<li>Form multiple steps into a pipeline with GNU Make</li>
<li>Alongside much on-line sources, also see Data Science at the Command Line <a href="https://datascienceatthecommandline.com/">https://datascienceatthecommandline.com/</a></li>
<li>Python tabulate library can be used to convert a CSV to a LaTeX table.</li>
<li>In your LaTeX file, use <code>\input</code> to include those files</li>
</ul>
</section>
</section>
<section>
<section id="slide-org50d64ad">
<h2 id="org50d64ad">Example&#x2014;generating data</h2>
<p>
For example, say we have a script to generate some data <code>a.csv</code>, <code>b.csv</code>, <code>c.csv</code> called <code>gen.py</code>
</p>
<div class="org-src-container">
<pre class="src src-python" ><code trim><span style="color: #b6a0ff;">import</span> pandas <span style="color: #b6a0ff;">as</span> pd
<span style="color: #b6a0ff;">import</span> numpy <span style="color: #b6a0ff;">as</span> np
<span style="color: #00d3d0;">SZ</span>=(20,)
<span style="color: #00d3d0;">df</span> = pd.DataFrame(np.random.randint(0, 10, size=SZ), columns=[<span style="color: #79a8ff;">"value"</span>])
df.to_csv(<span style="color: #79a8ff;">"a.csv"</span>, index=<span style="color: #00bcff;">False</span>)
<span style="color: #00d3d0;">df</span> = pd.DataFrame(np.random.normal(0, 1, size=SZ), columns=[<span style="color: #79a8ff;">"value"</span>])
df.to_csv(<span style="color: #79a8ff;">"b.csv"</span>, index=<span style="color: #00bcff;">False</span>)
<span style="color: #00d3d0;">df</span> = pd.DataFrame(np.random.normal(5, 3, size=SZ), columns=[<span style="color: #79a8ff;">"value"</span>])
df.to_csv(<span style="color: #79a8ff;">"c.csv"</span>, index=<span style="color: #00bcff;">False</span>)
</code></pre>
</div>
</section>
</section>
<section>
<section id="slide-org0965c80">
<h2 id="org0965c80">Example&#x2014;combine data</h2>
<p>
We might then have another script <code>comb.py</code> to combine them.
</p>
<div class="org-src-container">
<pre class="src src-python" ><code trim><span style="color: #b6a0ff;">import</span> pandas <span style="color: #b6a0ff;">as</span> pd
<span style="color: #b6a0ff;">import</span> numpy <span style="color: #b6a0ff;">as</span> np
<span style="color: #00d3d0;">newframe</span> = {}
<span style="color: #b6a0ff;">for</span> f <span style="color: #b6a0ff;">in</span> [<span style="color: #79a8ff;">"a"</span>, <span style="color: #79a8ff;">"b"</span>, <span style="color: #79a8ff;">"c"</span>]:
<span style="color: #00d3d0;">newframe</span>[f] = pd.read_csv(f<span style="color: #79a8ff;">"</span>{f}<span style="color: #79a8ff;">.csv"</span>)[<span style="color: #79a8ff;">"value"</span>]
<span style="color: #00d3d0;">df</span> = pd.DataFrame(newframe)
df.to_csv(<span style="color: #79a8ff;">"all.csv"</span>, index=<span style="color: #00bcff;">False</span>)
</code></pre>
</div>
</section>
</section>
<section>
<section id="slide-org477553f">
<h2 id="org477553f">Example&#x2014;table</h2>
<p>
We can produce a table using python tabulate in a script called <code>maketable.py</code>
</p>
<div class="org-src-container">
<pre class="src src-python" ><code trim><span style="color: #b6a0ff;">import</span> pandas <span style="color: #b6a0ff;">as</span> pd
<span style="color: #b6a0ff;">import</span> numpy <span style="color: #b6a0ff;">as</span> np
<span style="color: #b6a0ff;">import</span> tabulate
<span style="color: #00d3d0;">df</span> = pd.read_csv(<span style="color: #79a8ff;">"all.csv"</span>)
<span style="color: #00d3d0;">result</span> = pd.melt(df).groupby(<span style="color: #79a8ff;">"variable"</span>).agg([<span style="color: #79a8ff;">"mean"</span>, <span style="color: #79a8ff;">"std"</span>])
<span style="color: #b6a0ff;">with</span> <span style="color: #f78fe7;">open</span>(<span style="color: #79a8ff;">"result.tex"</span>, <span style="color: #79a8ff;">"w"</span>) <span style="color: #b6a0ff;">as</span> f:
<span style="color: #f78fe7;">print</span>(
tabulate.tabulate(result, tablefmt=<span style="color: #79a8ff;">"latex"</span>,
headers=[<span style="color: #79a8ff;">"Class"</span>, <span style="color: #79a8ff;">"mean"</span>, <span style="color: #79a8ff;">"std"</span>]),
<span style="color: #f78fe7;">file</span>=f,
)
</code></pre>
</div>
</section>
</section>
<section>
<section id="slide-org12f4bc7">
<h2 id="org12f4bc7">Example&#x2014;graph</h2>
<p>
Finally, we might use <code>graph.py</code> to plot <code>a</code> versus <code>b</code> (ok, this is not a very meaningful graph!)
</p>
<div class="org-src-container">
<pre class="src src-python" ><code trim><span style="color: #b6a0ff;">import</span> pandas <span style="color: #b6a0ff;">as</span> pd
<span style="color: #b6a0ff;">import</span> numpy <span style="color: #b6a0ff;">as</span> np
<span style="color: #b6a0ff;">import</span> matplotlib.pyplot <span style="color: #b6a0ff;">as</span> plt
<span style="color: #00d3d0;">df</span> = pd.read_csv(<span style="color: #79a8ff;">"all.csv"</span>)
df.plot(x=<span style="color: #79a8ff;">"a"</span>, y=<span style="color: #79a8ff;">"b"</span>, kind=<span style="color: #79a8ff;">"scatter"</span>)
plt.savefig(<span style="color: #79a8ff;">"graph.png"</span>)
</code></pre>
</div>
<div id="org8cb57d0" class="figure">
<p><img src="graph.png" alt="graph.png" />
</p>
</div>
</section>
</section>
<section>
<section id="slide-orgcd1c2d5">
<h2 id="orgcd1c2d5">Example&#x2014;LaTeX doc</h2>
<p>
Naturally, we need a LaTeX document:
</p>
<div class="org-src-container">
<pre class="src src-latex" ><code trim><span style="color: #b6a0ff;">\documentclass</span>{<span style="color: #f78fe7;">article</span>}
<span style="color: #b6a0ff;">\usepackage</span>{<span style="color: #f78fe7;">siunitx</span>}
<span style="color: #b6a0ff;">\usepackage</span>{<span style="color: #f78fe7;">graphicx</span>}
<span style="color: #b6a0ff;">\title</span>{<span style="color: #feacd0;">My great article</span>}
<span style="color: #b6a0ff;">\author</span>{James Brusey}
<span style="color: #b6a0ff;">\begin</span>{<span style="color: #feacd0;">document</span>}
<span style="color: #b6a0ff;">\maketitle</span>
<span style="color: #b6a0ff;">\section</span>{<span style="color: #feacd0;">Introduction</span>}
Blah blah blah.
<span style="color: #b6a0ff;">\section</span>{<span style="color: #feacd0;">Results</span>}
<span style="color: #b6a0ff;">\input</span>{<span style="color: #f78fe7;">result</span>}
<span style="color: #b6a0ff;">\begin</span>{<span style="color: #feacd0;">figure</span>}
<span style="color: #b6a0ff;">\includegraphics</span>{<span style="color: #f78fe7;">graph.png</span>}
<span style="color: #b6a0ff;">\caption</span>{A scatter plot}
<span style="color: #b6a0ff;">\end</span>{<span style="color: #feacd0;">figure</span>}
<span style="color: #b6a0ff;">\end</span>{<span style="color: #feacd0;">document</span>}
</code></pre>
</div>
</section>
</section>
<section>
<section id="slide-orgf3f6d59">
<h2 id="orgf3f6d59">Example&#x2014;Makefile</h2>
<p>
Finally, we tie everything together with a <code>Makefile</code>
</p>
<div class="org-src-container">
<pre class="src src-makefile" ><code trim><span style="color: #feacd0;">article.pdf</span>: article.tex result.tex graph.png
pdflatex article.tex
<span style="color: #feacd0;">graph.png</span>: all.csv graph.py
python graph.py
<span style="color: #feacd0;">result.tex</span>: all.csv maketable.py
python maketable.py
<span style="color: #feacd0;">all.csv</span>: a.csv b.csv c.csv comb.py
python comb.py
<span style="color: #feacd0;">a.csv</span>: gen.py
python gen.py
</code></pre>
</div>
</section>
</section>
<section>
<section id="slide-orgb407890">
<h2 id="orgb407890">Using RStudio</h2>
<ul>
<li>RStudio allows you to put all the steps into a notebook form</li>
<li>The result can be exported to a LaTeX document</li>
<li>Best for R but difficult to format for a paper</li>
<li>A great resource for R and the tidyverse is R for Data Science <a href="https://r4ds.had.co.nz/">https://r4ds.had.co.nz/</a></li>
<li>You can also use Pandoc separately from RStudio</li>
</ul>
</section>
</section>
<section>
<section id="slide-org77308f4">
<h2 id="org77308f4">RStudio example</h2>
<div class="org-src-container">
<pre class="src src-markdown" ><code trim>---
title: "Example rmarkdown document"
date: "24/03/2022"
author:
- James Brusey
- Ann Other Author
documentclass: scrartcl
classoption: twoside
geometry: false
subtitle: false
output:
pdf_document:
includes:
in_header: header.tex
---
# Introduction
This is a sample markdown document.
I can make a new paragraph using a blank line and a numbered list just with:
1. this item
2. this other item
3. and so forth
Math symbols are also easy either inline $K: \Re \times \Re \rightarrow \{0, 1\}$ or as display math,
$$ x = \int _{0} ^{\infty} \frac{1}{y^2}. $$
In addition, you can use the power of R to process your data-set and display results as tables or graphs.
```{r cars, warning=FALSE}
library(knitr)
kable(summary(cars))
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code ```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see &lt;http://rmarkdown.rstudio.com&gt;.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
that generated the plot.
</code></pre>
</div>
</section>
</section>
<section>
<section id="slide-org5de2597">
<h2 id="org5de2597">Use Jupyter Notebook</h2>
<ul>
<li>Jupyter notebook supports Python and several other languages</li>
<li>As with Rstudio, can produce LaTeX by combining code, graphs, and markdown</li>
<li>There are no easy options for changing the document class, so not really a viable option for writing papers</li>
</ul>
</section>
</section>
<section>
<section id="slide-org8cb8d12">
<h2 id="org8cb8d12">Use Emacs org-mode</h2>
<ul>
<li>Org mode is a powerful editing environment that comes with Emacs</li>
<li>Org mode documents are similar to Rmarkdown (or pandoc) with easy formatting instructions</li>
<li>More flexible than Rmarkdown</li>
<li>Easy to change document class</li>
<li>Can include many different programming languages within the one document</li>
</ul>
</section>
</section>
<section>
<section id="slide-orgda9e82e">
<h2 id="orgda9e82e">Using a different latex class</h2>
<ol>
<li>Your org mode document will need <code>#+latex_class: IEEEtran</code></li>
<li>You'll need to configure Emacs using something like:</li>
</ol>
<div class="org-src-container">
<pre class="src src-emacs-lisp" ><code trim>(add-to-list 'org-latex-classes
'(<span style="color: #79a8ff;">"IEEEtran"</span>
<span style="color: #79a8ff;">"\\documentclass</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">[10pt</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">]</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{IEEEtran</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span>
(<span style="color: #79a8ff;">"\\section</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span> . <span style="color: #79a8ff;">"\\section*</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span>)
(<span style="color: #79a8ff;">"\\subsection</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span> . <span style="color: #79a8ff;">"\\subsection*</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span>)
(<span style="color: #79a8ff;">"\\subsubsection</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span> . <span style="color: #79a8ff;">"\\subsubsection*</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span>)
(<span style="color: #79a8ff;">"\\paragraph</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span> . <span style="color: #79a8ff;">"\\paragraph*</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">{%s</span><span style="color: #d0bc00;">\</span><span style="color: #79a8ff;">}"</span>)
))
</code></pre>
</div>
</section>
</section>
<section>
<section id="slide-orgc0f65bb">
<h2 id="orgc0f65bb">Further reading</h2>
<ol>
<li>I thoroughly recommend Science Fictions (<a href="#citeproc_bib_item_2">Ritchie, 2020</a>)</li>
<li>John Kitchin has a nice article on embedding data into PDFs. (<a href="#citeproc_bib_item_1">Kitchin, 2015</a>)</li>
<li>He also has a youtube describing org mode for research <a href="https://youtu.be/1-dUkyn_fZA">https://youtu.be/1-dUkyn_fZA</a></li>
</ol>
</section>
<section id="slide-orgc1ab15e">
<h3 id="orgc1ab15e">References</h3>
<div class="csl-bib-body">
<div class="csl-entry"><a id="citeproc_bib_item_1"></a>Kitchin, J.R. (2015) “Examples of Effective Data Sharing in Scientific Publishing,” <i>Acs catalysis</i>, 5(6), pp. 3894–3899. Available at: <a href="https://doi.org/10.1021/acscatal.5b00538">https://doi.org/10.1021/acscatal.5b00538</a>.</div>
<div class="csl-entry"><a id="citeproc_bib_item_2"></a>Ritchie, S. (2020) <i>Science fictions: Exposing fraud, bias, negligence and hype in science</i>. Random House.</div>
</div>
</section>
</section>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/reveal.js/dist/reveal.js"></script>
<script src="https://cdn.jsdelivr.net/npm/reveal.js/plugin/markdown/markdown.js"></script>
<script src="https://cdn.jsdelivr.net/npm/reveal.js/plugin/zoom/zoom.js"></script>
<script src="https://cdn.jsdelivr.net/npm/reveal.js/plugin/notes/notes.js"></script>
<script>
// Full list of configuration options available here:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
plugins: [RevealMarkdown, RevealZoom, RevealNotes],
width:1200, height:1000, margin: 0.1, minScale:0.2, maxScale:2.5, transition:'cube', slideNumber:true
});
</script>
</body>
</html>