Analysing, comparing and communicating the predictive performance of machine learning models is a crucial component of any empirical research effort. Pandas, a staple in the Python data analysis stack, not only helps with the data wrangling itself, but also provides efficient solutions for data presentation. Two of its lesser-known yet incredibly useful features are df.to_markdown()
and df.to_latex()
, which allow for a seamless transition from DataFrames to publication-ready tables. Here’s how you can use them!
Exporting DataFrames to Markdown
Markdown is widely used for its simplicity and readability, making it a go-to format for rendering your GitHub README or rebuttals on OpenReview. With the df.to_markdown()
method, you can turn any DataFrame into a Markdown table with a single line of code.
import pandas as pd # construct example DataFrame results = pd.DataFrame( { "model": [ "random forest", "support vector machine", "multi-layer perceptron" ], "AUC-ROC": [0.83, 0.79, 0.81], "AUC-PRC": [0.46, 0.48, 0.49], "ECE": [0.04, 0.09, 0.05], "runtime": [0.004, 0.003, 0.01], } ) # convert it to Markdown print(results.to_markdown(index=False))
This Markdown table can then be copied into any Markdown editor or platform that supports it (such as this website) and will be rendered as a neat table.
model | AUC-ROC | AUC-PRC | ECE | runtime |
---|---|---|---|---|
random forest | 0.83 | 0.46 | 0.04 | 0.004 |
support vector machine | 0.79 | 0.48 | 0.09 | 0.003 |
multi-layer perceptron | 0.81 | 0.49 | 0.05 | 0.01 |
This function uses the tabulate
library, which additionally allows you to specify a range of different table styles using the tablefmt
argument – e.g. a text grid like this:
+------------------------+-----------+-----------+-------+-----------+
| model | AUC-ROC | AUC-PRC | ECE | runtime |
+========================+===========+===========+=======+===========+
| random forest | 0.83 | 0.46 | 0.04 | 0.004 |
+------------------------+-----------+-----------+-------+-----------+
| support vector machine | 0.79 | 0.48 | 0.09 | 0.003 |
+------------------------+-----------+-----------+-------+-----------+
| multi-layer perceptron | 0.81 | 0.49 | 0.05 | 0.01 |
+------------------------+-----------+-----------+-------+-----------+
Exporting DataFrames to LaTeX
LaTeX is the de facto standard for the typesetting of machine learning papers. The df.to_latex()
method can convert a DataFrame into LaTeX tabular format which can be included directly in your LaTeX documents. Using the same example as above
import pandas as pd # construct example DataFrame results = pd.DataFrame( { "model": [ "random forest", "support vector machine", "multi-layer perceptron" ], "AUC-ROC": [0.83, 0.79, 0.81], "AUC-PRC": [0.46, 0.48, 0.49], "ECE": [0.04, 0.09, 0.05], "runtime": [0.004, 0.003, 0.01], } ) # convert it to LaTeX print(results.to_latex(index=False))
we can generate the following table:
Similar to the df.to_markdown()
function, df.to_latex()
is quite flexible and allows you to customize the LaTeX table output to a great extent. Here are some of the specialized formatting options you can use with df.to_latex()
to e.g. align columns, add captions and labels and standardise number formatting:
# Custom LaTeX table with specialized formatting latex_output = results.to_latex(index=False, column_format='|l|r|r|r|r|', caption='Model Performance Metrics.', label='tab:model_performance', multicolumn_format='c', escape=False, header=[ 'Model', 'AUC-ROC', 'AUC-PRC', 'ECE', 'Runtime (s)' ], float_format="%.4f") print(latex_output)
resulting in the following table:
Conclusion
Over are the days of having to manually copy-paste your results into Overleaf! Both df.to_markdown()
and df.to_latex()
are straightforward yet highly customisable tools that allow you to easily compile and present your results for papers, blog posts and GitHub documentation.