Converting pandas DataFrames into Publication-Ready Tables

Analysing, comparing and communicating the predictive performance of machine learning models is a crucial component of any empirical research effort. Pandas, a staple in the Python data analysis stack, not only helps with the data wrangling itself, but also provides efficient solutions for data presentation. Two of its lesser-known yet incredibly useful features are df.to_markdown() and df.to_latex(), which allow for a seamless transition from DataFrames to publication-ready tables. Here’s how you can use them!

Exporting DataFrames to Markdown

Markdown is widely used for its simplicity and readability, making it a go-to format for rendering your GitHub README or rebuttals on OpenReview. With the df.to_markdown() method, you can turn any DataFrame into a Markdown table with a single line of code.

import pandas as pd

# construct example DataFrame
results = pd.DataFrame(
    {
        "model": [
            "random forest", 
            "support vector machine", 
            "multi-layer perceptron"
            ],
        "AUC-ROC": [0.83, 0.79, 0.81],
        "AUC-PRC": [0.46, 0.48, 0.49],
        "ECE": [0.04, 0.09, 0.05],
        "runtime": [0.004, 0.003, 0.01],
    }
)

# convert it to Markdown
print(results.to_markdown(index=False))

This Markdown table can then be copied into any Markdown editor or platform that supports it (such as this website) and will be rendered as a neat table.

model AUC-ROC AUC-PRC ECE runtime
random forest 0.83 0.46 0.04 0.004
support vector machine 0.79 0.48 0.09 0.003
multi-layer perceptron 0.81 0.49 0.05 0.01

This function uses the tabulate library, which additionally allows you to specify a range of different table styles using the tablefmt argument – e.g. a text grid like this:

+------------------------+-----------+-----------+-------+-----------+
| model                  |   AUC-ROC |   AUC-PRC |   ECE |   runtime |
+========================+===========+===========+=======+===========+
| random forest          |      0.83 |      0.46 |  0.04 |     0.004 |
+------------------------+-----------+-----------+-------+-----------+
| support vector machine |      0.79 |      0.48 |  0.09 |     0.003 |
+------------------------+-----------+-----------+-------+-----------+
| multi-layer perceptron |      0.81 |      0.49 |  0.05 |     0.01  |
+------------------------+-----------+-----------+-------+-----------+

Exporting DataFrames to LaTeX

LaTeX is the de facto standard for the typesetting of machine learning papers. The df.to_latex() method can convert a DataFrame into LaTeX tabular format which can be included directly in your LaTeX documents. Using the same example as above

import pandas as pd

# construct example DataFrame
results = pd.DataFrame(
    {
        "model": [
            "random forest", 
            "support vector machine", 
            "multi-layer perceptron"
            ],
        "AUC-ROC": [0.83, 0.79, 0.81],
        "AUC-PRC": [0.46, 0.48, 0.49],
        "ECE": [0.04, 0.09, 0.05],
        "runtime": [0.004, 0.003, 0.01],
    }
)

# convert it to LaTeX
print(results.to_latex(index=False))

we can generate the following table:

Similar to the df.to_markdown() function, df.to_latex() is quite flexible and allows you to customize the LaTeX table output to a great extent. Here are some of the specialized formatting options you can use with df.to_latex() to e.g. align columns, add captions and labels and standardise number formatting:

# Custom LaTeX table with specialized formatting
latex_output = results.to_latex(index=False,
                                column_format='|l|r|r|r|r|',
                                caption='Model Performance Metrics.',
                                label='tab:model_performance',
                                multicolumn_format='c',
                                escape=False,
                                header=[
                                    'Model', 'AUC-ROC', 'AUC-PRC', 
                                    'ECE', 'Runtime (s)'
                                    ],
                                float_format="%.4f")

print(latex_output)

resulting in the following table:

Conclusion

Over are the days of having to manually copy-paste your results into Overleaf! Both df.to_markdown() and df.to_latex() are straightforward yet highly customisable tools that allow you to easily compile and present your results for papers, blog posts and GitHub documentation.

Author