Blog

Automated Link Preview Image Card

Taking screenshots with Playwright

BinHong Lee

May 31, 2022

A while ago, I saw this tweet thread from Simon Willison about how he added social media preview cards to his TILs. He detailed how he did this through a combination of Puppeteer, Vercel, SQLite, and some other stuff I didn’t understand 😅.

At that time, I was manually taking these screenshots by hand to be included as part of the commit, which is to say it’s not very efficient. Anyway, this stayed on my backlog of “things to explore” for a really long time until recently during this long weekend (where I also took additional PTOs), I had some free time and decided to look into this. The goal is simple, automate creating the preview images for a list of given / defined urls.

Playwright

I know I just mentioned above that Simon used Puppeteer for his implementation but I just recently picked up Playwright for another project (GlobeTrotte) and figured that it seems like the right tool for this. Also, Playwright has better TypeScript support in general so I decided to opt for Playwright instead. (If you didn’t know, Playwright is built by the same team of engineers who built Puppeteer but has somehow all moved over from Google to Microsoft to build Playwright. There were some uhh interesting discussions about this when Playwright was first announced publicly.)

The minimum code needed for a working version is pretty short. It looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import { chromium } from "playwright-core";

async function genRun(): Promise<void> {
  const browser = await chromium.launch():
  const env = await browser.newContext({
    baseURL: "http://localhost:1313/blog",
    viewport: {
      width: 1200,
      height: 600,
    },
  });
  const page = await env.newPage();
  await page.goto("/2022-05-31-automated-link-preview-image");
  await page.waitForLoadState("networkidle");
  await page.screenshot({
    animations: "disabled",
    fullPage: false,
    type: "jpeg",
    path: "preview/automated_link_preview.jpg",
  });
  
  await browser.close();
}

@globetrotte/altimeter

Since I might want to use this in multiple places, I figured it’s probably a good idea to create an npm package for this so I can reuse it everywhere (with just some slight config change).

The package itself is really just a CLI tool for now. First, create a json config file based on this doc (depending on what you want from these screenshots). Then, run npm i -g @globetrotte/altimeter && altimeter config.json and you should see the screenshots show up once the run is complete.

Note: Unfortunately straight up calling npx @globetrotte/altimeter config.json doesn’t work as of now (unless you already have playwright installed before which can’t be guaranteed) since it won’t trigger the postinstall script to install playwright.

Alternatively, it can also be generated by including the package as a dependency, then import AltimeterConfig from the package, set the variables, and write the output to a JSON file. Something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import { AltimeterConfig, AltimeterDestination } from "@globetrotte/altimeter";

async function genConfig(urls: string) {
  const config = new AltimeterConfig();
  config.baseURL = "http://localhost:1313/blog";
  
  config.destURLs = urls.map((url) => {
    // You can also do some modification here to set a better "name"
    return new AltimeterDestination(url, url);
  });

  // This line is to add the index page itself (omit if not needed)
  config.destURLs.push(new AltimeterDestination("index", "/"));

  config.dir = "dist/preview";
  writeFileSync(path.join(__dirname, "config.json"), JSON.stringify(config, null, 2));
}

Then you can run it with altimeter config.json and it will start taking the screenshots and saving them into the specified folder.

<meta /> tags

To actually make the images show up when your web page is linked, you need to add the appropriate meta tags depending on which social media you want the image to be fetched and displayed.

Open Graph Protocol

Open Graph Protocol is used on many web services (including Facebook, LinkedIn, Mastodon) to fetch the preview image so you need to add the followings onto your site within the <head> element. You can also (optionally) specify the image width and height. From my personal experience, it doesn’t matter if they are absolute or relative path as they seem to work either way.

1
2
3
4
5
6
7
<head>
  ...
  <meta property="og:image" content="preview/automated_link_preview.jpg" />
  <meta property="og:image:height" content="600" />
  <meta property="og:image:width" content="1200" />
  ...
</head>

Twitter

Twitter has its own documentation for the display card. At the time of writing, it seems like they only take absolute path, relative paths don’t seem to work as intended.

1
2
3
4
5
6
<head>
  ...
  <meta name="twitter:card" content="summary_large_image">
  <meta name="twitter:image" content="https://binhong.me/blog/preview/automated_link_preview.jpg" />
  ...
</head>

You can use this card validator built by Twitter to try if your newly set <meta /> tags are getting picked up. Unlike just tweeting it out on your profile and see how it work, this would always force fetch from the given link and overwrite whatever cache they already stored of the link.

Google

Honestly, I couldn’t find a good documentation on how to go about setting the image properly. I only found this about setting the max-image-preview but nowhere does it say how to tell Google which image to use. I looked into how some of the news websites does it instead. NYTimes uses the image tag as below. The Verge seems to use some sort of Schema.org setup but within a <script /> tag (instead of <meta />) and label it as application/ld+json type. I also checked GitHub (since they actually do one of these screenshot as preview image stuff really well) but I didn’t see any <meta /> tag that stood out.

1
2
3
4
5
6
<head>
  ...
  <meta name="robots" content="max-image-preview:large">
  <meta name="image" content="https://binhong.me/threads/preview/index.jpg" />
  ...
</head>

GitHub Action

I currently use the above package as a CLI on my own blog (specifically this one that you’re reading right now). I didn’t want to have to run a bunch of things manually everytime I write a new post before I deploy so instead, I “delegated” this part of the job to GitHub Action.

  1. Add the generated directory to .gitignore
  2. Use Node.js in the GitHub Action (- uses: actions/setup-node@v1)
  3. Make sure the devserver site is up and running (in my case, hugo serve -D)
  4. Run npm install
  5. Run npx altimeter config.json
  6. Run build if needed (hugo --buildFuture)
  7. Deploy! 🎉

Note: If you are generating this for static content, you might need to setup a server to serve up those static content on a localhost address. I recommend following this guide to set up a server (instead of using expressjs).

Wrap up

This was a fun ride. Honestly, I ran into a few problems here and there (mainly, setting up monorepo in GlobeTrotte and creating CLI with TypeScript) but it’s relatively straightforward otherwise. The actual coding part took less than a day while writing up the documentations (and this blog post) probably took close to double the time.