Back to posts
  • Usability Testing
  • Interface Copywriting
  • Content Management Solution

How to Own the Data of Your Research Repository – and Why It Matters

Daniel Becker
Daniel Becker
Abstract image of computer

One of Dinghy’s most important resources is our research repository. It holds years of insights. We use it to inform and improve our design work on every project.

The valuable information was at stake, but we didn’t realize this until recently. As companies grow more reliant on third-party platforms to store valuable information, ensuring long-term access to that data becomes a critical concern. This exact situation was highlighted for Dinghy when Dovetail, the software as a service provider whose software we use for the research repository, announced a significant increase in pricing.

We found a way to ensure that we have (and keep!) access to the insights we accumulated over the years. In this post we walk you through the steps we took and share with you a script that will help you to do the same.

Discovering Dovetail’s Data Export Process Link to this headline

Dovetail offers the functionality to download all account data in bulk. Getting there was less straightforward than we had hoped for, but after some exploration of the settings and using the help desk’s search function we had a download on our server in little time.

Dovetail Data Project

To download the data from Dovetail all projects that should be included need to be checked. From the “meatball menu” of the action bar an Export option can be selected.

The download went surprisingly fast. A bit too fast for hours worth of interviews and other static assets. A quick review of the data surfaced open and well structured text data:

Dovetail Project Export/
  README.txt
  FlatFilesystemStructure.jsonl
  [project hash]/
    Fields.jsonl
    FieldSets.jsonl
    Files.jsonl
    …
    

A bunch of files holding a JSON string in each line. What was missing however was the actual assets of recent interviews. The Files.jsonl holds download links to these files.

Attention: the download URLs expire after seven days.

A Simple NodeJS Script to Automate the Download Link to this headline

With the amount of data we’ve had persisted in our research repository going through all Files.jsonl files in the download and manually downloading the assets would be a time-consuming and error-prone task. We went to work to write a NodeJS script that would parse the files in the Dovetail export and download all assets so we could store them on our own server.

// download-dovetail-assets.mjs

import { mkdir, readdir, readFile } from "node:fs/promises";
import { createWriteStream, existsSync } from "node:fs";
import { pipeline } from "node:stream/promises";
import path from "path";

async function main() {
  const directories = await getDirectories("./");
  console.log(`Found ${directories.length} directories.`);

  const files = [];
  for (let directory of directories) {
    const directoryFiles = await getFiles(directory);
    files.push(...directoryFiles.map((file) => ({ ...file, directory })));
  }

  downloadFiles(files);
}

async function getDirectories(path) {
  try {
    const entries = await readdir(path, { withFileTypes: true });
    const directories = entries.filter((dirent) => dirent.isDirectory());
    return directories;
  } catch (error) {
    console.error("Error occured while reading directory!", error);
    return [];
  }
}

function parseFileInformation(fileInformationString) {
  if (!fileInformationString) {
    console.warn("File information is empty.");
    return null;
  }
  try {
    const information = JSON.parse(fileInformationString);
    return information;
  } catch (error) {
    console.warn(
      "Error parsing file information.",
      error,
      `"${fileInformationString}"`,
    );
    return null;
  }
}

function getDirectoryPath(directory) {
  return path.join(directory.path, directory.name);
}

async function getFiles(directory) {
  const directoryPath = getDirectoryPath(directory);
  const fileListPath = path.join(directoryPath, "Files.jsonl");

  if (!existsSync(fileListPath)) {
    console.warn(`No file list found in ${directoryPath}.`);
    return [];
  }

  try {
    const fileListContents = await readFile(fileListPath, { encoding: "utf8" });
    const fileList = fileListContents.split("\\n");
    const files = fileList.map(parseFileInformation);
    return files;
  } catch (error) {
    console.warn("Error reading file list.", error);
    return [];
  }
}

async function downloadAsset(url, filepath) {
  try {
    const res = await fetch(url);

    if (!res.ok) {
      throw new Error(`Request Failed With a Status Code: ${res.status}`);
    }

    const fileStream = createWriteStream(filepath);
    await pipeline(res.body, fileStream);

    return filepath;
  } catch (error) {
    throw new Error(`Failed to download asset: ${error.message}`);
  }
}

async function downloadFiles(files) {
  for (let file of files) {
    try {
      const directoryPath = getDirectoryPath(file.directory);

      const assetsPath = path.join(directoryPath, "assets");
      await mkdir(assetsPath, { recursive: true });

      const fileName = file.originalFileName;
      const filePath = path.join(assetsPath, fileName);

      if (existsSync(filePath)) {
        console.log("Skip existing asset", fileName);
        continue;
      }

      console.log("Download asset …", fileName);
      await downloadAsset(file.downloadUrl, filePath);
      console.log("… done.", fileName);
    } catch (error) {
      console.warn("Could not download asset.", error, file);
    }
  }
}

main();

Before running the script, ensure you have NodeJS installed on your machine. This script expects to be located inside the Dovetail export folder. It should be run with NodeJS (v20+) with the following shell command:

node download-dovetail-assets.mjs

Depending on the amount of data in the repository this will take a while. After the script finished the assets should be located in a directory called assets inside each of the project directories.

Note: this script hasn’t been thoroughly tested. It fulfilled what it was supposed to do without issues. Feel free to get in touch, if you want to use the script yourself and encounter issues (or have ideas on how to improve it – there’s a lot that can be done).

Takeaways Link to this headline

A business’s information usually stays relevant for a long time. Especially if it is the foundation for product and strategy decisions. Whether it’s usability research, customer information or marketing content, businesses should ensure data ownership.

The Dovetail download serves as a reminder that even with a machine readable bulk download data ownership is more complex than it should be.

Author

Daniel Becker
Daniel Becker

Co-Founder, Head of Tech

Connect on LinkedIn

Similar articles