Architecting a modern publication for young journalists with AWS

TL;DR

Articles published

Readers per month

Empowering young writers.

Started as a side project in 2021, The Fledger has provided a platform for aspiring writers to develop their skills and gain practical journalism experience; empowering young individuals by giving them a voice and paving their way towards professional success.

Streamlined with AWS.

By harnessing the power of AWS, we were able to focus on developing our application's core features and highlighting what sets us apart, instead of dedicating substantial resources to building and managing the backend infrastructure from scratch. This allowed us to prioritise our unique strengths and accelerate our progress.

Optimised performance.

Extensive optimisations have been implemented to reduce bundle size, enhance network and rendering performance, and prioritise fast page load times, delivering a seamless user experience, fostering user engagement, and improving SEO.

What is The Fledger?
Technical design

What is The Fledger?

As a side project, The Fledger is an empowering platform that amplifies the voices and perspectives of young adults, providing a space for upcoming thought-leaders to shape narratives, challenge conventions, and ignite change. The platform is unbiased and takes no sides, offering a space for debate. Anyone can write at The Fledger, regardless of their background or level of experience. The platform features personal stories and independent voices, nurturing aspiring writers and offering them the opportunity to gain real-world experience and build a portfolio that opens doors to the world of journalism.

I joined the team in mid-2021 as the principal engineer on the project. With an exhaustive list of requirements, my role oversees the technical aspects of The Fledger, ensuring seamless functionality, scalability, and user experience. Collaborating closely with a talented team of designers and content creators, we worked diligently to bring the platform to life, empowering young voices and creating a dynamic space for expression and engagement.

The home page of The Fledger. Each article is displayed in the style of a poster on a wall.

A discussion taking part on The Fledger — Readers can also engage with authors in the comment section of each article.

Technical design

Requirements

Before jumping to a solution, we needed to understand our requirements. As engineers, we are responsible for analysing the project goals, gathering insights from stakeholders, assessing the technical landscape, and asking the right questions. What is the goal of the platform? Who will be using the platform, and what devices do they use? How will users find us? How much can we afford to spend? What's our budget? What's our MVP? How quickly do we want to release the first version?

With this in mind, we developed a list of requirements as a guiding compass throughout the development process. These included but were not limited to:

Articles will be uploaded by administrators and scheduled for visibility on the website.
Visitors will access the site from desktop and mobile devices.
Visitors will access the site both directly and organically.
Visitors will be immediately routed after clicking an article from any page.
Visitors can sign up for the weekly newsletter.
Visitors can apply to write from the website.
The UI will be a custom design.
Only engineers will update the website copy.
Max budget is £20 a month.
Deliver the first version in two months.

At this stage, it's important not to rush to solutions. As engineers, it can be tempting to run to our IDEs and start typing. But our goal should never be to code for the sake of coding. We're out to build a product and should make decisions that serve the requirements, even if that means using a third-party tool to do all the heavy lifting. The code is not the product. Users don't care how the product is put together; they only care if it helps them and their businesses grow whilst putting a smile on their faces.

That being said, for The Fledger, maybe there was a solution where no development was required. Maybe we could use a website builder like Squarespace or WordPress? Ultimately, these solutions wouldn't have offered the required flexibility and performance. But I believe we should always be on the lookout for how we can write less code.

Data

Before designing our API, we needed an idea of how our data would be structured and the relationships between our models. This involved defining the different types of data entities and how they were related to each other. Ultimately, we would have three entities: article, author and topic. For the sake of simplicity, let's include as few properties as possible. Therefore, their client-side representations would be:

article.json

{
  "id": "some-article-id",
  "title": "Some article title",
  "content": "Some article content.",
}

author.json

{
  "id": "some-author-id",
  "name": "John Smith",
}

topic.json

{
  "id": "some-topic-id",
  "name": "News",
}

Their relationships with each other would be:

An article has one author and many topic.
An author has many article.
A topic has many article.

With these representations and relationships defined, we had a clear understanding of how our data entities would interact with each other, helping us make more informed decisions when determining the design of our API.

API

The REST API has traditionally been the standard and go-to for FE ↔ BE communication. It's clearly structured and easy to understand. However, let's say we wanted to display the article with its author and tags and the author's other articles on one page. Typically, we'd have to create several endpoints and make multiple HTTP GET requests. One to fetch the article (GET to /article/:id) and one to fetch the author (GET to /author/:id). Alternatively, we could include all the required data in a single response for the article, but with a frequently changing schema, how often would we need to update our response to add and remove fields required by the frontend? How do we make this project easily maintainable?

With this in mind, I used a GraphQL API. One of the significant advantages of using GraphQL is its flexibility in handling frontend data requirements. If we need to include additional fields in the response, we can simply include them in the query without changing the backend controllers, serialisers, or other components. This dynamic nature of GraphQL reduces the maintenance burden and allows us to adapt quickly to frontend changes.

Client-side architecture

As per our requirements, where our goals were to provide a seamless user experience with immediate page navigation and maximise the visibility of our articles for organic traffic via Search Engine Optimisation (SEO), Server-Side Rendering (SSR) proved to be the most suitable choice.

With JavaScript frameworks such as Vue and React, the DOM is generated on the client side. This means a minimal HTML skeleton is served on the initial page request, and then additional data is fetched to populate and render the page on the client side. This poses problems for two reasons:

SEO — To comprehend and rank web pages, search engines rely on crawling and indexing HTML, including its metadata. Although search engines these days can successfully index client-side rendered pages, indexing bots may have difficulty navigating and understanding the content dynamically generated by the JavaScript, leading to suboptimal indexing and potentially lower visibility in search results. Even if successful, applications that are presented client-side frequently lack crucial metadata, such as Open Graph and JSON-LD structured data, impacting SEO and link-sharing capabilities.
Performance - Every time the page is requested, the user must wait for the network requests to resolve and then wait for the page to render. Not only is this a poor experience for the visitor, but we make the same network requests repeatedly, even though the page content stays the same. This costs money and wastes users' time.

This is where Server-Side Rendering (SSR) comes into play. By using an SSR framework, such as Next.js or Nuxt.js, we ensure that the complete HTML representations of our web pages are readily available to search engines. This improves the discoverability and indexability of the content, increasing the chances of our pages being ranked higher in search results, improving site performance and reducing server load.

When comparing the available SSR frameworks, Next.js and Nuxt.js, we found that, at the time, Next.js offered some distinct advantages and determined it to be the better choice. Not only did it provide robust SSR capabilities, but it also supported Incremental Static Regeneration (ISR). This powerful feature allows certain pages to be pre-rendered and cached at build time while still being dynamically updated when necessary. This meant we could generate the static pages for all of The Fledger's current articles during the build process, but when new articles are published, generate and cache them on-the-fly.

Using AWS

Considering the simplicity of The Fledger and the fact that we'd want to release a version within two months, how could we simplify and accelerate the development process? I had used cloud development platforms commercially while at ServiceAdminPro with Google's Firebase, and my experience working with this technology highlighted how revolutionary these services could be.

For those unaware, cloud development platforms such as Firebase and AWS Amplify provide us with a comprehensive set of tools to build, deploy and manage our applications in the cloud. These ready-to-use backend services include hosting, database management, storage, authentication and serverless functions. So ultimately, they allow us to focus on building our applications' core features and what makes us unique, rather than spending significant effort on building and managing the backend infrastructure from scratch.

In the case of The Fledger, after comparing the available services on the market, AWS Amplify emerged as the ideal choice. One of the main contention points was how both Amplify and Firebase handle data storage. Firebase's database solution, Firestore, utilises a document-oriented approach where data is stored in flexible, schemaless documents. However, for The Fledger's interconnected data entities, such as articles, authors, and topics, we required a more structured and relational data model. This is where AWS Amplify's integration with DynamoDB came into play. Although, like Firestore, DynamoDB is also NoSQL based, we could set up relationships between our data models and, with Amplify's support for GraphQL, easily query the data the frontend requires without experiencing the limitations of Firestore's document-oriented structure.

After the initial setup of AWS Amplify, taking our three entities, we could easily define our models and the relationships between one another via the @hasOne, @belongsTo, @hasMany, @manyToMany decorators. We could also determine which users had access to what documents. As per our requirements, I needed to provide a method for administrators to upload new articles to the site. With Amplify's simple @auth decorator, I could specify that only users within the Admin user group were able to manage articles, whereas every other user, they were only read-only access.

schema.graphql

type Article @model @auth(rules: [
  { allow: public, provider: iam, operations: [read] }
  { allow: private, provider: iam, operations: [read] }
  { allow: groups, provider: userPools, groups: ["Admin"], operations: [create, read, update, delete] }
]) {
  id: ID!
  title: String
  content: String
  authorId: ID! @index(name: "byArticleAuthor")
  author: Author @belongsTo(fields: ["authorId"])
  topics: [Topic] @manyToMany(relationName: "ArticleTopics")
}

type Author @model @auth(rules: [
  { allow: public, provider: iam, operations: [read] }
  { allow: private, provider: iam, operations: [read] }
]) {
  id: ID!
  name: String
  articles: [Article] @hasMany(indexName: "byArticleAuthor", fields: ["id"])
}

type Topic @model @auth(rules: [
  { allow: public, provider: iam, operations: [read] }
  { allow: private, provider: iam, operations: [read] }
]) {
  id: ID!
  name: String
  articles: [Article] @manyToMany(relationName: "ArticleTopics")
}

And then, via Amplify's support for relational databases and GraphQL, we could easily query the article with its assigned author and topics, all in a single HTTP request.

API.ts

import { GraphQLAPI } from "@aws-amplify/api-graphql";

const getArticle = /* GraphQL */ `
  query GetArticle($id: ID!) {
    getArticle(id: $id) {
      id
      title
      author {
        id
        name
      }
      topics {
        items {
          id
          name
        }
      }
    }
  }
`;

// Amplify provides an API to construct 
// and execute the HTTP request.
const { data } = await GraphQLAPI.graphql({
  query: getArticle,
  variables: {
    id: "some-article-id"
  }
})

Don't get me wrong, like all new technologies in their infancy, AWS Amplify has a lot of problems, even more so in 2021 when I first started on this side project. From hidden complexities to seemingly random deployment failures, its stability can sometimes be questionable.

But despite the headaches along the way, by abstracting away much of the backend complexity, Amplify has provided a streamlined development experience, enabling us to rapidly iterate and deploy new features with ease, giving us that freedom to focus on the core product itself.

Optimisations

How do we ensure the fastest possible page load without compromising the visitor's experience?

During my career, I've gained extensive experience in frontend optimisations, particularly in enhancing JavaScript performance through efficient functional operations, minimising main thread work, and optimising network requests, to name a few. These optimisations used to be solely focused on software functionality rather than just website performance. However, this side project has provided me with valuable insights into web page optimisations that prioritise factors such as reducing Time to Interactive and eliminating unnecessary data transfers; greatly accelerating my knowledge and proficiency in frontend optimisations, making me well-versed in both functional software and web-specific optimisation techniques.

One of the main tools employed to help diagnose issues was Google's PageSpeed Insights, powered by Lighthouse. Although not 100% accurate and thorough, it indicates how well an application is optimised across Performance, Accessibility, Best Practices, and SEO.

Page speed insights results page — The results of PageSpeed Insights run on The Fledger's home page. Although Desktop is well optimised, the Mobile version needs additional work.

The following are just a few of the optimisations introduced that cover Network, Rendering and JavaScript performance. The big takeaway is that we should aim to do less stuff.

Reduce bundle size

Bundles are formed by combining all our application code and its dependencies into multiple JavaScript files via code-splitting. By minimising the amount of code we ship, clients receive leaner bundles, enabling faster download and parsing times; minimising main thread work. This means users can access and interact with our application faster. This was particularly important for The Fledger, considering the critical role of speed in article-based websites and readers' expectations. In an era where attention spans are shorter than ever, waiting for a page to load after clicking on an article can be frustrating and increase the risk of abandonment.

A valuable tool utilised for bundle diagnosis was webpack-bundle-analyzer. If you're unaware of what this tool does, essentially, it allows you to view all your project's bundle content as an interactive treemap.

This analysis helps us understand and optimise the code structure for improved performance. I've seen some common mishaps over the years when entire libraries of functions or components are included in the bundle, even though they're never actually used. For this project, this came in the form of date-fns and @fortawesome. Generally, a process named tree-shaking will occur during a production build. This removes all dead, unused code in a project to ensure only code that's used by the application is shipped. However, for some packages, this is not correctly implemented; all exports that come with a module are included in the bundle. Therefore to resolve this, instead of importing our functions and icons as so:

import { startOfWeek } from "date-fns";
import { faCoffee } from "@fortawesome/free-solid-svg-icons"

We can use deep imports:

import startOfWeek from "date-fns/startOfWeek";
import { faCoffee } from "@fortawesome/free-solid-svg-icons/faCoffee"

Another issue I encountered was how large @fortawesome/react-fontawesome's JavaScript files were. If you look at the image below, you can see that index.es.js takes up a bulky 22% of app.js chunk size (even with tree-shaking). But what does this file do, and is it necessary? It turns out it can be replaced with a custom icon implementation, removing the need to include the NPM packages that cause this bundle bloating.

Unneccessary FontAwesome JavaScript files — `@fortawesome/react-fontawesome` takes up over 22% of chunk size.

We can remove the following modules:

{
  "@fortawesome/fontawesome-svg-core": "…",
  "@fortawesome/react-fontawesome": "…"
}

And replace with our own implementation:

VFontAwesomeIcon/index.tsx

import React, { Fragment, createElement, SVGAttributes } from "react";
import { IconDefinition } from "@fortawesome/fontawesome-common-types";
import classNames from "classnames";

export interface VFontAwesomeIconDefinition extends Omit<IconDefinition, "icon">{
  icon: [
    number,
    number,
    string[],
    string,
    string | string[],
    SVGAttributes<SVGSVGElement>["fillRule"]?,
  ];
}

export type VFontAwesomeIconProps = {
  icon: VFontAwesomeIconDefinition
  className?: string;
  onClick?: () => void;
}

export const VFontAwesomeIcon: React.FC<VFontAwesomeIconProps> = ({
  icon,
  className,
  onClick,
}) => {
  const { icon: [width, height, , , vectorData, fillRule] } = icon;

  const propsData: SVGAttributes<SVGSVGElement> = {
    xmlns: "http://www.w3.org/2000/svg",
    viewBox: `0 0 ${width} ${height}`,
    role: "img",
    fillRule,
    className: classNames([
      "inline-block overflow-visible box-content h-[1.125em] align-[-0.125em]",
      className
    ]),
    onClick,
  };

  const children = createElement(
    Fragment, 
    null, 
    null, 
    createElement("path", { 
      fill: "currentColor", 
      d: vectorData.toString(),
    })
  );
  
  return createElement("svg", propsData, children);
};

Defer resources

In the same spirit, we can defer the loading of unessential scripts and components to when they're needed — a Just In Time philosophy. Again, this reduces our bundle size, enabling faster download and parsing times and decreasing the strain on the main thread. A few examples include:

Using Next.js's dynamic import:
This allows us to defer loading components and libraries and only include them in the client bundle when required. For example, the loading of VLogoutModal and VEditProfileModal is deferred until the user clicks to open them.

const VLogoutModal = dynamic(() => {
  return import("@/components/molecules/VLogoutModal")
    .then((mod) => mod.VLogoutModal);
}, { ssr: false });

const VEditProfileModal = dynamic(() => {
  return import("@/components/molecules/VEditProfileModal")
    .then((mod) => mod.VEditProfileModal);
}, { ssr: false });

Dynamically importing libraries with import:
Although The Fledger allows users to register for an account, most users visit the site to read an article and are not interested in signing up. So why include dependencies that are solely there for that purpose? As a result, as an example, the @aws-amplify/auth module is not included in the main bundle and is dynamically imported when a user wants to do something authentication-related.
```
const handleAuthLogout = async () => {
  try {
    setLoggingOut(true);

    // Dynamically import Auth module when required to reduce bundle size.
    const { Auth } = await import("@aws-amplify/auth");
    await Auth.signOut();
  } catch (error) {
    setLoggingOut(false);
    captureException(error);
  }
};
```
Render in viewport with IntersectionObserver:
When a user clicks on an article that includes embedded content such as Tweets, Instagram Posts, or YouTube videos, additional third-party scripts and HTTP requests are needed to retrieve and display them. Instead of loading this content immediately when the page loads, we can defer its loading until it's about to enter the user's view. To achieve this, we utilised the IntersectionObserver API. Initially, we display a loading skeleton for the element, and as it approaches the viewport, we trigger the loading of the necessary dependencies. This approach improves performance and enhances the user experience by optimising the loading of embedded content.
```
const observerElementRef = useRef<HTMLDivElement>(null);

useEffect(() => {
  const observer = new IntersectionObserver((entries) => {      
    if (entries[0].isIntersecting) {
      // Do something.
    }
  }, { threshold: 0 });

  if (observerElementRef.current) {
    observer.observe(observerElementRef.current);
  }

  return () => {
    if (observerElementRef.current) {
      observer.unobserve(observerElementRef.current);
    }
  };
}, [observerElementRef]);
```

Image optimisation

One aspect of performance that is frequently overlooked is image optimisation. Images can present various challenges, including large file sizes, high resolution, and improper compression. These issues can lead to slower page load times, increased bandwidth usage, and suboptimal user experiences. To address these concerns, the following techniques were employed:

Compress files: Images are optimised by ensuring compression before uploading them to Cloudinary, our cloud-based image and video management service, resulting in reduced file sizes and improved loading times.
Use next-generation formats: Modern image formats, such as WebP, are utilised to take advantage of superior compression and improved image quality, providing a better visual experience while minimising file sizes. Cloudinary supports file type transformations, allowing for easy conversion of .jpg to .webp, where applicable, across the application.
Load at the correct size: srcset and sizes attributes, in combination with Cloudinary, allows for the responsive loading of images based on different viewport sizes and device capabilities. This approach delivers the appropriate image size, reducing unnecessary bandwidth usage and processing requirements.
Lazy load where appropriate: Images are lazily loaded using the loading="lazy" attribute for native images. This technique defers the loading of images until they're about to enter the viewport, improving initial page load times and prioritising the loading of visible images.
Use CDN: Content Delivery Networks (CDNs) are leveraged via Cloudinary to efficiently distribute images worldwide. This ensures faster delivery to users regardless of location, reducing latency and improving the overall user experience.

Accessibility

Accessibility was a fundamental consideration throughout the site's development, ensuring that the platform was inclusive and usable by all, regardless of a visitors' disability. As a result, the measures implemented included:

Semantic HTML: As is expected, we utilised semantic HTML tags including <header>, <nav>, <main>, and <article>, providing meaningful structure and context for assistive technologies such as screen readers.
ARIA roles and attributes: Where applicable, the appropriate ARIA roles and attributes were used. For the most part, by using the correct HTML element, the use of ARIA roles would be minimal. However, attributes such as aria-hidden ensure that purely decorative content, such as SVGs, are not interpreted via assistive technology.
ALT attributes: All images have ALT attributes, which provide text descriptions for images, meaning visually impaired users can still understand the content.
Hitbox size: By ensuring that interactive elements like buttons and links had an adequate size, we made it easier for users to tap on them, avoiding the frustration of trying to target small elements. This improvement greatly benefited users with motor impairments or those accessing the platform on touch devices.
Colour contrast: We adhered to WCAG guidelines for colour contrast, ensuring that text and important elements had sufficient contrast against the background. This improved readability and accessibility for users with visual impairments.

Testing

As mentioned in my values, tests play a crucial role in the development process by enabling us to iterate quickly with confidence. For this project, I employed the following testing approaches:

Unit Testing: We prioritised unit testing to validate the functionality of our Lambdas. For example, when working with the Notion API to extract data from a page and create DynamoDB records, our tests provided assurance that any changes or updates could be implemented without impacting existing functionality. This approach reduced the need for extensive manual testing and increased our confidence in the application's stability.
E2E Testing: End-to-end (E2E) testing plays a critical role in validating complex user flows that involve multiple components and API interactions. For instance, when an unregistered user visits an article and tries to add a comment, they encounter a registration modal. By simulating this interaction using Cypress tests, we ensure the user is smoothly guided through the sign-up process and seamlessly returns to the site, where the comment is successfully created.

Monitoring and analytics

Monitoring and analytics play a vital role in understanding user behaviour, diagnosing bugs, and assessing goals. At The Fledger, we leverage a combination of tools, including Sentry, Amplitude, and Google Analytics, to gain valuable insights. These tools provide valuable data and enable us to make data-driven decisions to improve the platform.

For instance, through event logging and analysis, we discovered an alarming churn rate of over 50% when users encountered the registration blocker while attempting to input a comment on an article. Consequently, we made an influential change by modifying the blocker's behaviour. Instead of appearing immediately, it now shows up only after users have finished typing and pressed the submit button. This adjustment resulted in a remarkable improvement, with over 90% of users successfully submitting their comments.