Tag: Website

Improve Largest Contentful Paint (LCP) on Your Website With Ease

(This is a sponsored post.)

Optimizing the user experience you offer on your website is essential for the success of any online business. Google does use different user experience-related metrics to rank web pages for SEO and has continued to provide multiple tools to measure and improve web performance.

In its recent attempt to simplify the measurement and understanding of what qualifies as a good user experience, Google standardized the page’s user experience metrics.

These standardized metrics are called Core Web Vitals and help evaluate the real-world user experience on your web page.

Largest Contentful Paint or LCP is one of the Core Web Vitals metrics, which measures when the largest content element in the viewport becomes visible. While other metrics like TTFB and First Contentful Paint also help measure the page experience, they do not represent when the page has become “meaningful” for the user.

Usually, unless the largest element on the page becomes completely visible, the page may not provide much context for the user. LCP is, therefore, more representative of the user’s expectations.As a Core Web Vital metric, LCP accounts for 25% of the Performance Score, making it one of the most important metrics to optimize.

Checking your LCP time

As per Google, the types of elements considered for Largest Contentful Paint are:

  • <img> elements
  • <image> elements inside an <svg> element
  • <video> elements (the poster image is used)
  • An element with a background image loaded via the url() function (as opposed to a CSS gradient)
  • Block-level elements containing text nodes or other inline-level text elements children.

Now, there are multiple ways to measure the LCP of your page.

The easiest ways to measure it are PageSpeed Insights, Lighthouse, Search Console (Core Web Vitals Report), and the Chrome User Experience Report.

For example, Google PageSpeed Insights in its report indicates the element considered for calculating the LCP.

What is a good LCP time?

To provide a good user experience, you should strive to have a Largest Contentful Paint of 2.5 seconds or less on your website. A majority of your page loads should be happening under this threshold.

Now that we know what is LCP and what our target should be let’s look at ways to improve LCP on our website.

How to optimize Largest Contentful Paint (LCP)

The underlying principle of reducing LCP in all of the techniques mentioned below is to reduce the data downloaded on the user’s device and reduce the time it takes to send and execute that content.

1. Optimize your images

On most websites, the above-the-fold content usually contains a large image which gets considered for LCP. It could either be a hero image, a banner, or a carousel. It is, therefore, crucial that you optimize these images for a better LCP.

To optimize your images, you should use a third-party image CDN like ImageKit.io. The advantage of using a third-party image CDN is that you can focus on your actual business and leave image optimization to the image CDN.

The image CDN would stay at the edge of technology evolution, and you always get the best possible features with minimum ongoing investment.

ImageKit is a complete real-time image CDN that integrates with any existing cloud storage like AWS S3, Azure, Google Cloud Storage, etc. It even comes with its integrated image storage and manager called the Media Library.

Here is how ImageKit can help you improve your LCP score.

1. Deliver your images in lighter formats

ImageKit detects if the user’s browser supports modern lighter formats like WebP or AVIF and automatically delivers the image in the lightest possible format in real-time. Formats like WebP are over 30% lighter compared to their JPEG equivalents.

2. Automatically compress your images

Not just converting the image to the correct format, ImageKit also compresses your image to a smaller size. In doing so, it balances the image’s visual quality and the output size.

You get the option to alter the compression level (or quality) in real-time by just changing a URL parameter, thereby balancing your business requirements of visual quality and load time.

3. Provide real-time transformations for responsive images

Google uses mobile-first indexing for almost all websites. It is therefore essential to optimize LCP for mobile more than that for desktop. Every image needs to be scaled down to as per the layout’s requirement.

For example, you would need the image in a smaller size on the product listing page and a larger size on the product detail page. This resizing ensures that you are not sending any additional bytes than what is required for that particular page.

ImageKit allows you to transform responsive images in real-time just by adding the corresponding transformation in the image URL. For example, the following image is resized to width 200px and height 300px by adding the height and width transformation parameters in its URL.

4. Cache images and improve delivery time

Image CDNs use a global Content Delivery Network (CDN) to deliver the images. Using a CDN ensures that images load from a location closer to the user instead of your server, which could be halfway across the globe.

ImageKit, for example, uses AWS Cloudfront as its CDN, which has over 220 deliver nodes globally. A vast majority of the images get loaded in less than 50ms. Additionally, it uses the proper caching directives to cache the images on the user’s device, CDN nodes, and even its processing network for a faster load time.

This helps to improve LCP on your website.

2. Preload critical resources

There are certain cases where the browser may not prioritize loading a visually important resource that impacts LCP. For example, a banner image above the fold could be specified as a background image inside a CSS file. Since the browser would never know about this image until the CSS file is downloaded and parsed along with the DOM tree, it will not prioritize loading it.

For such resources, you can preload them by adding a <link> tag with a rel= "preload" attribute to the head section of your HTML document.

<!-- Example of preloading --> <link rel="preload" src="banner_image.jpg" />

While you can preload multiple resources in a document, you should always restrict it to above-the-fold images or videos, page-wide font files, or critical CSS and JS files.

3. Reduce server response times

If your server takes long to respond to a request, then the time it takes to render the page on the screen also goes up. It, therefore, negatively affects every page speed metric, including LCP. To improve your server response times, here is what you should do.

1. Analyze and optimize your servers

A lot of computation, DB queries, and page construction happens on the server. You should analyze the requests going to your servers and identify the possible bottlenecks for responding to the requests. It could be a DB query slowing things down or the building of the page on your server.

You can apply best practices like caching of DB responses, pre-rendering of pages, amongst others, to reduce the time it takes for your server to respond to requests.

Of course, if the above does not improve the response time, you might need to increase your server capacity to handle the number of requests coming in.

2. Use a Content Delivery Network

We have already seen above that using an image CDN like ImageKit improves the loading time for your images. Your users get the content delivered from a CDN node close to their location in milliseconds.

You should extend the same to other content on your website. Using a CDN for your static content like JS, CSS, and font files will significantly speed up their load time. ImageKit does support the delivery of static content through its systems.

You can also try to use a CDN for your HTML and APIs to cache those responses on the CDN nodes. Given the dynamic nature of such content, using a CDN for HTML or APIs can be a lot more complex than using a CDN for static content.

3. Preconnect to third-party origins

If you use third-party domains to deliver critical above-the-fold content like JS, CSS, or images, then you would benefit by indicating to the browser that a connection to that third-party domain needs to be made as soon as possible. This is done using the rel="preconnect" attribute of the <link> tag.

<link rel="preconnect" href="https://static.example.com" />

With preconnect in place, the browser can save the domain connection time when it downloads the actual resource later.

Subdomains like static.example.com, of your main website domain example.com are also third-party domains in this context.

You can also use the dns-prefetch as a fallback in browsers that don’t support preconnect. This directive instructs the browser to complete the DNS resolution to the third-party domain even if it cannot establish a proper connection.

4. Serve content cache-first using a Service Worker

Service workers can intercept requests originating from the user’s browser and serve cached responses for the same. This allows us to cache static assets and HTML responses on the user’s device and serve them without going to the network.

While the service worker cache serves the same purpose as the HTTP or browser cache, it offers fine-grained control and can work even if the user is offline. You can also use service workers to serve precached content from the cache to users on slow network speeds, thereby bringing down LCP time.

5. Compress text files

Any text-based data you load on your webpage should be compressed when transferred over the network using a compression algorithm like gzip or Brotli. SVGs, JSONs, API responses, JS and CSS files, and your main page’s HTML are good candidates for compression using these algorithms. This compression significantly reduces the amount of data that will get downloaded on page load, therefore bringing down the LCP.

4. Remove render-blocking resources

When the browser receives the HTML page from your server, it parses the DOM tree. If there is any external stylesheet or JS file in the DOM, the browser has to pause for them before moving ahead with the parsing of the remaining DOM tree.

These JS and CSS files are called render-blocking resources and delay the LCP time. Here are some ways to reduce the blocking time for JS and CSS files:

1. Do not load unnecessary bundles

Avoid shipping huge bundles of JS and CSS files to the browser if they are not needed. If the CSS can be downloaded a lot later, or a JS functionality is not needed on a particular page, there is no reason to load it up front and block the render in the browser.

Suppose you cannot split a particular file into smaller bundles, but it is not critical to the functioning of the page either. In that case, you can use the defer attribute of the script tag to indicate to the browser that it can go ahead with the DOM parsing and continue to execute the JS file at a later stage. Adding the defer attribute removes any blocker for DOM parsing. The LCP, therefore, goes down.

2. Inline critical CSS

Critical CSS comprises the style definitions needed for the DOM that appears in the first fold of your page. If the style definitions for this part of the page are inline, i.e., in each element’s style attribute, the browser has no dependency on the external CSS to style these elements. Therefore, it can render the page quickly, and the LCP goes down.

3. Minify and compress the content

You should always minify the CSS and JS files before loading them in the browser. CSS and JS files contain whitespace to make them legible, but they are unnecessary for code execution. So, you can remove them, which reduces the file size on production. Smaller file size means that the files can load quickly, thereby reducing your LCP time.

Compression techniques, as discussed earlier, use data compression algorithms to bring down the file size delivered over the network. Gzip and Brotli are two compression algorithms. Brotli compression offers a superior compression ratio compared to Gzip and is now supported on all major browsers, servers, and CDNs.

5. Optimize LCP for client-side rendering

Any client-side rendered website requires a considerable amount of Javascript to load in the browser. If you do not optimize the Javascript sent to the browser, then the user may not see or be able to interact with any content on the page until the Javascript has been downloaded and executed.

We discussed a few JS-related optimizations above, like optimizing the bundles sent to the browser and compressing the content. There are a couple of more things you can do to optimize the rendering on client devices.

1. Using server-side rendering

Instead of shipping the entire JS to the client-side and doing all the rendering there, you can generate the page dynamically on the server and then send it to the client’s device. This would increase the time it takes to generate the page, but it will decrease the time it takes to make a page active in the browser.

However, maintaining both client-side and server-side frameworks for the same page can be time-consuming.

2. Using pre-rendering

Pre-rendering is a different technique where a headless browser mimics a regular user’s request and gets the server to render the page. This rendered page is stored during the build cycle once, and then every subsequent request uses that pre-rendered page without any computation on the server, resulting in a fast load time.

This improves the TTFB compared to server-side rendering because the page is prepared beforehand. But the time to interactive might still take a hit as it has to wait for the JS to download for the page to become interactive. Also, since this technique requires pre-rendering of pages, it may not be scalable if you have a large number of pages.

Conclusion

Core Web Vitals, which include LCP, have become a significant search ranking factor and strongly correlate with the user experience. Therefore, if you run an online business, you should optimize these vitals to ensure the success of the same.

The above techniques have a significant impact on optimizing LCP. Using ImageKit as your image CDN will give you a quick headstart.

Sign-up for a forever free account, upload your images to the ImageKit storage, or connect your origin, and start delivering optimized images in minutes.


The post Improve Largest Contentful Paint (LCP) on Your Website With Ease appeared first on CSS-Tricks. You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, , , , ,

Scaling Organizations Should Consider Building a Website Backed by a CRM Platform

To make some terminology clear here:

  • CMS = Content Management System
  • CRM = Customer Relationship Management

Both are essentially database-backed systems for managing data. HubSpot is both, and much more. Where a CMS might be very focused on content and the metadata around making content useful, a CRM is focused on leads and making communicating with current and potential customers easier.

They can be brothers-in-arms. We’ll get to that.

Say a CRM is set up for people. You run a Lexus dealership. There is a quote form on the website. People fill it out and enter the CRM. That lead can go to your sales team for taking care of that customer.

But a CRM could be based on other things. Say instead of people it’s based on real estate listings. Each main entry is a property, with essentially metadata like photos, address, square footage, # of bedrooms/baths, etc. Leads can be associated with properties.

That would be a nice CRM setup for a real estate agency, but the data that is in that CRM might be awfully nice for literally building a website around those property listings. Why not tap into that CRM data as literal data to build website pages from?

That’s what I mean by a CRM and CMS being brothers-in-arms. Use them both! That’s why HubSpot can be an ideal home for websites like this.

To keep that tornado of synergy going, HubSpot can also help with marketing, customer service, and integrations. So there is a lot of power packed into one platform.

And with that power, also a lot of comfort and flexibility.

  • You’re still developing locally.
  • You’re still using Git.
  • You can use whatever framework or site-building tools you want.
  • You’ve got a CLI to control things.
  • There is a VS Code Extension for super useful auto-complete of your data.
  • There is a staging environment.

And the feature just keep coming. HubSpot really has a robust set of tools to make sure you can do what you need to do.

As developer-rich as this all is, it doesn’t mean that it’s developer-only. There are loads of tools for working with the website you build that require no coding at all. Dashboard for content management, data wrangling, style control, and even literal drag-and-drop page builders.

It’s all part of a very learnable system.

Themestemplatesmodules, and fields are the objects you’ll work with most in HubSpot CMS as a developer. Using these different objects effectively lets you give content creators the freedom to work and iterate on websites independently while staying inside style and layout guardrails you set.


The post Scaling Organizations Should Consider Building a Website Backed by a CRM Platform appeared first on CSS-Tricks. You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, , , , , , ,
[Top]

Securing Your Website With Subresource Integrity

When you load a file from an external server, you’re trusting that the content you request is what you expect it to be. Since you don’t manage the server yourself, you’re relying on the security of yet another third party and increasing the attack surface. Trusting a third party is not inherently bad, but it should certainly be taken into consideration in the context of your website’s security.

A real-world example

This isn’t a purely theoretical danger. Ignoring potential security issues can and has already resulted in serious consequences. On June 4th, 2019, Malwarebytes announced their discovery of a malicious skimmer on the website NBA.com. Due to a compromised Amazon S3 bucket, attackers were able to alter a JavaScript library to steal credit card information from customers.

It’s not only JavaScript that’s worth worrying about, either. CSS is another resource capable of performing dangerous actions such as password stealing, and all it takes is a single compromised third-party server for disaster to strike. But they can provide invaluable services that we can’t simply go without, such as CDNs that reduce the total bandwidth usage of a site and serve files to the end-user much faster due to location-based caching. So it’s established that we need to sometimes rely on a host that we have no control over, but we also need to ensure that the content we receive from it is safe. What can we do?

Solution: Subresource Integrity (SRI)

SRI is a security policy that prevents the loading of resources that don’t match an expected hash. By doing this, if an attacker were to gain access to a file and modify its contents to contain malicious code, it wouldn’t match the hash we were expecting and not execute at all.

Doesn’t HTTPS do that already?

HTTPS is great for security and a must-have for any website, and while it does prevent similar problems (and much more), it only protects against tampering with data-in-transit. If a file were to be tampered with on the host itself, the malicious file would still be sent over HTTPS, doing nothing to prevent the attack.

How does hashing work?

A hashing function takes data of any size as input and returns data of a fixed size as output. Hashing functions would ideally have a uniform distribution. This means that for any input, x, the probability that the output, y, will be any specific possible value is similar to the probability of it being any other value within the range of outputs.

Here’s a metaphor:

Suppose you have a 6-sided die and a list of names. The names, in this case, would be the hash function’s “input” and the number rolled would be the function’s “output.” For each name in the list, you’ll roll the die and keep track of what name each number number corresponds to, by writing the number next to the name. If a name is used as input more than once, its corresponding output will always be what it was the first time. For the first name, Alice, you roll 4. For the next, John, you roll 6. Then for Bob, Mary, William, Susan, and Joseph, you get 2, 2, 5, 1, and 1, respectively. If you use “John” as input again, the output will once again be 6. This metaphor describes how hash functions work in essence.

Name (input) Number rolled (output)
Alice 4
John 6
Bob 2
Mary 2
William 5
Susan 1
Joseph 1

You may have noticed that, for example, Bob and Mary have the same output. For hashing functions, this is called a “collision.” For our example scenario, it inevitably happens. Since we have seven names as inputs and only six possible outputs, we’re guaranteed at least one collision.

A notable difference between this example and a hash function in practice is that practical hash functions are typically deterministic, meaning they don’t make use of randomness like our example does. Rather, it predictably maps inputs to outputs so that each input is equally likely to map to any particular output.

SRI uses a family of hashing functions called the secure hash algorithm (SHA). This is a family of cryptographic hash functions that includes 128, 256, 384, and 512-bit variants. A cryptographic hash function is a more specific kind of hash function with the properties being effectively impossible to reverse to find the original input (without already having the corresponding input or brute-forcing), collision-resistant, and designed so a small change in the input alters the entire output. SRI supports the 256, 384, and 512-bit variants of the SHA family.

Here’s an example with SHA-256:

For example. the output for hello is:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

And the output for hell0 (with a zero instead of an O) is:

bdeddd433637173928fe7202b663157c9e1881c3e4da1d45e8fff8fb944a4868

You’ll notice that the slightest change in the input will produce an output that is completely different. This is one of the properties of cryptographic hashes listed earlier.

The format you’ll see most frequently for hashes is hexadecimal, which consists of all the decimal digits (0-9) and the letters A through F. One of the benefits of this format is that every two characters represent a byte, and the evenness can be useful for purposes such as color formatting, where a byte represents each color. This means a color without an alpha channel can be represented with only six characters (e.g., red = ff0000)

This space efficiency is also why we use hashing instead of comparing the entirety of a file to the data we’re expecting each time. While 256 bits cannot represent all of the data in a file that is greater than 256 bits without compression, the collision resistance of SHA-256 (and 384, 512) ensures that it’s virtually impossible to find two hashes for differing inputs that match. And as for SHA-1, it’s no longer secure, as a collision has been found.

Interestingly, the appeal of compactness is likely one of the reasons that SRI hashes don’t use the hexadecimal format, and instead use base64. This may seem like a strange decision at first, but when we take into consideration the fact that these hashes will be included in the code and that base64 is capable of conveying the same amount of data as hexadecimal while being 33% shorter, it makes sense. A single character of base64 can be in 64 different states, which is 6 bits worth of data, whereas hex can only represent 16 states, or 4 bits worth of data. So if, for example, we want to represent 32 bytes of data (256 bits), we would need 64 characters in hex, but only 44 characters in base64. When we using longer hashes, such as sha384/512, base64 saves a great deal of space.

Why does hashing work for SRI?

So let’s imagine there was a JavaScript file hosted on a third-party server that we included in our webpage and we had subresource integrity enabled for it. Now, if an attacker were to modify the file’s data with malicious code, the hash of it would no longer match the expected hash and the file would not execute. Recall that any small change in a file completely changes its corresponding SHA hash, and that hash collisions with SHA-256 and higher are, at the time of this writing, virtually impossible.

Our first SRI hash

So, there are a few methods you can use to compute the SRI hash of a file. One way (and perhaps the simplest) is to use srihash.org, but if you prefer a more programmatic way, you can use:

sha384sum [filename here] | head -c 96 | xxd -r -p | base64
  • sha384sum Computes the SHA-384 hash of a file
  • head -c 96 Trims all but the first 96 characters of the string that is piped into it
    • -c 96 Indicates to trim all but the first 96 characters. We use 96, as it’s the character length of an SHA-384 hash in hexadecimal format
  • xxd -r -p Takes hex input piped into it and converts it into binary
    • -r Tells xxd to receive hex and convert it to binary
    • -p Removes the extra output formatting
  • base64 Simply converts the binary output from xxd to base64

If you decide to use this method, check the table below to see the lengths of each SHA hash.

Hash algorithm Bits Bytes Hex Characters
SHA-256 256 32 64
SHA-384 384 48 96
SHA-512 512 64 128

For the head -c [x] command, x will be the number of hex characters for the corresponding algorithm.

MDN also mentions a command to compute the SRI hash:

shasum -b -a 384 FILENAME.js | awk '{ print $  1 }' | xxd -r -p | base64

awk '{print $ 1}' Finds the first section of a string (separated by tab or space) and passes it to xxd. $ 1 represents the first segment of the string passed into it.

And if you’re running Windows:

@echo off set bits=384 openssl dgst -sha%bits% -binary %1% | openssl base64 -A > tmp set /p a= < tmp del tmp echo sha%bits%-%a% pause
  • @echo off prevents the commands that are running from being displayed. This is particularly helpful for ensuring the terminal doesn’t become cluttered.
  • set bits=384 sets a variable called bits to 384. This will be used a bit later in the script.
  • openssl dgst -sha%bits% -binary %1% | openssl base64 -A > tmp is more complex, so let’s break it down into parts.
    • openssl dgst computes a digest of an input file.
    • -sha%bits% uses the variable, bits, and combines it with the rest of the string to be one of the possible flag values, sha256, sha384, or sha512.
    • -binary outputs the hash as binary data instead of a string format, such as hexadecimal.
    • %1% is the first argument passed to the script when it’s run.
    • The first part of the command hashes the file provided as an argument to the script.
    • | openssl base64 -A > tmp converts the binary output piping through it into base64 and writes it to a file called tmp. -A outputs the base64 onto a single line.
    • set /p a= <tmp stores the contents of the file, tmp, in a variable, a.
    • del tmp deletes the tmp file.
    • echo sha%bits%-%a% will print out the type of SHA hash type, along with the base64 of the input file.
    • pause Prevents the terminal from closing.

SRI in action

Now that we understand how hashing and SRI hashes work, let’s try a concrete example. We’ll create two files:

// file1.js alert('Hello, world!');

and:

// file2.js alert('Hi, world!');

Then we’ll compute the SHA-384 SRI hashes for both:

Filename SHA-384 hash (base64)
file1.js 3frxDlOvLa6GGEUwMh9AowcepHRx/rwFT9VW9yL1wv/OcerR39FEfAUHZRrqaOy2
file2.js htr1LmWx3PQJIPw5bM9kZKq/FL0jMBuJDxhwdsMHULKybAG5dGURvJIXR9bh5xJ9

Then, let’s create a file named index.html:

<!DOCTYPE html> <html>   <head>     <script type="text/javascript" src="./file1.js" integrity="sha384-3frxDlOvLa6GGEUwMh9AowcepHRx/rwFT9VW9yL1wv/OcerR39FEfAUHZRrqaOy2" crossorigin="anonymous"></script>     <script type="text/javascript" src="./file2.js" integrity="sha384-htr1LmWx3PQJIPw5bM9kZKq/FL0jMBuJDxhwdsMHULKybAG5dGURvJIXR9bh5xJ9" crossorigin="anonymous"></script>   </head> </html>

Place all of these files in the same folder and start a server within that folder (for example, run npx http-server inside the folder containing the files and then open one of the addresses provided by http-server or the server of your choice, such as 127.0.0.1:8080). You should get two alert dialog boxes. The first should say “Hello, world!” and the second, “Hi, world!”

If you modify the contents of the scripts, you’ll notice that they no longer execute. This is subresource integrity in effect. The browser notices that the hash of the requested file does not match the expected hash and refuses to run it.

We can also include multiple hashes for a resource and the strongest hash will be chosen, like so:

<!DOCTYPE html> <html>   <head>     <script       type="text/javascript"       src="./file1.js"       integrity="sha384-3frxDlOvLa6GGEUwMh9AowcepHRx/rwFT9VW9yL1wv/OcerR39FEfAUHZRrqaOy2 sha512-cJpKabWnJLEvkNDvnvX+QcR4ucmGlZjCdkAG4b9n+M16Hd/3MWIhFhJ70RNo7cbzSBcLm1MIMItw 9qks2AU+Tg=="        crossorigin="anonymous"></script>     <script        type="text/javascript"       src="./file2.js"       integrity="sha384-htr1LmWx3PQJIPw5bM9kZKq/FL0jMBuJDxhwdsMHULKybAG5dGURvJIXR9bh5xJ9 sha512-+4U2wdug3VfnGpLL9xju90A+kVEaK2bxCxnyZnd2PYskyl/BTpHnao1FrMONThoWxLmguExF7vNV WR3BRSzb4g=="       crossorigin="anonymous"></script>   </head> </html>

The browser will choose the hash that is considered to be the strongest and check the file’s hash against it.

Why is there a “crossorigin” attribute?

The crossorigin attribute tells the browser when to send the user credentials with the request for the resource. There are two options to choose from:

Value (crossorigin=) Description
anonymous The request will have its credentials mode set to same-origin and its mode set to cors.
use-credentials The request will have its credentials mode set to include and its mode set to cors.

Request credentials modes mentioned

Credentials mode Description
same-origin Credentials will be sent with requests sent to same-origin domains and credentials that are sent from same-origin domains will be used.
include Credentials will be sent to cross-origin domains as well and credentials sent from cross-origin domains will be used.

Request modes mentioned

Request mode Description
cors The request will be a CORS request, which will require the server to have a defined CORS policy. If not, the request will throw an error.

Why is the “crossorigin” attribute required with subresource integrity?

By default, scripts and stylesheets can be loaded cross-origin, and since subresource integrity prevents the loading of a file if the hash of the loaded resource doesn’t match the expected hash, an attacker could load cross-origin resources en masse and test if the loading fails with specific hashes, thereby inferring information about a user that they otherwise wouldn’t be able to.

When you include the crossorigin attribute, the cross-origin domain must choose to allow requests from the origin the request is being sent from in order for the request to be successful. This prevents cross-origin attacks with subresource integrity.

Using subresource integrity with webpack

It probably sounds like a lot of work to recalculate the SRI hashes of each file every time they are updated, but luckily, there’s a way to automate it. Let’s walk through an example together. You’ll need a few things before you get started.

Node.js and npm

Node.js is a JavaScript runtime that, along with npm (its package manager), will allow us to use webpack. To install it, visit the Node.js website and choose the download that corresponds to your operating system.

Setting up the project

Create a folder and give it any name with mkdir [name of folder]. Then type cd [name of folder] to navigate into it. Now we need to set up the directory as a Node project, so type npm init. It will ask you a few questions, but you can press Enter to skip them since they’re not relevant to our example.

webpack

webpack is a library that allows you automatically combine your files into one or more bundles. With webpack, we will no longer need to manually update the hashes. Instead, webpack will inject the resources into the HTML with integrity and crossorigin attributes included.

Installing webpack

Yu’ll need to install webpack and webpack-cli:

npm i --save-dev webpack webpack-cli 

The difference between the two is that webpack contains the core functionalities whereas webpack-cli is for the command line interface.

We’ll edit our package.json to add a scripts section like so:

{   //... rest of package.json ...,   "scripts": {     "dev": "webpack --mode=development"   }   //... rest of package.json ..., }

This enable us to run npm run dev and build our bundle.

Setting up webpack configuration

Next, let’s set up the webpack configuration. This is necessary to tell webpack what files it needs to deal with and how.

First, we’ll need to install two packages, html-webpack-plugin, and webpack-subresource-integrity:

npm i --save-dev html-webpack-plugin webpack-subresource-integrity style-loader css-loader 
Package name Description
html-webpack-plugin Creates an HTML file that resources can be injected into
webpack-subresource-integrity Computes and inserts subresource integrity information into resources such as <script> and <link rel=…>
style-loader Applies the CSS styles that we import
css-loader Enables us to import css files into our JavaScript

Setting up the configuration:

const path              = require('path'),       HTMLWebpackPlugin = require('html-webpack-plugin'),       SriPlugin         = require('webpack-subresource-integrity');  module.exports = {   output: {     // The output file's name     filename: 'bundle.js',     // Where the output file will be placed. Resolves to      // the "dist" folder in the directory of the project     path: path.resolve(__dirname, 'dist'),     // Configures the "crossorigin" attribute for resources      // with subresource integrity injected     crossOriginLoading: 'anonymous'   },   // Used for configuring how various modules (files that    // are imported) will be treated   modules: {     // Configures how specific module types are handled     rules: [       {         // Regular expression to test for the file extension.         // These loaders will only be activated if they match         // this expression.         test: /\.css$  /,         // An array of loaders that will be applied to the file         use: ['style-loader', 'css-loader'],         // Prevents the accidental loading of files within the         // "node_modules" folder         exclude: /node_modules/       }     ]   },   // webpack plugins alter the function of webpack itself   plugins: [     // Plugin that will inject integrity hashes into index.html     new SriPlugin({       // The hash functions used (e.g.        // <script integrity="sha256- ... sha384- ..." ...       hashFuncNames: ['sha384']     }),     // Creates an HTML file along with the bundle. We will     // inject the subresource integrity information into      // the resources using webpack-subresource-integrity     new HTMLWebpackPlugin({       // The file that will be injected into. We can use        // EJS templating within this file, too       template: path.resolve(__dirname, 'src', 'index.ejs'),       // Whether or not to insert scripts and other resources       // into the file dynamically. For our example, we will       // enable this.       inject: true     })   ] };

Creating the template

We need to create a template to tell webpack what to inject the bundle and subresource integrity information into. Create a file named index.ejs:

<!DOCTYPE html> <html>   <body></body> </html>

Now, create an index.js in the folder with the following script:

// Imports the CSS stylesheet import './styles.css' alert('Hello, world!');

Building the bundle

Type npm run build in the terminal. You’ll notice that a folder, called dist is created, and inside of it, a file called index.html that looks something like this:

<!DOCTYPE HTML> <html><head><script defer src="bundle.js" integrity="sha384-lb0VJ1IzJzMv+OKd0vumouFgE6NzonQeVbRaTYjum4ql38TdmOYfyJ0czw/X1a9b" crossorigin="anonymous"> </script></head>   <body>   </body> </html>

The CSS will be included as part of the bundle.js file.

This will not work for files loaded from external servers, nor should it, as cross-origin files that need to constantly update would break with subresource integrity enabled.

Thanks for reading!

That’s all for this one. Subresource integrity is a simple and effective addition to ensure you’re loading only what you expect and protecting your users; and remember, security is more than just one solution, so always be on the lookout for more ways to keep your website safe.


The post Securing Your Website With Subresource Integrity appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, , ,
[Top]

What Google’s New Page Experience Update Means for Images on Your Website

It’s easy to forget that, as a search engine, Google doesn’t just compare keywords to generate search results. Google knows that if people don’t enjoy their experience on a web page, they won’t stay on the page long enough to consume the content — no matter how relevant it is.

As a result, Google has been experimenting with ways to analyze the user experience of web pages using quantifiable metrics. Factoring these into its search engine rankings, it’s hoped to provide users not only with great, relevant content but with awesome user experiences as well.

Google’s soon-to-be-launched page experience update is a major step in this direction. Website owners with image-heavy websites need to be particularly vigilant to adapt to these changes or risk falling by the wayside. In this article, we’ll talk about everything you need to know regarding this update, and how you can take full advantage of it.

Note: Google introduced their plans for Page Experience in May 2020 and announced in November 2020 that it will begin rolling out in May 2021. However, Google has since delayed their plans for a gradual rollout starting mid-Jun 2021. This was done in order to give website admins time to deal with the shifting conditions brought about by the COVID-19 pandemic first.

Some Background

Before we get into the latest iteration of changes to how Google factors user experience metrics into search engine rankings, let’s get some context. In April 2020, Google made its most pivotal move in this direction yet by introducing a new initiative: core web vitals.

Core web vitals (CWV) were introduced to help web developers deal with the challenges of trying to optimize for search engine rankings using testable metrics – something that’s difficult to do with a highly subjective thing like user experience.

To do this, Google identified three key metrics (what it calls “user-centric performance metrics”). These are:

  1. LCP (Largest Contentful Paint): The largest element above the fold when a web page initially loads. Typically, this is a large featured image or header. How fast the largest content element loads plays a huge role in how fast the user perceives the overall loading speed of the page.
  2. FID (First Input Delay): The time it takes between when a user first interacts with the page and when the main thread is free for the browser to process the event. This can be clicking/tapping a button, link, or interacting with any other dynamic element. Delays when interacting with a page can obviously be frustrating to users which is why keeping FID low is crucial.
  3. Cumulative Layout Shift (CLS): This calculates the visual stability of a page when it first loads. The algorithm takes the size of the elements and the distance they move relevant to the viewport into account. Pages that load with high instability can cause miscues by users, also leading to frustrating situations.

These metrics have evolved from more rudimentary ones that have been in use for some time, such as SI (Speed Index), FCP (First Contentful Paint), TTI (Time-to-interactive), etc.

The reason this is important is because images can play a significant role in how your website’s CWVs score. For example, the LCP is more often than not an above-the-fold image or, at the very least, will have to compete with an image to be loaded first. Images that aren’t correctly used can also negatively impact CLS. Slow-loading images can also impact the FID by adding further delays to the overall rendering of the page.

What’s more, this came on the back of Google’s renewed focus on mobile-first indexing. So, not only are these metrics important for your website, but you have to ensure that your pages score well on mobile devices as well.

It’s clear that, in general, Google is increasingly prioritizing user experience when it comes to search engine rankings. Which brings us to the latest update – Google now plans to incorporate page experience as a ranking factor, starting with an early rollout in mid-June 2021.

So, what is page experience? In short, it’s a ranking signal that combines data from a number of metrics to try and determine how good or bad the user experience of a web page is. It consists of the following factors:

  • Core Web Vitals: Using the same, unchanged, core web vitals. Google has established guidelines and recommended rankings that you can find here. You need an overall “good” CWV rating to qualify for a “good” page experience score.
  • Mobile Usability: A URL must have no mobile usability errors to qualify for a “good” page experience score.
  • Security Issues: Any flagged security issues will disqualify websites.
  • HTTPS: Pages must be served via HTTPS to qualify.
  • Ad Experience: Measures to what degree ads negatively affect the user experience on your web page, for example, by being intrusive, distracting, etc.

As part of this change, Google announced its intention to include a visual indicator, or badge, that highlights web pages that have passed its page experience criteria. This will be similar to previous badges the search engine has used to promote AMP (Accelerated Mobile Pages) or mobile-friendly pages.

This official recognition will give high-performing web pages a massive advantage in the highly competitive arena that is Google’s SERPs. This visual cue will undoubtedly boost CTRs and organic traffic, especially for sites that already rank well. This feature may drop as soon as May if it passes its current trial phase.

Another bit of good news for non-AMP users is that all pages will now become eligible for Top Stories in both the browser and Google News app. Although Google states that pages can qualify for Top Stories “irrespective of its Core Web Vitals score or page experience status,” it’s hard to imagine this not playing a role for eligibility now or down the line.

Key Takeaway: What Does This Mean For Images on Your Website?

Google noted a 70% surge in consumer usage of their Lighthouse and PageSpeed Insight tools, showing that website owners are catching up on the importance of optimizing their pages. This means that standards will only become higher and higher when competing with other websites for search engine rankings.

Google has reaffirmed that, despite these changes, content is still king. However, content is more than just the text on your pages, and truly engaging and user-friendly content also consists of thoughtfully used media, the majority of which will likely be images.

With the proposed page experience badges and Top Stories eligibility up for grabs, the stakes have never been higher to rank highly with the Google Search algorithm. You need every advantage that you can get. And, as I’m about to show, optimizing your image assets can have a tangible effect on scoring well according to these metrics.

What Can You Do To Keep Up?

Before I propose my solution to help you optimize image assets for core web vitals, let’s look at why images are often detrimental to performance:

  • Images bloat the overall size of your website pages, especially if the images are unoptimized (i.e. uncompressed, not properly sized, etc.)
  • Images need to be responsive to different devices. You need much smaller image sizes to maintain the same visual quality on smaller screens.
  • Different contexts (browsers, OSs, etc.) have different formats for optimally rendering images. However, most images are still used in .JPG/.PNG format.
  • Website owners don’t always know about the best practices associated with using images on website pages, such as always explicitly specifying width/height attributes.

Using conventional methods, it can take a lot of blood, sweat, and tears to tackle these issues. Most solutions, such as manually editing images and hard-coding responsive syntax have inherent issues with scalability, the ability to easily update/adjust to changes, and bloat your development pipeline.

To optimize your image assets, particularly with a focus on improving CWVs, you need to:

  • Reduce image payloads
  • Implement effective caching
  • Speed up delivery
  • Transform images into optimal next-gen formats
  • Ensure images are responsive
  • Implement run-time logic to apply the optimal setting in different contexts

Luckily, there is a class of tools designed specifically to solve these challenges and provide these solutions — image CDNs. Particularly, I want to focus on ImageEngine which has consistently outperformed other CDNs on page performance tests I’ve conducted.

ImageEngine is an intelligent, device-aware image CDN that you can use to serve your website images (including GIFs). ImageEngine uses WURFL device detection to analyze the context your website pages are accessed from (device, screen size, DPI, OS, browser, etc.) and optimize your image assets accordingly. Based on these criteria, it can optimize images by intelligently resizing, reformatting, and compressing them.

It’s a completely automatic, set-it-and-forget-it solution that requires little to no intervention once it’s set up. The CDN has over 20 global PoPs with the logic built into the edge servers for faster across different regions. ImageEngine claims to achieve cache-hit ratios of as high as 98%+ as well as reduce image payloads by 75%+.

Step-by-Step Test + How to Use ImageEngine to Improve Page Experience

To illustrate the difference using an image CDN like ImageEngine can make, I’ll show you a practical test.

First, let’s take a look at how a page with a massive amount of image content scores using PageSpeed Insights. It’s a simple page, but consists of a large number of high-quality images with some interactive elements, such as buttons and links as well as text.

FID is unique because it relies on data from real-world interactions users have with your website. As a result, FID can only be collected “in the field.” If you have a website with enough traffic, you can get the FID by generating a Page Experience Report in the Google Console.

However, for lab results, from tools like Lighthouse or PageSpeed Insights, we can surmise the impact of blocking resources by looking at TTI and TBT.

Oh, yes, and I’ll also be focussing on the results of a mobile audit for a number of reasons:

  1. Google themselves are prioritizing mobile signals and mobile-first indexing
  2. Optimizing web pages and images assets are often most challenging for mobile devices/general responsiveness
  3. It provides the opportunity to show the maximum improvement a image CDN can provide

With that in mind, here are the results for our page:

So, as you can see, we have some issues. Helpfully, PageSpeed Insights flags the two CWVs present, LCP and CLS. As you can see, because of the huge image payload (roughly 35 MB), we have a ridiculous LCP of nearly 1 minute.

Because of the straightforward layout and the fact that I did explicitly give images width and height attributes, our page happened to be stable with a 0 CLS. However, it’s important to realize that slow loading images can also impact the perceived stability of your pages. So, even if you can’t directly improve on CLS, the faster sizable elements such as images load, the better the overall experience for real-world users.

TTI and TBT were also sub-par. It will take at least two  seconds from the moment the first element appears on the page until when the page can start to become interactive.

As you can see from the opportunities for improvement and diagnostics, almost all issues were image-related:

Setting Up ImageEngine and Testing the Results

Ok, so now it’s time to add ImageEngine into the mix and see how it improves performance metrics on the same page.

Setting up ImageEngine on nearly any website is relatively straightforward. First, go to ImageEngine.io and signup for a free trial. Just follow the simple 3-step signup process where you will need to:

  1. provide the website you want to optimize, 
  2. the web location where your images are stored, and then 
  3. copy the delivery address ImageEngine assigns to you.

The latter will be in the format of {random string}.cdn.imgeng.in but can be updated from within the ImageEngine dashboard.

To serve images via this domain, simply go back to your website markup and update the <img> src attributes. For example:

From:

<img src=”mywebsite.com/images/header-image.jpg”/>

To:

<img src=”myimages.cdn.imgeng.in/images/header-image.jpg”/>

That’s all you need to do. ImageEngine will now automatically pull your images and dynamically optimize them for best results when visitors view your website pages. You can check the official integration guides in the documentation on how to use ImageEngine with Magento, Shopify, Drupal, and more. There is also an official WordPress plugin.

Here’s the results for my ImageEngine test page once it’s set up:

As you can see, the results are nearly flawless. All metrics were improved, scoring in the green – even Speed Index and LCP which were significantly affected by the large images.

As a result, there were no more opportunities for improvement. And, as you can see, ImageEngine reduced the total page payload to 968 kB, cutting down image content by roughly 90%:

Conclusion

To some extent, it’s more of the same from Google who has consistently been moving in a mobile direction and employing a growing list of metrics to hone in on the best possible “page experience” for its search engine users. Along with reaffirming their move in this direction, Google stated that they will continue to test and revise their list of signals.

Other metrics that can be surfaced in their tools, such as TTFB, TTI, FCP, TBT, or possibly entirely new metrics may play even larger roles in future updates.

Finding solutions that help you score highly for these metrics now and in the future is key to staying ahead in this highly competitive environment. While image optimization is just one facet, it can have major implications, especially for image-heavy sites.

An image CDN like ImageEngine can solve almost all issues related to image content, with minimal time and effort as well as future proof your website against future updates.


The post What Google’s New Page Experience Update Means for Images on Your Website appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, , , , , ,
[Top]

A Whole Website in a Single HTML File

I can’t stop thinking about this site. It looks like a pretty standard fare; a website with links to different pages. Nothing to write home about except that… the whole website is contained within a single HTML file.

What about clicking the navigation links, you ask? Each link merely shows and hides certain parts of the HTML.

<section id="home">   <!-- home content goes here --> </section> <section id="about">   <!-- about page goes here --> </section>

Each <section> is hidden with CSS:

section { display: none; }

Each link in the main navigation points to an anchor on the page:

<a href="#home">Home</a> <a href="#about">About</a>

And once you click a link, the <section> for that particular link is displayed via:

section:target { display: block; }

See that :target pseudo selector? That’s the magic! Sure, it’s been around for years, but this is a clever way to use it for sure. Most times, it’s used to highlight the anchor on the page once an anchor link to it has been clicked. That’s a handy way to help the user know where they’ve just jumped to.

Anyway, using :target like this is super smart stuff! It ends up looking like just a regular website when you click around:

Direct Link to ArticlePermalink


The post A Whole Website in a Single HTML File appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, , , ,
[Top]

Tech Stacks and Website Longevity

Steren Giannini in “My stack will outlive yours”:

My stack requires no maintenance, has perfect Lighthouse scores, will never have any security vulnerability, is based on open standards, is portable, has an instant dev loop, has no build step and… will outlive any other stack.

Jeremy Keith in “npm ruin dev”:

Instead of reaching for all-singing all-dancing toolchain by default, I’m going to start with a boring baseline. If and when that becomes too painful or unwieldy, then I’ll throw in a task manager. But every time I add a dependency, I’ll be limiting the lifespan of the project.

I like both of those sentiments.

Steren’s “stack” is HTML and CSS only. Will HTML and CSS “last” in the sense of that website being online and working for a long time. I’d say certainly yes. HTML and CSS were around before I got here, are actively developed, and no other technologies are really even trying to unseat them. The closest threats are native platforms, but those are so fractured, closed, while lacking the worldwide utility of the URL and open standards, that it doesn’t look like that any native platform will unseat the web. It’s more likely (and we see this happening, even if it’s slow and fraught) that native platforms embrace the web.

Will an HTML and CSS website be perfectly functional in, say, 2041? I’d say certainly. I’ll bet ya a dollar.

Steren doesn’t mean that HTML and CSS is just the output, but there is still other tooling to get there. No build process either. No templating. Need to update the navigation?

So… if I don’t use any templating system, how do I update my header, footer or nav? Well, simply by using the “Replace in files” feature of any good text editor. They don’t need frequent updates anyway. The benefits of using a templating system is not worth the cost of introducing the tooling it requires.

I admit this is drawing the line further back than I would. This feels just like trading one kind of technical debt for another. Now you’ll need to write scripts or an elaborate find-and-replace RegEx to do what you want to do, rather than reach for some form of HTML include, which there are a ton of ways to handle lightly.

But I get it. Especially since once you do add that one templating language (or whatever), the temptation is strong to keep adding to the system, introducing more and more liabilities with less consideration on how they may be “limiting the lifespan” of the project.

I don’t actually think the stack matters that much.

In thinking about sites I work on (and have worked on), the longevity of the site doesn’t feel particularly related to the stack. Like, at all. The sites with the longest lifespans (like this one) have long lifespans because I care about them, and they have all sorts of moving parts in the stack.

I pick technology to help with what I want to do. If my needs change, I change the technology. I don’t just say, ooops, my stack is off, I guess I’ll shut down the website forever.

If we’re talking about website longevity, I think the breakdown of how much things matter is more like this:

  • 80% How much I care about the website
  • 10% The website isn’t a financial burden
  • 5% The website isn’t a mental burden (“the stack” being some small part of this)
  • 5% I have access to the registrant and didn’t forget to renew the domain name before a squatter nabs it

The post Tech Stacks and Website Longevity appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, , ,
[Top]

Chapter 3: The Website

Previously in web history…

Berners-Lee, motivated by his own curiosity, creates the World Wide Web at CERN. He releases its technologies to the public domain, which enables the development of several new browsers for every operating system. Mosaic proves to the most popular, and its introduction of color images directly inline in content changes fundamentally the way people think about the web.

The very first website was about the web. That kind of thing is not all that unusual. The first email sent to another person was about email As technology progresses, we may have lost a bit of theatrics. The first telegraph, for instance, read “WHAT HATH GOD WROUGHT.” However, in most cases, telecommunication firsts follow this meta template.

Anyway, the first website was instructive for a reason. If you were a brand new web user, it is the first thing you would see. If that page didn’t manage to convince you the web was worth sinking a bit of time into, then that was the end of the story. You’d go and check out Gopher instead. So, as a starting point for new web users, the first website was critical.

The URL was info.cern.ch. Its existence on the CERN server should be of no surprise. The first website was created by the web’s inventor, Tim Berners-Lee, while he was still working there.

It was a simple page. A list of headers and links — to download web browser code, find out more info about the web, and get all of the technical details — was divided only by short descriptions o f each section. One link brought you to a list of websites. Berners-Lee collected a list of links that were sent to him, or plucked them from mailing lists whenever he found them. Every time he found a link he added it to the CERN website, loosely organized by category. It was a short list. In July of 1993, there were still only about 130 websites in the world.

(A few years back, some enterprising folks took it upon themselves to re-create the first website at CERN. So you can go and browse it now, just as it was then.)

As far as websites go, it was noting spectacular. The language was plain enough, though a bit technical. The instructions were clear, as long as you had some background in programming or computers. The web before the web was difficult to explain. The primary goal of the website was to prompt a bit of exploration from those who visited it. By that measure, it was successful.

But Berners-Lee never meant for the CERN website to be the most important page on the web. It was just there to serve as an example for others to recreate in their own image.

Tim Berners-Lee also created the first browser. It gave users the ability to both read — and crucially to publish — websites. In his conception, each consumer of the web would have their own personal homepage. The homepage could be anything. For most people, he thought, it would likely be a private place to store personal bookmarks or jot down notes. Others might chose to publish their site for the public, using it as an opportunity to introduce themselves, or explore some passion (similar to what services like Geocities would offer later). Berners-Lee imagined that when you opened your browser, any browser, your own homepage would be the first thing that you saw.

By the time other browsers hit the market, the publishing capabilities faded away. People were left to simply surf, and not to author, the web. For the earliest of web users, the CERN website remained a popular destination. With usage still growing, it was the best place to find a concise list of websites. But if the web was going to succeed — truly succeed — it was going to have to be more than links. The web was going to need to find its utility.

Fortunately Berners-Lee had created the URL. Anyone could create a website. Heck, he’d even post a link to it.


“Louise saw the web as a godsend,” Berners-Lee wrote in his personal retelling of the web’s history. The Louise in question is Louise Addis, librarian at SLAC for over 40 years before she retired in the mid-90s. Along with Paul Kunz, Tony Johnson, and several others, she helped create the first web server in the United States and one of the most influential websites of the early web. She would later put it a bit differently. “The Web was a revolution!” That may be true, but it wouldn’t have been a revolution if not for what she helped create.

As we found in the first chapter, Berners-Lee’s curiosity led him on a path to set information free. Louise Addis was also curious. Her curiosity led her to try to connect people to that information. She studied International Relations at Stanford University only to bounce around at a few jobs and land herself back at her alma mater working for a secret research lab known simply as Project M in 1960. Though she had no experience in the field, she worked there as a librarian, eventually moving up to head librarian. After a couple of years, the lab would go public and become formally known as the Stanford Linear Accelerator Center, or SLAC.

SLAC’s primary mission was to advance the research of American scientists in the wake of World War II. It houses a two-mile long linear accelerator, the longest in the world. SLAC recruits scientists across a broad set of fields, but its primary focus is particle physics. It has produced a number of Nobel prizes and has shared groundbreaking new discoveries across the world.

Research is at the center of the work done at SLAC. While she was there, Addis was relentless in her quest to connect her peers with research. When she learned that there wasn’t a good system for keeping track of the multitude of authors attributed to particle physics papers (some had over 1,000 authors on a single paper), she picked up a bit of programming with no formal training. “If I needed to know something, I asked someone to show me how to do a particular task. Then I went back to the Library and tried it on my own.”

A couple of years after she discovered the web, Addis would start the first unofficial tech support group for web newcomers known as the WWW Wizards. The Wizards worked — mostly in their spare time — to help new web users come online. They were a profoundly important resource for the early web. Addis continually made it her mission to help people find the information they needed.

She used her ad-hoc programming experience in the late 1960’s to create the SPIRES-HEP database, a digital library with hundreds of thousands of bibliographic records for particle physics papers. It is still in use today, though it’s newest iteration is called INSPIRE-HEP. The SPIRES-HEP database was a foundational resource. If you were a particle physics researcher anywhere in the world, you would be accessing it frequently. It ran on an IBM mainframe that looked like this:

An IBM mainframe console from the 70's

The mainframe used a very specific programming language also developed by IBM, which has since gone into disuse. Locked inside was a very well organized bibliography of research papers. Accessing it was another thing entirely. There were a few ways to do that.

The first required a bit of programming knowledge. If you were savvy enough, you could log directly into the SPIRES-HEP database remotely and, using the database-specific SPIRES query language, pull the records you needed directly from the mainframe. This was the quickest option, but required the most technical know-how and a healthy dose of tenacity. Let’s consider this method the high bar.

The middle bar was an interface built by SLAC researcher Paul Kunz that let you email the server to pull out the records you needed. You still needed to know the SPIRES query language, but it solved the remote access part of the equation.

The low bar was to email or message a librarian at SLAC so they could pull the record for you and send it back. The easiest bar to clear, this was the method that most people used. Which meant that the most widely accessed particle physics database in the world was beset by a bottleneck of librarians at SLAC who needed to ferry bibliographic records back and forth from researchers.

The SPIRES-HEP database was invaluable, but widespread access remained its largest obstacle.


For a second time in the web’s history, the NeXT computer played an important role in its fate. For a computer that was short-lived, and largely unheard of, it is a key piece of the web’s history.

Like Tim Berners-Lee, SLAC physicist Paul Kunz, creator of the SPIRES-HEP instant messaging and email service, used a NeXT computer. When Berners-Lee called him into his office on one of his visits, Berners-Lee invited him into his office. The only reason Kunz agreed to go was to see how somebody else was using a NeXT computer. While he was there, Berners-Lee showed Kunz the web. And then Kunz went back to SLAC and showed the web to Addis.

Kunz and Addis were both enthusiastic purveyors of research at SLAC. They each played their part in advancing information discovery. When Kunz told Addis about the web, they both had the same idea about what to do with it. SLAC was going to need a website. Kunz built a web server at Stanford — the first in the United States. Addis, meanwhile, wrangled a few colleagues to help her build the SLAC website. The site launched on December 12, 1991, a year after Berners-Lee first published his own website at CERN.

Most of the programmers and researchers that began tinkering on the web in the early days were drawn by a nerdy fascination. They liked to play around with browsers, mess around with some code. The website was, in some cases, the mere after-effect of a technological experiment. That wasn’t the case for Addis. The draw of the web wasn’t its technology. It was what it enabled her to do.

The SLAC website started out with two links. The first one let you search through a list of phone numbers at SLAC. That link wasn’t all that interesting. (But it was a nice nod to the web’s origin. The most practical early use of the web was as an Internet-enabled phonebook at CERN.) The second link was far more interesting. It was labeled “HEP.” Clicking on it brought you to a simple page with a single text field. Type a query into that field, click Enter and you got live results of records directly from the SPIRES-HEP database. And that was the SLAC website. Its primary purpose was to act as an interface in front of the SPIRES-HEP database and pull down queried results.

When Berners-Lee demoed the SLAC website a couple of months later at a conference, it was met with wild applause, practically a standing ovation.

The importance was obviously not lost on that audience. No longer would researchers be forced to wrestle with complicated programming languages, or emails to SLAC librarians. The SLAC website took the low bar of access for the SPIRES-HEP database and dropped it all the way to the floor. It made searching the database easy (and within a couple of years, it would even add links to downloadable PDFs).

The SLAC website, nothing more than a searchable bibliography, was the beginning of something on the web. Physicists began using it, and it rebounded from one research lab to the next. The web’s first micro-explosion happened the day Berners-Lee demoed the site. It began reverberating around the physics community, and then outside of it.

SLAC was the website that showed what the would could do. GNN was going to be the first that made the web look good doing it.


Global Network Navigator was going to be exciting. A bold experiment on and with the web. The web was a wall of research notes and scientific diagrams; plain black text on stark white backgrounds as far as the eye could see. GNN would change that. It would be fun. Lively. Interactive.

That was the pitch made to designer Jennifer Robbins by O’Reilly co-founder Dale Dougherty in 1993. Robbins’ mind immediately jumped to the possibilities of this incredible, new, digital medium.

She met with another O’Reilly employee, Rob Raisch. A couple of years after that pitch, Raisch would propose one of the first examples of a stylesheet. At the time, he was just the person at the company who happened to know the most about the web, which had only recently cracked a hundred total sites. When Robbins walked into his office, the first thing he said to her was: “You know, you probably can’t do what you want.” He had a point. The language of the web was limiting. But the GNN team was going to find a way around that.

GNN was the brainchild of Dale Dougherty. By the early 90s, Dougherty had become a minor celebrity for experiments just like this one. From the early days of O’Reilly media, the book publisher he co-founded, he was always cooking up some project or another.

Wherever technology is going, Dougherty has a knack for being there first. At one conference early on in O’Reilly’s history, he sold self-printed copies of a Unix manual for $ 5 apiece just before Unix exploded on the scene. After spending decades in book publishing, he’s recently turned his attention to the maker culture. He has been called a godfather of the Maker movement.

That was no less true for the web. He became one of the web’s earliest adopters and its most prolific early champion. He brought together Tim Beners-Lee and the developers of NCSA Mosaic, including Marc Andreessen, for the first time in a meeting in Cambridge. That meeting would eventually lead to the creation of the W3C. He’d be responsible for early experiments with web advertising, basically on the first day advertising was allowed. He would later coin the term Web 2.0, in the wake of transformation after the dot-com boom. Dougherty loved the web.

But staring at the web for the first time in the early 90s, he didn’t exactly know what to do with it. His first thought was to put a book on the web. After all, O’Reilly had a gigantic back catalog, and the web was mostly text. But Dougherty knew that the web’s greatest asset was the hyperlink. He needed a book that could act as a springboard to bring people to different parts of the web. He found it in the newly-published bestseller by author Ed Krol, The Whole Internet User’s Guide and Catalog. The book was a guided tour through the technologies of the Internet. It had a paragraph on the web. Not exactly a lot, but enough for Dougherty to make the connection.

Dougherty had recruited Pei-Yuan Wei, creator of the popular ViolaWWW browser to make an earlier version of an interactive Internet guide. But he pulled a together a production team — led by managing editor Gina Blaber — of writers, designers, programmers, and sales staff. They launched GNN, the web’s first true commercial website, in early 1993.

GNN was created before any other commercial websites, before blogs, and online magazines. Digital publishing was something new altogether. As a result, GNN didn’t quite know what it wanted to be. It operated somewhere between a portal and a magazine. Navigating the site was an exercise in tumbling down one rabbit hole after another.

In one section, the site included the Whole Internet Catalog repurposed and ported to the web. Contained within were pages upon pages of best-of lists; collections of popular websites sorted into categories like finance, literature and cooking.

Another section, labeled GNN Magazine, jumped to a different group of sortable webpages known as metacenters. These were, in the website’s own description, “special-interest magazines that gather together the best Internet resources on topics such as travel, music, education, and computers. Each metacenter contains articles, columns, reference guides, and discussion groups.” Though conceptually similar to modern day media portals, the nickname “metacenter” never truly caught on. The site’s content and design was produced and maintained by the GNN staff. Not to be outdone by their print predecessors, GNN magazine contained interviews, features, biographies, and explainers. One hyperlink after another.

Over time, GNN would expand to affiliated publications. When the Mosaic team got too busy working on the web’s most popular browser, they handed off their browser homepage to the GNN team. The page was called What’s New, and it featured the most interesting links around the web for the day. The GNN seized the opportunity to expand their platform even further.

Explaining what GNN was to someone who had never heard of the web, let alone a website, was an onerous task. Blaber explained GNN as giving “users a way to navigate through the information highway by providing insightful editorial content, easy point-and-click commands, and direct electronic links to information resources.” That’s a meaningful description of the site. It was a way into the web, one that wasn’t as fractured or unorganized as jumping in blind. It was also, however, the kind of thing you needed to see to understand.

And it was something to see. Years before stylesheets and armed with nothing but a handful of HTML tags, the GNN team set about creating the most ambitious project with the web medium yet. Browsers had only just begun allowing inline graphics, and GNN took full advantage. The homepage in particular featured big colorful graphics, including the hot air balloon that would endure for years as the GNN logo. They laid out their pages meticulously — most pages had a unique design. They used images as headers to break up the page. Most pages featured large graphics, and colored text and backgrounds. Wherever the envelope was, they’d push it a little further.

The result: a brand new kind of interactive experience. The web was a sea of plain websites with no design mostly coming from research institutions and colleges. Before Mosaic, bold graphics and colors weren’t even possible. And even after Mosaic’s release, the web was mostly filled with dense websites of scrolling text with nothing more than scientific diagrams to break it up, or sparse websites with a link, an email and a phone number. Most sites had nothing in the way of hierarchy or interactivity. Content was difficult to follow unless it was exactly what you were looking for. There was a ton of information on the web, but no one had thought to organize it to any meaningful degree. Imagine seeing all of that, day after day, and then one day you click a link and come to this:

Screenshot of what GNN looked like when it launched in 1993, with its famous hot air balloon logo

It looks dated now, but a splash page with bold colors and big graphics, organized into sections and layered with interesting content… that was something to see.

The GNN team was creating the rules of web design, a field that had yet to be invented. In the first few years of the web, there were some experiments. The Vatican had scanned a number of materials from its archives and put them on a website. The Exploratorium took that one step further, creating the first online museum, with downloadable sounds and pictures. But they were still very much constrained by the simplicity of the web experience. Click this link, download this file, and that was it. GNN began to take things further. Dale Dougherty recalls that their goal was to “shift from the Internet as command line retrieval to the internet as this more digital interface… like a book.” A perfectly reasonable goal for a book publisher but a tall order for the web.

To accomplish their goal, GNN’s staff used the rules of graphic design as a roadmap (as philosopher Marshall McLuhan once said, “the content of any medium is always another medium”). But the team was also writing a brand new rulebook, on the fly, as they went. There were open questions about how to handle web graphics, new patterns for designing user interfaces, and best practices for writing HTML. Once the team closed one loop, they moved on to the next one. It was as if they writing the manual for flying a rocketship — while strapped to the wings and hurtling towards space.

As browsers got better, GNN evolved to take advantage of the latest design possibilities. They began to use image maps to make more complex navigation. They added font tags and frames. GNN was also the first site on the web with a sponsored link, and even that was careful and considered. Before the popup would plague our browsing experience, GNN created simple, unobtrusive, informational adverts inserted in between their other listings.

GNN provided a template for the commercial web. As soon as they launched, dozens of copycats quickly followed. Many adopted a similar style and tone. Within a few years, web portals and online magazines would become so common they were considered trite and uninteresting. But very few sites that followed it had the lasting impact GNN did on a new generation of digital designers.


Ranjit Bhatnagar has an offbeat sort of humor. He’s a philosopher and a musician. He’s smart. He’s a fan of the weird and the banal. He’s anti-consumerist, or at the very least, opposed to consumerist culture. I won’t go as far as to say he’s pedantic, but he certainly revels in the most minute of details. He enjoys lively debates and engaged discourse. He’s fascinated by dreams, and once had a dream where he was flying through the air with his mother taking in the sights.

I’ve never met Bhatnagar. I know all of this because I read it on his website. Anyone can. And his website started with lunch.

Bhatnagar’s website was called Ranjit’s HTTP Playground. Playground describes it rather well; hyperlinks are scattered across the homepage like so many children’s toys. One link takes you to a half-finished web experiment. Another takes you to a list of his favorite bookmarks arranged by category. Yet another might contain a rant about the web, or a long-winded tribute to Kinder eggs. If you’re in the mood for a debate you can post your own thoughts to a page devoted to the single question: Are nuts wood? There’s still no consensus on that one.

Browsing Ranjit’s HTTP Playgroundis like peeling back the layers of Bhatnagar’s brain. He added new entries to his site pretty regularly, never more than a sentence or two, arranged in a series of dated bullet points. Pages were laid out on garish backgrounds, scalding bright green on jet black, or surrounded by a dizzying dance of animated GIFs. Each page was littered with links to more pages, seemingly at random. Every time you think you’ve reached the end of a thread, there’s another link to click. And every once in a while, you’ll find yourself back on the homepage wondering how you got there and how much time had passed in the meantime. This was the magic of the early web.

Bhatnagar first published his website in late 1993, just a few months after the GNN website went up. The very first thing Bhatnagar posted to his website was what he ordered for lunch every day. It was arranged in reverse chronological order, his most recent lunch order right at the top.

SLAC captured the utility of the web. GNN realized its popular appeal. Bhatnagar, and others like him, made the web personal.

Claudio Pinhanez began adding daily entries to the MIT Media Lab website in 1994. He posted movie and book reviews, personal musings, and shared his favorite links. He followed the same format as Bhatnagar’s Lunch Server. Entries were arranged on the page in reverse chronological order. Each entry was short and to the point — no longer than a sentence or two. This movie was good. This meal was bad. Isn’t it interesting that… and so on.

In early 1995, Carolyn Burke began posting daily entries to her website in one of the earliest examples of an online diary. Each one was a small slice from her life. The posts were longer than the short-burst of Pinhanez and Bhatnagar. Burke took her time with narrative anecdotes and meandering asides. She was loquacious and insightful. Her writing was conversational, and she promised readers that she would be honest. “I notice now that I have held back in being frank. My academic analysis skills come out, and I write with them things that I’ve known for a long time,” she wrote in an entry from the first few months, “But this is therapy for me… honesty and freedom therapy. Wow, that’s a loaded word. freedom.

Perhaps no site was more honest, or more free as Burke puts it, than Links from the Underground. Its creator, Swarthmore undergraduate Justin Hall, had transformed inviting others into his life into an art form. What began as a simple link dump quickly transformed into a network of short stories and poems, diary entries, and personal details from his own life. The layout of the site matched that of Bhatnagar, scattered and unorganized. But his tone was closer to Burke’s, long and deeply, deeply personal. Just about every day, Hall would post to his website. It was his daily inner monologue made public.

Sometimes, he would cross a line. If you were a friend of Justin’s, he might share a secret that you told him in confidence, or disparage you on a fully public post. But he also shared the most intimate details from his own life, from dorm room drama to his greatest fears and inadequacies. He told stories from his troubled past, and publicly tried to come to terms with an alcoholic father. His good humor was often tinged with tragedy. He was clearly working through something emotional and personally profound, and he was using the web to do it out in the open.

But for Hall, this was all in the service of something far greater than himself. Describing the web to newcomers in a documentary about his experience on the web, Hall’s primary message was about its ability to create — not to tear down — connections.

What’s so great about the web is I was able to go out there and talk about what I care about, what I feel strongly about and people responded to it. Because every high school’s got a poet, whether it’s a rich high school or a poor high school, you know, they got somebody that’s in to writing, that’s in to getting people to tell their stories. You give them access to this technology and all of a sudden they’re telling stories to people in Israel, to people in Japan, to people in their own town that they never would have been able to talk to. And that’s, you know, that’s a revolution.

There’s that word again. Revolution. Though coming at the web from very different places, Addis and Hall agreed on at least one thing. I would venture to guess that they agreed on a whole lot more.

Justin Hall became a presence on the web not soon forgotten by those that came across him. He’s had two documentaries made about him (one of which he made himself). He’s appeared on talk shows. He’s toured the country. He’s had very public mental breakdowns. But he believed deeply that the web meant nothing at all unless it was a place for people to share their own stories.

When Tim Berners-Lee first imagined the web, he believed that everybody would have their own homepage. He designed his first browser with authoring capabilities for just that reason. That dream never came true. But Hall and Burke and Bhatnagar channeled a similar idea when they decided to make the web personal. They created their own homepages, even if it meant having to spend a few hours, or a few weeks, learning HTML.

Within a couple of years, the web filled up with these homepages. There were some notable breakthrough websites, like when David Farley began posting daily webcomics to Doctor Fun or VJ Adam Curry co-opted the MTV website to post his own personal brand of music entertainment. There were extreme examples. In 1996, Jennifer Ringley stuck a webcam in her room and beamed images every few seconds, so anyone could watch her entire life in real time. She called it Jennicam, a name that would ultimately lead to the moniker cam girl. Ringley appeared on talk shows and became an overnight sensation for her strange website that let others peer directly into her world.

But mostly, homepages acted as a creative outlet — short biographies, photo albums of families and pets, short stories, status updates. There were a lot of diaries. People posted their art, their “hot takes” and their deepest secrets and greatest passions. There were fan pages dedicated to discontinued television shows and boy bands. A dizzying array of style and personality with no purpose other than to simply exist.

Then came the links. At the bottom of a homepage: a list of links to other homepages. Scattered in diary posts, links to other websites. In one entry, Hall might post a link to Bhatnagar’s site, musing about the influence it had on his own website. Bhatnagar’s own site had his own chaotic list of his favorites. Eventually, so did Burke’s. Half the fun of a homepage was obsessing over which others to share.

As the web turned on a moment of connection, the process of discovery became its greatest asset. The fantastic intrigue of clicking on a link and being transported into the world and mind of another person was — in the end — the defining feature of the web. There would be plenty of opportunities to use the web to find something you want or need. The lesson of the homepage is that what people really wanted to find was each other. The web does that better than any technology that has come before it.


At the end of 1993, there were just over 600 websites. One year later, at the end of 1994, there were over 10,000. They no longer fit on a single page on the CERN website maintained by the web’s creator.

The personal website would become the cornerstone of the web. The web would be filled with more applications, like SLAC. And more businesses, like GNN. But it would mostly be filled with people. When the web’s next wave came crashing down, it would become truly social.


The post Chapter 3: The Website appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

,
[Top]

Every Website is an Essay

Every website that’s made me oooo and aaahhh lately has been of a special kind; they’re written and designed like essays. There’s an argument, a playfulness in the way that they’re not so much selling me something as they are trying to convince me of the thing. They use words and type and color in a way that makes me sit up and listen.

And I think that framing our work in this way lets us web designers explore exciting new possibilities. Instead of throwing a big carousel on the page and being done with it, thinking about making a website like an essay encourages us to focus on the tough questions. We need an introduction, we need to provide evidence for our statements, we need a conclusion, etc. This way we don’t have to get so caught up in the same old patterns that we’ve tried again and again in our work.

And by treating web design like an essay, we can be weird with the design. We can establish a distinct voice and make it sound like an honest-to-goodness human being wrote it, too.

One example of a website-as-an-essay is the Analogue Pocket site which uses real paragraphs to market their fancy new device.

Another example is the new email app Hey in which the website is nothing but paragraphs — no screenshots, no fancy product information. It’s almost feels like a political manifesto hammered onto a giant wooden door.

Apple’s marketing sites are little essays, too. Take this one section from the iPad Pro all about the LiDAR Scanner. It’s not so much trying to sell you an iPad at this point so much as it is trying to argue the case for LiDAR. And as with all good essays it answers the who, what, why, when, and how.

Another example is Stripe’s recent beautiful redesign. What I love more than the outrageously gorgeous animated gradients is the argument that the website is making. What is Stripe? How can I trust them? How easy is it to get set up? Who, what, why, when, how.

To be my own devil’s advocate for a bit though, we’re all familiar with this line of reasoning: Why care about the writing so much when people don’t read? Folks skim through a website. They don’t persevere with the text, they don’t engage with the writing, and you only have half a millisecond to hit them with something flashy before they leave. They can’t handle complex words or sentences. They can’t grasp complex ideas. So keep those paragraphs short! Remove all text from the page!

The implication here is that users are dumb. They can’t focus and they don’t care. You have to shout at them. And I kinda sorta hate that.

Instead, I think the opposite is true. They’ve seen the same boring websites for years. Everyone is tired of lifeless, humorless copywriting. They’ve seen all the animations, witnessed all the cool fonts, and in the face of all that stuff, they yawn. They yawn because it supports a bad argument, or more precisely, a bad essay; one that doesn’t charm the reader, or give them a reason to care.

So what if we made our websites more like essays and less like billboards that dot the freeways? What would that look like?


The post Every Website is an Essay appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, ,
[Top]

In Defense of a Fussy Website

The other day I was doom-scrolling twitter, and I saw a delightful article titled “The Case for Fussy Breakfasts.” I love food and especially breakfast, and since the pandemic hit I’ve been using my breaks in between meetings (or sometimes on meetings, shh) to make a full bacon, poached egg, vegetable plate, so I really got into the article. This small joy of creating a bit of space for myself for the most important meal of the day has been meaningful to me — while everything else feels out of control, indulging in some ceremony has done a tiny part to offset the intensity of our collective situation.

It caused me to think of this “fussiness” as applied to other inconsequential joys. A walk. A bath. What about programming?

While we’re all laser-focused on shipping the newest feature with the hottest software and the best Lighthouse scores, I’ve been missing a bit of the joy on the web. Apps are currently conveying little care for UX, guidance, richness, and — well, for humans trying to communicate through a computer, we’re certainly bending a lot to… the computer.

I’m getting a little tired of the web being seen as a mere document reader, and though I do love me a healthy lighthouse score, some of these point matrixes seem to live and die more by our developer ego in this gamification than actually considering what we can do without incurring much weight. SVGs can be very small while still being impactful. Some effects are tiny bits of CSS. JS animations can be lazy-loaded. You can even dazzle with words, color, and layout if you’re willing to be a bit adventurous, no weight at all!

A few of my favorite developer sites lately have been Josh Comeau, Johnson Ogwuru and Cassie Evans. The small delights and touches, the little a-ha moments, make me STAY. I wander around the site, exploring, learning, feeling actually more connected to each of these humans rather than as if I’m glancing at a PDF of their resume. They flex their muscles, show me the pride they have in building things, and it intrigues me! These small bits are more than the fluff that many portray any “excess” as: they do the job that the web is intending. We are communicating using this tool- the computer- as an extension of ourselves.

Nuance can be challenging. It’s easy as programmers to get stuck in absolutes, and one of these of late has been that if you’re having any bit of fun, any bit of style, that must mean it’s “not useful.” Honestly, I’d make the case that the opposite is true. Emotions attach to the limbic system, making memories easier to recall. If your site is a flat bit of text, how will anyone remember it?

Don’t you want to build the site that teams in companies the world over remember and cite as an inspiration? I’ve been at four different companies where people have mentioned Stripe as a site they would aspire to be like. Stripe took chances. Stripe told stories. Stripe engaged the imagination of developers, spoke directly to us.

I’m sad acknowledging the irony that after thinking about how spot on Stripe was, most of those companies ignored much of what they learned while exploring it. Any creativity, risk, and intention was slowly, piece by piece, chipped away by the drumbeat of “usefulness,” missing the forest for the trees.

When a site is done with care and excitement you can tell. You feel it as you visit, the hum of intention. The craft, the cohesiveness, the attention to detail is obvious. And in turn, you meet them halfway. These are the sites with the low bounce rates, the best engagement metrics, the ones where they get questions like “can I contribute?” No gimmicks needed.

What if you don’t have the time? Of course, we all have to get things over the line. Perhaps a challenge: what small thing can you incorporate that someone might notice? Can you start with a single detail? I didn’t start with a poached egg in my breakfast, one day I made a goofy scrambled one. It went on from there. Can you challenge yourself to learn one small new technique? Can you outsource one graphic? Can you introduce a tiny easter egg? Say something just a little differently from the typical corporate lingo?

If something is meaningful to you, the audience you’ll gather will likely be the folks that find it meaningful, too.

The post In Defense of a Fussy Website appeared first on CSS-Tricks.

CSS-Tricks

, ,
[Top]

How to Add Lunr Search to your Gatsby Website

The Jamstack way of thinking and building websites is becoming more and more popular.

Have you already tried Gatsby, Nuxt, or Gridsome (to cite only a few)? Chances are that your first contact was a “Wow!” moment — so many things are automatically set up and ready to use. 

There are some challenges, though, one of which is search functionality. If you’re working on any sort of content-driven site, you’ll likely run into search and how to handle it. Can it be done without any external server-side technology? 

Search is not one of those things that come out of the box with Jamstack. Some extra decisions and implementation are required.

Fortunately, we have a bunch of options that might be more or less adapted to a project. We could use Algolia’s powerful search-as-service API. It comes with a free plan that is restricted to non-commercial projects with  a limited capacity. If we were to use WordPress with WPGraphQL as a data source, we could take advantage of WordPress native search functionality and Apollo Client. Raymond Camden recently explored a few Jamstack search options, including pointing a search form directly at Google.

In this article, we will build a search index and add search functionality to a Gatsby website with Lunr, a lightweight JavaScript library providing an extensible and customizable search without the need for external, server-side services. We used it recently to add “Search by Tartan Name” to our Gatsby project tartanify.com. We absolutely wanted persistent search as-you-type functionality, which brought some extra challenges. But that’s what makes it interesting, right? I’ll discuss some of the difficulties we faced and how we dealt with them in the second half of this article.

Getting started

For the sake of simplicity, let’s use the official Gatsby blog starter. Using a generic starter lets us abstract many aspects of building a static website. If you’re following along, make sure to install and run it:

gatsby new gatsby-starter-blog https://github.com/gatsbyjs/gatsby-starter-blog cd gatsby-starter-blog gatsby develop

It’s a tiny blog with three posts we can view by opening up http://localhost:8000/___graphql in the browser.

Showing the GraphQL page on the localhost installation in the browser.

Inverting index with Lunr.js 🙃

Lunr uses a record-level inverted index as its data structure. The inverted index stores the mapping for each word found within a website to its location (basically a set of page paths). It’s on us to decide which fields (e.g. title, content, description, etc.) provide the keys (words) for the index.

For our blog example, I decided to include all titles and the content of each article. Dealing with titles is straightforward since they are composed uniquely of words. Indexing content is a little more complex. My first try was to use the rawMarkdownBody field. Unfortunately, rawMarkdownBody introduces some unwanted keys resulting from the markdown syntax.

Showing an attempt at using markdown syntax for links.

I obtained a “clean” index using the html field in conjunction with the striptags package (which, as the name suggests, strips out the HTML tags). Before we get into the details, let’s look into the Lunr documentation.

Here’s how we create and populate the Lunr index. We will use this snippet in a moment, specifically in our gatsby-node.js file.

const index = lunr(function () {   this.ref('slug')   this.field('title')   this.field('content')   for (const doc of documents) {     this.add(doc)   } })

 documents is an array of objects, each with a slug, title and content property:

{   slug: '/post-slug/',   title: 'Post Title',   content: 'Post content with all HTML tags stripped out.' }

We will define a unique document key (the slug) and two fields (the title and content, or the key providers). Finally, we will add all of the documents, one by one.

Let’s get started.

Creating an index in gatsby-node.js 

Let’s start by installing the libraries that we are going to use.

yarn add lunr graphql-type-json striptags

Next, we need to edit the gatsby-node.js file. The code from this file runs once in the process of building a site, and our aim is to add index creation to the tasks that Gatsby executes on build. 

CreateResolvers is one of the Gatsby APIs controlling the GraphQL data layer. In this particular case, we will use it to create a new root field; Let’s call it LunrIndex

Gatsby’s internal data store and query capabilities are exposed to GraphQL field resolvers on context.nodeModel. With getAllNodes, we can get all nodes of a specified type:

/* gatsby-node.js */ const { GraphQLJSONObject } = require(`graphql-type-json`) const striptags = require(`striptags`) const lunr = require(`lunr`)  exports.createResolvers = ({ cache, createResolvers }) => {   createResolvers({     Query: {       LunrIndex: {         type: GraphQLJSONObject,         resolve: (source, args, context, info) => {           const blogNodes = context.nodeModel.getAllNodes({             type: `MarkdownRemark`,           })           const type = info.schema.getType(`MarkdownRemark`)           return createIndex(blogNodes, type, cache)         },       },     },   }) }

Now let’s focus on the createIndex function. That’s where we will use the Lunr snippet we mentioned in the last section. 

/* gatsby-node.js */ const createIndex = async (blogNodes, type, cache) => {   const documents = []   // Iterate over all posts    for (const node of blogNodes) {     const html = await type.getFields().html.resolve(node)     // Once html is resolved, add a slug-title-content object to the documents array     documents.push({       slug: node.fields.slug,       title: node.frontmatter.title,       content: striptags(html),     })   }   const index = lunr(function() {     this.ref(`slug`)     this.field(`title`)     this.field(`content`)     for (const doc of documents) {       this.add(doc)     }   })   return index.toJSON() }

Have you noticed that instead of accessing the HTML element directly with  const html = node.html, we’re using an  await expression? That’s because node.html isn’t available yet. The gatsby-transformer-remark plugin (used by our starter to parse Markdown files) does not generate HTML from markdown immediately when creating the MarkdownRemark nodes. Instead,  html is generated lazily when the html field resolver is called in a query. The same actually applies to the excerpt that we will need in just a bit.

Let’s look ahead and think about how we are going to display search results. Users expect to obtain a link to the matching post, with its title as the anchor text. Very likely, they wouldn’t mind a short excerpt as well.

Lunr’s search returns an array of objects representing matching documents by the ref property (which is the unique document key slug in our example). This array does not contain the document title nor the content. Therefore, we need to store somewhere the post title and excerpt corresponding to each slug. We can do that within our LunrIndex as below:

/* gatsby-node.js */ const createIndex = async (blogNodes, type, cache) => {   const documents = []   const store = {}   for (const node of blogNodes) {     const {slug} = node.fields     const title = node.frontmatter.title     const [html, excerpt] = await Promise.all([       type.getFields().html.resolve(node),       type.getFields().excerpt.resolve(node, { pruneLength: 40 }),     ])     documents.push({       // unchanged     })     store[slug] = {       title,       excerpt,     }   }   const index = lunr(function() {     // unchanged   })   return { index: index.toJSON(), store } }

Our search index changes only if one of the posts is modified or a new post is added. We don’t need to rebuild the index each time we run gatsby develop. To avoid unnecessary builds, let’s take advantage of the cache API:

/* gatsby-node.js */ const createIndex = async (blogNodes, type, cache) => {   const cacheKey = `IndexLunr`   const cached = await cache.get(cacheKey)   if (cached) {     return cached   }   // unchanged   const json = { index: index.toJSON(), store }   await cache.set(cacheKey, json)   return json }

Enhancing pages with the search form component

We can now move on to the front end of our implementation. Let’s start by building a search form component.

touch src/components/search-form.js 

I opt for a straightforward solution: an input of type="search", coupled with a label and accompanied by a submit button, all wrapped within a form tag with the search landmark role.

We will add two event handlers, handleSubmit on form submit and handleChange on changes to the search input.

/* src/components/search-form.js */ import React, { useState, useRef } from "react" import { navigate } from "@reach/router" const SearchForm = ({ initialQuery = "" }) => {   // Create a piece of state, and initialize it to initialQuery   // query will hold the current value of the state,   // and setQuery will let us change it   const [query, setQuery] = useState(initialQuery)      // We need to get reference to the search input element   const inputEl = useRef(null)    // On input change use the current value of the input field (e.target.value)   // to update the state's query value   const handleChange = e => {     setQuery(e.target.value)   }      // When the form is submitted navigate to /search   // with a query q paramenter equal to the value within the input search   const handleSubmit = e => {     e.preventDefault()     // `inputEl.current` points to the mounted search input element     const q = inputEl.current.value     navigate(`/search?q=$ {q}`)   }   return (     <form role="search" onSubmit={handleSubmit}>       <label htmlFor="search-input" style={{ display: "block" }}>         Search for:       </label>       <input         ref={inputEl}         id="search-input"         type="search"         value={query}         placeholder="e.g. duck"         onChange={handleChange}       />       <button type="submit">Go</button>     </form>   ) } export default SearchForm

Have you noticed that we’re importing navigate from the @reach/router package? That is necessary since neither Gatsby’s <Link/> nor navigate provide in-route navigation with a query parameter. Instead, we can import @reach/router — there’s no need to install it since Gatsby already includes it — and use its navigate function.

Now that we’ve built our component, let’s add it to our home page (as below) and 404 page.

/* src/pages/index.js */ // unchanged import SearchForm from "../components/search-form" const BlogIndex = ({ data, location }) => {   // unchanged   return (     <Layout location={location} title={siteTitle}>       <SEO title="All posts" />       <Bio />       <SearchForm />       // unchanged

Search results page

Our SearchForm component navigates to the /search route when the form is submitted, but for the moment, there is nothing behing this URL. That means we need to add a new page:

touch src/pages/search.js 

I proceeded by copying and adapting the content of the the index.js page. One of the essential modifications concerns the page query (see the very bottom of the file). We will replace allMarkdownRemark with the LunrIndex field. 

/* src/pages/search.js */ import React from "react" import { Link, graphql } from "gatsby" import { Index } from "lunr" import Layout from "../components/layout" import SEO from "../components/seo" import SearchForm from "../components/search-form" 
 // We can access the results of the page GraphQL query via the data props const SearchPage = ({ data, location }) => {   const siteTitle = data.site.siteMetadata.title      // We can read what follows the ?q= here   // URLSearchParams provides a native way to get URL params   // location.search.slice(1) gets rid of the "?"    const params = new URLSearchParams(location.search.slice(1))   const q = params.get("q") || "" 
   // LunrIndex is available via page query   const { store } = data.LunrIndex   // Lunr in action here   const index = Index.load(data.LunrIndex.index)   let results = []   try {     // Search is a lunr method     results = index.search(q).map(({ ref }) => {       // Map search results to an array of {slug, title, excerpt} objects       return {         slug: ref,         ...store[ref],       }     })   } catch (error) {     console.log(error)   }   return (     // We will take care of this part in a moment   ) } export default SearchPage export const pageQuery = graphql`   query {     site {       siteMetadata {         title       }     }     LunrIndex   } `

Now that we know how to retrieve the query value and the matching posts, let’s display the content of the page. Notice that on the search page we pass the query value to the <SearchForm /> component via the initialQuery props. When the user arrives to the search results page, their search query should remain in the input field. 

return (   <Layout location={location} title={siteTitle}>     <SEO title="Search results" />     {q ? <h1>Search results</h1> : <h1>What are you looking for?</h1>}     <SearchForm initialQuery={q} />     {results.length ? (       results.map(result => {         return (           <article key={result.slug}>             <h2>               <Link to={result.slug}>                 {result.title || result.slug}               </Link>             </h2>             <p>{result.excerpt}</p>           </article>         )       })     ) : (       <p>Nothing found.</p>     )}   </Layout> )

You can find the complete code in this gatsby-starter-blog fork and the live demo deployed on Netlify.

Instant search widget

Finding the most “logical” and user-friendly way of implementing search may be a challenge in and of itself. Let’s now switch to the real-life example of tartanify.com — a Gatsby-powered website gathering 5,000+ tartan patterns. Since tartans are often associated with clans or organizations, the possibility to search a tartan by name seems to make sense. 

We built tartanify.com as a side project where we feel absolutely free to experiment with things. We didn’t want a classic search results page but an instant search “widget.” Often, a given search keyword corresponds with a number of results — for example, “Ramsay” comes in six variations.  We imagined the search widget would be persistent, meaning it should stay in place when a user navigates from one matching tartan to another.

Let me show you how we made it work with Lunr.  The first step of building the index is very similar to the gatsby-starter-blog example, only simpler:

/* gatsby-node.js */ exports.createResolvers = ({ cache, createResolvers }) => {   createResolvers({     Query: {       LunrIndex: {         type: GraphQLJSONObject,         resolve(source, args, context) {           const siteNodes = context.nodeModel.getAllNodes({             type: `TartansCsv`,           })           return createIndex(siteNodes, cache)         },       },     },   }) } const createIndex = async (nodes, cache) => {   const cacheKey = `LunrIndex`   const cached = await cache.get(cacheKey)   if (cached) {     return cached   }   const store = {}   const index = lunr(function() {     this.ref(`slug`)     this.field(`title`)     for (node of nodes) {       const { slug } = node.fields       const doc = {         slug,         title: node.fields.Unique_Name,       }       store[slug] = {         title: doc.title,       }       this.add(doc)     }   })   const json = { index: index.toJSON(), store }   cache.set(cacheKey, json)   return json }

We opted for instant search, which means that search is triggered by any change in the search input instead of a form submission.

/* src/components/searchwidget.js */ import React, { useState } from "react" import lunr, { Index } from "lunr" import { graphql, useStaticQuery } from "gatsby" import SearchResults from "./searchresults" 
 const SearchWidget = () => {   const [value, setValue] = useState("")   // results is now a state variable    const [results, setResults] = useState([]) 
   // Since it's not a page component, useStaticQuery for quering data   // https://www.gatsbyjs.org/docs/use-static-query/   const { LunrIndex } = useStaticQuery(graphql`     query {       LunrIndex     }   `)   const index = Index.load(LunrIndex.index)   const { store } = LunrIndex   const handleChange = e => {     const query = e.target.value     setValue(query)     try {       const search = index.search(query).map(({ ref }) => {         return {           slug: ref,           ...store[ref],         }       })       setResults(search)     } catch (error) {       console.log(error)     }   }   return (     <div className="search-wrapper">       // You can use a form tag as well, as long as we prevent the default submit behavior       <div role="search">         <label htmlFor="search-input" className="visually-hidden">           Search Tartans by Name         </label>         <input           id="search-input"           type="search"           value={value}           onChange={handleChange}           placeholder="Search Tartans by Name"         />       </div>       <SearchResults results={results} />     </div>   ) } export default SearchWidget

The SearchResults are structured like this:

/* src/components/searchresults.js */ import React from "react" import { Link } from "gatsby" const SearchResults = ({ results }) => (   <div>     {results.length ? (       <>         <h2>{results.length} tartan(s) matched your query</h2>         <ul>           {results.map(result => (             <li key={result.slug}>               <Link to={`/tartan/$ {result.slug}`}>{result.title}</Link>             </li>           ))}         </ul>       </>     ) : (       <p>Sorry, no matches found.</p>     )}   </div> ) export default SearchResults

Making it persistent

Where should we use this component? We could add it to the Layout component. The problem is that our search form will get unmounted on page changes that way. If a user wants to browser all tartans associated with the “Ramsay” clan, they will have to retype their query several times. That’s not ideal.

Thomas Weibenfalk has written a great article on keeping state between pages with local state in Gatsby.js. We will use the same technique, where the wrapPageElement browser API sets persistent UI elements around pages. 

Let’s add the following code to the gatsby-browser.js. You might need to add this file to the root of your project.

/* gatsby-browser.js */ import React from "react" import SearchWrapper from "./src/components/searchwrapper" export const wrapPageElement = ({ element, props }) => (   <SearchWrapper {...props}>{element}</SearchWrapper> )

Now let’s add a new component file:

touch src/components/searchwrapper.js

Instead of adding SearchWidget component to the Layout, we will add it to the SearchWrapper and the magic happens. ✨

/* src/components/searchwrapper.js */ import React from "react" import SearchWidget from "./searchwidget" 
 const SearchWrapper = ({ children }) => (   <>     {children}     <SearchWidget />   </> ) export default SearchWrapper

Creating a custom search query

At this point, I started to try different keywords but very quickly realized that Lunr’s default search query might not be the best solution when used for instant search.

Why? Imagine that we are looking for tartans associated with the name MacCallum. While typing “MacCallum” letter-by-letter, this is the evolution of the results:

  • m – 2 matches (Lyon, Jeffrey M, Lyon, Jeffrey M (Hunting))
  • ma – no matches
  • mac – 1 match (Brighton Mac Dermotte)
  • macc – no matches
  • macca – no matches
  • maccal – 1 match (MacCall)
  • maccall – 1 match (MacCall)
  • maccallu – no matches
  • maccallum – 3 matches (MacCallum, MacCallum #2, MacCallum of Berwick)

Users will probably type the full name and hit the button if we make a button available. But with instant search, a user is likely to abandon early because they may expect that the results can only narrow down letters are added to the keyword query.

 That’s not the only problem. Here’s what we get with “Callum”:

  • c – 3 unrelated matches
  • ca – no matches
  • cal – no matches
  • call – no matches
  • callu – no matches
  • callum – one match 

You can see the trouble if someone gives up halfway into typing the full query.

Fortunately, Lunr supports more complex queries, including fuzzy matches, wildcards and boolean logic (e.g. AND, OR, NOT) for multiple terms. All of these are available either via a special query syntax, for example: 

index.search("+*callum mac*")

We could also reach for the index query method to handle it programatically.

The first solution is not satisfying since it requires more effort from the user. I used the index.query method instead:

/* src/components/searchwidget.js */ const search = index   .query(function(q) {     // full term matching     q.term(el)     // OR (default)     // trailing or leading wildcard     q.term(el, {       wildcard:         lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING,     })   })   .map(({ ref }) => {     return {       slug: ref,       ...store[ref],     }   })

Why use full term matching with wildcard matching? That’s necessary for all keywords that “benefit” from the stemming process. For example, the stem of “different” is “differ.”  As a consequence, queries with wildcards — such as differe*, differen* or  different* — all result in no matches, while the full term queries differe, differen and different return matches.

Fuzzy matches can be used as well. In our case, they are allowed uniquely for terms of five or more characters:

q.term(el, { editDistance: el.length > 5 ? 1 : 0 }) q.term(el, {   wildcard:     lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING, })

The handleChange function also “cleans up” user inputs and ignores single-character terms:

/* src/components/searchwidget.js */   const handleChange = e => {   const query = e.target.value || ""   setValue(query)   if (!query.length) {     setResults([])   }   const keywords = query     .trim() // remove trailing and leading spaces     .replace(/*/g, "") // remove user's wildcards     .toLowerCase()     .split(/s+/) // split by whitespaces   // do nothing if the last typed keyword is shorter than 2   if (keywords[keywords.length - 1].length < 2) {     return   }   try {     const search = index       .query(function(q) {         keywords           // filter out keywords shorter than 2           .filter(el => el.length > 1)           // loop over keywords           .forEach(el => {             q.term(el, { editDistance: el.length > 5 ? 1 : 0 })             q.term(el, {               wildcard:                 lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING,             })           })       })       .map(({ ref }) => {         return {           slug: ref,           ...store[ref],         }       })     setResults(search)   } catch (error) {     console.log(error)   } }

Let’s check it in action:

  • m – pending
  • ma – 861 matches
  • mac – 600 matches
  • macc – 35 matches
  • macca – 12 matches
  • maccal – 9 matches
  • maccall – 9 matches
  • maccallu – 3 matches
  • maccallum – 3 matches

Searching for “Callum” works as well, resulting in four matches: Callum, MacCallum, MacCallum #2, and MacCallum of Berwick.

There is one more problem, though: multi-terms queries. Say, you’re looking for “Loch Ness.” There are two tartans associated with  that term, but with the default OR logic, you get a grand total of 96 results. (There are plenty of other lakes in Scotland.)

I wound up deciding that an AND search would work better for this project. Unfortunately, Lunr does not support nested queries, and what we actually need is (keyword1 OR *keyword*) AND (keyword2 OR *keyword2*). 

To overcome this, I ended up moving the terms loop outside the query method and intersecting the results per term. (By intersecting, I mean finding all slugs that appear in all of the per-single-keyword results.)

/* src/components/searchwidget.js */ try {   // andSearch stores the intersection of all per-term results   let andSearch = []   keywords     .filter(el => el.length > 1)     // loop over keywords     .forEach((el, i) => {       // per-single-keyword results       const keywordSearch = index         .query(function(q) {           q.term(el, { editDistance: el.length > 5 ? 1 : 0 })           q.term(el, {             wildcard:               lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING,           })         })         .map(({ ref }) => {           return {             slug: ref,             ...store[ref],           }         })       // intersect current keywordSearch with andSearch       andSearch =         i > 0           ? andSearch.filter(x => keywordSearch.some(el => el.slug === x.slug))           : keywordSearch     })   setResults(andSearch) } catch (error) {   console.log(error) }

The source code for tartanify.com is published on GitHub. You can see the complete implementation of the Lunr search there.

Final thoughts

Search is often a non-negotiable feature for finding content on a site. How important the search functionality actually is may vary from one project to another. Nevertheless, there is no reason to abandon it under the pretext that it does not tally with the static character of Jamstack websites. There are many possibilities. We’ve just discussed one of them.

And, paradoxically in this specific example, the result was a better all-around user experience, thanks to the fact that implementing search was not an obvious task but instead required a lot of deliberation. We may not have been able to say the same with an over-the-counter solution.

The post How to Add Lunr Search to your Gatsby Website appeared first on CSS-Tricks.

CSS-Tricks

, , ,
[Top]