webpack has a lot of features to help you achieve smaller bundles and control the loading priority of resources. The most compelling of them is code splitting, which provides a way to split your code into various bundles that can then be loaded on demand or in parallel. Another one is performance hints which indicates when emitted bundle sizes cross a specified threshold at build time so that you can make optimizations or remove unnecessary code.
The default behavior for production builds in webpack is to show a warning when an asset size or entry point is over 250KB (244KiB) in size, but you can configure how performance hints are shown and size thresholds through the performance object in your webpack.config.js file.
We will walk through this feature and how to leverage it as a first line of defense against performance regressions.
First, we need to set a custom budget
The default size threshold for assets and entry points (where webpack looks to start building the bundle) may not always fit your requirements, but they can be configured to.
For example, my blog is pretty minimal and my budget size is a modest 50KB (48.8KiB) for both assets and entry points. Here’s the relevant setting in my webpack.config.js:
Let’s show an error if thresholds are exceeded
webpack’s default warning emits when budget thresholds are exceeded. It’s good enough for development environments but insufficient when building for production. We can trigger an error instead by adding the hints property to the performance object and setting it to 'error':
There are other valid values for the hints property, including 'warning' and false, where false completely disables warnings, even when the specified limits are encroached. I wouldn’t recommend using false in production mode.
We can exclude certain assets from the budget
webpack enforces size thresholds for every type of asset that it emits. This isn’t always a good thing because an error will be thrown if any of the emitted assets go above the specified limit. For example, if we set webpack to process images, we’ll get an error if just one of them crosses the threshold.
The assetFilter property can be used to control the files used to calculate performance hints:
This tells webpack to exclude any file that ends with a .jpg extension when it runs the calculations for performance hints. It’s capable of more complex logic to meet all kinds of conditions for environments, file types, and other resources.
That said, there is an open pull request that should remove this limitation but it is not merged yet. Definitely something to keep an eye on.
It’s so useful to set a performance budget and enforcing one with webpack is something worth considering at the start of any project. It will draw attention to the size of your dependencies and encourage you to look for lighter alternatives where possible to avoid exceeding the budget.
That said, performance budgeting does not end here! Asset size is just one thing of many that affect performance, so there’s still more work to be done to ensure you are delivering an optimal experience. Running a Lighthouse test is a great first step to learn about other metrics you can use as well as suggestions for improvements.
A straightforward post with some perf data from Tomas Pustelnik. It’s a good reminder that CSS is a crucial part of thinking web performance, and for a huge reason:
Any time [the browser] encounters any external resource (CSS, JS, images, etc.) it will assign it a download priority and initiate its download. Priorities are important because some resources are critical to render a page (eg. main stylesheet and JS files) while others may be less important (like images or stylesheets for other media types).
In the case of CSS, this priority is usually high because stylesheets are necessary to create CSSOM (CSS Object Model). To render a webpage browser has to construct both DOM and CSSOM.
That’s why CSS is often referred to as a “blocking” resource. That’s desirable to some degree: we wouldn’t want to see flash-of-unstyled-websites. But we get real perf gains when we make CSS smaller because it’s quicker to download, parse, and apply.
So, you know how many emoji have different skin tones? Emoji skin tones are extremely popular, especially over text and on social media. The raised black fist emoji (✊🏿) was voted “The Most 2020 Emoji” by Emojipedia’s World Emoji Awards.
Each tone is a modifier and many emoji are made up of modifiers and base encodings that map to specific characters. Unfortunately, not every emoji library supports modifiers. But, given their popularity, emoji skin tone modifiers are more than a “nice to have” feature. Plus, they’re a smart way to work because they allow us to write more modular and efficient code.
So that’s what we’re doing in this article: figure out how to work with emoji modifiers programmatically. This way, if you’re ever stuck without skin tone support — or want to create custom variations of other emoji — then you’ll know how!
Meet the Fitzpatrick scale
Skin tone modifiers were officially added to emoji in 2015 as part of Unicode 8.0. They are based on the Fitzpatrick scale, which is a formal classification of human skin tones. The following chart shows how the emoji characters match to Fitzpatrick types:
Skin tone character
In the simplest use case, when one of these characters is appended to an emoji that supports skin tone modifiers, it will change the skin tone of the emoji.
Another way to say that: 👶 +🏽 = 👶🏽
Applying skin tone modifiers with CSS
To swap between emoji skin tones using CSS, we would start with the base emoji character (👶) and then append the skin tone using the ::after pseudo-selector.
In addition to using the rendered emoji characters, we could use the Unicode hex codes instead:
What’s going on here? First, we start with a baby emoji with Fitzpatrick Type 4. We then pass it into the function removeModifier, which searches for any of the skin tone modifiers and removes it from the string. Now that we have the emoji without a modifier, we can add whichever modifier we like.
While this approach works with many emoji, we run into problems when other modifiers are introduced. That’s why we now need to talk about…
ZWJ sequences are most commonly used to add gender modifiers to emoji. For example, a person lifting weights, plus ZWJ, plus the female sign, equals a woman lifting weights (️🏋️ + ♀︎ = 🏋️♀️).
There’s a few important things to need to keep in mind when working with ZWJ sequences:
The sequences are only recommendations. They come from the Unicode Consortium and are not guaranteed to be supported on every platform. If they are not supported by a platform, then a fallback sequence of regular emoji will be displayed instead.
Skin tone modifiers, if present, must be included after the emoji but before the ZWJ.
Some ZWJ sequences include multiple emoji that each have different skin tone modifiers.
Given this information, we need to make the following changes to the previous code example:
The skin tone modifiers need to be inserted immediately after any base emoji rather than simply being appended to the end of the emoji.
If there are multiple emoji in a ZWJ sequence that have skin tone modifiers, then the modifiers will need to be replaced for each of those emoji.
From this example, you may notice the limitation of consistency. The editor view shows each of the characters in a ZWJ sequence separately, with exception to the skin tone modifiers, which are immediately applied to their corresponding emoji. The console or results views, on the other hand, will attempt render the character for the entire sequence.
Support for this will vary by platform. Some editors may attempt to render ZWJ sequences, and not all browsers will support the same sets of ZWJ sequences.
Additionally, adding skin tones in a ZWJ sequence requires knowing what’s being used as the base emoji. While this would be relatively simple in a situation where the emoji are provided by a known collection, things become more difficult if we want to be able to handle arbitrary input from a user.
Also, be aware that the CSS solutions in this post are not compatible with ZWJ sequences.
Questions to guide development
I put some questions together you may want to ask yourself when you’re designing a system that needs to handle emoji skin tone modifiers:
Do I have control over which emoji my system will interact with?
Does my emoji library have information about which emoji support skin tone modifiers?
Does my system need to add, remove, or change the modifiers?
Does my platform support ZWJ sequences? If so, which ones?
The irony is that while the concept is simple, that simplicity can be the cause of complexity.
Netlify, the company largely behind Jamstack, knows this. They know that without a back-end server with back-end languages, something like a basic contact form gets complicated. Instead of being in no-brainer solved-problem territory, we have to figure out another way to process that form. So, they solve that problem for you (among others, like auth and serverless functions). But there are tons of other companies that want to be that cog in your machine.
That’s just one potential complication. What do you use for a CMS or other data storage? What is your build process like? How do you see previews of content changes? How do you do auth? What if you need some fancy calendar widget? What if you want to sell something? Anything a website can do, Jamstack has an answer for — it’s just that combining all those answers can feel disjointed and potentially confusing.
So my little mashup, which was supposed to be just 3 technologies ended up exposing me to ~20 different technologies and had me digging into nth-level dependency source code after midnight. If there’s an allegory for what I don’t like about modern-day web development, this is it. You want to use three tools, but you have to know how to use twenty tools instead. If modules and components are like LEGO, then this is dumping out the entire bin on the floor just to find one tiny piece you need.
“The tangled webs we weave,” indeed.
In a conversation between Richard MacManus and Matt Mullenweg¹, Richard quotes Matt:
You can patch together a dozen services, each with its own account and billing, for hundreds of dollars a month, to get a similar result you’d have for a few dollars a month using WordPress on shared hosting,” he said. “And it would be more fragile, because the chain is only as strong as its weakest link. You are chaining together different toolsets, logins, billing, hosting… any part of it going down can break the entire flow.
If I was considering Jamstack for a particular project, and the grand total really was twelve services, I probably would reconsider, particularly if I could reach for a tool like WordPress and bring it down to one. There are plenty of other fair criticisms of Jamstack, particularly since it is early-days. The story of “CMS with Preview” isn’t particularly great, for example, which is a feature you don’t even think about with WordPress because, duh, obviously it has that.
And Jamstack can do some things that are very ahead of the game that I cherish. Git-based deployment? All websites should have that. Previews of my pull requests? Hot damn. Sub -100-millisecond first requests? Yes please. Not having to diddle with cache? Sweet. Catch up, other stacks.
I’m saying there are baby bear choices to be made here. You get there by doing what you’re probably already doing anyway: putting your adult pants on, thinking about what your project needs, and choosing the best option.
I have production WordPress sites. Like this one! It’s great!
I have production Jamstack sites. Like this one! It’s not a complicated web of services. It’s a static site generator with content in the GitHub repo deployed with Netlify. While CSS-Tricks can do about 100 things that this site can’t, it has a few tricks up its sleeve that CSS-Tricks can’t do, like accept pull requests on content.
I feel like I’ve chosen pretty well in all my cases.
Here’s a seven minute video from Caleb Porzio that focuses on some of Emmet‘s HTML editing features. You might think of Emmet as that thing that expands abbreviations like table.stats>tr*3>td*3 into glorious, expanded, and perfect HTML. But Emmet has other HTML editing trickery up its sleeve. My favorite is “wrap with abbreviation” (which happens to be Cmd/Ctrl + Shift + A on CodePen), but there are more, like expanding your selection inward and outward and tag changing.
If you haven’t seen it, the Emmet 2 preview on CodePen is pretty neeeeat. It shows you what you’re about to expand into before you do it:
Caution: Terrible sense of humor ahead. We’ll talk about practical stuff, but the examples pretty much all involve zombies and silly jokes. You have been warned.
I’ll be linking to individual Pens as I discuss the lessons I learned, but if you’d like to get a sense of the entire project, check out 60 days of Animation on Undead Institute. I started this project to end on August 1st, 2020, coinciding with the publication of a book I wrote featuring CSS animation, humor, and zombies — because, obviously, zombies will destroy the world if you don’t brandish your web skills and stop the apocalypse. Nothing puts the hurt on the horde like a HTML element on the move!
I had a few rules for myself throughout the project.
I would hand-code all CSS. (I’m a masochist.)
The user would initiate all of the animation. (I hate coming upon an animation that’s already halfway through.)
Lesson 1: Eighty days is a long time.
Uh, doesn’t the title say “sixty” days? Yes, but my original goal was to do eighty days and as day one approached with less than twenty animations prepared and a three day average for each production, I freaked out and switched to sixty days. That gave me both twenty more days till the beginning date and twenty fewer pieces to do.
Lesson 1A: Sixty days is still a long time.
That’s a lot of animation to do with a limited amount of time, ideas, and even more limited artistic skills. And while I thought of dropping to thirty days, I’m glad I didn’t. Sixty days stretched me and forced me to go deeper into how CSS animation — and by extension, CSS itself — works. I’m also proudest of many of the later pieces I did as my skills increased, and I had to be more innovative and think harder about how to make things interesting. Once you’ve used all the easy options, the actual work and best results begin. (And yes, it ended up being sixty-two days because I started on June 1 and wanted to do a final animation on August 1. Starting June 3 just felt icky and wrong.)
So, the real Lesson 1: stretch yourself.
Lesson 2: Interactive animations are hard, and even harder to make responsive.
If you want something to fly across the screen and connect with another element or appear to start another element’s move, you must use either all standard, inflexible units or all flexible units.
Three variables determine when and where an animated element will be during any animation: duration, velocity, and distance. The duration of the animation is set in the animation property and cannot be changed in relation to screen size. The animation timing function determines the velocity; screen size can’t change that either. Thus, if the distance varies with the screen size, the timing will be off everywhere except a specific screen width and height.
Look at Tank!. Run the animation at wide and narrow screen sizes. While I got the timing close, if you compare the two, you’ll see that the tank is in a different place relative to the zombies when the last zombies fall.
To avoid these timing issues, you can use fixed units and a large number, like 2000 or 5000 pixels or more, so that the animation will cover the width (or height) of the screen for all but the largest monitors.
Lesson 3: If you want a responsive animation, put everything in (one of the) viewport units.
Going halfsies on unit proportions (e.g. setting width and height in pixels, but location and movement with viewport units) will lead to unpredictable results. Don’t use both vw and vh either but one or the other; whichever will be the dominant orientation. Mixing vh and vw units will make your animation go “wonky” which I believe is the technical term.
Take Superbly Zomborrific, for instance. It mixes pixel, vw, and vh units. The premise is that the Super Zombie is flying upward as the “camera” follows. Super Zombie smashes into a ledge and falls as the camera continues, but you wouldn’t understand that if your screen was sufficiently tall.
That also means that if you need something to come in from the top — like I did in Nobody Here But Us Humans —you must set the vw height high enough to ensure that the ninja zombie isn’t visible at most aspect ratios.
Lesson 3A: Use pixel units for movements within an SVG element.
All that said, transforming elements within an SVG element should not use viewport units. SVG tags are their own proportional universe. The SVG “pixel” will stay proportional within the SVG element to all the other SVG element children while viewport units will not. So transform with pixel units within an SVG element, but use viewport units everywhere else.
Lesson 4: SVGs scale horribly at runtime.
For animations, like Oops…, I made the SVG image of the zombie scale up to five times his size, but that makes the edges fuzzy. [Shakes fist at “scalable” vector graphics.]
I learned to set their dimensions to the final dimensions that would be in effect at the end of the animation, then use a scale transform to shrink them down to the size for the start of the animation.
In short, the revised code moves from a scaled-down version of the image up to the full width and height. The browser always renders at 1, making the edges crisp and clean at a scale of 1. So instead of scaling from 1 to 5, I scaled from 0.2 to 1.
Lesson 5: The axis Isn’t a universal truth.
An element’s axes stay in sync with the element, not the page. A 90-degree rotation before a translateX will change the direction of the translateX from horizontal to vertical. In Nobody Here But Us Humans… 2, I flipped the zombies using a 180-degree rotation. But positive Y values move the ninjas towards the top, and negative ones move them towards the bottom (the opposite of normal). Beware of how a rotation may affect transforms further down the line.
Lesson 6. Separate complex animations into concentric elements to make easier adjustments.
When creating a complex animation that moves in multiple directions, adding wrapper divs, or rather parent elements, and animating each one individually will cut down on conflicting transforms, and prevent you from becoming a weepy mess.
For instance, in Space Cadet, I had three different transforms going on. The first is the zomb-o-naut’s moving in an up and down motion. The second is a movement across the screen. The third is a rotation. Rather than trying to do everything in a single transform, I added two wrapping elements and did one animation on each element (I also saved my hair… at least some of it.) This helped avoid the axis issues discussed in the last lesson because I performed the rotation on the innermost element, leaving its parent’s and grandparent’s axes in place.
Lesson 7: SVG and CSS transforms are the same.
Some paths and groups and other SVG elements will already have transforms defined on them. It could be from an optimization algorithm, or perhaps it’s just how the illustration software generates the code. If a path, group, or whatever element in an SVG already has an SVG transform on it, removing that transform will reset the element, often to a bizarre location or size compared to the rest of the drawing.
Since SVG and CSS transforms are the same, any CSS transform you do replaces the SVG transform, meaning your CSS transform will start from that bizarre location or size rather than the location or size that is set in the the SVG.
You can copy the transform from the SVG element to your CSS and set it as the starting position in CSS (updating it to the CSS syntax first, of course). You can then modify it in your CSS animation.
For instance, in Uhhh, Yeah…, my tribute to Office Space, Undead Lumbergh’s right upper arm (the #arm2 element) had a transform on it in the original SVG code.
This process is harder when the tool generating the SVG code attempts to “simplify” the transform into a matrix. While you can recreate the matrix transform by copying it into the CSS, it is a difficult task to do. You’re a better developer than me — which might be true anyway — if you can take a matrix transform and manipulate it to scale, rotate, or translate in the exact way you want.
Alternatively, you can recreate the matrix transform using translation, rotation, and scaling, but if the path is complex, the likelihood that you can recreate it in a timely manner without finding yourself in a straight jacket is low.
The last and probably easiest option is to wrap the element in a group (<g>) tag. Add a class or ID to it for easy CSS access and transform the group itself, thus separating out the transforms as discussed in the last lesson.
Lesson 8: Keep your sanity by using transform-origin when transforming part of an SVG
The CSS transform-origin property moves the point around which the transform happens. If you’re trying to rotate an arm — like I did in Clubbin’ It — your animation will look more natural if you rotate the arm from the center of the shoulder, but that path’s natural transform origin is in the upper-left. Use transform-origin to fix this for smoother, more natural feel… you know that really natural pixel art look…
Transforming the origin can also be useful when scaling, like I did in Mustachioed Oops, or when rotating mouth movements, such as the dinosaur’s jaw in Super Tasty. If you don’t change the origin, the transforms will use an origin point at the upper left corner of the SVG element.
Lesson 9: Sprite animations can be responsive
I ended up doing a lot of sprite animations for this project (i.e., where you use multiple, incremental frames and switch between them fast enough that the characters seem to move). I created the images in one wide file, added them as a background image to an element the size of a single frame, used background-size to set the background image to the width of the image, and hid the overflow. Then I used background-position and the animation timing function, step(), to walk through the images; for example: Post-Apocalyptic Celebrations.
Before the project, I always used inflexible images. I’d scale things down a little so that there would be at least a little responsive give, but I didn’t think you could make it a fully flexible width. However, if you use SVG as the background image you can then use viewport units to scale the element along with the changing screen size. The only problem is the background position. However, if you use viewport units for that, it will stay in sync. Check that out in Finally, Alone with my Sandwich…
Lesson 9A: Use viewport units to set the background size of an image when creating responsive sprite animation
As I’ve learned throughout this project, using a single type of unit is almost always the way to go. Initially, I’d set my sprite’s background size using percentages. The math was easy (100% * (number of steps + 1)) and it worked fine in most cases. In longer animations, however, the exact frame tracking could be off and parts of the wrong sprite frame might display. The problem grows as more frames are added to the sprite.
I’m not sure the exact reason this causes an issue, but I believe it’s because of rounding errors that compound over the length of the sprite sheet (the amount of the shift increases with the number of frames).
For my final animation, It Ain’t Over Till the Zombie Sings, I had a dinosaur open his mouth to reveal a zombie Viking singing (while lasers fired in the background plus there was dancing, accordions playing and zombies fired from cannons, of course). Yeah, I know how to throw a party… a nerd party.
The dinosaur and viking was one of the longest sprite animations I did for the project. But when I used percentages to set the background size, the tracking would be off at certain sizes in Safari. By the end of the animation, part of the dinosaur’s nose from a different frame would appear to the right and a similar part of the nose would be missing on the left.
This was super frustrating to diagnose because it seemed to work fine in Chrome and I’d think I fixed it in Safari only to look at a slightly different screen size and see the frame off again. However, if I used consistent units — i.e. vw for background-size, frame width, and background-position — everything worked fine. Again, it comes down to working with consistent units!
Lesson 10: Invite people into the project.
While I learned tons of things during this process, I beat my head against the wall for most of it (often until the wall broke or my head did… I can’t tell). While that’s one way to do it, even if you’re hard-headed, you’ll still end up with a headache. Invite others into your project, be it for advice, to point out an obvious blind spot you missed, provide feedback, help with the project, or simply to encourage you to keep going when the scope is stupidly and arbitrarily large.
So let me put this lesson into practice. What are your thoughts? How will you stop the zombie hordes with CSS animation? What stupidly and arbitrarily large project will you take on to stretch yourself?
eBay had had enough of these spiders. They were fending them off by the thousands. Their servers buzzed with nonstop activity; a relentless stream of trespassers. One aggressor, however, towered above the rest. Bidder’s Edge, which billed itself as an auction aggregator, would routinely crawl the pages of eBay to extract its content and list it on its own site alongside other auction listings.
The famed auction site had unsuccessfully tried blocking Bidder’s Edge in the past. Like an elaborate game of Whac-A-Mole, they would restrict the IP address of a Bidder’s Edge server, only to be breached once again by a proxy server with a new one. Technology had failed. Litigation was next.
eBay filed suit against Bidder’s Edge in December of 1999, citing a handful of causes. That included “an ancient trespass theory known to legal scholars as trespass to chattels, basically a trespass or interference with real property — objects, animals, or, in this case, servers.” eBay, in other words, was arguing that Bidder’s Edge was trespassing — in the most medieval sense of that word — on their servers. In order for it to constitute trespass to chattels, eBay had to prove that the trespassers were causing harm. That their servers were buckling under the load, they argued, was evidence of that harm.
Judge Ronald M. Whyte found that last bit compelling. Quite a bit of back and forth followed, in one of the strangest lawsuits of a new era that included the phrase “rude robots” entering the official court record. These robots — as opposed to the “polite” ones — ignored eBay’s requests to block spidering on their sites, and made every attempt to circumvent counter measures. They were, by the judge’s estimation, trespassing. Whyte granted an injunction to stop Bidder’s Edge from crawling eBay until it was all sorted out.
Several appeals and countersuits and counter-appeals later, the matter was settled. Bidder’s Edge paid eBay an undisclosed amount and promptly shut their doors. eBay had won this particular battle. They had gotten rid of the robots. But the actual war was already lost. The robots — rude or otherwise — were already here.
If not for Stanford University, web search may have been lost. It is the birthplace of Yahoo!, Google and Excite. It ran the servers that ran the code that ran the first search engines. The founders of both Yahoo! and Google are alumni. But many of the most prominent players in search were not in the computer science department. They were in the symbolic systems program.
Symbolic systems was created at Stanford in 1985 as a study of the “relationship between natural and artificial systems that represent, process, and act on information.” Its interdisciplinary approach is rooted at the intersection of several fields: linguistics, mathematics, semiotics, psychology, philosophy, and computer science.
These are the same fields of study one would find at the heart of artificial intelligence research in the second half of the 20ᵗʰ century. But this isn’t the A.I. in its modern smart home manifestation, but in the more classical notion conceived by computer scientists as a roadmap to the future of computing technology. It is the understanding of machines as a way to augment the human mind. That parallel is not by accident. One of the most important areas of study at the symbolics systems program is artificial intelligence.
Numbered among the alumni of the program are several of the founders of Excite and Srinija Srinivasan, the fourth employee at Yahoo!. Her work in artificial intelligence led to a position at the ambitious A.I. research lab Cyc right out of college.
Marisa Mayer, an early employee at Google and, later, Yahoo!’s CEO, also drew on A.I. research during her time in the symbolic systems program. Her groundbreaking thesis project used natural language processing to help its users find the best flights through a simple conversation with a computer. “You look at how people learn, how people reason, and ask a computer to do the same things. It’s like studying the brain without the gore,” she would later say of the program.
Search on the web stems from this one program at one institution at one brief moment in time. Not everyone involved in search engines studied that program — the founders of both Yahoo! and Google, for instance, were graduate students of computer science. But the ideology of search is deeply rooted in the tradition of artificial intelligence. The goal of search, after all, is to extract from the brain a question, and use machines to provide a suitable answer.
At Yahoo!, the principles of artificial intelligence acted as a guide, but it would be aided by human perspective. Web crawlers, like Excite, would bear the burden of users’ queries and attempt to map websites programmatically to provide intelligent results.
The difference would be a matter of approach. A tension that would come to dominate search for half a decade. The directory versus the crawler. The precision of human influence versus the completeness of machines. Surfers would be on one side and, on the other, spiders. Only one would survive.
The first spiders were crude. They felt around in the dark until they found the edge of the web. Then they returned home. Sometimes they gathered little bits of information about the websites they crawled. In the beginning, they gathered nothing at all.
One of the earliest web crawlers was developed at MIT by Matthew Gray. He used his World Wide Wanderer to go and find every website on the web. He wasn’t interested in the content of those sites, he merely wanted to count them up. In the summer of 1993, the first time he sent his crawler out, it got to 130. A year later, it would count 3,000. By 1995, that number grew to just shy of 30,000.
Like many of his peers in the search engine business, Gray was a disciple of information retrieval, a subset of computer science dedicated to knowledge sharing. In practice, information retrieval often involves a robot (also known as “spiders, crawlers, wanderers, and worms”) that crawls through digital documents and programmatically collects their contents. They are then parsed and stored in a centralized “index,” a shortcut that eliminates the need to go and crawl every document each time a search is made. Keeping that index up to date is a constant struggle, and robots need to be vigilant; going back out and re-crawling information on a near constant basis.
The World Wide Web posed a problematic puzzle. Rather than a predictable set of documents, a theoretically infinite number of websites could live on the web. These needed to be stored in a central index —which would somehow be kept up to date. And most importantly, the content of those sites needed to be connected to whatever somebody wanted to search, on the fly and in seconds. The challenge proved irresistible for some information retrieval researchers and academics. People like Jonathan Fletcher.
He built Jumpstation in 1993, one of the earliest examples of a searchable index. His crawler would go out, following as many links as it could, and bring them back to a searchable, centralized database. Then it would start over. To solve for the issue of the web’s limitless vastness, Fletcher began by crawling only the titles and some metadata from each webpage. That kept his index relatively small, but but it also restricted search to the titles of pages.
Fletcher was not alone. After tinkering for several months, WebCrawler launched in April of 1994 out of the University of Washington. It holds the distinction of being the first search engine to crawl entire webpages and make them searchable. By November of that year, WebCrawler had served 1 million queries. At Carnegie Mellon, Michael Maudlin released his own spider-based search engine variant named for the Latin translation of wolf spider, Lycos. By 1995, it had indexed over a million webpages.
Search didn’t stay in universities long. Search engines had a unique utility for wayward web users on the hunt for the perfect site. Many users started their web sessions on a search engine. Netscape Navigator — the number one browser for new web users — connected users directly to search engines on their homepage. Getting listed by Netscape meant eyeballs. And eyeballs meant lucrative advertising deals.
In the second half of the 1990’s, a number of major players entered the search engine market. InfoSeek, initially a paid search option, was picked up by Disney, and soon became the default search engine for Netscape. AOL swooped in and purchased WebCrawler as part of a bold strategy to remain competitive on the web. Lycos was purchased by a venture capitalist who transformed it into a fully commercial enterprise.
Excite.com, another crawler started by Stanford alumni and a rising star in the search engine game for its depth and accuracy of results, was offered three million dollars not long after they launched. Its six co-founders lined up two couches, one across from another, and talked it out all night. They decided to stick with the product and bring in a new CEO. There would be many more millions to be made.
AltaVista, already a bit late to the game at the end of 1995, was created by the Digital Equipment Corporation. It was initially built to demonstrate the processing power of DEC computers. They quickly realized that their multithreaded crawler was able to index websites at a far quicker rate than their competitors. AltaVista would routinely deploy its crawlers — what one researcher referred to as a “brood of spiders” — to index thousands of sites at a time.
As a result, AltaVista was able to index virtually the entire web, nearly 10 million webpages at launch. By the following year, in 1996, they’d be indexing over 100 million. Because of the efficiency and performance of their machines, AltaVista was able to solve the scalability problem. Unlike some of their predecessors, they were able to make the full content of websites searchable, and they re-crawled sites every few weeks, a much more rapid pace than early competitors, who could take months to update their index. They set the standard for the depth and scope of web crawlers.
Never fully at rest, AltaVista used its search engine as a tool for innovation, experimenting with natural language processing, translation tools, and multi-lingual search. They were often ahead of their time, offering video and image search years before that would come to be an expected feature.
Those spiders that had not been swept up in the fervor couldn’t keep up. The universities hosting the first search engines were not at all pleased to see their internet connections bloated with traffic that wasn’t even related to the university. Most universities forced the first experimental search engines, like Jumpstation, to shut down. Except, that is, at Stanford.
Stanford’s history with technological innovation begins in the second half of the 20th century. The university was, at that point, teetering on the edge of becoming a second-tier institution. They had been losing ground and lucrative contracts to their competitors on the East Coast. Harvard and MIT became the sites of a groundswell of research in the wake of World War II. Stanford was being left behind.
In 1951, in a bid to reverse course on their downward trajectory, Dean of Engineering Frederick Terman brokered a deal with the city of Palo Alto. Stanford University agreed to annex 700 acres of land for a new industrial park that upstart companies in California could use. Stanford would get proximity to energetic innovation. The businesses that chose to move there would gain unique access to the Stanford student body for use on their product development. And the city of Palo Alto would get an influx of new taxes.
Hewlett-Packard was one of the first companies to move in. They ushered in a new era of computing-focused industry that would soon be known as Silicon Valley. The Stanford Research Park (later renamed Stanford Industrial Park) would eventually host Xerox during a time of rapid success and experimentation. Facebook would spend their nascent years there, growing into the behemoth it would become. At the center of it all was Stanford.
The research park transformed the university from one of stagnation to a site of entrepreneurship and cutting-edge technology. It put them at the heart of the tech industry. Stanford would embed itself — both logistically and financially — in the crucial technological developments of the second half of the 20ᵗʰ century, including the internet and the World Wide Web.
The potential success of Yahoo!, therefore, did not go unnoticed.
Jerry Yang and David Filo were not supposed to be working on Yahoo!. They were, however, supposed to be working together. They had met years ago, when David was Jerry’s teaching assistant in the Stanford computer science program. Yang eventually joined Filo as a graduate student and — after building a strong rapport — they soon found themselves working on a project together.
As they crammed themselves into a university trailer to begin working through their doctoral project, their relationship become what Yang has often described as perfectly balanced. “We’re both extremely tolerant of each other, but extremely critical of everything else. We’re both extremely stubborn, but very unstubborn when it comes to just understanding where we need to go. We give each other the space we need, but also help each other when we need it.”
In 1994, Filo showed Yang the web. In just a single moment, their focus shifted. They pushed their intended computer science thesis to the side, procrastinating on it by immersing themselves into the depths of the World Wide Web. Days turned into weeks which turned into months of surfing the web and trading links. The two eventually decided to combine their lists in a single place, a website hosted on their Stanford internet connection. It was called Jerry and David’s Guide to the World Wide Web, launched first to Stanford students in 1993 and then to the world in January of 1994. As catchy as that name wasn’t, the idea (and traffic) took off as friends shared with other friends.
Jerry and David’s Guide was a directory. Like the virtual library started at CERN, Yang and Filo organized websites into various categories that they made up on the fly. Some of these categories had strange or salacious names. Others were exactly what you might expect. When one category got too big, they split it apart. It was ad-hoc and clumsy, but not without charm. Through their classifications, Yang and Filo had given their site a personality. Their personality. In later years, Yang would commonly refer to this as the “voice of Yahoo!”
That voice became a guide — as the site’s original name suggested — for new users of the web. Their web crawling competitors were far more adept at the art of indexing millions of sites at a time. Yang and Filo’s site featured only a small subset of the web. But it was, at least by their estimation, the best of what the web had to offer. It was the cool web. It was also a web far easier to navigate than ever before.
At the end of 1994, Yang and Filo renamed their site to Yahoo! (an awkward forced acronym for Yet Another Hierarchical Officious Oracle). By then, they were getting almost a hundred thousand hits a day, sometimes temporarily taking down Stanford’s internet in the process. Most other universities would have closed down the site and told them to get back to work. But not Stanford. Stanford had spent decades preparing for on-campus businesses just like this one. They kept the server running, and encouraged its creators to stake their own path in Silicon Valley.
Throughout 1994, Netscape had included Yahoo! in their browser. There was a button in the toolbar labeled “Net Directory” that linked directly to Yahoo!. Marc Andreessen, believing in the site’s future, agreed to host their website on Netscape’s servers until they were able to get on steady ground.
Yang and Filo rolled up their sleeves, and began talking to investors. It wouldn’t take long. By the spring of 1996, they would have a new CEO and hold their own record-setting IPO, outstripping even their gracious host, Netscape. By then, they became the most popular destination on the web by a wide margin.
In the meantime, the web had grown far beyond the grasp of two friends swapping links. They had managed to categorize tens of thousands of sites, but there were hundreds of thousands more to crawl. “I picture Jerry Yang as Charlie Chaplin in Modern Times,” one journalist described, “confronted with an endless stream of new work that is only increasing in speed.” The task of organizing sites would have to go to somebody else. Yang and Filo found help in a fellow Stanford alumni, someone they had met years ago while studying abroad together in Japan, Srinija Srinivasan, a graduate of the symbolic systems program. Many of the earliest hires at Yahoo! were given slightly absurd titles that always ended in “Yahoo.” Yang and Filo went by Chief Yahoos. Srinivasan’s job title was Ontological Yahoo.
That is a deliberate and precise job title, and it was not selected by accident. Ontology is the study of being, an attempt to break the world into its component parts. It has manifested in many traditions throughout history and the world, but it is most closely associated with the followers of Socrates, in the work of Plato, and later in the groundbreaking text Metaphysics, written by Aristotle. Ontology asks the question “What exists?”and uses it as a thought experiment to construct an ideology of being and essence.
As computers blinked into existence, ontology found a new meaning in the emerging field of artificial intelligence. It was adapted to fit the more formal hierarchical categorizations required for a machine to see the world; to think about the world. Ontology became a fundamental way to describe the way intelligent machines break things down into categories and share knowledge.
The dueling definitions of the ontology of metaphysics and computer science would have been familiar to Srinija Srinivasan from her time at Stanford. The combination of philosophy and artificial intelligence in her studies gave her a unique perspective on hierarchical classifications. It was this experience that she brought to her first job after college at the Cyc Project, an artificial intelligence research lab with a bold project: to teach a computer common sense.
At Yahoo!, her task was no less bold. When someone looked for something on the site, they didn’t want back a random list of relevant results. They wanted the result they were actually thinking about, but didn’t quite know how to describe. Yahoo! had to — in a manner of seconds — figure out what its users really wanted. Much like her work in artificial intelligence, Srinivasan needed to teach Yahoo! how to think about a query and infer the right results.
To do that, she would need to expand the voice of Yahoo! to thousands of more websites in dozens of categories and sub-categories without losing the point of view established by Jerry and David. She would need to scale that perspective. “This is not a perfunctory file-keeping exercise. This is defining the nature of being,” she once said of her project. “Categories and classifications are the basis for each of our worldviews.”
At a steady pace, she mapped an ontology of human experience onto the site. She began breaking up the makeshift categories she inherited from the site’s creators, re-constituting them into more concrete and findable indexes. She created new categories and destroyed old ones. She sub-divided existing subjects into new, more precise ones. She began cross-linking results so that they could live within multiple categories. Within a few months she had overhauled the site with a fresh hierarchy.
That hierarchical ontology, however, was merely a guideline. The strength of Yahoo!’s expansion lay in the 50 or so content managers she had hired in the meantime. They were known as surfers. Their job was to surf the web — and organize it.
Each surfer was coached in the methodology of Yahoo! but were left with a surprising amount of editorial freedom. They cultivated the directory with their own interests, meticulously deliberating over websites and where they belong. Each decision could be strenuous, and there were missteps and incorrectly categorized items along the way. But by allowing individual personality to dictate hierarchal choices, Yahoo! retained its voice.
They gathered as many sites as they could, adding hundreds each day. Yahoo! surfers did not reveal everything on the web to their site’s visitors. They showed them what was cool. And that meant everything to users grasping for the very first time what the web could do.
At the end of 1995, the Yahoo! staff was watching their traffic closely. Huddled around consoles, employees would check their logs again and again, looking for a drop in visitors. Yahoo! had been the destination for the “Internet Directory” button on Netscape for years. It had been the source of their growth and traffic. Netscape had made the decision, at the last minute (and seemingly at random), to drop Yahoo!, replacing them with the new kids on the block, Excite.com. Best case scenario: a manageable drop. Worst case: the demise of Yahoo!.
But the drop never came. A day went by, and then another. And then a week. And then a few weeks. And Yahoo! remained the most popular website. Tim Brady, one of Yahoo!’s first employees, describes the moment with earnest surprise. “It was like the floor was pulled out in a matter of two days, and we were still standing. We were looking around, waiting for things to collapse in a lot of ways. And we were just like, I guess we’re on our own now.”
Netscape wouldn’t keep their directory button exclusive for long. By 1996, they would begin allowing other search engines to be listed on their browser’s “search” feature. A user could click a button and a drop-down of options would appear, for a fee. Yahoo! bought themselves back in to the drop-down. They were joined by four other search engines, Lycos, InfoSeek, Excite, and AltaVista.
By that time, Yahoo! was the unrivaled leader. It had transformed its first mover advantage into a new strategy, one bolstered by a successful IPO and an influx of new investment. Yahoo! wanted to be much more than a simple search engine. Their site’s transformation would eventually be called a portal. It was a central location for every possible need on the web. Through a number of product expansions and aggressive acquisitions, Yahoo! released a new suite of branded digital products. Need to send an email? Try Yahoo! Mail. Looking to create website? There’s Yahoo! Geocities. Want to track your schedule? Use Yahoo! Calendar. And on and on the list went.
Competitors rushed the fill the vacuum of the #2 slot. In April of 1996, Yahoo!, Lycos and Excite all went public to soaring stock prices. Infoseek had their initial offering only a few months later. Big deals collided with bold blueprints for the future. Excite began positioning itself as a more vibrant alternative to Yahoo! with more accurate search results from a larger slice of the web. Lycos, meanwhile, all but abounded the search engine that had brought them initial success to chase after the portal-based game plan that had been a windfall for Yahoo!.
The media dubbed the competition the “portal wars,” a fleeting moment in web history when millions of dollars poured into a single strategy. To be the biggest, best, centralized portal for web surfers. Any service that offered users a destination on the web was thrown into the arena. Nothing short of the future of the web (and a billion dollar advertising industry) was at stake.
In some ways, though, the portal wars were over before they started. When Excite announced a gigantic merger with @Home, an Internet Service Provider, to combine their services, not everyone thought it was a wise move. “AOL and Yahoo! were already in the lead,” one investor and cable industry veteran noted, “and there was no room for a number three portal.” AOL had just enough muscle and influence to elbow their way into the #2 slot, nipping at the heels of Yahoo!. Everyone else would have to go toe-to-toe with Goliath. None were ever able to pull it off.
Battling their way to market dominance, most search engines had simply lost track of search. Buried somewhere next to your email and stock ticker and sports feed was, in most cases, a second rate search engine you could use to find things — only not often and not well. That’s is why it was so refreshing when another search engine out of Stanford launched with just a single search box and two buttons, its bright and multicolored logo plastered across the top.
A few short years after it launched, Google was on the shortlist of most popular sites. In an interview with PBS Newshour in 2002, co-founder Larry Page described their long-term vision. “And, actually, the ultimate search engine, which would understand, you know, exactly what you wanted when you typed in a query, and it would give you the exact right thing back, in computer science we call that artificial intelligence.”
Google could have started anywhere. It could have started with anything. One employee recalls an early conversation with the site’s founders where he was told “we are not really interested in search. We are making an A.I.” Larry Page and Sergey Brin, the creators of Google, were not trying to create the web’s greatest search engine. They were trying to create the web’s most intelligent website. Search was only their most logical starting point.
Imprecise and clumsy, the spider-based search engines of 1996 faced an uphill battle. AltaVista had proved that the entirety of the web, tens of millions of webpages, could be indexed. But unless you knew your way around a few boolean logic commands, it was hard to get the computer to return the right results. The robots were not yet ready to infer, in Page’s words, “exactly what you wanted.”
Yahoo! had filled in these cracks of technology with their surfers. The surfers were able to course-correct the computers, designing their directory piece by piece rather than relying on an algorithm. Yahoo! became an arbiter of a certain kind of online chic; tastemakers reimagined for the information age. The surfers of Yahoo! set trends that would last for years. Your site would live or die by their hand. Machines couldn’t do that work on their own. If you wanted your machines to be intelligent, you needed people to guide them.
Page and Brin disagreed. They believed that computers could handle the problem just fine. And they aimed to prove it.
That unflappable confidence would come to define Google far more than their “don’t be evil” motto. In the beginning, their laser-focus on designing a different future for the web would leave them blind to the day-to-day grind of the present. On not one, but two occasions, checks made out to the company for hundreds of thousands of dollars were left in desk drawers or car trunks until somebody finally made the time to deposit them. And they often did things different. Google’s offices, for instances, were built to simulate a college dorm, an environment the founders felt most conducive to big ideas.
Google would eventually build a literal empire on top of a sophisticated, world-class infrastructure of their own design, fueled by the most elaborate and complex (and arguably invasive) advertising mechanism ever built. There are few companies that loom as large as Google. This one, like others, started at Stanford.
Even among the most renowned artificial intelligence experts, Terry Winograd, a computer scientist and Stanford professor, stands out in the crowd. He was also Larry Page’s advisor and mentor when he was a graduate student in the computer science department. Winograd has often recalled the unorthodox and unique proposals he would receive from Page for his thesis project, some of which involved “space tethers or solar kites.” “It was science fiction more than computer science,” he would later remark.
But for all of his fanciful flights of imagination, Page always returned to the World Wide Web. He found its hyperlink structure mesmerizing. Its one-way links — a crucial ingredient in the web’s success — had led to a colossal proliferation of new websites. In 1996, when Page first began looking at the web, there were tens of thousands of sites being added every week. The master stroke of the web was to enable links that only traveled in one direction. That allowed the web to be decentralized, but without a central database tracking links, it was nearly impossible to collect a list of all of the sites that linked to a particular webpage. Page wanted to build a graph of who was linking to who; an index he could use to cross-reference related websites.
Page understood that the hyperlink was a digital analog to academic citations. A key indicator of the value of a particular academic paper is the amount of times it has been cited. If a paper is cited often (by other high quality papers), it is easier to vouch for its reliability. The web works the same way. The more often your site is linked to (what’s known as a backlink), the more dependable and accurate it is likely to be.
Theoretically, you can determine the value of a website by adding up all of the other websites that link to it. That’s only one layer though. If 100 sites link back to you, but each of them has only ever been linked to one time, that’s far less valuable than if five sites that each have been linked to 100 times link back to you. So it’s not simply how many links you have, but the quality of those links. If you take both of those dimensions and aggregate sites using backlinks as a criteria, you can very quickly start to assemble a list of sites ordered by quality.
John Battelle describes the technical challenge facing Page in his own retelling of the Google story, The Search.
Page realized that a raw count of links to a page would be a useful guide to that page’s rank. He also saw that each link needed its own ranking, based on the link count of its originating page. But such an approach creates a difficult and recursive mathematical challenge — you not only have to count a particular page’s links, you also have to count the links attached to the links. The math gets complicated rather quickly.
Fortunately, Page already knew a math prodigy. Sergey Brin had proven his brilliance to the world a number of times before he began a doctoral program in the Stanford computer science department. Brin and Page had crossed paths on several occasions, a relationship that began on rocky ground but grew towards mutual respect. The mathematical puzzle at the center of Page’s idea was far too enticing for Brin to pass up.
He got to work on a solution. “Basically we convert the entire Web into a big equation, with several hundred million variables,” he would later explain, “which are the page ranks of all the Web pages, and billions of terms, which are the links. And we’re able to solve that equation.” Scott Hassan, the seldom talked about third co-founder of Google who developed their first web crawler, summed it up a bit more concisely, describing Google’s algorithm as an attempt to “surf the web backward!”
The result was PageRank — as in Larry Page, not webpage. Brin, Page, and Hassan developed an algorithm that could trace backlinks of a site to determine the quality of a particular webpage. The higher value of a site’s backlinks, the higher up the rankings it climbed. They had discovered what so many others had missed. If you trained a machine on the right source — backlinks — you could get remarkable results.
It was only after that that they began matching their rankings to search queries when they realized PageRank fit best in a search engine. They called their search engine Google. It was launched on Stanford’s internet connection in August of 1996.
Google solved the relevancy problem that had plagued online search since its earliest days. Crawlers like Lycos, AltaVista and Excite were able to provide a list of webpages that matched a particular search. They just weren’t able to sort them right, so you had to go digging to find the result you wanted. Google’s rankings were immediately relevant. The first page of your search usually had what you needed. They were so confident in their results they added an “I’m Feeling Lucky” button which took users directly to the first result for their search.
Google’s growth in their early days was not unlike Yahoo!’s in theirs. They spread through word of mouth, from friends to friends of friends. By 1997, they had grown big enough to put a strain on the Stanford network, something Yang and Filo had done only a couple of years earlier. Stanford once again recognized possibility. It did not push Google off their servers. Instead, Stanford’s advisors pushed Page and Brin in a commercial direction.
Initially, the founders sought to sell or license their algorithm to other search engines. They took meetings with Yahoo!, Infoseek and Excite. No one could see the value. They were focused on portals. In a move that would soon sound absurd, they each passed up the opportunity to buy Google for a million dollars or less, and Page and Brin could not find a partner that recognized their vision.
One Stanford faculty member was able to connect them with a few investors, including Jeff Bezos and David Cheriton (which got them those first few checks that sat in a desk drawer for weeks). They formally incorporated in September of 1998, moving into a friend’s garage, bringing a few early employees along, including symbolics systems alumni Marissa Mayer.
Even backed by a million dollar investment, the Google founders maintained a philosophy of frugality, simplicity, and swiftness. Despite occasional urging from their investors, they resisted the portal strategy and remained focused on search. They continued tweaking their algorithm and working on the accuracy of their results. They focused on their machines. They wanted to take the words that someone searched for and turn them into something actually meaningful. If you weren’t able to find the thing you were looking for in the top three results, Google had failed.
Google was followed by a cloud of hype and positive buzz in the press. Writing in Newsweek, Steven Levy described Google as a “high-tech version of the Oracle of Delphi, positioning everyone a mouse click away from the answers to the most arcane questions — and delivering simple answers so efficiently that the process becomes addictive.” It was around this time that “googling” — a verb form of the site synonymous with search — entered the common vernacular. The portal wars were still waging, but Google was poking its head up as a calm, precise alternative to the noise.
At the end of 1998, they were serving up ten thousand searches a day. A year later, that would jump to seven million a day. But quietly, behind the scenes, they began assembling the pieces of an empire.
As the web grew, technologists and journalists predicted the end of Google; they would never be able to keep up. But they did, outlasting a dying roster of competitors. In 2001, Excite went bankrupt, Lycos closed down, and Disney suspended Infoseek. Google climbed up and replaced them. It wouldn’t be until 2006 that Google would finally overtake Yahoo! as the number one website. But by then, the company would transform into something else entirely.
After securing another round of investment in 1999, Google moved into their new headquarters and brought on an army of new employees. The list of fresh recruits included former engineers at AltaVista, and leading artificial intelligence expert Peter Norving. Google put an unprecedented focus on advancements in technology. Better servers. Faster spiders. Bigger indexes. The engineers inside Google invented a web infrastructure that had, up to that point, been only theoretical.
They trained their machines on new things, and new products. But regardless of the application, translation or email or pay-per-click advertising, they rested on the same premise. Machines can augment and re-imagine human intelligence, and they can do it at limitless scale. Google took the value proposition of artificial intelligence and brought it into the mainstream.
In 2001, Page and Brin brought in Silicon Valley veteran Eric Schmidt to run things as their CEO, a role he would occupy for a decade. He would oversee the company during its time of greatest growth and innovation. Google employee #4 Heather Cairns recalls his first days on the job. “He did this sort of public address with the company and he said, ‘I want you to know who your real competition is.’ He said, ‘It’s Microsoft.’ And everyone went, What?“
Bill Gates would later say, “In the search engine business, Google blew away the early innovators, just blew them away.” There would come a time when Google and Microsoft would come face to face. Eric Schmidt was correct about where Google was going. But it would take years for Microsoft to recognize Google as a threat. In the second half of the 1990’s, they were too busy looking in their rearview mirror at another Silicon Valley company upstart that had swept the digital world. Microsoft’s coming war with Netscape would subsume the web for over half a decade.
I really like Dave’s take on the matter. The diversity of browser engines makes web tech slow. Frustratingly slow, to many, but that slowness can bring value.
There’s a lot of value in slow thinking. You use the non-lizard side of your brain. You make more deliberate decisions. You prioritize design over instant gratification. You can check your gut instincts and validate your hypothesis before incurring mountains of technical debt.
I’d bet you a dollar that the less engines we have, the faster things get. Fast can be satisfying in the moment, but doesn’t make for the best brisket.
If we do see a major reduction in browser diversity, I think we lose the intentional slowness and the cooperation mechanisms we have in place. Who knows what will happen, but my hope is that just like iron can sharpen iron, maybe chromium can sharpen chromium.