Tag: Audio

Let’s Create a Custom Audio Player

HTML has a built-in native audio player interface that we get simply using the <audio> element. Point it to a sound file and that’s all there is to it. We even get to specify multiple files for better browser support, as well as a little CSS flexibility to style things up, like giving the audio player a border, some rounded corners, and maybe a little padding and margin.

But even with all that… the rendered audio player itself can look a little, you know, plain.

Did you know it’s possible to create a custom audio player? Of course we can! While the default <audio> player is great in many cases, having a custom player might suit you better, like if you run a podcast and an audio player is the key element on a website for the podcast. Check out the sweet custom player Chris and Dave set up over at the ShopTalk Show website.

Showing a black audio player with muted orange controls, including options to jump or rewind 30 seconds on each side of a giant play button, which sits on top of a timeline showing the audio current time and duration at both ends of the timeline, and an option to set the audio speed in the middle.
The audio player fits in seamlessly with other elements on the page, sporting controls that complement the overall design.

We’re going to take stab at making our own player in this post. So, put on your headphones, crank up some music, and let’s get to work!

The elements of an audio player

First, let’s examine the default HTML audio players that some of the popular browsers provide.

Google Chrome, Opera, and Microsoft Edge
Blink

Mozilla Firefox
Firefox

Internet Explorer
Internet Explorer

If our goal is to match the functionality of these examples, then we need to make sure our player has:

  • a play/pause button,
  • a seek slider,
  • the current time indicator,
  • the duration of the sound file,
  • a way to mute the audio, and
  • a volume control slider.

Let’s say this is the design we’re aiming for:

We’re not going for anything too fancy here: just a proof of concept sorta thing that we can use to demonstrate how to make something different than what default HTML provides.

Basic markup, styling and scripts for each element

We should first go through the semantic HTML elements of the player before we start building features and styling things. We have plenty of elements to work with here based on the elements we just listed above.

Play/pause button

I think the HTML element appropriate for this button is the <button> element. It will contain the play icon, but the pause icon should also be in this button. That way, we’re toggling between the two rather than taking up space by displaying both at the same time.

Something like this in the markup:

<div id="audio-player-container">   <p>Audio Player</p>   <!-- swaps with pause icon -->   <button id="play-icon"></button> </div>

So, the question becomes: how do we swap between the two buttons, both visually and functionally? the pause icon will replace the play icon when the play action is triggered. The play button should display when the audio is paused and the pause button should display when the audio is playing.

Of course, a little animation could take place as the icon transitions from the play to pause. What would help us accomplish that is Lottie, a library that renders Adobe After Effects animations natively. We don’t have to create the animation on After Effects though. The animated icon we’re going to use is provided for free by Icons8.

New to Lottie? I wrote up a thorough overview that covers how it works.

In the meantime, permit me to describe the following Pen:

The HTML section contains the following:

  • a container for the player,
  • text that briefly describes the container, and
  • a <button> element for the play and pause actions.

The CSS section includes some light styling. The JavaScript is what we need to break down a bit because it’s doing several things:

// imports the Lottie library via Skypack import lottieWeb from 'https://cdn.skypack.dev/lottie-web';  // variable for the button that will contain both icons const playIconContainer = document.getElementById('play-icon'); // variable that will store the button’s current state (play or pause) let state = 'play';  // loads the animation that transitions the play icon into the pause icon into the referenced button, using Lottie’s loadAnimation() method const animation = lottieWeb.loadAnimation({   container: playIconContainer,   path: 'https://maxst.icons8.com/vue-static/landings/animated-icons/icons/pause/pause.json',   renderer: 'svg',   loop: false,   autoplay: false,   name: "Demo Animation", });  animation.goToAndStop(14, true);  // adds an event listener to the button so that when it is clicked, the the player toggles between play and pause playIconContainer.addEventListener('click', () => {   if(state === 'play') {     animation.playSegments([14, 27], true);     state = 'pause';   } else {     animation.playSegments([0, 14], true);     state = 'play';   } });

Here’s what the script is doing, minus the code:

  • It imports the Lottie library via Skypack.
  • It references the button that will contain both icons in a variable.
  • It defines a variable that will store the button’s current state (play or pause).
  • It loads the animation that transitions the play icon into the pause icon into the referenced button, using Lottie’s loadAnimation() method.
  • It displays the play icon on load since the audio is initially paused.
  • It adds an event listener to the button so that when it is clicked, the the player toggles between play and pause.

Current time and duration

The current time is like a progress indicate that shows you how much time has elapsed from the start of the audio file. The duration? That’s just how long the sound file is.

A <span> element is okay to display these. The <span> element for the current time, which is to be updated every second, has a default text content of 0:00. On the other side, the one for duration is the duration of the audio in mm:ss format.

<div id="audio-player-container">   <p>Audio Player</p>   <button id="play-icon"></button>   <span id="current-time" class="time">0:00</span>   <span id="duration" class="time">0:00</span> </div>

Seek slider and volume control slider

We need a way to move to any point in time in the sound file. So, if I want to skip ahead to the halfway point of the file, I can simply click and drag a slider to that spot in the timeline.

We also need a way to control the sound volume. That, too, can be some sort of click-and-drag slider thingy.

I would say <input type="range"> is the right HTML element for both of these features.

<div id="audio-player-container">   <p>Audio Player</p>   <button id="play-icon"></button>   <span id="current-time" class="time">0:00</span>   <input type="range" id="seek-slider" max="100" value="0">   <span id="duration" class="time">0:00</span>   <input type="range" id="volume-slider" max="100" value="100"> </div>

Styling range inputs with CSS is totally possible, but I’ll tell you what: it is difficult for me to wrap my head around. This article will help. Mad respect to you, Ana. Handling browser support with all of those vendor prefixes is a CSS trick in and of itself. Look at all the code needed on input[type="range"] to get a consistent experience:

input[type="range"] {   position: relative;   -webkit-appearance: none;   width: 48%;   margin: 0;   padding: 0;   height: 19px;   margin: 30px 2.5% 20px 2.5%;   float: left;   outline: none; } input[type="range"]::-webkit-slider-runnable-track {   width: 100%;   height: 3px;   cursor: pointer;   background: linear-gradient(to right, rgba(0, 125, 181, 0.6) var(--buffered-width), rgba(0, 125, 181, 0.2) var(--buffered-width)); } input[type="range"]::before {   position: absolute;   content: "";   top: 8px;   left: 0;   width: var(--seek-before-width);   height: 3px;   background-color: #007db5;   cursor: pointer; } input[type="range"]::-webkit-slider-thumb {   position: relative;   -webkit-appearance: none;   box-sizing: content-box;   border: 1px solid #007db5;   height: 15px;   width: 15px;   border-radius: 50%;   background-color: #fff;   cursor: pointer;   margin: -7px 0 0 0; } input[type="range"]:active::-webkit-slider-thumb {   transform: scale(1.2);   background: #007db5; } input[type="range"]::-moz-range-track {   width: 100%;   height: 3px;   cursor: pointer;   background: linear-gradient(to right, rgba(0, 125, 181, 0.6) var(--buffered-width), rgba(0, 125, 181, 0.2) var(--buffered-width)); } input[type="range"]::-moz-range-progress {   background-color: #007db5; } input[type="range"]::-moz-focus-outer {   border: 0; } input[type="range"]::-moz-range-thumb {   box-sizing: content-box;   border: 1px solid #007db5;   height: 15px;   width: 15px;   border-radius: 50%;   background-color: #fff;   cursor: pointer; } input[type="range"]:active::-moz-range-thumb {   transform: scale(1.2);   background: #007db5; } input[type="range"]::-ms-track {   width: 100%;   height: 3px;   cursor: pointer;   background: transparent;   border: solid transparent;   color: transparent; } input[type="range"]::-ms-fill-lower {   background-color: #007db5; } input[type="range"]::-ms-fill-upper {   background: linear-gradient(to right, rgba(0, 125, 181, 0.6) var(--buffered-width), rgba(0, 125, 181, 0.2) var(--buffered-width)); } input[type="range"]::-ms-thumb {   box-sizing: content-box;   border: 1px solid #007db5;   height: 15px;   width: 15px;   border-radius: 50%;   background-color: #fff;   cursor: pointer; } input[type="range"]:active::-ms-thumb {   transform: scale(1.2);   background: #007db5; }

Whoa! What does even mean, right?

Styling the progress section of range inputs is a tricky endeavor. Firefox provides the ::-moz-range-progress pseudo-element while Internet Explorer provides ::-ms-fill-lower. As WebKit browsers do not provide any similar pseudo-element, we have to use the ::before pseudo-element to improvise the progress. That explains why, if you noticed, I added event listeners in the JavaScript section to set custom CSS properties (e.g. --before-width) that update when the input event is fired on each of the sliders.

One of the native HTML <audio> examples we looked at earlier shows the buffered amount of the audio. The --buffered-width property specifies the amount of the audio, in percentage, that the user can play through to without having to wait for the browser to download. I imitated this feature with the linear-gradient() function on the track of the seek slider. I used the rgba() function in the linear-gradient() function for the color stops to show transparency. The buffered width has a much deeper color compared to the rest of the track. However, we would treat the actual implementation of this feature much later.

Volume percentage

This is to display the percentage volume. The text content of this element is updated as the user changes the volume through the slider. Since it is based on user input, I think this element should be the <output> element.

<div id="audio-player-container">   <p>Audio Player</p>   <button id="play-icon"></button>   <span id="current-time" class="time">0:00</span>   <input type="range" id="seek-slider" max="100" value="0">   <span id="duration" class="time">0:00</span>   <output id="volume-output">100</output>   <input type="range" id="volume-slider" max="100" value="100"> </div>

Mute button

Like for the play and pause actions, this should be in a <button> element. Luckily for us, Icons8 also has an animated mute icon. So we would use the Lottie library here just as we did for the play/pause button.

<div id="audio-player-container">   <p>Audio Player</p>   <button id="play-icon"></button>   <span id="current-time" class="time">0:00</span>   <input type="range" id="seek-slider" max="100" value="0">   <span id="duration" class="time">0:00</span>   <output id="volume-output">100</output>   <input type="range" id="volume-slider" max="100" value="100">   <button id="mute-icon"></button> </div>

That’s all of the basic markup, styling and scripting we need at the moment!

Working on the functionality

The HTML <audio> element has a preload attribute. This attribute gives the browser instructions for how to load the audio file. It accepts one of three values:

  • none – indicates that the browser should not load the audio at all (unless the user initiates the play action)
  • metadata – indicates that only the metadata (like length) of the audio should be loaded
  • auto – loads the complete audio file

An empty string is equivalent to the auto value. Note, however, that these values are merely hints to the browser. The browser does not have to agree to these values. For example, if a user is on a cellular network on iOS, Safari does not load any part of an audio, regardless of the preload attribute, except the user triggers the play action. For this player, we would use the metadata value since it doesn’t require much overhead and we want to display the length of the audio.

What would help us accomplish the features our audio player should have is the JavaScript HTMLMediaElement interface, which the HTMLAudioElement interface inherits. For our audio player code to be as self-explanatory as possible, I’d divide the JavaScript into two sections: presentation and functionality.

First off, we should create an <audio> element in the audio player that has the basic features we want:

<div id=”audio-player-container”>   <audio src=”my-favourite-song.mp3” preload=”metadata” loop>   <button id="play-icon"></button>   <!-- ... --> </div>

Display the audio duration

The first thing we want to display on the browser is the duration of the audio, when it is available. The HTMLAudioElement interface has a duration property, which returns the duration of the audio, returned in seconds units. If it is unavailable, it returns NaN.

We’ve set preload to metadata in this example, so the browser should provide us that information up front on load… assuming it respects preload. Since we’d be certain that the duration will be available when the browser has downloaded the metadata of the audio, we display it in our handler for the loadedmetadata event, which the interface also provides:

const audio = document.querySelector('audio');  audio.addEventListener('loadedmetadata', () => {   displayAudioDuration(audio.duration); });

That’s great but, again, we get the duration in second. We probably should convert that to a mm:ss format:

const calculateTime = (secs) => {   const minutes = Math.floor(secs / 60);   const seconds = Math.floor(secs % 60);   const returnedSeconds = seconds < 10 ? `0$ {seconds}` : `$ {seconds}`;   return `$ {minutes}:$ {returnedSeconds}`; }

We’re using Math.floor() because the seconds returned by the duration property typically come in decimals.

The third variable, returnedSeconds, is necessary for situations where the duration is something like 4 minutes and 8 seconds. We would want to return 4:08, not 4:8.

More often than not, the browser loads the audio faster than usual. When this happens, the loadedmetadata event is fired before its listener can be added to the <audio> element. Therefore, the audio duration is not displayed on the browser. Nevertheless, there’s a hack. The HTMLMediaElement has a property called readyState. It returns a number that, according to MDN Web Docs, indicates the readiness state of the media. The following describes the values:

  • 0 – no data about the media is available.
  • 1 – the metadata attributes of the media are available.
  • 2 – data is available, but not enough to play more than a frame.
  • 3 – data is available, but for a little amount of frames from the current playback position.
  • 4 – data is available, such that the media can be played through to the end without interruption.

We want to focus on the metadata. So our approach is to display the duration if the metadata of the audio is available. If it is not available, we add the event listener. That way, the duration is always displayed.

const audio = document.querySelector('audio'); const durationContainer = document.getElementById('duration');  const calculateTime = (secs) => {   const minutes = Math.floor(secs / 60);   const seconds = Math.floor(secs % 60);   const returnedSeconds = seconds < 10 ? `0$ {seconds}` : `$ {seconds}`;   return `$ {minutes}:$ {returnedSeconds}`; }  const displayDuration = () => {   durationContainer.textContent = calculateTime(audio.duration); }  if (audio.readyState > 0) {   displayDuration(); } else {   audio.addEventListener('loadedmetadata', () => {     displayDuration();   }); }

Seek slider

The default value of the range slider’s max property is 100. The general idea is that when the audio is playing, the thumb is supposed to be “sliding.” Also, it is supposed to move every second, such that it gets to the end of the slider when the audio ends.

Notwithstanding, if the audio duration is 150 seconds and the value of the slider’s max property is 100, the thumb will get to the end of the slider before the audio ends. This is why it is necessary to set the value of the slider’s max property to the audio duration in seconds. This way, the thumb gets to the end of the slider when the audio ends. Recall that this should be when the audio duration is available, when the browser has downloaded the audio metadata, as in the following:

const seekSlider = document.getElementById('seek-slider');  const setSliderMax = () => {   seekSlider.max = Math.floor(audio.duration); }  if (audio.readyState > 0) {   displayDuration();   setSliderMax(); } else {   audio.addEventListener('loadedmetadata', () => {     displayDuration();     setSliderMax();   }); }

Buffered amount

As the browser downloads the audio, it would be nice for the user to know how much of it they can seek to without delay. The HTMLMediaElement interface provides the buffered and seekable properties. The buffered property returns a TimeRanges object, which indicates the chunks of media that the browser has downloaded. According to MDN Web Docs, a TimeRanges object is a series of non-overlapping ranges of time, with start and stop times. The chunks are usually contiguous, unless the user seeks to another part in the media. The seekable property returns a TimeRanges object, which indicates “seekable” parts of the media, irrespective of whether they’ve been downloaded or not.

Recall that the preload="metadata" attribute is present in our <audio> element. If, for example the audio duration is 100 seconds, the buffered property returns a TimeRanges object similar to the following:

Displays a line from o to 100, with a marker at 20. A range from 0 to 20 is highlighted on the line.

When the audio has started playing, the seekable property would return a TimeRanges object similar to the following:

Another timeline, but with four different ranges highlighted on the line, 0 to 20, 30 to 40, 60 to 80, and 90 to 100.

It returns multiple chunks of media because, more often than not, byte-range requests are enabled on the server. What this means is that multiple parts of the media can be downloaded simultaneously. However, we want to display the buffered amount closest to the current playback position. That would be the first chunk (time range 0 to 20). That would be the first and last chunk from the first image. As the audio starts playing, the browser begins to download more chunks. We would want to display the one closest to the current playback position, which would be the current last chunk returned by the buffered property. The following snippet would store in the variable, bufferedAmount, i.e. the time for the end of the last range in the TimeRanges object returned by the buffered property.

const audio = document.querySelector('audio'); const bufferedAmount = audio.buffered.end(audio.buffered.length - 1);

This would be 20 from the 0 to 20 range in the first image. The following snippet stores in the variable, seekableAmount, the time for the end of the last range in the TimeRanges object returned by the seekable property.

const audio = document.querySelector('audio'); const seekableAmount = audio.seekable.end(audio.seekable.length - 1);

Nevertheless, this would be 100 from the 90 to 100 range in the second image, which is the entire audio duration. Note that there are some holes in the TimeRanges object as the browser only downloads some parts of the audio. What this means is that the entire duration would be displayed to the user as the buffered amount. Meanwhile, some parts in the audio are not available yet. Because this won’t provide the best user experience, the first snippet is what we should use.

As the browser downloads the audio, the user should expect that the buffered amount on the slider increases in width. The HTMLMediaElement provides an event, the progress event, which fires as the browser loads the media. Of course, I’m thinking what you’re thinking! The buffered amount should be incremented in the handler for the audio’s progress event.

Finally, we should actually display the buffered amount on the seek slider. We do that by setting the property we talked about earlier, --buffered-width, as a percentage of the value of the slider’s max property. Yes, in the handler for the progress event too. Also, because of the browser loading the audio faster than usual, we should update the property in the loadedmetadata event and its preceding conditional block that checks for the readiness state of the audio. The following Pen combines all that we’ve covered so far:

Current time

As the user slides the thumb along the range input, the range value should be reflected in the <span> element containing the current time of the audio. This tells the user the current playback position of the audio. We do this in the handler of the slider’s input event listener.

If you think the correct event to listen to should be the change event, I beg to differ. Say the user moved the thumb from value 0 to 20. The input event fires at values 1 through to 20. However, the change event will fire only at value 20. If we use the change event, it will not reflect the playback position from values 1 to 19. So, I think the input event is appropriate. Then, in the handler for the event, we pass the slider’s value to the calculateTime() function we defined earlier.

We created the function to take time in seconds and return it in a mm:ss format. If you’re thinking, Oh, but the slider’s value is not time in seconds, let me explain. Actually, it is. Recall that we set the value of the slider’s max property to the audio duration, when it is available. Let’s say the audio duration is 100 seconds. If the user slides the thumb to the middle of the slider, the slider’s value will be 50. We wouldn’t want 50 to appear in the current time box because it is not in accordance with the mm:ss format. When we pass 50 to the function, the function returns 0:50 and that would be a better representation of the playback position.

I added the snippet below to our JavaScript.

const currentTimeContainer = document.getElementById('current-time');  seekSlider.addEventListener('input', () => {   currentTimeContainer.textContent = calculateTime(seekSlider.value); }); 

To see it in action, you can move the seek slider’s thumb back and forth in the following Pen:

Play/pause

Now we’re going to set the audio to play or pause according to the respective action triggered by the user. If you recall, we created a variable, playState, to store the state of the button. That variable is what will help us know when to play or pause the audio. If its value is play and the button is clicked, our script is expected to perform the following actions:

  • play the audio
  • change the icon from play to pause
  • change the playState value to pause

We already implemented the second and third actions in the handler for the button’s click event. What we need to do is to add the statements to play and pause the audio in the event handler:

playIconContainer.addEventListener('click', () => {   if(playState === 'play') {     audio.play();     playAnimation.playSegments([14, 27], true);     playState = 'pause';   } else {     audio.pause();     playAnimation.playSegments([0, 14], true);     playState = 'play';   } });

It is possible that the user will want to seek to a specific part in the audio. In that case, we set the value of the audio’s currentTime property to the seek slider’s value. The slider’s change event will come in handy here. If we use the input event, various parts of the audio will play in a very short amount of time.

Recall our scenario of 1 to 20 values. Now imagine the user slides the thumb from 1 to 20 in, say, two seconds. That’s 20 seconds audio playing in two seconds. It’s like listening to Busta Rhymes on 3× speed. I’d suggest we use the change event. The audio will only play after the user is done seeking. This is what I’m talking about:

seekSlider.addEventListener('change', () => {   audio.currentTime = seekSlider.value; });

With that out of the way, something needs to be done while the audio is playing. That is to set the slider’s value to the current time of the audio. Or move the slider’s thumb by one tick every second. Since the audio duration and the slider’s max value are the same, the thumb gets to the end of the slider when the audio ends. Now the timeupdate event of the HTMLMediaElement interface should be the appropriate event for this. This event is fired as the value of the media’s currentTime property is updated, which is approximately four times in one second. So in the handler for this event, we could set the slider’s value to the audio’s current time. This should work just fine:

audio.addEventListener('timeupdate', () => {   seekSlider.value = Math.floor(audio.currentTime); });

However, there are some things to take note of here:

  1. As the audio is playing, and the seek slider’s value is being updated, a user is unable to interact with the slider. If the audio is paused, the slider won’t be able to receive input from the user because it is constantly being updated.
  2. In the handler, we update the value of the slider but its input event does not fire. This is because the event only fires when a user updates the slider’s value on the browser, and not when it is updated programmatically.

Let’s consider the first issue.

To be able to interact with the slider while the audio is playing, we would have to pause the process of updating it’s value when it receives input. Then, when the slider loses focus, we resume the process. But, we don’t have access to this process. My hack would be to use the requestAnimationFrame() global method for the process. But this time, we won’t be using the timeupdate event for this because it still won’t work. The animation would play forever until the audio is paused, and that’s not what we want. Therefore, we use the play/pause button’s click event.

To use the requestAnimationFrame() method for this feature, we have to accomplish these steps:

  1. Create a function to keep our “update slider value” statement.
  2. Initialize a variable in the function that was previously created to store the request ID returned by the function (that will be used to pause the update process).
  3. Add the statements in the play/pause button click event handler to start and pause the process in the respective blocks.

This is illustrated in the following snippet:

let rAF = null;  const whilePlaying = () => {   seekSlider.value = Math.floor(audio.currentTime);   rAF = requestAnimationFrame(whilePlaying); }  playIconContainer.addEventListener('click', () => {   if(playState === 'play') {     audio.play();     playAnimation.playSegments([14, 27], true);     requestAnimationFrame(whilePlaying);     playState = 'pause';   } else {     audio.pause();     playAnimation.playSegments([0, 14], true);     cancelAnimationFrame(rAF);     playState = 'play';   } });

But this doesn’t exactly solve our problem. The process is only paused when the audio is paused. We also need to pause the process, if it is in execution (i.e. if the audio is playing), when the user wants to interact with the slider. Then, after the slider loses focus, if the process was ongoing before (i.e. if the audio was playing), we start the process again. For this, we would use the slider’s input event handler to pause the process. To start the process again, we would use the change event because it is fired after the user is done sliding the thumb. Here is the implementation:

seekSlider.addEventListener('input', () => {   currentTimeContainer.textContent = calculateTime(seekSlider.value);   if(!audio.paused) {     cancelAnimationFrame(raf);   } });  seekSlider.addEventListener('change', () => {   audio.currentTime = seekSlider.value;   if(!audio.paused) {     requestAnimationFrame(whilePlaying);   } });

I was able to come up with something for the second issue. I added the statements in the seek slider’s input event handlers to the whilePlaying() function. Recall that there are two event listeners for the slider’s input event: one for the presentation, and the other for the functionality. After adding the two statements from the handlers, this is how our whilePlaying() function looks:

const whilePlaying = () => {   seekSlider.value = Math.floor(audio.currentTime);   currentTimeContainer.textContent = calculateTime(seekSlider.value);   audioPlayerContainer.style.setProperty('--seek-before-width', `$ {seekSlider.value / seekSlider.max * 100}%`);   raf = requestAnimationFrame(whilePlaying); }

Note that the statement on the fourth line is the seek slider’s appropriate statement from the showRangeProgress() function we created earlier in the presentation section.

Now we’re left with the volume-control functionality. Whew! But before we begin working on that, here’s a Pen covering all we’ve done so far:

Volume-control

For volume-control, we’re utilizing the second slider, #volume-slider. When the user interacts with the slider, the slider’s value is reflected in the volume of the audio and the <output> element we created earlier.

The slider’s max property has a default value of 100. This makes it easy to display its value in the <output> element when it is updated. We could implement this in the input event handler of the slider. However, to implement this in the volume of the audio, we’re going to have to do some math.

The HTMLMediaElement interface provides a volume property, which returns a value between 0 and 1, where 1 being is the loudest value. What this means is if the user sets the slider’s value to 50, we would have to set the volume property to 0.5. Since 0.5 is a hundredth of 50, we could set the volume to a hundredth of the slider’s value.

const volumeSlider = document.getElementById('volume-slider'); const outputContainer = document.getElementById('volume-output');  volumeSlider.addEventListener('input', (e) => {   const value = e.target.value;    outputContainer.textContent = value;   audio.volume = value / 100; });

Not bad, right?

Muting audio

Next up is the speaker icon, which is clicked to mute and unmute the audio. To mute the audio, we would use its muted property, which is also available via HTMLMediaElement as a boolean type. Its default value is false, which is unmuted. To mute the audio, we set the property to true. If you recall, we added a click event listener to the speaker icon for the presentation (the Lottie animation). To mute and unmute the audio, we should add the statements to the respective conditional blocks in that handler, as in the following:

const muteIconContainer = document.getElementById('mute-icon');  muteIconContainer.addEventListener('click', () => {   if(muteState === 'unmute') {     muteAnimation.playSegments([0, 15], true);     audio.muted = true;     muteState = 'mute';   } else {     muteAnimation.playSegments([15, 25], true);     audio.muted = false;     muteState = 'unmute';   } });

Full demo

Here’s the full demo of our custom audio player in all its glory!

But before we call it quits, I’d like to introduce something — something that will give our user access to the media playback outside of the browser tab where our custom audio player lives.

Permit me to introduce to you, drumroll, please…

The Media Session API

Basically, this API lets the user pause, play, and/or perform other media playback actions, but not with our audio player. Depending on the device or the browser, the user initiates these actions through the notification area, media hubs, or any other interface provided by their browser or OS. I have another article just on that for you to get more context on that.

The following Pen contains the implementation of the Media Session API:

If you view this Pen on your mobile, take a sneak peek at the notification area. If you’re on Chrome on your computer, check the media hub. If your smartwatch is paired, I’d suggest you look at it. You could also tell your voice assistant to perform some of the actions on the audio. Ten bucks says it’ll make you smile. 🤓

One more thing…

If you need an audio player on a webpage, there’s a high chance that the page contains other stuff. That’s why I think it’s smart to group the audio player and all the code needed for it into a web component. This way, the webpage possesses a form of separation of concerns. I transferred everything we’ve done into a web component and came up with the following:

Wrapping up, I’d say the possibilities of creating a media player are endless with the HTMLMediaElement interface. There’s so many various properties and methods for various functions. Then there’s the Media Session API for an enhanced experience.

What’s the saying? With great power comes great responsibility, right? Think of all the various controls, elements, and edge cases we had to consider for what ultimately amounts to a modest custom audio player. Just goes to show that audio players are more than hitting play and pause. Properly speccing out functional requirements will definitely help plan your code in advance and save you lots of time.


The post Let’s Create a Custom Audio Player appeared first on CSS-Tricks.

You can support CSS-Tricks by being an MVP Supporter.

CSS-Tricks

, , , ,

Making an Audio Waveform Visualizer with Vanilla JavaScript

As a UI designer, I’m constantly reminded of the value of knowing how to code. I pride myself on thinking of the developers on my team while designing user interfaces. But sometimes, I step on a technical landmine.

A few years ago, as the design director of wsj.com, I was helping to re-design the Wall Street Journal’s podcast directory. One of the designers on the project was working on the podcast player, and I came upon Megaphone’s embedded player.

I previously worked at SoundCloud and knew that these kinds of visualizations were useful for users who skip through audio. I wondered if we could achieve a similar look for the player on the Wall Street Journal’s site.

The answer from engineering: definitely not. Given timelines and restraints, it wasn’t a possibility for that project. We eventually shipped the redesigned pages with a much simpler podcast player.

But I was hooked on the problem. Over nights and weekends, I hacked away trying to achieve this
effect. I learned a lot about how audio works on the web, and ultimately was able to achieve the look with less than 100 lines of JavaScript!

It turns out that this example is a perfect way to get acquainted with the Web Audio API, and how to visualize audio data using the Canvas API.

But first, a lesson in how digital audio works

In the real, analog world, sound is a wave. As sound travels from a source (like a speaker) to your ears, it compresses and decompresses air in a pattern that your ears and brain hear as music, or speech, or a dog’s bark, etc. etc.

An analog sound wave is a smooth, continuous function.

But in a computer’s world of electronic signals, sound isn’t a wave. To turn a smooth, continuous wave into data that it can store, computers do something called sampling. Sampling means measuring the sound waves hitting a microphone thousands of times every second, then storing those data points. When playing back audio, your computer reverses the process: it recreates the sound, one tiny split-second of audio at a time.

A digital sound file is made up of tiny slices of the original audio, roughly re-creating the smooth continuous wave.

The number of data points in a sound file depends on its sample rate. You might have seen this number before; the typical sample rate for mp3 files is 44.1 kHz. This means that, for every second of audio, there are 44,100 individual data points. For stereo files, there are 88,200 every second — 44,100 for the left channel, and 44,100 for the right. That means a 30-minute podcast has 158,760,000 individual data points describing the audio!

How can a web page read an mp3?

Over the past nine years, the W3C (the folks who help maintain web standards) have developed the Web Audio API to help web developers work with audio. The Web Audio API is a very deep topic; we’ll hardly crack the surface in this essay. But it all starts with something called the AudioContext.

Think of the AudioContext like a sandbox for working with audio. We can initialize it with a few lines of JavaScript:

// Set up audio context window.AudioContext = window.AudioContext || window.webkitAudioContext; const audioContext = new AudioContext(); let currentBuffer = null;

The first line after the comment is a necessary because Safari has implemented AudioContext as webkitAudioContext.

Next, we need to give our new audioContext the mp3 file we’d like to visualize. Let’s fetch it using… fetch()!

const visualizeAudio = url => {   fetch(url)     .then(response => response.arrayBuffer())     .then(arrayBuffer => audioContext.decodeAudioData(arrayBuffer))     .then(audioBuffer => visualize(audioBuffer)); };

This function takes a URL, fetches it, then transforms the Response object a few times.

  • First, it calls the arrayBuffer() method, which returns — you guessed it — an ArrayBuffer! An ArrayBuffer is just a container for binary data; it’s an efficient way to move lots of data around in JavaScript.
  • We then send the ArrayBuffer to our audioContext via the decodeAudioData() method. decodeAudioData() takes an ArrayBuffer and returns an AudioBuffer, which is a specialized ArrayBuffer for reading audio data. Did you know that browsers came with all these convenient objects? I definitely did not when I started this project.
  • Finally, we send our AudioBuffer off to be visualized.

Filtering the data

To visualize our AudioBuffer, we need to reduce the amount of data we’re working with. Like I mentioned before, we started off with millions of data points, but we’ll have far fewer in our final visualization.

First, let’s limit the channels we are working with. A channel represents the audio sent to an individual speaker. In stereo sound, there are two channels; in 5.1 surround sound, there are six. AudioBuffer has a built-in method to do this: getChannelData(). Call audioBuffer.getChannelData(0), and we’ll be left with one channel’s worth of data.

Next, the hard part: loop through the channel’s data, and select a smaller set of data points. There are a few ways we could go about this. Let’s say I want my final visualization to have 70 bars; I can divide up the audio data into 70 equal parts, and look at a data point from each one.

const filterData = audioBuffer => {   const rawData = audioBuffer.getChannelData(0); // We only need to work with one channel of data   const samples = 70; // Number of samples we want to have in our final data set   const blockSize = Math.floor(rawData.length / samples); // Number of samples in each subdivision   const filteredData = [];   for (let i = 0; i < samples; i++) {     filteredData.push(rawData[i * blockSize]);    }   return filteredData; }
This was the first approach I took. To get an idea of what the filtered data looks like, I put the result into a spreadsheet and charted it.

The output caught me off guard! It doesn’t look like the visualization we’re emulating at all. There are lots of data points that are close to, or at zero. But that makes a lot of sense: in a podcast, there is a lot of silence between words and sentences. By only looking at the first sample in each of our blocks, it’s highly likely that we’ll catch a very quiet moment.

Let’s modify the algorithm to find the average of the samples. And while we’re at it, we should take the absolute value of our data, so that it’s all positive.

const filterData = audioBuffer => {   const rawData = audioBuffer.getChannelData(0); // We only need to work with one channel of data   const samples = 70; // Number of samples we want to have in our final data set   const blockSize = Math.floor(rawData.length / samples); // the number of samples in each subdivision   const filteredData = [];   for (let i = 0; i < samples; i++) {     let blockStart = blockSize * i; // the location of the first sample in the block     let sum = 0;     for (let j = 0; j < blockSize; j++) {       sum = sum + Math.abs(rawData[blockStart + j]) // find the sum of all the samples in the block     }     filteredData.push(sum / blockSize); // divide the sum by the block size to get the average   }   return filteredData; }

Let’s see what that data looks like.

This is great. There’s only one thing left to do: because we have so much silence in the audio file, the resulting averages of the data points are very small. To make sure this visualization works for all audio files, we need to normalize the data; that is, change the scale of the data so that the loudest samples measure as 1.

const normalizeData = filteredData => {   const multiplier = Math.pow(Math.max(...filteredData), -1);   return filteredData.map(n => n * multiplier); }

This function finds the largest data point in the array with Math.max(), takes its inverse with Math.pow(n, -1), and multiplies each value in the array by that number. This guarantees that the largest data point will be set to 1, and the rest of the data will scale proportionally.

Now that we have the right data, let’s write the function that will visualize it.

Visualizing the data

To create the visualization, we’ll be using the JavaScript Canvas API. This API draws graphics into an HTML <canvas> element. The first step to using the Canvas API is similar to the Web Audio API.

const draw = normalizedData => {   // Set up the canvas   const canvas = document.querySelector("canvas");   const dpr = window.devicePixelRatio || 1;   const padding = 20;   canvas.width = canvas.offsetWidth * dpr;   canvas.height = (canvas.offsetHeight + padding * 2) * dpr;   const ctx = canvas.getContext("2d");   ctx.scale(dpr, dpr);   ctx.translate(0, canvas.offsetHeight / 2 + padding); // Set Y = 0 to be in the middle of the canvas };

This code finds the <canvas> element on the page, and checks the browser’s pixel ratio (essentially the screen’s resolution) to make sure our graphic will be drawn at the right size. We then get the context of the canvas (its individual set of methods and values). We calculate the pixel dimensions pf the canvas, factoring in the pixel ratio and adding in some padding. Lastly, we change the coordinates system of the <canvas>; by default, (0,0) is in the top-left of the box, but we can save ourselves a lot of math by setting (0, 0) to be in the middle of the left edge.

Now let’s draw some lines! First, we’ll create a function that will draw an individual segment.

const drawLineSegment = (ctx, x, y, width, isEven) => {   ctx.lineWidth = 1; // how thick the line is   ctx.strokeStyle = "#fff"; // what color our line is   ctx.beginPath();   y = isEven ? y : -y;   ctx.moveTo(x, 0);   ctx.lineTo(x, y);   ctx.arc(x + width / 2, y, width / 2, Math.PI, 0, isEven);   ctx.lineTo(x + width, 0);   ctx.stroke(); };

The Canvas API uses an concept called “turtle graphics.” Imagine that the code is a set of instructions being given to a turtle with a marker. In basic terms, the drawLineSegment() function works as follows:

  1. Start at the center line, x = 0.
  2. Draw a vertical line. Make the height of the line relative to the data.
  3. Draw a half-circle the width of the segment.
  4. Draw a vertical line back to the center line.

Most of the commands are straightforward: ctx.moveTo() and ctx.lineTo() move the turtle to the specified coordinate, without drawing or while drawing, respectively.

Line 5, y = isEven ? -y : y, tells our turtle whether to draw down or up from the center line. The segments alternate between being above and below the center line so that they form a smooth wave. In the world of the Canvas API, negative y values are further up than positive ones. This is a bit counter-intuitive, so keep it in mind as a possible source of bugs.

On line 8, we draw a half-circle. ctx.arc() takes six parameters:

  • The x and y coordinates of the center of the circle
  • The radius of the circle
  • The place in the circle to start drawing (Math.PI or π is the location, in radians, of 9 o’clock)
  • The place in the circle to finish drawing (0 in radians represents 3 o’clock)
  • A boolean value telling our turtle to draw either counterclockwise (if true) or clockwise (if false). Using isEven in this last argument means that we’ll draw the top half of a circle — clockwise from 9 o’clock to 3 clock — for even-numbered segments, and the bottom half for odd-numbered segments.

OK, back to the draw() function.

const draw = normalizedData => {   // Set up the canvas   const canvas = document.querySelector("canvas");   const dpr = window.devicePixelRatio || 1;   const padding = 20;   canvas.width = canvas.offsetWidth * dpr;   canvas.height = (canvas.offsetHeight + padding * 2) * dpr;   const ctx = canvas.getContext("2d");   ctx.scale(dpr, dpr);   ctx.translate(0, canvas.offsetHeight / 2 + padding); // Set Y = 0 to be in the middle of the canvas    // draw the line segments   const width = canvas.offsetWidth / normalizedData.length;   for (let i = 0; i < normalizedData.length; i++) {     const x = width * i;     let height = normalizedData[i] * canvas.offsetHeight - padding;     if (height < 0) {         height = 0;     } else if (height > canvas.offsetHeight / 2) {         height = height > canvas.offsetHeight / 2;     }     drawLineSegment(ctx, x, height, width, (i + 1) % 2);   } };

After our previous setup code, we need to calculate the pixel width of each line segment. This is the canvas’s on-screen width, divided by the number of segments we’d like to display.

Then, a for-loop goes through each entry in the array, and draws a line segment using the function we defined earlier. We set the x value to the current iteration’s index, times the segment width. height, the desired height of the segment, comes from multiplying our normalized data by the canvas’s height, minus the padding we set earlier. We check a few cases: subtracting the padding might have pushed height into the negative, so we re-set that to zero. If the height of the segment will result in a line being drawn off the top of the canvas, we re-set the height to a maximum value.

We pass in the segment width, and for the isEven value, we use a neat trick: (i + 1) % 2 means “find the reminder of i + 1 divided by 2.” We check i + 1 because our counter starts at 0. If i + 1 is even, its remainder will be zero (or false). If i is odd, its remainder will be 1 or true.

And that’s all she wrote. Let’s put it all together. Here’s the whole script, in all its glory.

See the Pen
Audio waveform visualizer
by Matthew Ström (@matthewstrom)
on CodePen.

In the drawAudio() function, we’ve added a few functions to the final call: draw(normalizeData(filterData(audioBuffer))). This chain filters, normalizes, and finally draws the audio we get back from the server.

If everything has gone according to plan, your page should look like this:

Notes on performance

Even with optimizations, this script is still likely running hundreds of thousands of operations in the browser. Depending on the browser’s implementation, this can take many seconds to finish, and will have a negative impact on other computations happening on the page. It also downloads the whole audio file before drawing the visualization, which consumes a lot of data. There are a few ways that we could improve the script to resolve these issues:

  1. Analyze the audio on the server side. Since audio files don’t change often, we can take advantage of server-side computing resources to filter and normalize the data. Then, we only have to transmit the smaller data set; no need to download the mp3 to draw the visualization!
  2. Only draw the visualization when a user needs it. No matter how we analyze the audio, it makes sense to defer the process until well after page load. We could either wait until the element is in view using an intersection observer, or delay even longer until a user interacts with the podcast player.
  3. Progressive enhancement. While exploring Megaphone’s podcast player, I discovered that their visualization is just a facade — it’s the same waveform for every podcast. This could serve as a great default to our (vastly superior) design. Using the principles of progressive enhancement, we could load a default image as a placeholder. Then, we can check to see if it makes sense to load the real waveform before initiating our script. If the user has JavaScript disabled, their browser doesn’t support the Web Audio API, or they have the save-data header set, nothing is broken.

I’d love to hear any thoughts y’all have on optimization, too.

Some closing thoughts

This is a very, very impractical way of visualizing audio. It runs on the client side, processing millions of data points into a fairly straightforward visualization.

But it’s cool! I learned a lot in writing this code, and even more in writing this article. I refactored a lot of the original project and trimmed the whole thing in half. Projects like this might not ever go on to see a production codebase, but they are unique opportunities to develop new skills and a deeper understanding of some of the neat APIs modern browsers support.

I hope this was a helpful tutorial. If you have ideas of how to improve it, or any cool variations on the theme, please reach out! I’m @ilikescience on Twitter.

The post Making an Audio Waveform Visualizer with Vanilla JavaScript appeared first on CSS-Tricks.

CSS-Tricks

, , , , ,
[Top]