How to validate URL in JavaScript

Preface

When developers need to process URLs in different forms for different purposes, such as browser history navigation, anchor targets, query parameters, etc., we often resort to JavaScript. However, its frequent use encourages attackers to exploit its vulnerabilities. This risk of being exploited is why we must implement URL validation in our JavaScript applications.

URL validation checks whether a URL follows correct URL syntax, which is the structure that every URL must have. URL validation protects our applications from URL-based vulnerabilities such as malicious script injection and server-side request forgery (SSRF). SSRF attacks can be employed by malicious actors when we do not apply secure coding practices to validate user-supplied URLs when fetching remote resources.

URL verification

URL validation exists to strengthen security, prevent possible vulnerabilities, and eliminate the chance of any errors when running your code. But when should we use URL verification, and what should we verify during this process? We should implement URL validation in all software that must identify and verify resources such as web pages, images, gifs, and videos.

A typical URL includes multiple fragments, such as protocol, domain name, host name, resource name, URL source, port, etc. These are used to tell the browser how to track the specified resource. We can validate URLs in different ways:

  • Using regular literals and constructors
  • URL constructor
  • isValidURLmethod
  • Input element
  • Anchor tag method

A typical URL validation scheme receives input from the user and then parses it to identify its individual components. Validation schemes ensure that all URL components comply with Internet standards. For example, if needed, it can check whether the URL uses a secure protocol.

Hostname verification begins by splitting hostnames into independent labels to ensure that they comply with top-level domain specifications. A typical hostname consists of at least two labels separated by dots. For example, www.snyk.com has the tags “www”, “snyk”, and “com”. Each label can consist of only one alphanumeric character or one hyphen, regardless of case. The verification scheme can then ensure that the hostname matches the allowed list of URLs to ensure that only specified URLs are allowed and that allowed URLs are not incorrectly disqualified.

By default, the paths to most resources used in URLs are allowed. However, the port can only be in the range 1 to 65536. Anything outside this range should throw an error. We can also check the numeric IP address to determine if it is an IPV4 or IPV6 address.

Finally, we can also check the username and password of the URL. This feature helps comply with company policies and credential protection.

Now that you have the basics, let’s look at URL validation using JavaScript.

How to perform URL verification

In JavaScript, the easiest way to perform URL validation is to use new URLa constructor. In addition to this, it is supported by the Node.js runtime and most browsers.

The basic syntax is as follows:

new  URL (url)
 new  URL (url , base)

JavaScript only requires the element if a relative URL is provided base. If no relative URL is provided, it defaults to undefined. Additionally, if you provide an element with an absolute URL base, JavaScript ignores basethe element.

In order to verify the URL, you can use the following code:

function  checkUrl (string) {
   let givenURL;
   try {
      givenURL = new  URL (string);
  } catch (error) {
       console . log ( "error is" , error);
      return  false ;
  }
  return  true ;
}

This function is used to check the validity of the URL. Returns if the URL is valid true, otherwise false.

  • www.urlcheck.comThis function will return if you pass it false. Because this parameter is not a valid URL. The correct version should be https://urlcheck.com.
  • Another example is mailto:John.Doe@example.com. This is a valid URL, but if the colon is removed, JavaScript no longer considers it to be a URL.
  • The third example is ftp://. This is not a valid URL because the hostname is not included. If you add two dots ( ..), it becomes a valid URL. Because the dot is considered a hostname, it ftp://..becomes a valid URL.

It’s important to remember that unconventional, but perfectly valid URLs do exist! They may come as a surprise to the developers working on them, but are otherwise perfectly appropriate. For example, both of the following URLs will return true:

  • new URL("youtube://a.b.c.d");
  • new URL("a://1.2.3.4@1.2.3.4");

These examples remind us that developers should rely on URL validation principles rather than focusing on conventions.

If you want to ensure that valid URLs contain some specific URL scheme, you can use the following function:

function  checkHttpUrl ( string ) {
   let givenURL;
   try {
      givenURL = new  URL (string);
  } catch (error) {
       console . log ( "error is" ,error)
     return  false ;  
  }
  return givenURL. protocol === "http:" || givenURL. protocol === "https:" ;
}

This function validates the URL and then checks whether the URL uses HTTP or HTTPS. Here, ftp://..it will be considered invalid because it does not contain HTTP or HTTPS, but http://..is still valid.

Some other ways to use URLconstructors include:

let m = '<https: //snyk.io>'; 
let a = new  URL( "/" , m ) ;

The above example uses baseelements. Record this value and we can get it https://snyk.io/.

To return a URL object without specifying baseparameters, the syntax is:

let b = new  URL (m);

To add a pathname to the host, our code is structured as follows:

let d = new  URL ( '/en-US/docs' , b);

The URL stored on the variable dis https://snyk.io/en-US/docs.

Another feature of the URL module is that it implements the WHATWG URL API , which adheres to the WHATWG URL standard for use by browsers:

let adr = new  URL ( "<https://snyk.io/en-US/docs>" );
 let host = adr. host ;
 let path = adr. pathname ;

In the above example, we created a adrURL object named. Next, the code gets the host and pathname of the URL, which are snyk.ioand /en-US/docs. Finally, we can compare the URL to the allow list or blacklist to ensure that only specific URLs are allowed.

How to use regular validation

Another way to validate URLs is to use regular expressions (regex). We can use Regex to check if the URL is valid.

The JavaScript syntax for URL validation using regex is:

function  isValidURL ( string )
  {
      var res =
      string. match ( /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-
      ]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA -Z0-9]
      \.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{ 2,}|w
      ww\.[a-zA-Z0-9]+\.[^\s]{2,})/gi );
           return (res !== null );
  };

Let’s test some URLs:

var tc1 = "<http://helloworld.com>" 
console . log ( isValidURL (tc1));

The URL syntax defined by regex checks whether the URL starts with http://or https://subdomain and contains the domain name. The statement on the console turns out to be truebecause it follows the URL syntax defined by the regex. In contrast, the following statement will return one falsebecause it does not start with any allowed scheme or subdomain, nor does it contain a domain name:

var tc4 = "helloWorld" ;
 console . log ( isValidURL (tc4));

The regular expression above is relatively simple, but still difficult to master. This is also an error-prone approach because a regular expression cannot adequately handle the rules for validating URLs. The most it can do is match valid URLs. Additionally, when a regular expression either contains complex validation logic or receives lengthy input strings, performing validation checks becomes time-consuming.

In order to satisfy the defined regex validation checks, the browser must make millions of backtracks in the input string. So many backtrace checks can lead to “catastrophic backtrace,” a phenomenon where complex regular expressions freeze the browser or fill up the CPU core process.

Use JavaScript safely

URL validation has become increasingly critical to the security of JavaScript applications, as demonstrated by the addition of SSRF to the new OWASP Top 10. Fortunately, we can help mitigate this type of attack by validating the URL on the server side. Additionally, new URLit can be very beneficial to use functions based on your preferred way of validating and processing URLs .

After seeing new URLsome use cases for the function, we learned how to validate a URL with regular expressions — and saw why this approach is cumbersome and error-prone.

The security risk of a URL is less about its validity and more about dangerous URL schemes. Therefore, we need to make sure we let the server-side application do the validation. An attacker can bypass the client’s authentication mechanism, so relying on it alone is not a solution.

The above is all the content of this article. If it is helpful to you, please like, collect and forward~