What is a Uniform Resource Locator (URL)?
Articles,  Blog

What is a Uniform Resource Locator (URL)?


Oh Oh this is a lecture that is design
get some background on where we get URLs from where the what the definition of a
URL is at a high level to think that we want to cover is the idea that what we
call a web link can more precisely be described as a URL URL is an acronym the
Uniform Resource locator URLs are convention that we use for locating
files globally on the internet through the network of computers a URL has
several components that we need to discuss and although URLs conceptually
seem to be referring to files they may not actually be referring to files that
exist on a hard drive so what’s the motivation you know
originally we wanted to be able to share different things that maybe were in
individual computers and so someone a friend might say hey can you send me
that a link to that hilarious cat video and you have the cat video may be
playing in your browser and you grab that link from the top of your browser
you copy it you paste it into an email or text message or whatever you’ve got
and you send it off and by sending off that link you’re able to pass on a
resource on the Internet in this case a gif file and you’re able to send that to
someone else and then they using that link that you pass them are able to find
the same resource using the Internet as well so this is the motivation for URL
being able to identify the same thing using a name using a path using a
locator that works regardless of where you start from the internet so this is a
link to that gift that I just showed you just grabbed it off the internet from a
search on the search engine these are a lot of pieces from it you see a lot of
slashes some numbers see dots and some names in there as well you see dot-com
that seems familiar let’s break this down and look at what all the individual
pieces are more generally starting with the idea that when we talk about a web
link or a link what we’re more formally going to call it is a url url stands for
Uniform Resource locator in computer science in the domain of computer
science there’s a little bit of a discussion about what the difference is
between a URL and a URI sometimes you’ll see the word web link and you are Al
used interchangeably and sometimes you’ll hear the acronym URL and URI you
interact it I’m interchangeably URI stands for a uniform resource identifier
it’s slightly more general and that can refer to things that aren’t actually
maybe files although URLs aren’t really used to just to identify files anymore
so just know that there are different terms for it and URI is generally
considered more general than a URL and a URL is more formal or more precise than
saying web link with the context back in the
day before there were a standardized internet that anyone can could connect
to computers on their hard drive would have a past name that you would use in
order to access a file this still exists in the computers that we use today on
the different platform so a Windows computer for example uses a path name
like C colon slash users BJP desktop info doc and so if you look in maybe the
file explorer or you look in different places in windows you’ll see a file path
that looks something like that a Macintosh Macintoshes as we know them
today didn’t exist before the internet but a Macintosh file path looks like
similar its users DJ p3 for me a desktop and info doc so both of these referring
to a dot docx file you see the mac address the mac file pal doesn’t include
any information about the hard drive to c colon that windows does and you can
see that it uses a different kind of slash in order to separate the different
directories in the file higher finally linux looks a little bit more like Max
they’re based on similar underlying technology and the operating system but
instead of starting with a user directory the top directory in most
Linux systems that contain the users file called home so slash home slash DJ
p 3 / desktop flush impro dot doc those are three different pile file path
existing on three different operating systems that are referencing maybe the
same file but their existing on different hard drives so when it came
time to connect all these computers it became necessary to be able to identify
that file as it existed on these three different machines using one standard
way of specifying the path not differentiating or not having to confuse
the different kinds of flashes or the hard drive recommendation
format in each of the operating systems that describe this file path was
different and when the computers were finally connected the standard access of
finding files across the network was needed URLs as a result make the
internet look like it’s one big file system one enormous hard drive so that
we can access files a cat gif or a word doc anywhere on the internet by using
the same path identify so what it looked like before the
computers were connected were all these files on different machines and the user
had to keep track of which machine they were on and didn’t really have any way
of getting them together as the computers became connected it became a
little bit easier to be able to find the computers and then as URLs were
standardized the computer that they were on became less important and the
position where those files existed within the network became the dominant
way of thinking about the file so there are a lot of details in how we specify a
uniform resource locator let’s look at a few of them let’s take for example a
book a book by Paul Duras that is about to be published at least at the time at
which this is being filled with stuff worth it if we go on to the internet we
would like to find the page on amazon.com that describes this book so
not just the document the web page that describes it we have to find a bunch of
different components and put them together into a URL the first one that
we need to identify is called the host and the host is the name of a computer
or a group of computers that holds the resource resource which you are
referencing or which you would like to obtain so in this case the host is
amazoncom now in today’s day and age we often just associate the host itself
with the company the company in the hosts are sometimes interchangeable or
indistinguishable formally though in a URL this host name refers to the
organization or the group of computers on the internet that holds the file that
you’re looking for the next piece is the sub domain so on the front of amazon com
in some web addresses we prepend ww which stands for world wide web as a sub
domain well we can obtain we can prepend different subdomains to the beginning of
that hostname as well to form a valid URL this sub domain is just a subgroup
of computers within the host organization and the host has the
freedom to pick a name as they as a wish as long as they configure their service
to respond to that name the ww amazon com
following that hostname with the sub domain we use a flash in the style of
UNIX or Macintosh systems and we call this the path and conceptually this is a
file path that hierarchically arranged as all the documents that are available
on that host on that hose into a into a hierarchical tree into a system of
resources that we can get separated by the flash so for example in this
hierarchy we have three different levels the first level appears to be a
directory called stuff dash bit SI VES materiality status information that
appears to be followed by a subdirectory called BP followed by another sub
directory with a number 0 to 6 203 6 207 and then finally we have the resource
itself that were interested in obtaining and this rest equals the fat code name
there apparently is a file that we are accessing that has all the information
about this book that we might be interested in at the very beginning of
this we put HTTPS colon slash slash that’s called the scheme the scheme is a
description of how the internet needs to be traversed in order to get that
document there are many common ones that are used for example HTTP colon
backslash backslash n HTTPS colon backslash backslash those are very
common ways of identifying resources on the internet HTTP stands for hypertext
transfer protocol and hypertext hypertext Transfer Protocol secure then
the last one that we see more often than not is file file colon and that means
that rather than going out on the internet in order to find them the
resource we’re going to look on our local hard drive in order to find a
resource for example if you’re on the browser there are many other schemes
that are possibly that you might possibly run a cross but they’re much
rarer things like SSH or FTP male 2 or IRC colon all different ways of
traversing the internet in order to get some kind of a resource that’s outfit
alright the next part that you might have is on the very end of your resource
you might have something that starts with a question mark the question mark
indicates that whatever follows it is a query a query and the style of a search
engine in the search engine you enter a query in order get information back so
in this case what we’re doing is we’re specifying this file location and we’re
adding a query candidate saying we want some subset of that file we want some
arrangement of that file that the server is going to provide for us so we’re just
requesting a part of the resource and we’re going to identify how we want it
to be set up with the question mark what follows the question marks are the query
parameters in the query itself so we often find different components of our
query specified with equal sign where the left side of the query will be a
parameter name and the right side will be a parameter value so in this case
we’re passing a query to amazon com about this resource where we’re saying
we want f equal to book I don’t know for sure what that means it probably means
that we’re looking for the versions of stuff of bits that are formatted in book
format hardcover paperback maybe Kindle there are some other ones in fact you
can have as many as you’d like and they get separated by an ampersand so in this
example here we have three different parameters the first one is called f the
second one is ie the third is queue ID on cider for and the fourth one is sr
each of them have a value that is being passed to the query each one of those
parameters are separated by the ampersand
well there’s a little bit more that you can add as well so just like in the old
days of telephone communication when you had to have a human operator that would
manually patch hard connections from one location to another by connecting those
two different ports with a wire you can also specify which ports in a computer
that you want to connect to on the far side so on your computer in the browser
you can specify that I would like to connect to port number 80 on the remote
computer now these ports are no longer physical ports they’re not separate
wires that are going into the computer but you can imagine conceptually that
they do work like different wires going into a computer you can have any of a
number between 0 and 65,536 so it’s as if you have 65,000 different wires going
into the computer the way this works is that certain numbers are assigned to
different protocols by convention so for example if you add a colon 80 at the end
of your host name you’re saying I specifically want to connect to port 80
on the remote machine now you don’t often see that in web requests because
by default in HTTP protocol request is on port 80 and an HTTPS course by
default uses 443 so if you don’t specify which port your spec you want then
you’re going to get on the default that’s associated with it in this
example we’ve mixed up the default we’ve used HTTPS but we’re asking for port 80
usually this won’t work finally in some protocols you can add a
user name and password to the beginning of your url this would be for some kind
of a resource that’s located on the internet that you would like access to
but you need to supply credentials or authentication in order to get access to
those resources so if you had a protocol that use this here to pass a username a
colon and a password and then the @ symbol followed by all the different
details that we’ve talked about so far this is the origin of where our email
addresses came from specifying a username and a domain at which resources
were located finally the last piece of a URL is at the end of all of this we can
put a hashtag symbol followed by what we call a fragment and the fragment says we
would like after we’ve requested a resource past query parameters what we
get back we would like to focus in on a section of the document or section of
the resource that comes back that goes by the name of product detail in this
example here in general it can be any any number name any any name that’s
referenced in the document there’s the particular piece of a resource that
you’re interested in we’re going to preface it with a hashtag all right so
if we take all of those pieces I’m going to remove the username and password and
the default and the default port name because it’s not necessary in this case
we put all those pieces together and then we attempt to get a resource on the
Internet we’re going to get the web page from amazon com that describes this book
and it’s going to immediately jump down to the products detail because in this
case the amazon web page has a section that’s called product details that you
can reference immediately within the URL many different parts of the URL then
come together in order to find this specific resource on the internet
our general form then consists of a pattern that looks like this a scheme at
the beginning of our URL followed by a colon and maybe two backslashes
depending on which team it is and user name colon and a password and at symbol
the host name possibly with a sub-domain so that would be where dub dub dub
amazon.com would go a colon and a pork if we need to specify which one of the
remote connection which one of the connections on the remote computer we’re
going to connect to a path which is the directory location of the resource we
want followed possibly by ? or query and ampersand and a bunch of different keys
and values and then at the end we may is also have a fragment so these are all
the cases of a very general format the URL often you’ll find URLs are simpler
or don’t have every one of these pieces because they’re not necessary for the
particular resource that you’re interested is the concept a single unified file
system that exists across the internet in reality however the resource that
you’re requesting may not actually be a file that exists on the hard drive
somewhere it might actually be constructed on the fly as you asked for
a resource the remote computer may be building that resource using software
and delivering it to you as if a web page existed like that for example
imagine you were checking the status of a train or an airplane you might have a
URL that references the flight number or the train number that you’re interested
in when you request that resource from the remote computer there’s not a file
that exists that tells you the status instead there’s a database that’s being
kept up to date with information from sensors on the track and possibly from
input on the train itself that’s feeding into a database when you request that
resource the train table the estimated time when the flights going to arrive
the resources constructed for you from components of the database and presented
to you in the browser as if it was just a file located remotely on us on some
train company or flat airline companies limbs web hard drive harder so the path
might also another tricky thing about the differencing the concept and the
reality is that sometimes the path is actually part of the query so just like
we see examples of resources that are constructed on the fly sometimes that
hierarchy that it seems to be present within the path doesn’t actually reflect
the hierarchy on a remote hard drive even though you can kind of think of it
as if it does instead it might actually be specifying parts of the query in a
different way in a different style it might also be part of the parameters as
well you’ll see that in some different conventions of web servers and web
server framework settings one last thing about the difference
between the concept of me reality when you go to amazon.com that’s not just one
computer that’s answering your request it’s actually a whole fleet of computers
that are working together acting as if they’re one computer from your
perspective as a user using a web browser those computer drawl acting
together to construct a web page on the fly given the parameters that you’ve
passed it in the URL and delivering it back to you as if it was a single file
on a document delivered by a single computer all part of the magic of making
the internet seem very simple to the user good summary web link we talk about
them colloquially are more precisely described as URLs or Uniform Resource
locators urls are convention for locating files globally across the
internet and they have several components that we’ve looked at URLs
don’t always necessarily refer to actual thanks for your attention I hope after
listening to this lecture when you look at a web link you’ll see it with new
eyes you’ll appreciate the complexity behind it and understand the different
components and art the other they are and why they’re there thank you

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *