Python for Informatics Chapter 13 – Web Services (Part 1/3)
Articles,  Blog

Python for Informatics Chapter 13 – Web Services (Part 1/3)


Hello. Welcome to our lecture
on web services. These slides are copyright
Creative Commons Attribution. What we’ve been playing
with in chapter 12 was the basic idea of using HTTP
or the request response cycle to send a request to a server
and get something back. When we looked at
HTML, we parsed, it we took out tags, so we
started to treat this as data. In web services, we
really are switching to let’s produce this data
as data with the intent to have consumed by an
application as data. Basically you’re
doing this– we have to come up with format for
the document that comes back when we ask for the data so that
we can parse it and make sense of that data. There are two commonly
used formats for that data that we’ll take a
look at both of them. If you imagine the problem of
exchanging data between two applications, we have
to deal with the fact that these applications may
not be the same language. One might be Python. You might have some data
in a Python dictionary and we might want to
send it into Java. Java doesn’t have Python
dictionaries, it has HashMaps. We have to agree on a format. That is the format that
we convert the data from the Python dictionary. We do some kind of conversion. We send across the
network and then we parse it and
interpret it and then you come up with an
internal structure within the other system. We call a wire format, a
format that is on the wire. It’s not always a wire, but
we call the wire format. We can agree. One of the formats that’s
commonly used that we’ll talk about is called XML. XML consists of less
thans and greater thans. It looks a lot like HTML, that’s
because they both were inspired by an earlier thing called SGML. We call that act
of taking something from an internal format
and making into a wire format, the active
serialization, and then reading a wire
format and getting it back into some internal format
at some destination system and some destination language. We call that deserialization. XML is one of the two
formats that we’re going to talk about today. The other is JSON,
JavaScript Object Notation. The difference with
JSON is it its choice of how to represent
the data on the wire, it uses curly braces, colons,
and square brackets, which are not in this particular example. These are just two
techniques for serializing and deserializing data. Two techniques. There the two ones
that are most common and we will talk about
both of them in class. I’ll start talking about XML. The first thing to
observe about XML is that these tags,
much like in HTML, they have start tags and end
tags so people slash people. This is called an element. We might also call it a note. Within a note there
are other notes. Within the people note,
here’s one person, and here’s another person,
This one starts with person and ends with slash person. Within the person
element there’s a name element and a
phone number element; so elements within
elements within elements. The term that we’ll use
for this is simple element and complex element. The basic difference
between them is a simple element is one
that has no sub elements. It is just like the N;
it’s just the name Chuck. There’s no other
elements inside of it. A complex element is like
people or like person that has more elements within it. A complex element has elements
within it, as well as data. Let’s talk a little
bit about XML. There’s a debate as to which
is better, XML or JSON. The answer is they’re
probably better for different applications. XML is really good
at representing hierarchical,
structured data that needs a lot of description. It started from
this thing called SGML, which was a Generalized
Markup Language using less thans and
greater thans, but it was intended to be a
little more easily legible. It’s commonly used to do
things like word processing documents or whatever. As I mentioned, XML has
start tags and end tags. Name ans slash name is also
a start and an end tag. Then it has some text content. The text is that which is
between the start and end tag that’s not, itself,
another element. If this phone number
is a text element. In addition to the
text element, which is between the
start and end tags, there is also the
notion of an attribute. An attribute is
on the start tag. If, in the case of email
here, it’s a self closing tag. Type is– there’s a
set of key value pairs, type equals, and then the
value in double quotes. If hide equals yes, then
type is international, those are called attributes. You have start tags, end
tags, content, attributes, and self-closing
tags are the ones that don’t have slash email. They just end in slash and
they’re totally self-contained, but they can have
attributes on them. White space doesn’t matter. In general, we tend
to format the use with little bits of indentation
to make our lives easier. These two representations I
have here are roughly the same. The fact that I’ve got
these nicely indented makes no difference. Line ends don’t matter,
and it’s generally discarded on text elements. We indent only to be
readable, and it’s very common to indent to be readable. Here’s just a bit of
XML from an example. We have a recipe
tag and everything’s going to be closed. The recipe tag has a
number of attributes on it. Again, they’re key value pairs. Name equals bread. Prep time equals 5 minutes. Cook time three
hours, et cetera. A title tag, an
ingredient tag flour is the text bit of
this ingredient tag and then it has some attributes,
some more ingredients, some instructions, a step
tag and an end step tag. You get the picture. We can represent lots of things. In XML one of its advantages–
and disadvantages– the disadvantage
is common wording. The advantage is it’s
a little more self describing the JSON is. JSON is simpler and
more direct, but XML is in some ways more self
described because we look at this, and based on the names,
ingredients, instructions, step they can make some sense to us. Tags are the basic
less than greater than bits that
indicate the beginning and ending of elements. Attributes are key value
pairs on open tags. Serializing
deserializing is this act of taking an internal structure
inside of a Python program and producing the less
thans and greater thans in the right proper
format so they can be sent across the
internet to the destination. One of the ways we
can think of XML, we have these complex
elements that have more complex or
simple elements is we can’t think of them
as nodes in a tree. Another name for this little
B guy, B slash B is as a tree, as a node in a tree. We can think the B as
this node in the tree. Its parent node is
A, it’s part of A. It’s immediate containing
element is A. C, it’s immediate containing
element is also A. It’s a node, and C has two child nodes. When we think about
a node like C, think of A as the
parent node, and then the child node and
the child node. This is like a tree. These are more moving
down toward leaves and these are the trunk– it’s
a bit of an upside down tree. If you think about how
trees actually grow. We often think of
the text bits that are sitting in here as
the children of a node. Clear that. D is part of C. It’s
immediate parent is C. It’s child is that text
bit Y. That’s one way to think about XML is as a tree. As we start pulling stuff out
of XML, we’ll go grab a node, and then we’ll say oh let’s go
through the immediate children of that particular node. Or I’ll grab a
node and I’ll find the text child of that node. We tend to sort of pull
way through these things in trees thinking
about I’m at a node. And I’m looking
down from that node. That’s the tree terminology,
the node terminology. Another way to think
about this– I’m sorry. The attributes are
also best thought of as associated with
the node is children. This W attribute is like
a child of the B node. The B node is this whole thing. It has a child of the
text bits and a child that is the attribute. Another way to think
about this is as paths. The way the paths
work is you just take a how to find this text
X. It’s really the child of B, which is the child A. Then
we use a slash notation, like we might use for
folders on a file system, slash A slash B is
where in this tree? Slash A slash B is where we
would find X. Slash A slash C slash D is where we would
find Y. Slash A slash C slash E is where we find E. These
are the paths to pieces of the document. That’s another way that
we think about them starting with this outer node
A and then working in as far as we have to go. That’s basic XML. Another thing that we often
use an XML is a technology called XML Schema. XML Schema defines a
contract that tells us what legal XML really is. It itself turns out to
be XML, but it’s purpose is to describe a
set of XML documents that can pass the schema. It’s a set of constraints
on the structure. What the name of the tags are. How many of the
tags you can have. What tag lives inside of
what other tag, et cetera. The goal of a schema is to
use a schema to validate, to look at some XML and
say that is legal XML or that is not legal XML, based
on the schema that we’ve got. The validation step is
it takes an XML document that we’re wondering if it
complies with the schema. We take a schema and we
hand it to this piece of software called a validator. And the validator either
says yes, it’s validated, or no, it’s not validated. The real value in this is if
we have two applications they are going to exchange
data, they should be able to come up with
a contract as to what is valid and invalid. An XML schema is a good way to
describe valid and invalid XML. Here is a very simple example
of XML schema in action. Here’s an XML document. It’s got a person, the last name
and the age and the date born. Here’s an XML schema contract,
and I mentioned it was XML, so it’s got less thans
and greater thans, and it’s got tags. It’s got attributes. What it’s really saying here
is that outer part of the XML is supposed to be a tag
by the name of person, so that’s the outer
thing, and then within that there
is a sequence– and there’s supposed
to be an element that’s last name with a type of
string, and I want age, that’s a type of integer, an
element called date born, which is a type date. We can know that these are the
proper names for these things. This is supposed to
be a string, this is supposed to be a number, and
that’s supposed to be a date. We can look at that, look
at the two documents, a possible bit of XML that
either complies or doesn’t comply, A contract that
tells us whether or not or what the contract
is, and then a validator that mechanically
checks to see if the XML meets the contract or not. There’s a number of
different XML languages, schema definition languages. We’re talking about one
called the W3C XML Schema, often ends up with a file
suffix on your file system XSD. I won’t talk about
the other ones. I’m talking about A the
most common one and probably the easiest one to
understand, and that probably is the reason why it’s
probably the most common one. We’re going to focus
on the schema that came from the Worldwide
Web Consortium. Like I said filenames
tend to end in XSD. This one we went through before. Person is a complex
type, so we say it’s a complex type
with tag name of person. Then within that there’s
a sequence of tags that XS sequence says you
can expect a series of tags. A simple element, a non complex
element, is just an XS element. Then we have the name
and the expected type for the three elements. This particular one
validates nicely. There’s a couple
other things that we can put on as the XSD
starts to become richer. There are more things
that you could describe. In this example, we are
seeing the use of min occurs and max occurs. That basically is a
constraint on the cardinality of these things. What this is saying is that
we have a tag called fullname. It is a string
and it’s required, meaning that the minimum
number a topic occurs is one and the maximum number of
times it has to occur is one. That means it’s exactly one. We have that. If we look at this
child name tag, like this one here,
we have four of them, it says it’s a string
and the minimum number times we should have it is zero,
and the maximum number of times we’re allowed to have is 10. We’re allowed to have this
tag repeated between 0 and 10 times. In this particular example, it
is repeated exactly four times. That validates. That is a happy validation. It looks like a mean
validation, so let’s change it to be a happy validation. It reads this. It reads this. It reads those two
things, and its happy because it
meets the validation. I’m having trouble
drawing happy faces. Just a few more data types to
talk about in this XML Schema. My goal is not to have you be
able to write XML Schema I’m just kind of showing
you a little bit of it so that you can
understand how it works. We can look at a simple one
and understand it makes sense, and ask your questions
does this mean. It or not. We’ve talked about string. I will talk about it in
a second date format. Date format is generally
year, month, day. There is a date time, which is
year, month, day, the letter T hour, minute, second,
and then optional times. You can have decimal numbers,
which means they have points after the decimal place. You can even say integers. You can have some
types of things that we could put in a schema
to constrain the data that we’ve seen XML. I mentioned the
date time format. There’s a special
standard called ISO 8601 that talks about
this date time format. I like this format because
it is easily sortable. The top part is the year, and
it’s always the same number, you put zeros in. Year, month, day then the letter
T then hour, minute, second, and then times zone. The most common time
zone that we tend to use is the time zone
called Z. Normally this would be like GMT or
EDT for Eastern Daylight Time or EST. Most computers don’t
like using that. Most computers want
to use a time that is the same around the world. They tend to use Greenwich
Mean Time, otherwise known as Zulu time. You might have a local time on
the east coast– I don’t even know what these numbers are, but
let’s say it’s 10:00 at night. That’s a bad time. Let’s pick like 2:00 PM
in the US East coast. In England I think
it’s six hours later. It’s actually 8:00 PM in the UK. This is Zulu time. Greenwich Mean Time,
universal time, Zulu time are all
the same thing. They are the time in the UK. Again, if you want
to see something that happened an hour
ago or two hours ago, you don’t want to have to
calculate back and forth between lots of time
zones, so we really prefer to use this Zulu time
and map the stuff that we store as we send
data from servers which might be in different time zones. We tend to use Zulu
time, otherwise known as Greenwich Mean Time. Here’s another example
of some XML Schema. Most of this is
pretty much the same. Yada, yada. We got some min occurs here. We’ve got strings. String, string, and now we
have this thing called country. It is a simple type,
and it’s a string, but this XS [? enumeration ?]
gives us the legal values. It’s not just any string. It’s gotta be FR,
DE, ES, UK or US. If you’re validating
this XML for country, you look at the string
and check to see if it’s a member of that set. Again, that’s
another kind of thing that you can do with XML schema. Here’s another example
of some XML Schema. Let’s see we’ve seen most
of this XML complex type, XS sequence, string, complex
type sequence, string, string, string, string, string. This one, max occurs unbounded. That means you can
have an infinite number of these things. String min occurs zero. XS positive integers. This means that negative
14 would not be allowed. That’s what we’ve talked
about– oh, use equals required. That means it must be there. This just gives you
a sense– I’m not going to expect you
to know all these, but jusf– I’ll give you
a couple of questions that are relatively
straightforward on these. Just a sense of– take
a look at some XML and see if that XML
meets it or not. We’re not going to spend
a lot of time in Python. Doing too much with XML. We’re going to do most the
stuff in Python and JSON, but this is just a
little bit of XML code that you can download
from the website that there is an XML
parser built into Python. It’s called ElementTree. There are actually several
ones that you could use, and I’m just going to use
the one called ElementTree. I’m making the data
triple coded string, so here’s my little XML bit. It’s just some well
formed XML, the stuff we’ve been playing with before. This import statement gives
us the ElementTree library and to parse this data
we do ET from string and pass in the string. This is string. Now let’s talk about
what tree looks like. Tree is– again we
could think of this as either nodes or paths. There’s a person node. Then there is a name. Then there’s a phone. Then there’s an email. Name has Chuck underneath it. Phone has the number and
an attribute underneath it. Email has an attribute under it. These are nodes. I probably should have made
a better slide for this. What we do is we find
our way to a node. Tree is all of this. Tree is person on down. I can say tree dot
find the name thing. What that does is that
goes and finds this guy. Tree dot find, so
go in the whole tree and find the thing
called name and give me the text element in it. That is going to
print out Chuck. Tree dot find email
finds this little guy. Another way to say it is
it defines this little guy. Then it says get
the attribute hide. This is going to print
out the string yes. That prints out the string yes. Tree is all of that. Find the name thing,
so that’s the name guy. Find the name. Find name dot text. That is this text right here. Then tree dot find dot email,
that’s this whole thing. Then within that, to
get the attributes use dot get and then the
name of the attribute. That’s going to give
you back the string yes. This will print
out Chuck and yes. You basically go down
this tree, go find pieces, and then you pull pieces
out of those pieces so you can parse
this from a tree. Here’s another example
of a little bit of XML. The difference between this
XML and the previous XML is we have another
tag called stuff, and then there is a tag
called– this is not indented all so well. There is user’s tag and then
within that are two users. The difference between
the previous one is we just went
down a set of nodes. Now what we have is we have
a stuff, users, and then we have a series of users. There could be several of these. You could think of this as in
here there’s dot dot dot dot. There could be
hundreds of users. Now what we’re
going to do is we’re going to say I would like to do
find all, find all these users, not just find one, but
find all the users. Stuff is we parse it. Then when we do a find
all of user slash user. Users slash user. That’s find me all
the user nodes. Let me change color here. Find me all the user
nodes underneath users. That gets me all of these. I get this back as a list. In the list is
each of the nodes. In my example, I’m only going
to get a list of two nodes, but I’m going to see if there’s
hundreds I would get hundreds. The thing I get
back from stuff dot find all is a list of nodes. There’s stuff
under here as well. Each of these nodes has
things underneath of it. Stuff, find all,
user slash users gets me a list of all
of the user objects. Because list is
a– LST is a list, I can see how many
things that I’ve got and then I write a little
loop for item in list. That’s going make an iteration
variable, item, is going to go through successive
elements of this list, and then item is a node. Each item– getting a
little complex here, each item– let me
switch over here. Another way to say
this is list, find all, gets a list of all
of the user objects. There turns out
to be two of them. Here’s the sub zero, and
here’s the sub one in the list. We get a list of user objects. Then we’re going to have
this iteration very well item iterate through each
of those things. It can pull out item dot
find, item dot find , name, get me the thing, and then
find the text within that. That’s going to be Chuck. We can say item dot
find ID dot text, and that will be the
zero zero one bit. Them item dot get X,
which will pull out– item dot get X. We’ll pull out
that value right there. Let me draw that again. Item dot find dot name
will get [INAUDIBLE]. That’s giving me
a better picture. Item is that. Item dot find dot name
dot text is Chuck. Item dot find dot ID dot
text is zero zero one. Item dot get X, that’s
find the attribute X on the item, the tag
of what we’re looking at, and that’s going to get the two. You’re looking at these things
and pulling the bits out. You can, if there’s
more than one of them, you can write a for
loop to go through them. That’s XML. We’re not going to
do much with XML. We’re going to do
more with JSON. That’s one form of serializing
data to move back and forth. Another is JavaScript
Object Notation. JavaScript Object
Notation is a notation that’s really the constant
syntax, the syntax to make JavaScript constants,
is what turns out to be. It was named JSON by this
fellow Douglas Crockford. It is really exactly how
you represent objects. What I’m going to
do is I’m going to stop now and have you take a
look at the video from Douglas Crockford.

5 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *