Good afternoon everyone, we will go ahead and get started.
On the screen you can see details about what we will be doing today.
This is an update will webinar that we did a few weeks ago about API keys, for getting
better data from Eutilities and EDirect.
If there are questions that we do not get to answer when I am done talking, at the end
we will be certain to post answers to those a few days after the webinar is over also.
The materials, the PowerPoint that I will be showing along with a PDF version of the
PowerPoint, are available at the go USA.gov link that you can access any time (go.usa.gov/xnBHR).
Including right now.
Without further ado we will began the webinar today.
This is an update, this is a part two of a discussion that we started a few weeks ago
on November 8 about API keys for the Eutilities.
This is our main API to the Entrez databases.
Also, EDirect, which is a UNIX command line version is also affected by these new keys.
This is essentially an example of community involvements with webinars which is exciting.
We had a lot of questions, from our November 8 webinar, we learned a lot of different things.
One was there's a lot of community interest which is great, it was great to hear people
paying attention to this, and also that there are a number of interesting side topics to
the API keys that we had not included in the November 8 webinar that people were interested
in.
We've included some of that today.
It was also clear to us that we needed to be more clear about a few aspects of how these
keys will work in particular, I will provide you with more details today, about exactly
how they work, and some scenarios that might occur out there, when you are beginning to
set these up.
And what some likely consequences of the various scenarios are.
I probably will not cover everything that might be in everyone's heads out there so
if you have questions, please let us know.
We're happy to answer those as we can.
For any of you who were not at the earlier webinar on the eighth, I will go through some
basic things to review what is going on.
A quick review of what the Eutilities are so we know what the API's that will be affected
by this.
A little bit about what the keys are, and then I will expand the discussion about what
happens without a key, and what happens with the key.
And for fun what happens when you mix, when you have a key and not a key and what kind
of things can happen in those situations.
Finally to finish up how do you get the key, how do you use it, and when does all of this
take effect and when do I have to have things done by.
What we're doing we're introducing new API keys for the e-utilities API.
Keys are popular with other API providers, we are looking at these as a way to register
activity and also trying to manage the activity of these APIs.
Pit APIs are free and open to the public, but with absolutely no regulation in how they
are used they are vulnerable to attack.
And service problems because of the high usage peaks.
We want to make this a fair use resource for everyone, and so on May 1, 2018, we will begin
limiting access to the utilities on the basis of an IP address.
To a rate of no more than three requests per second.
I will have much more to say about that in a few slides.
That's for IP's.
So one of the key takeaways is that if you do not have a key what we will do is look
at the IP address.
If you do have a key then you get a higher rates limit.
10 requests per second by default, regardless of what IP you are using.
We only look at the key if the key is present.
Then the IP does not matter, the traffic could be coming for multiple IP's and it is the
same key and the same regulation, 10 per second by default.
We can negotiate higher rates limits than 10 if you would like and if you can demonstrate
a need.
If you are in that situation and you know that you will be needing more than 10 requests
per second, then please contact us.
I will have more information on how to do that at the end of the webinar.
That is some basics, what are the e-utilities?
The important parts of the slide I'm sure that many of you will are familiar with Eutilities,
but it's important to talk about scope.
These keys are only applied to the classic e-utilities, esearch, esummary, efetch, efetch
and elink.
Also espell and egquery.
Not to the BLAST API or any other APIs that are offered, only to these Eutilities for
this point in time.
That might change, as this matures but for now the only thing that we're talking about
are the classic E utilities.
The key itself, is a string, it is a unique string the you can acquire it identifies you
to our servers.
It should be included every time you make an API request.
It is simply a URL parameter, I will talk about that in a moment, it gets included with
every request and it is attached to a NCBI account.
Many of you may be familiar with My NCBI accounts or submission accounts, they're all the same,
they are all NCBI accounts.
If you have an account you can create a key, today, and it will be a atached to that account,
if you do not have an account it is very easy to make one and then you can create a key
and it will be attached to that account.
Let's talk a little bit about what happens, what will happen in May if you do not have
a key.
You just go on the way the you have been going.
What we will be doing is we will be looking at the IP addresses from whence the requests
come.
So I have on the left three users all at different IP addresses.
So what will happen to them on May 1, when the switch gets thrown and the API keys are
now in affect.
This guy here is coming in about two request per second without a key.
That is no problem.
He will not notice anything.
As long as he does not go about that there is no problem.
This person is coming through from a different IP at five request the seconds, what will
happen is they will receive an error message.
I will show you an example of what those messages look like, those requests for all intents
and purposes will fail.
That person will need to reduce the rate below 3 per second and then he can to resume activity,
and one thing to be clear, this is not a block in the sense that NCBI sometimes blocks IPs
because of abuse, and then they have to be unblocked.
Every second we're looking and checking what the rate is, so all that you have to do is
reduce the rate and then you will have access.
It is a dynamic rate control.
If you are above the three per second limit you are going to essentially be temporarily
blocked and receive an error message.
That is the unfortunate fate of the person in the middle.
The person on the bottom is averaging 1 1/2 request per second there is no problem.
They can continue on as they have been.
Let's talk about a slightly different scenario, Let's talk about two users who are sharing
the same IP, they may be behind a firewall, they may be using the same type of software
interface, whatever it is they are coming from the same IP address.
The guy that was doing five request per second is still out of luck, we see that IP address,
he is not getting through.
However and this is important, the first user who is now at the same IP as the second guy,
is also getting blocked, because that IP is exceeding the limit.
In fact it is exceeding it by more now because that IP is now giving a seven requests per
second.
So both users are blocked.
So while this person was not actually exceeding the limits on his or her own, they give blocked
because someone else is using the same IP and they are exceeding the limit.
Or I could of made a lot of slides here, I'm try to keep this brief, let's say there are
several people there all doing one request per second all of the same time, if there
are five of them that is five request for second and they all give blocked.
It is the IP address.
However that works for you, the IP address is what we're looking at if the key is not
present.
Here's an IP address down here, only one person is using this address, they are still doing
one a half request per second and there is no problem.
The IP address has not exceeded the rate.
That is the important thing to remember it is about IP addresses, come May 1st that is
how we will look at it.
Had you measure the rates?
There have been several questions about this.
What I've done is to put a time bar at the top, each one of these weird little H like
things, is a request.
Request start, execution and request stop when the data is delivered back to you.
These are the execution times you can imagine on our servers.
I am not including Internet traffic and all of that.
This is what the NCBI servers would see.
Your request comes in at a particular time, it executes for a particular time and then
it finishes.
The rates the we're looking at are the rates that requests are received by our servers.
Their time stamped at the request start, it does not matter how long the request takes
to execute, it is only when we see the initial request, so let's take this first case.
This could be a single IP address, or it could be a key, it is an activity stream that we
are seeing.
There is no case where more than three of these request starts occur within the same
second.
That is what is important.
It is a moving window, not that we count every second but it is a moving second window, if
you look at the second what there will be a problem here because there is one 2, 3,
4 requests the begin in that second window.
So that will throw the error.
That is when the rate is exceeded.
Just to prove another point, here is a case where the requests are taking a long time,
these requests are taking a few hundred milliseconds, these requests might be taking over one second,
certainly the case with efetch or elink the request can take many seconds.
So what happens in that case?
This is all fine, this person will not receive any errors because there is no case when the
request starts are more than three per second.
It is absolutely true that during this period there are four concurrent requests that our
servers are executing from the same IP address, or the same key, that is not what we're counting,
we're not counting simultaneous execution we are looking at the request start times.
You can have as many of these things going at the same time as you want as long as you
do not start requests more than three times a second.
I hope that that is clear that is what we're doing in terms of the rates.
Now that we understand what is happening, if you do not have a key and how we calculate
the rates.
What happens if you have a key?
Okay so I have my key, I'm going to start using it.
After May 1st what will be expected happened?
Here is the first guy with the IP address, now I can do up to 10, I'm doing seven requests
per second here is my key.
Notice that api underscore key is the parameter that I use, that is what you put in the utility
requests in the URL and here is my key.
Yours will be significantly longer than that but just to fit it on the slide.
This is your little key.
They can go on as much as they want at seven requests per second.
Here is another guy with 15 requests per second, we told you that 10 requests per second is
a limit so you get an error message.
Just like someone without a key over three, the same error messages.
You are temporarily blocked.
What this person would need to do with their requests, is calm that down a bit below 10,
and then they will have access restored again.
Immediately.
Here's another guy, he is doing four requests per second and that is great, he has his own
key, all is hunky-dory.
So let's think about some other situations.
What if these people share the same IP?
That is what we were talking about before, these users are behind the firewall, at the
same company, whatever it might be, they are coming to NCBI with the same IP.
It does not matter anymore.
Because they have separate keys.
The only thing that we're looking at is the key.
So this key has seven requests per second and it is totally fine and he goes through.
This key has 15 and that exceeds the rate.
It gets blocked.
Previously when people from the same IP would have both gotten blocked by one person at
the IP exceeding the rate, or if they collectively were exceeded the rate.
If everybody has a key you are independent.
You are looking at the individual keys and it does not matter how many keys are behind
the same IP it is key by key.
Again this guy is not affected, he still does has one IP, he has is key, he is under 10
per second he is good.
Let's look at the case, when someone is sharing a key.
Now all the IPs are different but the first two cases are sharing the same key.
Maybe you decided that different users of your software share the same key.
Or you have shared the key with a colleague, what happens?
This guy thinks he is great because he's doing seven per second and everything great.
But down the hall or wherever, someone else is using the same key at 15 so they are both
blocked.
It is the key that is seeing 22 requests per second, that is way over 10, everything with
that key will be blocked until that rates gets below 10.
That is the key thing to remember once you have keys if you share them you become vulnerable
with any person that has the key to block everybody with the key.
And this person is still fine, he his own key and is cruising at four per second no
problem.
Let's think of another scenario.
Let's think of a big organization.
Everyone has the same IP.
We're all behind the same firewall or gateway, two of us have a key and this guy does not.
So what is happening?
I have my own key, I told you it would be better if you have your individual key, I'm
cruising in a seven per second, great no peoblem.
Because your key is unique, and no one else is using it and you are beneath ten requests
per second.
This guy though, has a key and he is still insisting at 15 requests per second it will
not work, he gets blocked.
He gets error messages, because his key is generating more than 10 requests per second.
Regardless of the IP.
However here is this guy, some guy down the hall on a different floor maybe over in Europe
somewhere did not get the message, didnt get the memo about the API keys, is not using
one.
He has no key, what happens.
IF it has no key, then we look at the IP address, we look at the IP address and say what is
coming from that IP address?
Goodness there is 24 requests per second coming from this IP address from all three of these
people, blocked.
It exceeds three requests per second.
So this is a problem.
You have some people sharing and IP address, some do not have keys the people without keys
will be affected by the people who do not keys at a higher rate.
So that is something to be aware of, for these are cases where you are sharing an IP address.
How would this not of been as bad?
If he had a different IP address this would be solved.
Because then we would look at his IP or her IP and we would only be seeing to requests
per second from that IP.
These people's activity would not be on the same IP address.
So we would not notice it.
It is only the case if you do not have the key.
To review, these are the principles.
Any IP that posts more than three requests per second will receive errors if the key
is not present.
If the key is not present we look at the IP address.
And three is the maximum.
If the key is present, we ignore the IP address and we look at any activity that is coming
from that key.
Some people ask, well let's say that I have my laptop at home and my computer at work,
and my computer at my consultant's office and they're all different things and I'm using
the same key everywhere?
All of those activity streams sum, from our point of view into one activity from the same
key.
If the sum of all that activity is more than 10 there will be a problem.
You will get error messages.
When you are thinking about your actual activity you want to think about these principles and
think about how many keys are you going to share, if you will share them at all, it is
risky right?
If you share a key anyone the goes above the limit will cause problems for everyone with
the key.
You want to think about what the limits are and what they might be, again if ten is not
going to work for you, and we understand that, we are perfectly happy to negotiate a higher
rate if you need that.
We need you to work with us to demonstrate why you need it and we can help you get to
a rate that will work for you.
All right.
So a few more details that came out of some of the questions.
We will cover these quickly.
There may be more that you are curious about.
This is the error message, it will look like this.
We're thinking about providing a service in the near future where you can actually send
a request and get the error message, it is not live yet but we are considering doing
something like that because we understand that many people would like to have their
software get it directly and then parse it or detect it in some way.
This is essentially what this will look like.
If the request produces an error it is like a bounce.
The request does not execute.
So you would need to resubmit that request.
It is not a delay it is not a time of execution penalty at this point it just bounces off.
It is dead.
You will need to resubmit any request that received an error message.
At least you get the error message back so that you know which requests received that
message, any request that did not receive the measures is fine, once you get the error
you will need to resubmit.
A few questions about encryption.
All Eutility traffic now should be coming over HTTPS, which encrypts all of the parameters
in the URL including the key, but if you are passing the key around any other way before
you get it into the request object, that encryption is up to you, the developer.
All we're saying, is HTTPS will take care of the actual communication request, and the
encryption of that request, anything else to manipulate the key before or after the
request, that is a to you in terms of how you want to think about security.
And just to make the point again.
The only way to prevent multiple users from interfering with each other is for them each
to have their own key.
That is the ideal.
It is their own fault if they go over the limit.
Once you start sharing, and I'm not saying that you should not share there may be reasonable
cases where sharing is reasonable, but just be aware that you could have people surprised
and getting affected by other users that they have no idea about.
So we talked about this last time, just in case you have not heard, all you have to do
to create a key is to get a NCBI account.
You can find the account login in the upper right-hand side of any page very easy to create
an account.
You will find on the main page of the account, there is an API key management section and
click the button.
You click the button get the key copy it and use it.
That is it.
One key per NCBI account.
If you need additional keys you will need to make additional accounts to have those
keys.
Again I have alluded to this, all you do to use the key is place the value and assign
it to this parameter, api underscore key, in any Eutility request.
If you are using edirect, there is an even simpler environmental variable, NCBI underscore
API underscore KEY, you just set that environmental variable to your key and you are done.
You can do that in a config file or however you would like to do that.
Once the environmental variable is set, Edirect will look at that, and provide the key to
all the requests automatically.
A question about this, if you do not like your key anymore, you think you got stolen,
you are concerned about the key, you can always get rid of it, go back to your NCBI account,
create a key, and then the old one will immediately be defunct, and the new one will be immediately
activated.
You can only have one key per account, you can replace at any time, just remember when
you replace it the other one is immediately dead.
This all happens on May first of next year.
We would encourage you to go ahead and get a key, start playing with that, it is not
going to do anything.
There are no restrictions in effect currently.
We was still encourage you to get a key is start thinking about how you would like to
strategize.
If you are going to share it, provide users, ask for users to create a key, whatever strategy
you want to think about, please start thinking about it now.
We would also encourage you as developers, to continue using the tool and email parameters
they are very helpful.
The tool really should be the value, the value of the parameter should be the name of the
software package, you can potentially have many keys associated with that one tool value.
The key does not necessarily help us understand what the software package is, and that can
really help us help you in terms of monitoring activity and getting back to you about concerns
with what is going on with the activity.
If someone is abusing your tool we can help you understand how that is happening.
We cannot do that just from the key we need the tool value to help you ideally.
The email is very helpful to be a developer contact because if you are distributing something
the end-users will have keys but we do not need their email addresses we need your email
address as a developer as to who to talk to about how to solve the problem with the software.
So we continue to encourage you to provide us with those contacts, so that we can help
you in case there are issues.
The tool and email parameters are not required, have nothing to do the keys, but we would
encourage you to continue using them to allow us to have that direct contact.
One thing I want to announce, in early 2018 we will be conducting a few test periods when
the API keys turn on.
This has been a request from several people, they would like to test them.
They would like to test their code with the keys working.
We do not have dates or times planned yet but we will not be doing these until the earliest
January 2018, very likely for a few hours on a particular day we will turn them on,
and then you can have at it and test your code and see if the keys are working as you
would expect them to.
We obviously are very interested in how that works out for you, because it is a test for
us to.
So that will be happening at least a few times throughout early 2018.
Well before the May 1st date.
We will be publicizing those on the blog and social media and on the E utilities announcement.
There is a great email list, I encourage any of you would like to keep abreast of those
things, to subscribe to that utilities-announce.
We will very likely be doing more webinars more posts like this as the date approaches.
If you have additional questions about that please let us know.
Write to us, info@ncbi.nlm.nih.gov if you have any questions after this webinar.
I think at this point I will be quiet and see if we have any questions.
There were a couple of questions during the talk.
They circulate around the idea of everyone having his or her own key.
I want to emphasize that, if you are distributing software you should have some mechanism for
your end user to enter the key that they have obtained themselves from NCBI.
Someone suggested that we put that in the documentation, I think we should do that.
If it is not in the utilities documentation I think we should add that.
Right now that is it, I think you have answered the other question that this person had about
the IP versus what happens when you have a key I think you covered that fairly thoroughly.
If you have additional questions we will stay open a couple more minutes and we can answer
them.
I see another question.
Does this change the overnight window for large jobs?
So the answer to that is essentially no.
What we have been doing in the past, when we have not had keys, it has been more unregulated,
we have encouraged people if you have large jobs to do those in the overnight hours or
on the weekend simply because you are likely to get better response times, and you are
likely to not interfere with other people as much.
When you are doing that.
I think that advice would still hold going forward.
What we would also anticipate is that once the keys go live you should have less
cases of high volume traffic that would impede you because the keys will be limiting that.
It will not be as much of a problem as to when you actually submit the jobs, the keys
are active 24 seven, it does not matter when the request come in, they will be limited
the same way regardless of the time.
It is likely, the way the traffic patterns work you will probably get better service
to certain extent more freedom on the servers overnight and on the weekends.
There is no reason for you to change that behavior.
A couple of more questions.
Will there be a way for us to obtain multiple keys and provide them to our users?
That is a great question at this point we do not have a mechanism in place for a particular
automatic call to generate multiple keys.
It is something that we have noted from one or two people, it is on the list of considerations.
For right now, the only way to get a key is to establish a NCBI account, and then get
the key that way.
So, if we have an update on out we will certainly let you know, it has been something that a
few people have asked about, but at the moment there is no automatic way to get a key.
Another one, any suggestions for server-based software?
For our software there is not a good user to use for a key.
So I suppose for that, it might be interesting to hear more about your specific situation.
In that case.
There have been, one thing I guess I should say, it is important to know what your frequency
of load is.
To have some understanding of that.
It may be that in many cases the three per second limit will be rare for you to hit from
a particular IP.
I do not know how distributed your package is.
Or how many IP's we would be seeing from the activity from your package.
If it is really just one IP, then you need to be much more concerned about the rate obviously.
If it is a distributed thing, and you have thousands of users, all over the place, then
each user would have to exceed that 3 per second rate on their own to have an effect
at all.
so that's one question, you may not need the keys.
Otherwise it is something where each of the individual distributed packages would need
its own key in some way to be able to get over those limits or to have the limits be
higher.
I'm not sure I'm answering your question well if I am not I would be happy to follow up
with you after this with the more particulars about your specific situation.
Another question, do we have documentation on how to put the key in the code?
It simply goes in the URL with however you were passing the URL to us.
Right.
Thanks for the question.
We have not put anything in the documentation specifically about this, we can put some guidance
in, as Peter was saying, the thing that is necessary is for the key to go into the URL.
Or the post.
Whatever object you are creating to send the request.
It needs to go in there.
How it goes in there we do not care.
There is any number of ways they can be put in there.
Like what E direct does, this is one straightforward way to do this, the key get stored in a configuration
entity, it is an environmental variable, it could be a configuration file, or anything
else that is stored in the software as a preference, on the disc as a config, whatever it might
be, the key is stored somewhere the software reads it and puts in the requests.
That is one approach.
I'm sure there are many approaches that I will not think about or think of because there
are many creative people out there in terms of how you deal with keys.
Effectively there is any way that you would like to do this is fine as long as the key
gets into the request.
We may be updating the documentation to add examples, of how to do that, but I think that
is all that I can really say right now.
Do you think that it would be a good idea for the person who asked about the server
software, should they write to you?
Yes absolutely wright to me.
It's sayers@ncbi.nlm.nih.gov which is the same suffix that you see, you can also
just write to info and it will get to me or someone who can answer your question,
no problem.
We would love to talk to about that.
Were absolutely confident that there are many situations we have not thought about as much
as we need to so if you do have a situation that we are not covering, please let us know.
I'm sure there are other people in your situation as well.
Thank you Eric thank you everybody for coming today.
I think we will and the webinar at this point.
Không có nhận xét nào:
Đăng nhận xét