Welcome to this demonstration of WPS. Today, I am going to show you how to
use WPS Workbench to analyse the evolution of property
prices in London, combining languages of SAS, R and SQL. We will use the publicity
available dataset of HM land registry that records property
transactions in the UK since 1995. I am going to start by pointing to the libname
where I have loaded the datasets. The house prices dataset has around 23
million transactions recording information such as price, date, type of
property and postcode. We use the macro language to parametrize the report so we
can modify the parameters and get reports for different years. In addition,
these four macro variables control the left, bottom, right and top Cartesian
coordinates of the interest area, so we can easily change our analysis region.
After loading the macro variables, we use the data step to create a smaller
version of the prices dataset focusing on the Greater London region and the
year defined by our macro variable, 2012. Using the SQL language we will join that
dataset with another that links the postcodes with their respective
Cartesian coordinates, enabling us to create the table named "miniprice" that
includes all the information required for doing spatial analyics.
Once executed, we use PROC FREQ to generate a report with the percentage
of types of property sold in each area in 2012. For example, in Greenwich a 48.7%
of sold properties in 2012 were flats Other languages can be used to extend
the WPS spatial analysis capabilities. Using PROC R combined
with the SAS macro language we are going to create a map with the density of
properties sold to highlight were the market activity is. With the "assign"
statement we pass SAS macro variables as parameters in R, so without modifying
the R code we can analyse different regions and
years, we simply need to go to the beginning of the script and replace the
macro variables value to change the report. Executing this code will download
the maps corresponding to our selected region and display the density of
properties sold for the requested year. When we open the report, we can see the
map of London with the high density of sold properties highlighted in orange.
This is the Thames River, that is the City of London, here is Canary Wharf and
there is Waterloo, which along with Whitechapel were the areas with greater
market movement in 2012. To make a detailed report of univariate statistics of prices
per region we are going to create a new dataset using macro language and a data
step to select only data within the target area. Then, we calculate the
statistics with proc means, and reshape the data using proc transpose to finally
make a report using proc print. The output report validates the intuition that
the mean prices in the City of London are much higher than the ones in less
affluent areas such as Lewisham or Greenwich. The last analysis will be a Kriging
regression using the R library automap. Kriging is a type of regression
that uses a given set of points as input and interpolates values over a spatial
grid, providing estimated prices and uncertainty metrics for each element of
that grid. We will export our dataset along with the macro parameters. The
first R code loads several required libraries, including automap to set the
foundations of our analysis. The second call to R uses the coordinates of the
defined boundaries to create a spatial grid of 100x100 meters to use as
the target for the Kriging process and then calls automap to generate the variogram
and the Kriging regression. The results show the Kriging interpolation output,
the variogram fit and the standard error. Note how the areas where we had less
input points have higher standard errors. The final map shows using a colour gradient
the prices overlaid in the London map, highlighting those areas,
such as the City and Canary Wharf have much higher prices than the rest. We can
also use proc GLM analysis to, through non spatial methods, see how variables such
as region, type of dwelling and whether is new or old have an impact on the price.
That allows you to make complex reports end to end. As you can see
we are combining some SAS language, SQL code, R code, however the most important
point is that when you are within WPS you can use macro language
so you can replace the year by 2016, 17 or any other year of interest, or modify the
boundary box and rerun again the whole script to be able to compare the results
of one year to another for example. Not only that, you can use WPS outline
Explorer to compare results from several years. For example, we can focus on the
table of dwellings by area to check how in 2012, 48% of property sold in Greenwich
were flats, while in 2016, flats represented almost
60%. Or compare how the areas of major market activity in 2012 were the city
and Canary Wharf, while in 2016 the market movement seems to have shifted
more towards east London, or the new Greenwich developments. This way using
WPS Workbench you can parametrize the scripts combining SAS, R and SQL
languages to generate a spacial analysis reports of property prices, accidents,
cost of living or any other user case that has a geographical dimension.
Không có nhận xét nào:
Đăng nhận xét