Archive for January, 2008

An Introduction to GIS

The motivation for this post is simply the growing importance of Location-Based Services (LBS) in the mobile environment. To provide such services knowledge of location and topography is needed. Services will also benefit from knowing the proximity to complementary services, routes and obstacles. All such information come from GIS. If someone – as a developer or provider – is attempting to get into LBS space, it is vital to understand GIS first. This post is a brief introduction to GIS as I have understood it. I am by no means an expert in this area and readers may wish to do further research on their own. Feel free to add your comments to this post.

There are two definitions of GIS – Geographical Information System and Geographical Information Science. As a system, the focus is on methods and processes. It relates to tools, software and hardware. It defines how data is input, stored, analyzed and displayed. As a science, it deals with the theory behind the system. It asks and answers why something should be done in a certain way. It looks for solutions to technical and theoretical problems. It challenges existing methodologies and models. For example, as a system there may a defined way for data transformation but as a science justifications are needed why such a transformation is correct and applicable.

Historically, there has been no consensus within the fraternity but it is being increasingly felt that GIS is both a science and a system. One cannot survive without the other and both have their place. More than this, the understanding and use of GIS has evolved over time. In the early days it was no more than an automated mapping system. It meant the use of computers to make maps in a cost-effective manner. Perceptions changed quickly as people realized that more can be gleaned from maps than the information available from map data. Visualization brought new possibilities and the idea of spatial analysis was born. Such spatial analysis described how data is to be combined, what questions need to be asked for specific problems and what solutions could be sought by means of spatial relationships. The human eye and the brain can perceive patterns that are not obvious from data described as tables and spreadsheets.

As data became pervasive, the quantitative revolution came to the fore. Number crunching or data intensive processing as they came to known in computing lingo became popular. GIS may not have given impetus to quantitative analysis but it surely made it important. In turn, GIS rode on improvements that happened to quantitative analysis. Nonetheless, students studying GIS are sometimes blamed for seeing GIS as nothing more than quantitative analysis. The fact is that qualitative analysis is an important aspect of GIS. This is what separates the notions of system and science. Intuition and spatial analysis are still the primary drivers for GIS. GIS research is much more than just numbers and quantitative analysis. Figure 1 gives a snapshot of the breadth of analysis that happens in GIS [1].

Figure 1: Content analysis of GIS journals
from 1995-2001 (based on 566 papers)
GIS Research

At this point it would be apt to know the applications that use GIS. The real value for most people is not GIS itself, but rather how and where it is used. It has already been mentioned that in the domain of mobile communications, GIS enables LBS. Traditionally, it used to be used only within geography – to identify, record or classify urban areas, rural areas, forest cover, soil type, course of rivers, to mention a few. These days it is used in many other fields. It can be used by city planners to aid decision making. For example, what’s the best route to lay a road between two points through an urban landscape without going underground? GIS can give answer to such a question. GIS can help in social analysis. If women in a certain area have a higher incidence of breast cancer, GIS data of various contributing factors can be combined and analyzed to arrive at an answer or a set of possible answers. For transportation and service delivery, GIS is an important tool to plan routes, profile markets and determine pricing models. E-governance uses GIS for property survey lines and tax assessments.

I will give an example of where GIS could be useful with reference to Tesco stores in the UK. I noted consistently that Tesco stocks a variety of Indian consumer products in outlets that are close to Indian communities. Two Tesco outlets in different locations often have quite different items. I don’t know how this happened but my guess is that Tesco learnt by experience. Sometimes this intelligent differentiation is missing in outlets. However, if GIS had been used, such stores could stock goods focused to ethic community groups from the outset. There is no learning period needed. Provided decision makers ask the right questions and know how best to use GIS data, Tesco could predict consumer behaviour patterns in a specific area even before it has opened its outlet there.

Approaches and Identities
It will be apparent from the diversity of applications that GIS does not mean the same thing to two people. Cartographers, sociologists, city planners, environmentalists, geologists and scientists, could all potentially look at GIS differently. Let us take the example of mapping a forest area. Wildlife enthusiasts would map the forest cover with emphasis on habitat and conservation. They would consider how much light reaches the forest floor for the undergrowth to survive. On the other hand, forest officials more concerned with the health of trees would focus on height and width of trees. They would consider different types of trees and forest canopy. If it is a commercial forest, loggers would be more concerned with factors associated with their business.

The point about GIS is that data is just a representation of reality. The same reality can be seen differently by different people and they all can be true at the same time. This is somewhat like painting. Two painters can see the same scene in quite different ways. It is said that painting is all about making choices – what details to include, what to leave out and what to emphasize. No one painted olive trees like Van Gogh yet his trees are every bit as real as the post-Romantic Realism of Courbet.

The terms used to describe this are epistemology and ontology. Epistemology is the perspective through which reality is seen. It is sort of a lens that notices some things and filters out the rest. Ontology is refers to the reality. They exist on their own but they are interpreted through epistemology. The reality one sees could be different from another simply because their perspectives are different. Without going into details, different epistemologies have been discussed in literature – social constructivism, positivism, realism, pragmatism. For example, positivism believes in observations that can be repeated before deriving a theory out of it. Realism emphasizes more on specific conditions and particular events. Ultimately, these approaches straddle the divide between GIS being a system and a science.

Governments for example may apply a certain epistemology to suit their purpose. The resulting ontology may not necessarily be the way people see themselves. Thus, Public Participation in GIS (PPGIS) has become important for communities to challenge governments by defining ontologies that they believe is real or at least more real.

For computers, ontology is simply the way data is stored, objects are represented or object relationships are described. For example, a bridge across a road can be underground or above the road. Such relationships are defined and this represents reality for a computer. Data is not everything but they are a key component of GIS.

Handling Data
This is a complex area in itself. There is data collection, classification, interpretation and representation. Broadly, there is raster data and vector data. With raster data, geographical area is divided into units or cells and attributes are set for each of these units. Raster data can be easily transformed and combined. Handling raster data is simple. This is not the case with vector data in which the basic components are points, lines and polygons. A geographical area is described from these components. Both these are means to describe an entire area without any gaps. Generally these are called field models or layer models. There are also object models in which objects are represented within an area but the area in its entirety has not been mapped. Thus object models may have many gaps which may not be significant for the purpose for which these maps have been generated.

Scaling of data is regarded as a difficult activity that involves lots of decision making. At 1:25000 roads, bridges and towers may be clear in an urban area. At 1:75000 such fine details may be lost. The problem is how to aggregate data, classify them correctly and represent them at the scale of 1:75000. It all depends on the context. If a contractor tasked to maintain bridges is looking at such a map, he should probably see bridges even at the scale of 1:75000.

Data collection for a specific purpose is an expensive job. Thus it becomes necessary to share and combine data from multiple sources. The problem with combining data is that each source has collected it for a specific purpose. One source collecting tree data may classify all trees taller than 50 meters as tall. Another source may use a different criterion. If the actual height has not been recorded it becomes difficult to combine the two sets of data and come up with a consistent ontology of tall trees in a forest area. On the flip side, different data sets may be representing the same objects but may use different terminology. For example, a “limited access road” in one set may be same as a “secondary road” in another set. Only with the help of local knowledge we would realize that they are talking about the same roads. Then, the two data sets can be usefully combined. Data semantics varies and it needs to be understood well to make the best use of data. We ought to realize that data offer particular points of view, not an absolute reality. In this context, primary data is one that is collected for a specific purpose. Secondary data is one that is used in GIS but was collected for a different purpose.

Attempt has been made to standardize data so that data can be merged more consistently. Metadata is used to facilitate this. Metadata are descriptors of data. They record how data was collected, when it was collected, at what scale is it applicable, what classification was used and many other things. Metadata is a good step forward but it does not entirely solve the problem of dissimilar data collected differently for different purposes. With metadata, we at least have more information about available data so that we can use them as appropriate. This is really a critical part of GIS these days as data is shared widely. Combining data without understanding the science behind it could lead to inaccurate analysis that builds a conclusion divergent from reality.

Modelling and Analysis
With so much data available, models help us build a picture of the world and its realities. Analysis follows as a necessary step in understanding this reality. Overlay technique and analysis is a fundamental approach. An area can be seen from the perspective of many layers, each of which is overlaid on top of another. Bringing together spatially different data sets can assist in solving problems and answering questions. Schuurman [1] quotes the example of identifying population areas that are at risk of fires in Southern California. Population is on one layer. Rivers which help break the spread of fires in on another layer. Road networks is on another layer that relate to accessibility and user location. Tree or forest cover is yet another layer that relates to spread of fires. This can get further complicated if we bring in local weather patterns and wind directions on another layer. Overlay technique is easily done with raster data but much more complex with polygon data. For computation, overlay uses set theory.

Another example is environmental modelling that could be useful for studying levels of pollution and areas at risk. Air emission is modelled. Noise is modelled. These models are based on factors which might be available as GIS data. Contours from these models are generated to highlight patterns of noise or air pollution. These are then converted into GIS data. The next step is to overlay all this GIS data and visualize the overall impact on the environment in a particular area. Such use of GIS helps in decision making. Thus, GIS today combines visualization as well as quantitative approaches to problem solving.

Decision making exists even in the process of using GIS data. Often many areas are incompletely mapped while others may be complete. Representing all the data on a single map is inaccurate. Thus decision has to be made to bring all data to a common platform of comparison. Data reduction enables one to do this. Likewise, a project attempting to model and analyze something with an accuracy of 50 meters may not be possible for reasons of privacy. One example of this is when working with individual health data. Some process of data averging over a wider area must be used. Spatial boundary definitions present their own problems. GIS likes crisp boundaries but this is never achievable in reality. Scales are different. Classification criteria are different. National data is collected for a different purpose at a different scale than taluk data. Combining the two is not exactly a trivial job. This is named the modifiable area unit problem (MAUP). MAUP deals with aggregation of data from different scales or redrawing of the same map at a different scale.

GIS is an interesting field that has many practical uses. It is more than just data collected for a particular location. It is a science. It is a system. In a future post, I will look at the use of GIS specifically for LBS. From what I have learnt of GIS, a truly powerful LBS application will do a lot more than just feeding users with data based on their location.


  1. Nadine Schuurman, GIS: A Short Introduction, Blackwell Publishing, 2004.

Read Full Post »