Collection Of Data

Chapter Notes

Data: The facts which can be represented as numerical form are called data

Statistical Enquiry:Statistical enquiry’ refers to some investigation wherein relevant information is collected, analysed and interpreted by the application of statistical methods.

The purpose of data collection is to understand, explain and analyse a problem and causes behind it.

For studying a problem statistically, the relevant data must be collected. The interpretation of the ultimate conclusion and the decisions depend upon the accuracy with which the data are collected. Unless the data are collected with sufficient care and are as accurate as is necessary for the purposes of the inquiry, the result obtained cannot be expected to be valid or reliable.

Before starting the collection of the data, it is necessary to know the sources from which the data are to be collected.

Sources of Data

The primary data is the data which is originally collected by an investigator for the first time for some specific purpose.

The original compiler of the data is the primary source or enumerator. For example, the office of the Registrar General will be the primary source of the decennial population census figures.

The data which is not directly collected by the investigator, but obtained from other published or unpublished source, is known as secondary data.

A secondary source is the one that furnishes the data that were originally compiled by someone else. For example the population census figures issued by the office of the Registrar-General are published in the Indian year Book. This publication will be the secondary source of the population data.

Difference between Primary and SecondaryData

(i) Primary data is original data collected by the investigator while secondary data is already existing and not original.

(ii) The primary data is more reliable and suitable for the purpose of the investigator compared to secondary data

(iii) Primary data is always collected for a specific purpose while secondary data has already been collected for some other purpose.

(iv) Collection of primary data requires elaborate organizational setup and involves considerable amount of money, whereas secondary data does need any organizational setup and is much cheaper than the primary data.

Choice between Primary and Secondary Data

An investigator has to decide whether he will collect fresh (primary) data or he will compile data from the published sources. The primary data can is reliable but the secondary data can be relied upon only by examining the following factors:

(i) source from which they have been obtained;

(ii) their true significance;

(iii) completeness and

(iv) method of collection.

In addition to the above factors, there are other factors to be considered while making choice between the primary or secondary data :

(i) Nature and scope of enquiry.

(ii) Availability of time and money.

(iii) Degree of accuracy required and

(iv) The status of the investigator i.e., individual, private company or Government.

In certain investigations both primary and secondary data may need to be used, one may be supplement the other.

Methods of Collection of Primary Data

The primary methods of collection of statistical information are the following:

1. Direct Personal Observation,

2. Indirect Personal Observation or Oral Observation,

3. Information from Correspondents, and

4. Schedules to be filled in by informants

5. Questionnaire Method

The particular method that is decided to be adopted would depend upon the nature and availability of time, money and other facilities available to the investigation.

1. Direct Personal Observation

In this method, the investigator obtains the data by personal observation.

This method is suitable, when,

 The field of inquiry is small.

 Detailed information is required.

 The nature of information is confidential.

 High degree of accuracy is required.

 Originality is important.

Merits of Direct Personal Observation:

 Since the investigator is closely connected with the collection of data, it is bound to be more accurate.

 The data collected is original

 Highest Response Rate

 Allows use of all types of questions

 Better for using open-ended questions

 Allows clarification of ambiguous questions

For example, if an inquiry is to be conducted into the family budgets and living conditions of industrial labour, the investigator himself lives in the industrial area as one of the industrial workers, mix with other residents and make patient and careful personal observation regarding how they spend, work and live.

Demerits of Direct Personal Observations:

 Not suitable for wide areas or when the number of respondents is very large.

 It is very expensive and time consuming.

 Personal Prejudice: The bias of personal prejudice of investigator can affect the accuracy of data.

 The enumerators involved have to be trained and unbiased.

2. Indirect Personal Observation

In this method, the investigator interviews several persons who are either directly or indirectly in possession of the information sought to be collected. It is different from the first method in which information is collected directly from the persons who are involved in the inquiry. In the case of indirect personal observation, the persons from whom the information is being collected are known as witnesses or informants. However it should be made sure that the informants really passes the knowledge and they are not prejudiced in favour of or against a particular view point. This method is adopted in the following situations:

 Where the information to be collected is of a complete nature.

 When investigation has to be made over a wide area.

 Where the persons involved in the inquiry would be reluctant to part with the information.

 This method is generally adopted by enquiry committee or commissions appointed by government.

3. Mailing Questionnaire

When the data in a survey are collected by mail, the questionnaire is sent to each individual by mail with a request to complete and return it by a given date. These days online surveys or surveys through short messaging service i.e. SMS have become popular.

The advantages of mail survey are

 It is less expensive.

 It allows the researcher to have access to people in remote areas too, who might be difficult to reach in person or by telephone.

 It does not allow influencing of the respondents by the interviewer.

 It also permits the respondents to take sufficient time to give thoughtful answers to the questions.

 Maintains anonymity of respondents

 Best for sensitive questions.

The disadvantages of mail survey are,

 There is less opportunity to provide assistance in clarifying instructions, so there is a possibility of misinterpretation of questions.

 Mailing is also likely to produce low response rates due to certain factors such as returning the questionnaire without completing it, not returning the questionnaire at all

 Loss of questionnaire in the mail itself, etc.

4.     Telephone Interviews

In a telephone interview, the investigator asks questions over the telephone.

The advantages of telephone interviews are,

 They are cheaper than personal interviews and can be conducted in a shorter time.

 Relatively less influence on respondents

 They allow the researcher to assist the respondent by clarifying the questions.

 Relatively high response rate.

 Telephone interview is better in the cases where the respondents are reluctant to answer certain questions in personal interviews.

The disadvantages of telephone interviews are,

 Not all people own telephone, so accessing all people may be difficult.

 Telephone Interviews also obstruct visual reactions of the respondents, which becomes helpful in obtaining information on sensitive issues.

 Possibility of influencing respondents.

5. Information from Correspondents

In this method certain correspondents are appointed in different parts of the field of enquiry, who submit their reports to the central office in their own manner. For example, estimates of agricultural wages may be periodically furnished to the Government by village school teachers.

The local correspondents being on the spot of the enquiry are capable of giving reliable information.

But it is not always advisable to place much reliance on correspondents, who have often got their own personal prejudices. However, by this method, a rough estimate is obtained at a very low cost.

This method is also adopted by various departments of the government, in cases where regular information is to be collected from a wide area.

6. Schedules to be filled in by the informants

In this method, properly drawn up schedules or blank forms are distributed among the persons from whom the necessary figure are to be obtained. The informants would fill in the forms and return them to the officer incharge of investigation. The Government of India issued slips for enumeration at the time of census. These slips are good examples of schedules to be filled in by the informants.

The merit of this method is its simplicity and lesser degree of trouble and pain for the investigator. Its greatest drawback is that the informants may not send back the schedules duly filled in.

Drafting the Questionnaire

The success of questionnaire method of collecting information depends on the proper drafting of the questionnaire. It is a highly specialized job and requires great deal of skill and experience. However, the following general principles may be helpful in framing a questionnaire:

(i) The questionnaire should not be too long. The number of questions should be as minimum as possible. Long questionnaires discourage people from completing them.

(ii) The series of questions should move from general to specific. The questionnaire should start from general questions and proceed to more specific ones. This helps the respondents feel comfortable.

(iii) The questions should be short, simple and easy to understand and they should convey one meaning.

(iv) As far as possible the questions should be such that they can be answered briefly in ‘Yes’ or ‘No’, or in terms of numbers, place, date, etc. When there is a possibility of more than two options of answers, multiple choice questions are more appropriate.

(v) The question should not use double negatives. The questions starting with “Wouldn’t you” or “Don’t you” should be avoided, as they maylead to biased responses.

(vi) The question should not be a leading question, which gives a clue about how the respondent should answer.

(vii) The question should not indicate alternatives to the answer. The questionnaire may consist of closed ended (or structured) questions or open ended (or unstructured) questions. Closed ended or structured questions can either be a two-way question or a multiple choice question.

(viii) The questionnaire should provide necessary instructions to the informants.

Sources of Secondary Data

There are number of sources from which secondary data may be obtained. They may be classified as follow. :

(i) Published sources, and

(ii) Unpublished sources.

(i) Published Sources

The various sources of published data are:

1. Reports and official publications of-

a) International bodies such as the International Monetary Fund, International Finance Corporation, and United Nations Organisation.

b) Central and State Governments- such as the Report of the Patel Committee, etc.

2. Semi official publication. Various local bodies such as Municipal Corporation, and Districts Boards.

3. Private publication of

a) Trade and professional bodies such as the Federation of India, Chamber of Commerce and Institute of Chartered Accountants of India.

b) Financial and Economic Journals such as “Commerce”, ‘Capital’ etc.

c) Annual Reports of Joint Stock Companies.

d) Publication brought out by research agendas, research scholars, etc.

(ii) Unpublished Sources

There are various sources of unpublished data such as records maintained by various government and private offices, studies made by research institutions, scholars, etc., such source can also be used where necessary.

Pilot, Census, and Sample Surveys

Population or Universe

The Population or the Universe is a group to which the results of the study are intended to apply. A population is always all the individuals/items who possess certain characteristics (or a set of characteristics), according to the purpose of the survey.

Pilot Survey

Trying out a survey with a limited number of people is called pilot survey or Pre-Testing of the questionnaire.

 The pilot survey helps in providing a preliminary idea about the survey. It helps in pre-testing of the questionnaire, so as to know the shortcomings and drawbacks of the questions.

 Pilot survey also helps in assessing the suitability of questions, clarity of instructions, performance of enumerators and the cost and time involved in the actual survey.

Census or Complete Enumeration

A survey, which includes every element of the population, is known as Census or the Method of Complete Enumeration.

For example Census of India is carried out every ten years and covers each individual in rural and urban India.

A house-to-house enquiry is carried out, covering all households in India. Demographic data on birth and death rates, literacy, workforce, life expectancy, size and composition of population, etc. are collected and published by the Registrar General of India.

(i) As the entire ‘population’ is studied, the results obtained are most correct.

(ii) In a census, information is available for each individual item of the population which is not possible in the case of a sample. Thus no information is sacrificed under the census method.

(iii) If data are to be secured only from a small fraction of the aggregate, their completeness and accuracy can be ensured only by the census method, since greater attention thereby is given to each item.

(iv) The census mass of data being taken into consideration all the characteristics of the ‘population’ is maintained in original.

(i) The cost of conducting enquiry by the census method is very high as the whole universe is to be investigated.

(ii) The census method is not practicable in very big enquiries due to the inconvenience of individual enumeration.

(iii) In the cases of very big enquiries, the census method can be resorted to by the government agencies only. The application of this method is limited to those who are having adequate financial resources and other facilities at their disposal.

(iv) As all the items in the universe are to be enumerated, there is a need for training of staff and investigators. Sometimes it becomes very difficult to maintain uniformity of standards, when many investigators are involved. Individual preferences and prejudices are there and it becomes very difficult to avoid bias in such type of enquiries.

Sample Survey

A sample refers to a group or section of the population from which information is to be obtained. A good sample (representative sample) is generally smaller than the population and is capable of providing reasonably accurate information about the population at a much lower cost and shorter time.

Methods of Sampling

(i) Random Sampling

Random sampling is one where the individual units from the population (samples) are selected at random.

Example: The government wants to determine the impact of the rise in petrol price on the household budget of a particular locality. Let us take a locality with 300 households.

The names of all the 300 households of that area are written on pieces of paper and mixed well, then 30 names to be interviewed are selected one by one.

In the random sampling, every individual has an equal chance of being selected and the individuals who are selected are just like the ones who are not selected.

This is also called lottery method. The same could be also be done using a Random Number Tabl. Random number tables guarantee equal probability of selection of every individual unit. They are available either in a published form or can be generated

Exit Polls

Exit polls take a random sample of voters who exit the polling booths and are asked whom they voted for. From the data of the sample of voters, the prediction is made.

(ii) Non-Random Sampling

In a non-random sampling method all the units of the population

do not have an equal chance of being selected and convenience or judgement of the investigator plays an important role in selection of the sample. They are mainly selected on the basis of judgment, purpose, convenience or quota.

a) Purposive or Judgemental or Authoritative Sampling

Judgmental sampling is a non-probability sampling technique where the researcher selects units to be sampled based on his knowledge and professional judgment.

Purposive sampling is representative sampling by analysing carefully the universe enquiry and selecting only those which seem to be most representatives of the characteristics of the universe.

This method is suffers from dangers of personal prejudices. Also there is a possibility of certain wrong cases being included in the data under collection, consciously or unconsciously.

However, this method gives a very representative sample data provided neither bias nor prejudices influence the process of data selection.

b) Quota Sampling

Quota sampling is a non-probability samplingtechnique wherein the assembled sample has the same proportions of individuals as the entire population with respect to known characteristics, traits or focused phenomenon.

c) Stratified Sampling

Under this method, the population is first sub-divided into groups or “strata” before the selection of the samples is made. This is done to achieve homogeneity within each group or “stratum”. A stratified sample is nothing but a set of random samples of a number of sub-populations, each representing a single group. The major advantage of such a stratification is that the several sub-divisions of the population which are relevant for purpose of inquiry are adequately represented.

Objectives of Sampling

a) To get as much information as possible of the whole universe by examining only a part of it.

b) To determine the reliability of the estimates. This can be done by drawing successive samples from the some parent universe and comparing the results obtained from different samples.

(i) Sample method is less costly since the sample is a small fraction of the total population.

(ii) Data can be collected and summarized more quickly. This is a vital consideration when the information is urgently needed.

(iii) A sample produces more accurate results than are ordinarily practicable on a complete enumeration.

(iv) Personnel of high quality can be employed and given intensive training as the number of personnel would not be very large.

(v) A sample method is not restricted to the Government agencies. Even private agencies can use this method as the financial burden is not heavy. It is much more economical than the census method.

(i) In a census, information is available for each individual item of the population which is not possible in the case of a sample. Some information has to be sacrificed.

(ii) If data are to be secured only from a small fraction of the aggregate, their completeness and accuracy can be ensured only through the census method, since greater attention thereby is given to each item.

(iii) In using the technique of sampling, the investigator may not choose a representative sample. The aim of sampling is that it should afford a sufficiently accurate picture of a large group without the need for a complete enumeration of all the units of the group. If the sample chosen is not representative of the group, the very object of sampling is defeated.

(iv) The sampling technique is based upon the fundamental assumption that the population to be sampled is homogenous. It is not so, the sampling method should not be adopted unless the population is first divided into groups or “strata” before the selection of the sample is made.

Principle of Sampling

There are two important principles on which the theory of sampling is based. These laws are based on theory of probability;

(i) Principle of Statistical Regularity, and

(ii) Principle of ‘Inertia of Large Numbers’

(i) Principle of Statistical Regularity

Statistical regularity is a notion in statistics and probability theory that random events exhibit regularity when repeated enough times or that enough sufficiently similar random events exhibit regularity.

This principle points out that if a sample is taken at random from a population. It is likely to possess almost the same characteristics as that of the population.

For example, if one intends to make a study of the average weight of the students of Delhi University, it is not necessary to take the weight of each and every student. A few students may be selected at random from every college, their weights taken and the average weight of the University students in general may be inferred.

(ii) Principle of Inertia of Large Numbers

According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

Errors

Sampling Errors

The difference between the actual value of a parameter of the population (which is not known) and its estimate (from the sample) is the sampling error.

It is the error that occurs when we make an observation from the sample taken from the population.

It is possible to reduce the magnitude of sampling error by taking a larger sample.

Non-Sampling Errors

Some of the non-sampling errors are:

a) Errors in Data Acquisition

 This type of error arises from recording of incorrect responses.

 The differences may occur due to differences in instruments

 Carelessness of the enumerator

 Several sources of data, For example, suppose we want to collect data on prices of oranges. The prices vary from shop to shop and from market to market. Prices also vary according to the quality. Therefore, we can only consider the average prices.

 Recording mistakes can also take place as the enumerators or the respondents may commit errors in recording or transscripting the data, for example, he/ she may record 13 instead of 31.

b) Non-Response Errors

Non-response occurs if an interviewer is unable to contact a person listed in the sample or a person from the sample refuses to respond. In this case, the sample observation may not be representative.

c) Sampling Bias

Sampling bias occurs when the sampling plan is such that some members of the target population could not possibly be included in the sample.

Census of India and NSSO

There are some agencies both at the national and state level, which collect, process and tabulate the statistical data. Some of the major agencies at the national level are,

(i) Census of India

(ii) National Sample Survey Organisation (NSSO)

(iii) Central Statistical Organisation (CSO)

(iv) Registrar General of India (RGI)

(v) Directorate General of Commercial Intelligence and Statistics (DGCIS)

(vi) Labour Bureau

Census of India

The Census of India provides the most complete and continuous demographic record of population. The Census is being regularly conducted every ten years since 1881. The first Census after Independence was held in 1951. The Census collects information on various aspects of population such as the size, density, sex ratio, literacy, migration, rural-urban distribution etc.

Census in India is not merely a statistical operation, the data is interpreted and analysed in an interesting manner.

National Sample Survey Organisation (NSSO)

The NSSO was established by the government of India to conduct nation-wide surveys on socioeconomic issues.

The NSSO does continuous surveys in successive rounds. The data collected by NSSO surveys, on different socio economic subjects, are released through reports and its quarterly journal Sarvekshana.

NSSO provides periodic estimates of literacy, school enrolment, utilisation of educational services, employment, unemployment, manufacturing and service sector enterprises, morbidity, maternity, child care, utilisation of the public distribution system etc.

The NSS 59th round survey (January–December 2003) was on land and livestock holdings, debt and investment. The NSS 60th round survey (January–June 2004) was on morbidity and health care.

The NSSO also undertakes the fieldwork of Annual survey of industries, conducts crop estimation surveys, collects rural and urban retail prices for compilation of consumer price index numbers.