## Cell Weighting

Cell weighting is the most standard weighting scheme, where the weights are computed so the sample totals conform to the target totals on a cell-by-cell basis, which means that the distribution of the target population must be known for each target cell in case of weighting by multiple demographics. Please note: the target grand total 1,500 is different than sample grand total 1,000 in this example.
There will not be any issue if the total must be kept same as the sample.

In the above example, the weighting is being done across two different demographics A and B, where A has 4 categories and B has 3 categories. For cell weighting, the weights the computed for each cell of the distribution, by dividing the target size by the sample size. The outcome is given below. The disadvantage of cell weighting is that it can lead to a large variability to the distribution of the weighting adjustments and inflating the standard deviations of the survey data. Also, a practical disadvantage is that one needs to know the entire target joint distribution of all the weighting variables. For example, if the weighting needs to be done for a survey in US on 3 demographic variables such as age (5 levels), gender (2 levels), marital status (4 levels), then there are 5x2x4=40 cells. If the US state of residence is added, then it becomes 40×50=2000 cells. Thus, it gets extremely difficult as more weighting variables are added.

Rim weighting overcomes these disadvantages. For rim weighting, only the marginal distribution of the target needs to be known. The complete joint distribution of the weighting factors is not necessary. The weights computed in this scheme are more reasonable, reducing the possibility of requiring any weight capping. We will describe it further below.

## Rim/Rake Weighting

An iterative proportional fitting procedure estimates the individual weights. The first iteration computes weights to match the first dimension (weighting variable) totals, the second iteration matches the second dimension totals, and so on. These steps for all the dimensions are performed repeatedly unless convergence is achieved within an acceptable margin of error. If we look at our previous example, we see that the marginal distributions of the weighting variables A and B are as below: In the first iteration, we find the weight factors for A (first dimension). The row of A1 is multiplied by 175/100, the row of A2 is multiplied by 550/500, A3 is multiplied by 430/200, A4 is multiplied by 345/200. The outcome is given below: Obviously, when weighting only using variable A, the counts for B will not match. So, in the next step, the weights for B (second dimension) are computed. The column of B1 is multiplied by 365/356.75, the column of B2 is multiplied by 415/504, and the column of B3 is multiplied by 720/639.25. It can be observed again that the counts for A are not matching now. Here comes the iterative procedure. The weighting is recalculated for A, and then again for B, and so on. Once the counts have achieved convergence within an acceptable margin of error, the final weights are assigned to the respondents.

The final weights from this procedure are given below: ## b³’s Proprietary Tool

The proprietary tool by b³ follows the same rim weighting procedure described above. After the weights are built, the weights are extracted along with the respondent id’s and then merged with the original dataset for further processing, such as tabulation in Wincross/UNCLE. ## Weight Capping

If the computed weights become too high or low, sometimes they are forced to not cross certain limits. In our current practice, we use the lower cap to be 0.2 and the upper cap to be 5.0.

## Number of Acceptable Dimensions for Weighting

If the number of weighting dimensions is too high, any weighting procedure becomes computationally expensive, and the results may be unstable as well. Uncle and WinCross have the limit of 10 weighting variables for this reason; however, in b³’s proprietary software, the limit is removed. It should be noted that although there is no limitation in the current tool, using too many variables can still cause weights to reach the limits, or the procedure to take a long time to converge. In some rare extreme cases b³ tabulation team has used up to 18 variables for weighting. Around 10 variables can be considered as a general guideline for the maximum number of weighting variables. If there are more than this number, a two-step approach can be taken. Step 1. The weighting variables can be checked for correlations. If multiple variables are highly correlated, only one of them can be used. Step 2. The variables can be ordered according to their importance which can be guided by the researchers. The top 10 variables can be used for weighting.

## Using Quotas with Weighting

If a survey has pre-determined quotas on some variables and has some other variable to perform the weighting on, then the quota variables can be used with the other weighting variables during the procedure. Since the iterative procedure adjusts the data to match the target proportions for all weighting variables, it can maintain the quota proportions within an acceptable margin of error.

## Sample Balance Weighting

In cases where sample sizes across segments or geographies need to be balanced, population weights are generated.
Every record/responder in the data set is given a weight (Px) to start off with that’s based on below formula. This factor is inserted into RIM weighting as another variable. This ensure not only are the demographics balanced but also sample sizes line up based on requirements.

Most common methodology used for balancing sample across segments/geographies is to use the measure of central tendency (average). This ensures weight factors are kept within acceptable level, adjustments are not extreme.

## Weighting Efficiency Score (WES)

Weight efficiency score is a metric that’s used to determine the efficacy of weighting algorithm. It’s inversely corelated to variance between actual and target proportions. It’s a numeric score between 0 and 100.

## Wi – Weighted case of i

Weight efficiency score would be calculated using The higher the efficiency score the smaller the variance between target and actual proportions, leading to smaller weight factors. Higher score also indicates the likelihood of majority of weight factors falling in the optimum weighting range (0.8 to 1.2).

Weighting solutions that generate a lower efficiency score are generally not ideal and should be revisited (generate heavy weight factors), score of less than 50 is a good indicator of extreme skews in proportions. Such scores (< 50) also indicate the need for revisiting the algorithm.

## Weighting Report

A report that shows the variance between actual, target and weighted proportions along with weightefficiency score and weight spread (min weight – max weight).

This report helps in showcasing the effectiveness of the weighting solution and acts as a proof that weighting schema achieved the targets. 