Data Mining Effective for Casing-Failure Prediction and Prevention

You have access to this full article to experience the outstanding content available to SPE members and JPT subscribers.

To ensure continued access to JPT's content, please Sign In, JOIN SPE, or Subscribe to JPT

Recent casing failures in the Granite Wash play in the western Anadarko Basin have sparked deep concerns for operators in North Texas and Oklahoma. Hydrostatic tests made in the field show that current API standards do not assure adequate joint and bursting strength to meet deep-well requirements. This paper is part of an ongoing effort to minimize the likelihood of failure using data-mining and machine-learning algorithms.


Casing failure has long presented a challenge to the industry. The combined effects of design, dynamic borehole conditions, metallurgy, and handling have been challenging to quantify and predict accurately. Additionally, most ­casing-string challenges have been handled reactively instead of proactively; the total number of failures have been underreported and overlooked.

The authors focus on the effects of poor cement as a primary factor; this translates into the absence of cement in a case study presented in the complete paper. Additional factors are the pumping of corrosive acids and poor standardized casing design that does not account for varied formations along with cyclical temperatures.

Casing with partial cementation and sheaths with voids can contribute to excessive buckling-related collapse and tensile failures. Large pressure loadings, along with significant change in temperature, contribute to significant stresses in the intercasing annuli. In fragmentally cemented casing, tensile loading can show a great discrepancy between compression and high tension, with instances of failures in both the outer and inner strings. Additionally, cement thickening by downward flow could lack uniformity and could be prone to channeling. Air entrapment might occur, establishing bridges that hinder the process. Some authors in the literature related cementing failures with hole enlargements and washouts in long cement depths. The lack of cement support in those significant intervals exposed the casing to movement during drillpipe rotation, which triggered wear and ultimate buckling.


The data were descriptively visualized using methods such as box plots, mosaic plots, and trellis charts, while predictive techniques included artificial neural networks (ANNs) and boosted-­ensemble trees. A statistical software package was used along with Python coding to implement the models and choose the most-significant factors contributing to failure. Data-preprocessing techniques were implemented. The process began with data cleaning to account for missing data, remove the bias incurred by noise, and remove outliers. For missing values, multivariate normal imputation on the basis of all samples belonging to the same class was used. Then, several parameters from different databases were integrated. Data transformation involved standardizing the data by the subtraction of the mean value and the subsequent division by standard deviation from each feature. Categorical variables were converted to numerical values because models such as neural nets, regression, and nearest-neighbor involve only numeric inputs. The compiled data set comprised 78 wells. Caution should be taken when assessing its statistical significance.

Fig. 1—Box plots for failed vs. offset wells
depending on the fracturing start date. The
first 6 months show a higher probability
of failure.


Descriptive Statistics. Box Plot (One-Way). Box plots graph continuous variables and are a standardized method of exhibiting data distribution on the basis of four quartile segments and five-number representation: minimum, first quartile, median, third quartile (75%), and maximum with three standard deviations. The medial rectangle exemplifies the first quartile up to the third quartile recognized as the interquartile range. A subdivision inside the rectangle displays the median as illustrated in Fig. 1, while the “whiskers” above and below the box demonstrate the minimum and maximum intervals. The four quartiles are of varied sizes in both box plots. The lower whisker and the lower quartile for both plots are approximately the same size.

Mosaic Plots. A mosaic plot is a visual representation of the different cell frequencies. Those frequencies are proportional to the box area, where its area characterizes a conditional relative frequency for each tile in the table. The diagram constitutes a square partitioned into rectangular slates. This graphical display presents an association between two features. Each cell is color-coded to display the eccentricity from the residual frequency. The magnitude of the residual is represented by the color intensity of the corresponding tile. During a mosaic-plot interpretation, each new feature category divides the boxes vertically or horizontally while the area of each box is proportional to its frequency. The gaps between the matching sets of boxes will line up with regard to two independent factors.

The mosaic plot indicates that the use of acid pumping contributed to a higher rate of casing failure. Thirty-five percent of the failed wells underwent acid-stimulation jobs, regardless of the type or quantity of acid, whereas a failure rate of only 22.41% was shown in acid-absent counterparts. The data show that the failure is highest in the winter season (41.67%), followed by spring (33.33%) and then summer (26.09%). Only 14.29% of failures occur in the fall. The data show an equal probability of failure regardless of cement use. Both cemented and uncemented hole sections provided a 12.82% failure rate. This finding is in disagreement with ANN and boosted-ensemble methods, which pinpoint cement absence as a major contributing factor.

Trellis Chart. A chart is created for each level of both categorical and numerical variables. Trellis charts are extremely effective in discovering relationships with multivariate data. They reveal that higher casing-failure probability occurs with lower amounts of proppant mass. Also, they indicate that there are more failures during the cooler months.

Predictive Analytics. Artificial Neural Networks (ANN). ANNs are considered among the most widespread and efficient artificial-intelligence tools. The concept was motivated by biological neurons comparable to those originating in the human brain. ANN learns by accumulating samples of data and evaluating input/output relationships. The architecture has two desired outcomes that are constantly matched. The variances are then fed back to the neuron, which will endeavor to minimize the inaccuracy between the two outputs. This progression continues iteratively until both the output and the favorable output converge.

The purpose of the neural network is for prediction. The model that predicts the best outcome is then used. The bases for making these comparisons are the confusion matrix, receiver-operating-characteristic and lift curves, and various measures of forecast accuracy for continuous variables.

Boosted Ensemble. A boosted ensemble is a supervised-learning method that can be applied to both regression and classification problems. The outputs are based on addition rather than on averaging techniques. The discrete trees do not attempt to forecast the response in a direct manner. As an alternative, they attempt a gradient fitting to mitigate errors in preceding iterations, helping to improve the objective function. The model commences by allocating a number of preliminary values to this function, and subsequently generates a predictive gradient for refining results. The succeeding iteration reflects both the corrections and initial values, then searches for the subsequent gradient to enhance prediction further. The algorithm stops when the predicted function output parallels the actual values or when an iteration limit is reached. The final objective is reached by summing up each model contribution associated with weighted factors.


  • Box plots showed that fracturing during the first 6 months of a year has a more-significant probability of failure.
  • Mosaic plots showed that failure is most common in winter. Also, the data displayed an equal probability of failure regardless of cement use. The highest failure rate occurred in the Cottage Grove, Hogshooter, Checkerboard, and Cleveland formations. Results indicated that the use of acid pumping contributed to a higher rate of casing failure.
  • According to the ANN model, 12 3D contour plots displayed the following:
    • Smaller amounts of proppant, combined with an increased time from drilling to fracturing operations, had higher chances of casing failure.
    • Sudden failure occurred at very low proppant concentrations.
    • Reduced proppant mass, accompanied with a lower casing setting depth, increased the likelihood of casing failure.
    • A shallower casing setting depth caused a high likelihood of casing failure.
    • The probability of failure is higher when acid is pumped, while the probability of failure decreases with decreasing hole size.
    • A higher mean temperature and bottomhole temperature (BHT) increased the probability of success.
    • The probability of failure is high when the pipe shrinkage length is between 4 and 7 ft.
    • The probability of success is highest when the well is cemented with no acid pumping.
    • The highest likelihood of failure exists with acid presence and cement absence.
    • The probability of failure is highest in winter, followed by spring, and high BHT contributed to higher failure chances in summer and fall.
    • The probability of success decreased with increased hole size and decreased BHT.
    • The probability of failure decreased as base water increased and BHT was not a controlling parameter in that case.
    • The probability of failure decreased in hole sizes between 6 and 6.5 in. The probability of failure was reduced with cement presence and a higher BHT.
  • Boosted-tree approaches suggested that six features had the highest contribution to failure, from highest to lowest ranking:
    • Acid presence
    • Fracturing start month
    • Cement presence
    • Fracturing season
    • Maximum inclination
    • Cumulative dogleg severity in lateral plus build section.
  • For field casing-pipe racks, apply a series of pressure tests before running the casing.
This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper IPTC 19311, “Data-Mining Approaches for Casing-Failure Prediction and Prevention,” by Christine Noshi, SPE, Samuel Noynaert, SPE, and Jerome Schubert, SPE, Texas A&M University, prepared for the 2019 International Petroleum Technology Conference, Beijing, 26–28 March. The paper has not been peer reviewed. Copyright 2019 International Petroleum Technology Conference. Reproduced by permission.

Data Mining Effective for Casing-Failure Prediction and Prevention

01 July 2019

Volume: 71 | Issue: 7



Don't miss out on the latest technology delivered to your email weekly.  Sign up for the JPT newsletter.  If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.