| Feedback | Index to site | Removing postcodes from addresses (Access) |

An Excel user defined function to extract characters from text

If you import a text file of data into an Excel worksheet you may wish to extract a particular range of characters from each record. For example if each data record (or row) looks something like 'A100 John 23.5 South' you may wish to extract the number (23.5) which is the third out of four items in the string. As always, there are several ways of doing this.

The most obvious method might be to parse the data. The ribbon option [Data] Data Tools | Text to Columns starts a wizard which will guide you through the process of splitting up a range of data into columns. The wizard prompts you to describe the character which separates each piece of data within the string - in the example above that would be a space. The data is then rewritten on your worksheet, broken up over as many columns as required.

- A B
31 Example 1  
32 Data Extract
33 A100 ABP 0017 F ABP
34 B231 CIM 0041 M CIM
35 C031 HCC 1032 F HCC

[B33] = Mid(A33, 6, 3)

You may however choose to retain the data in its original joined format. If the data you want to extract is always in exactly the same place in each row you can simple use the MID() function to obtain it. The function takes the format =MID(TextRef, StartPosition, NumberOfCharacters) where TextRef is a cell reference containing the text; StartPosition is the character number at which you wish to start extracting; NumberOfCharacters is the length of the element you wish to read from the text.

- A B
37 Example 2  
38 Data Extract
39 A100;AB;17;F 17
40 B231;CIM;241;M 241
41 C031;HCC;1032;F 1032

[B39] =MID(A39, FIND(";", A39, 7)+1,
FIND(";", A39, FIND(";", A39,
7)+1)-FIND(";", A39, 7)-1)

If the item is not always in exactly the same character position within each text string you can incorporate the FIND() function as parameters within the MID() function. This method works if you have a reasonable idea about where the required text is found but it quickly becomes a long and complicated formula.

The FIND() function takes the format =FIND(SearchItem, TextString, StartPos) where SearchItem is the character(s) you wish to find within the TextString, starting your search at the character in position StartPos. It returns a number representing the position at which the SearchItem is found.

Would you like to learn more?


Excel 2010 Formulas by John Walkenbach (with CD)

Note that FIND() is case sensitive which means that searching for 'abc' will not recognise 'ABC'. To perform the same task without case sensitivity, use the similar SEARCH() function.

You may find however that the word or item you wish to extract is not in a regular position and not of a uniform length. Also the positioning may need to be counted from the end rather than the beginning (e.g. the penultimate word in a string containing a varied number of words). In these circumstances the built in Excel functions probably can not cope and some programming is required.

The final example applies a User Defined function (a visual basic macro) to extract a particular word or series of characters from a text string. It takes the form: =MYEXTRACT(TextString, WordNumber, FrontOrBack, Separator) where the TextString is a phrase of words or items; WordNumber is a count of words or items from the front or end of the phrase; FrontOrBack is the character 'F' or 'B' to signify which end of the phrase you are counting from; the optional Separator represents the character which separates each word or item. In this example it is a space but it could be defined as a comma or semi colon.

- A B
43 Example 3  
44 Data Extract
45 Abby Monday 8 Surgery 8
46 Mrs Plumstead Friday 30.5 Pain 30.5
47 Dr Jones Wednesday 103 Trauma 103

[B45] =MyExtract(A45, 2, "B", " ")

In example 3 the function takes the phrase in column A and extracts the second word from the end, where spaces separate each item. You may wish to apply the VALUE() function to convert the text result to a number.

The VBA macro is shown below. This must be copied into a visual basic module.

free download
Function MyExtract(MyText As String, ItemNo As Integer, FrontOrBack As String, Optional MySeparator As String) As String

' This function will scan a string containing at least TWO words, each separated
' by a specified character and extract one of the words. It can extract the word
' (or SubString) counting from either the Back or Front of the text.
' []
Dim LenText As Integer, n As Integer, CountSpaces As Integer
Dim MySt As Integer, MyFin As Integer, MyStep As Integer, Mk1 As Integer, Mk2 As Integer

' MySeparator was an optional parameter
If Len(MySeparator) = 0 Then MySeparator = " "

LenText = Len(MyText)
' You cannot extract a word if length is LT 3 chars
If LenText < 3 Then
MyExtract = "*"
GoTo MyEndBit
End If

' set the direction in which the text is examined
If UCase(FrontOrBack) = "F" Then
MySt = 2
MyFin = LenText - 1
MyStep = 1
MyFin = 2
MySt = LenText - 1
MyStep = -1
End If

' identify the position of characters matching the separator
For n = MySt To MyFin Step MyStep
If Mid(MyText, n, 1) = MySeparator Then
CountSpaces = CountSpaces + 1
If CountSpaces = ItemNo - 1 Then Mk1 = n
If CountSpaces = ItemNo Then Mk2 = n
End If
Next n

If CountSpaces = 0 Then
MyExtract = "*"
GoTo MyEndBit
End If

If UCase(FrontOrBack) = "B" Then
n = Mk1
Mk1 = Mk2
Mk2 = n
End If

If Mk2 = 0 Then Mk2 = LenText + 1
Mk1 = Mk1 + 1

MyExtract = Mid(MyText, Mk1, Mk2 - Mk1)
End Function

There is also a visual basic function for splitting a long string of text into several shorter lengths with appropriate breaks at the ends of words. Link to MakeStr()

excel Get this information as a document
accompanied by Excel worksheets
Document is in PDF format Click here for details about obtaining this
file. It has been rewritten for Excel 2010.

file: XLExtractText.htm 2014 Page last updated Jan14 CMIDX S3 P6 Y