I need a regular expression that I can use in VBScript and .NET that will return only the numbers that are found in a string.
For Example any of the following "strings" should return only 1231231234
- 123 123 1234
- (123) 123-1234
- 123-123-1234
- (123)123-1234
- 123.123.1234
- 123 123 1234
- 1 2 3 1 2 3 1 2 3 4
This will be used in an email parser to find telephone numbers that customers may provide in the email and do a database search.
I may have missed a similar regex but I did search on regexlib.com.
[EDIT] - Added code generated by RegexBuddy after setting up musicfreak's answer
VBScript Code
Dim myRegExp, ResultString
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "[^\d]"
ResultString = myRegExp.Replace(SubjectString, "")
VB.NET
Dim ResultString As String
Try
Dim RegexObj As New Regex("[^\d]")
ResultString = RegexObj.Replace(SubjectString, "")
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
C#
string resultString = null;
try {
Regex regexObj = new Regex(@"[^\d]");
resultString = regexObj.Replace(subjectString, "");
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
-
Have you gone through the phone nr category on regexlib. Seems like quite a few do what you need.
-
By the looks of things, your trying to catch any 10 digit phone number....
Why not do a string replace first of all on the text to remove any of the following characters.
<SPACE> , . ( ) - [ ]Then afterwards, you can just do a regex search for a 10 digit number.
\d{10}Brian Boatright : that is what's in place but I wanted to make it match a wider range of input string. -
I don't know if VBScript has some kind of a "regular expression replace" function, but if it does, then you could do something like this pseudocode:
reg_replace(/\D+/g, '', your_string)I don't know VBScript so I can't give you the exact code but this would remove anything that is not a number.
EDIT: Make sure to have the global flag (the "g" at the end of the regexp), otherwise it will only match the first non-number in your string.
Brian Boatright : Thanks! That's exactly what I was looking to do. I knew it had to be somewhat simple. I'm using RegExBuddy and will try to test it and then post the VBScript code. I believe VBScript will do a replace.Matthew Flaschen : If you want to do it with .NET classes, it's basically re = Regex("\D"); re.Replace("123 123 1234", ""). Remember to cache your Regex objects (don't compile them every time the method is called). -
Just installing C# Express so I can test this code, but in .NET, couldn't you simply extract just the digits from the string? Something like this:
string justNumbers = new String(text.Where(Char.IsDigit).ToArray());Brian Boatright : that's very cool.Matt Hamilton : ps. I know I've answered a VB question with C#, but since it's .NET I figured it's worth putting the idea out there. RegEx seems like overkill for something this simple.Brian Boatright : I actually needed VBScript to use in a Classic ASP page but I appreciate your answer.Matthew Flaschen : I was about to post a comment along the lines of, "/Clearly/, regex would be faster for this", but I ran a (unscientific) benchmark in Mono, and Linq won (about half the duration the regex took). :) So my hat is off to you.Mohamed : that is elegant piece of code. -
Note: you've only solved half the problem here.
For US phone numbers entered "in the wild", you may have:
- Phone numbers with or without the "1" prefix
- Phone numbers with or without the area code
- Phone numbers with extension numbers (if you blindly remove all non-digits, you'll miss the "x" or "Ext." or whatever also on the line).
- Possibly, numbers encoded with mnemonic letters (800-BUY-THIS or whatever)
You'll need to add some smarts to your code to conform the resulting list of digits to a single standard that you actually search against in your database.
Some simple things you could do to fix this:
Before the RegEx removal of non-digits, see if there's an "x" in the string. If there is, chop everything off after it (will handle most versions of writing an extension number).
For any number with 10+ digits beginning with a "1", chop off the 1. It's not part of the area code, US area codes start in the 2xx range.
For any number still exceeding 10 digits, assume the remainder is an extension of some sort, and chop it off.
Do your database search using an "ends-with" pattern search (SELECT * FROM mytable WHERE phonenumber LIKE 'blah%'). This will handle sitations (although with the possibility of error) where the area code is not provided, but your database has the number with the area code.
Brian Boatright : true. I did add something after the regex that returned the entire string if it was 10 digits or right(string,10) if it was longer. you last suggestion is a good one and something I will add. thanks! +1
0 comments:
Post a Comment