Thursday, April 28, 2011

Regular expression to find and replace a string in a xml

I'm looking for one regular expression that could match a string for three specific cases in a xml file:

  1. : Double-quotes surrounding a string.
  2. : A string surrounded by the characters greater than and Less Than.
  3. : A string surrounded by the characters ; and &.

Example:

  • "MyString" - Valid match
  • >MyString< - Valid match
  • ;MyString& - Valid match

Other possible combinations are invalid match.

  • "MyString< - Invalid match
  • ;MyString" - Invalid match

    From stackoverflow
    • Try this: ("MyString")|(>MyString<)|(;MyString&)

      victor hugo : I think he means any string between " ", > < or ; &
      Brian : He can replace "MyString" with whatever he wants, even another regular expression. He used MyString, so I did too.
    • You cannot use regex to parse xml, it is not a regular grammar. Use an xml parser, seriously.

      When you're using your parser to inspect text node values then and only then you might want to use (\".*?\")|(>.*?<)|(;.*?&) but I doubt you'll find the problem is framed the same way. >MyString< is very suspicious.

      Cerebrus : I understand that this is a common refrain, but nevertheless you should change the "You cannot" part to "You should not"! ;-)
      annakata : I see what you're saying but on what technical grounds? You *cannot* do this with any reliability. "should not" implies that actually sometimes it's ok if you throw an unhandled exception.
      patjbs : In some situations a "quick and dirty" solution outweighs a "clean and polished" one. Regex is perfectly acceptable for processing text (XML formatted or not) in some situations.
      Brian : Especially if the xml might be malformed. XML parsers hate malformed XML. Regular expressions don't care.
      Francis B. : I agree with you on this point and when I need to process an xml file I'm using an xml parser. In this case, I'm receiving a bunch of different formatted xml file (I dont know the format) and I need to convert them for another project. The only thing I know is the strings to convert can be in an attribute or in an inner text of an element. Per example, every Plant.Unit1.Current needs to be changed to Plant.Unit2.Current.
      Francis B. : So, I'm not parsing any xml in this case, I'm just doing a simple search and replace.
      Brian : Regex may be overkill and slower compared to just using the Replace member function of string.
      Francis B. : Thanks Brian, I will check that.

    0 comments:

    Post a Comment