In my previous post I mentioned several ways of defining regular expressions in Groovy. Here I want to show how we can use Groovy regex to find the data in the files.
Parsing properties file (simplified)1
Data: each line in the file has the same structure; the entire line can be matched by single regex.
Task: transform each line to the object.
Solution: construct regex with capturing parentheses, apply it to each line, extract captured data.
File.eachLine method, matrix syntax of Matcher object.
Parsing CSV files (simplified)2
Data: each line in the file has the same structure; the line consists of the blocks separated by some character sequence.
Task: transform each line to the list of objects.
Solution: construct regex with capturing parentheses, parse each line with the regex in a loop extracting captured data.
~// Pattern defenition,
\G regex meta-sequence.
Finding snapshot dependencies in the POM (simplified)3
Data: file contains blocks with known boundaries (possibly spanning multiple lines).
Task: extract the blocks satisfying some criteria.
Solution: read the entire file into the string, construct regex with capturing parentheses, apply the regex to the string in a loop.
File.text property, list syntaxt of Matcher object, named capture, global
\x regex modifier, local
\s regex modifier.
Finding stacktraces in the log
Data: file contains entries each of which starts with the same pattern and can span multiple lines. Typical example is log4j log files:
2009-10-16 15:32:12,157 DEBUG [com.ndpar.web.RequestProcessor] Loading user 2009-10-16 15:32:13,258 ERROR [com.ndpar.web.UserController] id to load is required for loading java.lang.IllegalArgumentException: id to load is required for loading at org.hibernate.event.LoadEvent.(LoadEvent.java:74) at org.hibernate.event.LoadEvent.(LoadEvent.java:56) at org.hibernate.impl.SessionImpl.get(SessionImpl.java:839) at org.hibernate.impl.SessionImpl.get(SessionImpl.java:835) at org.springframework.orm.hibernate3.HibernateTemplate$1.doInHibernate(HibernateTemplate.java:531) at org.springframework.orm.hibernate3.HibernateTemplate.doExecute(HibernateTemplate.java:419) at org.springframework.orm.hibernate3.HibernateTemplate.executeWithNativeSession(HibernateTemplate.java:374) at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:525) at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:519) at com.ndpar.dao.UserManager.getUser(UserManager.java:90) ... 62 more 2009-10-16 15:32:14,659 DEBUG [com.ndpar.jms.MessageListener] Received message: ... multi-line message ... 2009-10-16 15:32:15,169 INFO [com.ndpar.dao.UserManager] User: ...
Task: find entries satisfying some criteria.
Solution: read the entire file into the string4, construct regex with capturing parentheses and lookahead, split the string into entries, loop through the result and apply criteria to each entry.
Demonstrates: regex interpolation, combined global regex modifiers
- This example is for demonstration purposes only. In real program you would just use
- The regex is simplified. If you want the real one, take a look at Jeffrey Friedl’s example.
- Again, in reality you would find snapshots using
mvn dependency:resolve | grep SNAPSHOTcommand.
- This approach won’t work for big files. Take a look at this script for practical solution.