I am currently implementing a sqlite dll extension for having the regexp replace...
it is so useful that I decided to share it with you all.
first of all, I went with boost + icu because I met some issues with Qt. I still didn't find any workaround for using Qt...
I am working on a mac os x 10.6.3 and I don't know if it works on other platform, maybe yes, maybe not... Let me know :-)
I made a c++ file that I named sqliteextensions.cpp
#include "/usr/local/include/sqlite3ext.h"
SQLITE_EXTENSION_INIT1
#include <unicode/unistr.h>
#include <boost/regex/icu.hpp>
using namespace std;
extern "C" void sqlite3_regexp_replace(sqlite3_context *context, int argc, sqlite3_value **argv){
UnicodeString column_value_uft8 = UnicodeString::fromUTF8((const char*)sqlite3_value_text(argv[0]));
const unsigned char* pattern_utf8 = sqlite3_value_text(argv[1]);
UnicodeString pattern_replace_utf8 = UnicodeString::fromUTF8((const char*)sqlite3_value_text(argv[2]));
const boost::u32regex e = boost::make_u32regex(pattern_utf8); //from docs: UTF-8 when char size = 1
UnicodeString us = boost::u32regex_replace(column_value_uft8, e, pattern_replace_utf8);
string cs;
us.toUTF8String(cs);
sqlite3_result_text( context, cs.c_str(), -1, SQLITE_TRANSIENT );
}
extern "C" int sqlite3_extension_init(
sqlite3 *db,
char **pzErrMsg,
const sqlite3_api_routines *pApi
){
SQLITE_EXTENSION_INIT2(pApi)
sqlite3_create_function(db, "regexp_replace", 3, SQLITE_UTF8, 0, sqlite3_regexp_replace, 0, 0);
return 0;
}
then you compile this file using the command line below:
g++ -O3 -shared src/sqliteextensions.cpp \
-arch x86_64 -Xarch_x86_64 \
-framework CoreFoundation \
-L/usr/local/lib/ -lboost_regex \
-licudata -licui18n -licuio -licule -liculx -licutu -licuuc \
-o ../libsqliteextensions.dylib
now run sqlite3 (for me located in /usr/local/bin/ to have the latest version that I installed)
and type this command:
.load ../../libsqliteextensions.dylib
now you can use the regexp_replace function like:
update mytable set myfield = regexp_replace(myfield,'^[ \t]+','')
... and much more ... That's really great, isn't it?
it made my day :-)
I forgot to mention that I used ICU 4.2 and boost 1.43.
boost has been compiled with ICU, to make the regex lib aware of unicode.
Write to you soon.
Sylvain