Saturday, November 8, 2008

Multiattribute querying with Boost.MultiIndex

We saw in a prior entry how to define database indexes on a set of columns so that all possible queries involving clauses of the form columni = valuei are accelerated by some index. We see now how to model this situation for an in-memory C++ container, which gives us the opportunity to use Boost.MultiIndex and show some template and preprocessor metaprogramming in action.

So, suppose we have the following struct:

‎struct element
‎{
‎ int x1,x2,x3,x4;
‎};
and we are assigned the task of designing a container of elementss along with efficient querying functions of the form:
‎typedef ... container;

‎template<int N>
‎struct field
‎{
‎ field(int x):x(x){}
‎ int x;
‎};

‎template<int N0>
‎const element* find(const container& c,const field<N0>& f0);

‎...

‎template<int N0,int N1,int N2,int N3>
‎const element* find(
‎ const container& c,
‎ const field<N0>& f0,const field<N1>& f1,
‎ const field<N2>& f2,const field<N3>& f2);
so that operations like the following are possible:
‎container c;
‎...
‎// x3==10 && x1==20
‎find(c,field<3>(10),field<1>(20));

‎// x1==5 && x4==40 && x2==11
‎find(c,field<1>(5),field<4>(40),field<2>(11));
Boost.MultiIndex is a natural choice to implement the container. As we already know, we need as many as six indices to cover all the possible variations:
‎typedef member<element,int,&element::x1> key1;
‎typedef member<element,int,&element::x2> key2;
‎typedef member<element,int,&element::x3> key3;
‎typedef member<element,int,&element::x4> key4;

‎‎typedef multi_index_container<
‎ element,
‎ indexed_by<
‎ ordered_non_unique<composite_key<element,key1,key3,key2,key4> >,
‎ ordered_non_unique<composite_key<element,key4,key1,key3,key2> >,
‎ ordered_non_unique<composite_key<element,key2,key1,key3,key4> >,
‎ ordered_non_unique<composite_key<element,key4,key2,key1,key3> >,
‎ ordered_non_unique<composite_key<element,key3,key2,key1,key4> >,
‎ ordered_non_unique<composite_key<element,key4,key3,key2,key1> >
‎ >
‎> container;

Actually, the combined composite keys are redundant, and we can shorten some of them. Also, we add a little instrumentation to the type for future use, so the final definition of container is:

‎struct key1:member<element,int,&element::x1>,boost::mpl::int_<1>{};
‎struct key2:member<element,int,&element::x2>,boost::mpl::int_<2>{};
‎struct key3:member<element,int,&element::x3>,boost::mpl::int_<3>{};
‎struct key4:member<element,int,&element::x4>,boost::mpl::int_<4>{};

‎template<int>struct cover;

‎typedef multi_index_container<
‎ element,
‎ indexed_by<
‎ ordered_non_unique<
‎ tag<cover<1>,cover<13>,cover<123>,cover<1234> >,
‎ composite_key<element,key1,key3,key2,key4>
‎ >,
‎ ordered_non_unique<
‎ tag<cover<4>,cover<14>,cover<134> >,
‎ composite_key<element,key4,key1,key3>
‎ >,
‎ ordered_non_unique<
‎ tag<cover<2>,cover<12> >,
‎ composite_key<element,key2,key1>
‎ >,
‎ ordered_non_unique<
‎ tag<cover<24>,cover<124> >,
‎ composite_key<element,key4,key2,key1>
‎ >,
‎ ordered_non_unique<
‎ tag<cover<3>,cover<23> >,
‎ composite_key<element,key3,key2>
‎ >,
‎ ordered_non_unique<
‎ tag<cover<34>,cover<234> >,
‎ composite_key<element,key4,key3,key2>
‎ >
‎ >
‎> container;
The keyn extractor derives from boost::mpl::int_<n>, which makes these types easily indexable. As for the cover tags, they simply indicate the different query formulas supported by each index using a straightforward coding technique: for instance, the second index covers the querying formulas
‎x4==v4 --> cover<4>
‎x1==v1 && x4==v4 -->
cover<14>
‎x1==v1 && x3==v3 && x4==v4 -->
cover<134>
To avoid ambiguities, cover numbers are coded assuming the attribute names appear in the order x1, x2, x3, x4, even if the tagged index does not respect this order in the specification of the composite key; also, when two different types cover the same formula, we assign the tag to only one of them. So, there are a total of 15 different tags, exactly as many as the number of different query formulas with up to 4 attributes. cover_number does the metaprogrammatic work of calculating a cover number from an MPL sequence of field<n> types:
‎template<typename FieldSequence>
‎struct cover_number:
‎ boost::mpl::fold<
‎ typename boost::mpl::sort<FieldSequence>::type,
‎ boost::mpl::int_<0>,
‎ boost::mpl::plus<
‎ boost::mpl::times<boost::mpl::_1,boost::mpl::int_<10> >,
‎ boost::mpl::_2
‎ >
‎ >::type
‎{};
The following is a possible realization of the overload of find with three fields:
‎template<int N0,int N1,int N2>
‎const element* find(
‎ const container& c,
‎ const field<N0>& f0,const field<N1>& f1,const field<N2>& f2)
‎{
‎ // find the tag associated to the index conering our fields
‎ typedef cover<
‎ cover_number<
‎ boost::mpl::vector_c<int,N0,N1,N2>
‎ >::value
‎ > tag;

‎ // retrieve the index type
‎ typedef typename container::index<tag>::type index_type;

‎ // retrieve the type of the associated composite key extractor
‎ typedef typename index_type::key_from_value composite_key_type;

‎ // an MPL sequence containing the field types passed
‎ typedef boost::mpl::vector<
‎ field<N0>,field<N1>,field<N2>
‎ > fields_type;

‎ // get the index
‎ const index_type& i=c.get<tag>();

‎ // pack the field values into a tuple...
‎ boost::tuple<int,int,int> t=boost::make_tuple(f0.x,f1.x,f2.x);

‎ // ...and use it with i.find(), rearranging the tuple
‎ // so that each field<n> matches its key<n> in the composite key
‎ typename index_type::iterator it=i.find(
‎ boost::make_tuple(
‎ get<composite_key_type,fields_type,boost::mpl::int_<0> >(t),
‎ get<composite_key_type,fields_type,boost::mpl::int_<1> >(t),
‎ get<composite_key_type,fields_type,boost::mpl::int_<2> >(t)
‎ )
‎ );
‎ if(it!=i.end())return &*it;
‎ else return 0;
‎}
The only missing part is the get template function, that accepts a composite key extractor type, an MPL sequence of fields, a position and a tuple of field values and returns the value associated to the field<n> with the same n as the key<n> at the indicated position in the composite key extractor:
‎template<
‎ typename CompositeKey,typename FieldSequence,
‎ typename Pos,typename Tuple
‎>
‎int get(const Tuple& t)
‎{
‎ typedef typename boost::tuples::element<
‎ Pos::value,
‎ typename CompositeKey::key_extractor_tuple
‎ >::type key_at_pos;
‎ const int M=
‎ boost::mpl::distance<
‎ typename boost::mpl::begin<FieldSequence>::type,
‎ typename boost::mpl::find<
‎ FieldSequence,
‎ field<key_at_pos::value>
‎ >::type
‎ >::value;
‎ return t.template get<M>();
‎}

The code is simpler than it might appear at first glance: if the key at the position Pos of CompositeKey is of the form key<n>, M is simply the distance from the beginning of FieldSequence to the type key<n>. The expression key_at_pos::value takes advantage of the fact that each key<n> is derived from boost::mpl::int_<n>.

The different overloads of find are entirely similar to the one we have just seen. Boost.Preprocessor allows us to generate the overloads without code repetition:

‎#define FIND_PARAM_MACRO(z,n,data) \
‎const field<BOOST_PP_CAT(N,n)>& BOOST_PP_CAT(f,n)

‎#define FIND_FIELD_MACRO(z,n,data) field<BOOST_PP_CAT(N,n)>

‎#define FIND_VALUE_MACRO(z,n,data) BOOST_PP_CAT(f,n).x

‎#define FIND_GET_MACRO(z,n,data) \
‎get<composite_key_type,fields_type,boost::mpl::int_<n> >(t)

‎#define DEFINE_FIND(num_fields) \
‎template<BOOST_PP_ENUM_PARAMS(num_fields,int N)> \
‎const element* find( \
‎ const container& c, \
‎ BOOST_PP_ENUM(num_fields,FIND_PARAM_MACRO,~)) \
‎{ \
‎ typedef cover< \
‎ cover_number< \
‎ boost::mpl::vector_c<int, \
‎ BOOST_PP_ENUM_PARAMS(num_fields,N) \
‎ > \
‎ >::value \
‎ > tag; \
‎ typedef typename container::index<tag>::type index_type; \
‎ typedef typename index_type::key_from_value composite_key_type; \
‎ typedef boost::mpl::vector< \
‎ BOOST_PP_ENUM(num_fields,FIND_FIELD_MACRO,~) \
‎ > fields_type; \
‎ \
‎ const index_type& i=c.get<tag>(); \
‎ boost::tuple< \
‎ BOOST_PP_ENUM_PARAMS(num_fields,int BOOST_PP_INTERCEPT) \
‎ > t=boost::make_tuple( \
‎ BOOST_PP_ENUM(num_fields,FIND_VALUE_MACRO,~) \
‎ ); \
‎ typename index_type::iterator it=i.find( \
‎ boost::make_tuple( \
‎ BOOST_PP_ENUM(num_fields,FIND_GET_MACRO,~) \
‎ ) \
‎ ); \
‎ if(it!=i.end())return &*it; \
‎ else return 0; \
‎}

‎DEFINE_FIND(1)
‎DEFINE_FIND(2)
‎DEFINE_FIND(3)
‎DEFINE_FIND(4)
A complete implementation program is provided.

Although we have metaprogrammed the code for selecting and using the indices of an n-attribute container, still the definition of container was done manually. The following is a very tough challenge for the reader: program a metafunction that accepts an integer n and produces a suitable multi_index_container instantiation providing full query coverage for a struct with attributes x1,...,xn. Note that solving this challenge implies metaprogramming the permutation cover algorithm we devised some entries ago.

1 comment :