The paper is aimed at efficient mass query optimization of substructure search on a large organic chemical database. Optimization method is based on so called fingerprints-compact bit arrays which represent graph structure in a packed form. Fingerprints allow cheap (but not complete) screening of fault cases, avoiding the subgraph isomorphism algorithm most of the time. Fingerprints, originally proposed by Daylight, are built in three independent sequential phases: (i) determining the characteristic features of a graph, (ii) hashing these features, and (iii) packing the hashes into a bit array. Our approach is novel in the first phase, in which we are using the edge subgraph enumeration, and in the second, in which we use the new graph hashing algorithm. |
full paper (pdf, 80 Kb)