How to search with diacritics?

How to search with diacritics?

pisislerpisisler Posts: 106Questions: 21Answers: 1
edited March 2021 in Free community support

I am using diacritics-neutralise plugin to make search available for letters like " İ ". But this doesn't actually work as intended I guess. Here an example:

http://live.datatables.net/damawuca/1/edit?html,css,js,output

See the position column. Try searching for " İ ", it will find nothing. Without using the plugin, it can find but only when searched as capital. So that means it won't find " i " (as this is capitalized like " I " in standard Latin). This also applies to other diacritics as well. For example try searching " ç ", it won't find it although it is there.

I checked the unicode representations in the plugin code, they are all correct in fact. So how to resolve this issue?

Edit: Something very strange happened. I clicked the test case I posted, it worked this time. I closed and re-opened, it doesn't work again.

Answers

  • allanallan Posts: 61,686Questions: 1Answers: 10,100 Site admin

    The search plug-in will replace characters such as İ with their closest Roman alphabet counterpart. Which is why you can't then use İ as a search character.

    Yup, its wrong like that, I fully accept that. We need to find a better way of handling this, so you can search with diacritic character or its Roman alphabet counterpart at the same time.

    I don't have a solution for that at the moment I'm sorry to say, but it is something I want to address in future.

    Allan

  • pisislerpisisler Posts: 106Questions: 21Answers: 1
    edited March 2021

    Thank you @allan . Yes I know its mechanism. Is it not possible to apply the same neutralization to the search keyword too? This way whatever you type to the search box, it will find the text; as it will replace the chars in the keyword too.

    This, of course has a negative side-effect too. Like for example it will find both ı and i at the same time; but it is in my opinion still much better from the current way.

    By the way, some people says to have solved this issue by modifying the core datatable.js file like:

    /**
    * Escape a string such that it can be used in a regular expression
    *
    * @param {string} val string to escape
    * @returns {string} escaped string
    */
    escapeRegex: function ( val ) {
    var letters = { "İ": "[İi]", "I": "[Iı]", "Ş": "[Şş]", "Ğ": "[Ğğ]", "Ü": "[Üü]", "Ö": "[Öö]", "Ç": "[Çç]", "i": "[İi]", "ı": "[Iı]", "ş": "[Şş]", "ğ": "[Ğğ]", "ü": "[Üü]", "ö": "[Öö]", "ç": "[Çç]" };
    var acEscape = [ '/', '.', '*', '+', '?', '|', '(', ')', '[', ']', '{', '}', '\\', '$', '^', '-' ];
    var reReplace = new RegExp( '(\\' + acEscape.join('|\\') + ')', 'g' );
    val=val.replace(reReplace, '\\$1');
    
    return val.replace(/(([İIŞĞÜÇÖiışğüçö]))/g, function (letter) { return letters[letter]; });
    }
    
  • pisislerpisisler Posts: 106Questions: 21Answers: 1

    The comment I posted just disappeared :o

    Thank you @allan . I know its mechanism. It is not possible to apply the same neutralization on the search keyword too?

    This has a side-effect too, as it will find ı and i at the same time but it would still be way better than the current way.

    By the way some people say to have solved this problem by editing the core datatables.js file modifying it like:

    /**
    * Escape a string such that it can be used in a regular expression
    *
    * @param {string} val string to escape
    * @returns {string} escaped string
    */
    escapeRegex: function ( val ) {
    var letters = { "İ": "[İi]", "I": "[Iı]", "Ş": "[Şş]", "Ğ": "[Ğğ]", "Ü": "[Üü]", "Ö": "[Öö]", "Ç": "[Çç]", "i": "[İi]", "ı": "[Iı]", "ş": "[Şş]", "ğ": "[Ğğ]", "ü": "[Üü]", "ö": "[Öö]", "ç": "[Çç]" };
    var acEscape = [ '/', '.', '*', '+', '?', '|', '(', ')', '[', ']', '{', '}', '\\', '$', '^', '-' ];
    var reReplace = new RegExp( '(\\' + acEscape.join('|\\') + ')', 'g' );
    val=val.replace(reReplace, '\\$1');
    
    return val.replace(/(([İIŞĞÜÇÖiışğüçö]))/g, function (letter) { return letters[letter]; });
    }
    
  • colincolin Posts: 15,143Questions: 1Answers: 2,586

    Sorry, your comment (and another) went into the spam filter, removed now.

    Colin

  • pisislerpisisler Posts: 106Questions: 21Answers: 1

    I tried the method that I mentioned in my previous post. Edit jquery.dataTables.min.js file. Find this block:

    escapeRegex:function(a){return a.replace(uc,"\\$1")}
    

    Replace it with:

    escapeRegex: function ( val ) {
    var letters = { "İ": "[İi]", "I": "[Iı]", "Ş": "[Şş]", "Ğ": "[Ğğ]", "Ü": "[Üü]", "Ö": "[Öö]", "Ç": "[Çç]", "i": "[İi]", "ı": "[Iı]", "ş": "[Şş]", "ğ": "[Ğğ]", "ü": "[Üü]", "ö": "[Öö]", "ç": "[Çç]" };
    var acEscape = [ '/', '.', '*', '+', '?', '|', '(', ')', '[', ']', '{', '}', '\\', '$', '^', '-' ];
    var reReplace = new RegExp( '(\\' + acEscape.join('|\\') + ')', 'g' );
    val=val.replace(reReplace, '\\$1');
     
    return val.replace(/(([İIŞĞÜÇÖiışğüçö]))/g, function (letter) { return letters[letter]; });
    }
    

    This way it works perfectly. But I don't find it a good idea to edit original files. So I tried to pluginify the code like:

    jQuery.fn.DataTable.ext.type.search.string = function (data) {
        return !data ? '' : (typeof data === 'string' ? tr_fix(data) : data);
    }
    function tr_fix (data) {
        var letters = { "İ": "[İi]", "I": "[Iı]", "Ş": "[Şş]", "Ğ": "[Ğğ]", "Ü": "[Üü]", "Ö": "[Öö]", "Ç": "[Çç]", "i": "[İi]", "ı": "[Iı]", "ş": "[Şş]", "ğ": "[Ğğ]", "ü": "[Üü]", "ö": "[Öö]", "ç": "[Çç]" };
        var acEscape = [ '/', '.', '*', '+', '?', '|', '(', ')', '[', ']', '{', '}', '\\', '$', '^', '-' ];
        var reReplace = new RegExp( '(\\' + acEscape.join('|\\') + ')', 'g' );
        data=data.replace(reReplace, '\\$1');
        return data.replace(/(([İIŞĞÜÇÖiışğüçö]))/g, function (letter) { return letters[letter]; });
    }
    

    Loading the same code as a plugin altered the search mechanism for sure; but not in an expected way. This way it finds some of the data, but can not find some others, although the content is exactly same.

  • allanallan Posts: 61,686Questions: 1Answers: 10,100 Site admin

    Thanks for posting that. We probably need to look into using locale compare methods provided by the browser to truly fix this. That is on the cards for DataTables 2.

    Allan

This discussion has been closed.