当前位置：首页 > news >正文

江西哪家网站建设公司好收录文案网站

news 2025/11/14 13:20:45

江西哪家网站建设公司好,收录文案网站,哪个电商平台好做,桂林有帮做公司网站吗R语言 R语言R包详解——stringr包#xff1a;字符处理一切用法皆以说明书为准#xff0c;想要了解该包#xff0c;请多查阅说明书或者查看底层算法。文章目录 R语言一、安装与加载R包二、函数简介三、函数详解3.1、str_c: 字符串拼接3.2、str_trim: 去掉字符串的空格和TA…R语言 R语言R包详解——stringr包字符处理一切用法皆以说明书为准想要了解该包请多查阅说明书或者查看底层算法。文章目录 R语言一、安装与加载R包二、函数简介三、函数详解3.1、str_c: 字符串拼接3.2、str_trim: 去掉字符串的空格和TAB(\t)3.3、str_pad: 以单字符填充字符串的长度3.4、str_dup: 复制字符3.5、str_wrap: 控制字符串输出格式3.6、str_sub: 截取字符串3.7、str_subset: 返回匹配的字符串3.8、word: 从文本中提取单词3.9、str_count: 字符串计数3.10、str_length: 字符串长度3.11、str_sort: 字符串值排序3.12、str_order: 字符串索引排序3.13、str_split / str_split_fixed: 字符串分割3.14、str_detect: 检查匹配字符串的字符3.15、str_match / str_match_all: 从字符串中提取匹配组3.16、str_replace / str_replace_all: 字符串替换3.17、str_replace_na:把NA替换为指定字符串3.18、str_locate / str_locate_all: 找到匹配的字符串的位置3.19、str_extract / str_extract_all: 从字符串中提取匹配字符3.20、str_conv: 字符编码转换3.21、str_to_upper / str_to_lower: 字符串转成大/小写3.22、str_to_title: 字符串转成标题3.23、str_to_sentence字符转为语句3.24、str_glue / str_glue_data提取字符串中的变量3.25、str_remove / str_remove_all: 字符删除四、参数控制函数4.1、boundary: 定义使用边界4.2、coll: 使用标准Unicode排序规则比较字符串。4.3、fixed: 比较文字字节4.4、coll和fixed区别4.5、regex: 定义正则表达式五、用于字符处理的其他包 stringr是R语言中一个常用的字符串处理包它提供了一系列函数来处理和操作字符串。常用的字符串的处理以str_开头来命名方便更直观理解函数的定义看名知意。掌握此包辅以正则表达式足以处理大部分字符处理上的问题。一、安装与加载R包 install.packages(stringr) # 安装R包 library(stringr) # 加载R包 packageVersion(stringr) # 查看加载的R包版本 help(package stringr) # 产看R包的具体信息二、函数简介函数功能str_c字符串拼接str_trim去掉字符串的空格和TAB(\t)str_pad补充字符串的长度str_dup复制字符串str_wrap控制字符串输出格式str_sub截取字符串str_subset返回匹配的字符串word从文本中提取单词str_count字符串计数str_length字符串长度str_sort字符串值排序str_order字符串索引排序规则同str_sortstr_split字符串分割str_split_fixed字符串分割同str_splitstr_detect检查匹配字符串的字符str_match从字符串中提取匹配组。str_match_all从字符串中提取匹配组同str_matchstr_replace字符串替换str_replace_all字符串替换同str_replacestr_replace_na把NA替换为指定字符串str_locate找到匹配的字符串的位置。str_locate_all找到匹配的字符串的位置,同str_locatestr_extract从字符串中提取匹配字符str_extract_all从字符串中提取匹配字符同str_extractstr_conv字符编码转换str_to_upper字符串转成大写str_to_lower字符串转成小写,规则同str_to_upperstr_to_title字符串转成标题,规则同str_to_upperstr_to_sentence字符转为语句str_glue提取字符串中的变量str_remove字符删除str_remove_all字符删除规则同str_remove 三、函数详解 3.1、str_c: 字符串拼接概述用来进行字符串、向量拼接与R语言自带的paste和paste0函数功能类似。语法 str_c(..., sep , collapse NULL)参数列表 …: 多参数的输入 sep: 用于字符串拼接为字符串的分割符。 collapse: 用于向量拼接为向量字符串的分割符。示例 # 默认无向量分割符拼接str_c(a,b) [1] ab # 指定向量分隔符str_c(a,b,sep _) [1] a_b # 指定向量折叠符str_c(c(a,b,c),collapse _) [1] a_b_c # 混合应用str_c(c(a,b),c(c,d),sep /,collapse _) [1] a/c_b/dstr_c与paste函数的异同点 #相同点############# 向量拼接字符串collapse参数的行为一致str_c(c(a,b,c), collapse ) #collapse 将一个向量的所有元素连接成一个字符串collapse设置元素间的连接符 [1] abc paste(c(a,b,c), collapse ) [1] abc #不同点############str_c(a,b) #把多个字符串拼接为一个大的字符串。 [1] ab paste(a,b) # 多字符串拼接默认的sep参数行为不一致 [1] a b #拼接有NA值的字符串向量对NA的处理行为不一致str_c(c(a, NA, b), -d) #若为空则无法连接 [1] a-d NA b-d paste(c(a, NA, b), -d) #即使空也可连接 [1] a -d NA -d b -d str_c(str_replace_na(c(a, NA, b)), -d) #需要进行处理才可连接 [1] a-d NA-d b-d # str_replace_na用于将值NA替换为字符“NA”3.2、str_trim: 去掉字符串的空格和TAB(\t) 概述去掉字符串的空格和TAB(\t)语法 str_trim(string, side c(both, left, right))参数列表 string: 字符串字符串向量。 side: 过滤方式both两边都过滤left左边过滤right右边过滤示例 # 删除字符串两侧的空格str_trim( a ,side both) [1] a # 删除字符串左侧的空格str_trim( a ,side left) [1] a # 删除字符串右侧的空格str_trim( a ,side right) [1] a3.3、str_pad: 以单字符填充字符串的长度概述字符补齐函数str_pad用于在字符串中添加单个字符可选择添加的位置。语法 str_pad(string, width, side c(left, right, both), pad , use_width TRUE)参数列表 string: 字符串字符串向量 width: 字符串填充后的长度若指定的width长度小于string长度则无效扩充 side: 填充方向both两边都填充left左边填充right右边填充 pad: 用于填充的字符要求单字符 use_width: 若为False则返回string不扩充示例 # string ≤ width 无效扩充str_pad(aaaaa,3) [1] aaaaa # string width 默认为从左侧扩充str_pad(aaaaa,10) [1] aaaaa # 更改扩充方式和填充内容both方式下非对称时优先补充至右侧str_pad(aaaaa,10,side both,pad *) [1] **aaaaa***3.4、str_dup: 复制字符概述复制字符串语法 str_dup(string, times)参数列表 string需要重复处理的字符串 times指定重复的次数示例 # 字符串复制str_dup(a,2) [1] aa # 向量复制str_dup(c(a,b,c),1:3) [1] a bb ccc # 组合使用str_c(c(a,b,c),str_dup(c(1,2,3),1:3),sep _,collapse /) [1] a_1/b_22/c_3333.5、str_wrap: 控制字符串输出格式概述用于将长字符串按照指定的宽度进行换行。它可以帮助我们在输出或显示长字符串时使其更易读和美观。语法 str_wrap(string, width 80, indent 0, exdent 0, whitespace_only TRUE))参数列表 string: 字符串字符串向量。 width: 设置一行所占的宽度。 indent: 段落首行的缩进值缩进字符不纳入width的考量内。 exdent: 设置第二行及之后每行缩进缩进字符不纳入width的考量内。 whitespace_only: 若为Ture则换行只会发生在空格处若为False则换行也会发生在非字符,/-等处。示例 text - This is a-long-string that needs to be wrapped to fit within a specified width. # 首行不缩进后面每行缩进两字符缩进字符不纳入width的考量内str_wrap(text, width 14,indent 0,exdent 2) [1] This is\n a-long-string\n that needs\n to be\n wrapped to\n fit within\n a specified\n width. # 缩进会发生在非字符处str_wrap(text, width 14,indent 0,exdent 2,whitespace_only F) [1] This is a-\n long-\n string that\n needs to be\n wrapped to\n fit within\n a specified\n width.3.6、str_sub: 截取字符串概述字符过滤函数str_sub和str_subsetstr_sub函数通过指定开始和结束位置过滤出字符串的部分字符串。语法 str_sub(string, start 1L, end -1L)参数列表 string: 字符串字符串向量。 start : 开始位置 end : 结束位置示例 # 字符过滤正向索引str_sub(string banana,start 1,end 3) [1] ban # 字符过滤反向索引str_sub(string banana,start -2,end -1) [1] na # 字符过滤并赋值x - banana str_sub(string x,start 1,end 1) - A print(x) [1] Aanana # 分2段截取字符串str_sub(banana, c(1, 2), c(3, -2)) [1] ban anan3.7、str_subset: 返回匹配的字符串概述str_subset函数通过匹配模式过滤出满足模式的字符串。语法 str_subset(string, pattern)参数列表 string: 字符串字符串向量。 pattern: 匹配的字符。示例 fruit - c(apple, banana, pear, pinapple) ##返回含字符ap的字符串str_subset(fruit, ap) [1] apple pinapple # 运用正则表达式进行详细的字符匹配## 匹配开头str_subset(fruit, ^a) [1] apple ## 匹配结尾为a的字符串str_subset(fruit, a$) [1] banana ##返回含aeiou任一个字符的单词str_subset(fruit, [aeiou]) [1] apple banana pear pinapple #匹配任意字符即可以实现丢弃空值str_subset(c(a, NA, b), .) [1] a b3.8、word: 从文本中提取单词概述提取一个完整的字符不如str_sub和str_subset来得灵活一些情况下比起前两者来说更加的便捷。语法 word(string, start 1L, end start, sep fixed( ))参数列表 string: 字符串字符串向量。 start: 开始的单词。 end: 结束的单词。 sep: 分隔符。示例 sentences - c(I saw a cat, it sat down,Maybe you-were-right) #提取第二个单词到最后一个单词word(sentences, 2, -1) [1] saw a cat, it sat down you-were-right #整个句子从第一~三个单词到最后一个单词word(sentences[1], 1:3, -1) [1] I saw a cat, it sat down saw a cat, it sat down a cat, it sat down # 指定分隔符word(sentences, 2, -1, sep ,) [1] it sat down NA word(sentences, 2, -1, sep -) [1] NA were-right3.9、str_count: 字符串计数概述字符串计数计算字符串中指定字符的个数语法 str_count(string, pattern )参数列表 string: 字符串字符串向量。 pattern: 匹配的字符。示例 # 单个目标字符计数str_count(string c(sql,json,java),pattern s) [1] 1 1 0 # 多个目标字符计数str_count(string c(sql,json,java),pattern c(s,j,a)) [1] 1 1 2 # 统计字符长度str_count(string c(sql,json,java)) [1] 3 4 43.10、str_length: 字符串长度概述计算字符串长度阉割版str_count函数语法 str_length(string)参数列表 string: 字符串字符串向量。示例 str_length(c(I, am, 福旺旺, NA)) [1] 1 2 3 NA3.11、str_sort: 字符串值排序概述对字符向量进行排序语法 str_sort(x, decreasing FALSE, na_last TRUE, locale en, numeric FALSE,...)参数列表 x: 字符串字符串向量 decreasing: 排序方向 na_last: NA值的存放位置一共3个值TRUE放到最后FALSE放到最前NA过滤处理 locale: 按哪种语言习惯排序默认为en (English) numeric: 若为Ture则将数字当作数值型进行排序处理否则按照字符型排序处理示例 # 字符向量升序排序返回字符向量str_sort(c(sql,json,python,NA)) [1] json python sql NA # 字符向量降序排序返回字符向量并丢弃掉NA值str_sort(c(sql,json,python,NA),decreasing TRUE, na_last NA) [1] sql python json # 字符向量升序排序返回字符向量并将NA值放在第一个str_sort(c(sql,json,python,NA),na_last F) [1] NA json python sql3.12、str_order: 字符串索引排序概述字符串索引排序规则同str_sortstr_order和str_sort的区别在于前者返回排序后的索引下标后者返回排序后的实际值。语法 str_order(x, decreasing FALSE, na_last TRUE, locale en, numeric FALSE,...)参数列表 x: 字符串字符串向量 decreasing: 排序方向 na_last: NA值的存放位置一共3个值TRUE放到最后FALSE放到最前NA过滤处理 locale: 按哪种语言习惯排序默认为en (English) numeric: 若为Ture则将数字当作数值型进行排序处理否则按照字符型排序处理示例 # 字符向量升序排序返回索引向量str_order(c(sql,json,python,NA)) [1] 2 3 1 4 # 字符向量降序排序返回索引向量并丢弃掉NA值str_order(c(sql,json,python,NA),decreasing TRUE, na_last NA) [1] 1 3 2 # 字符向量升序排序返回索引向量并将NA值放在第一个str_order(c(sql,json,python,NA),na_last F) [1] 4 2 3 13.13、str_split / str_split_fixed: 字符串分割概述对字符串进行分割str_split与str_split_fixed的区别在于前者返回列表格式后者返回矩阵格式。语法 str_split(string, pattern, n Inf, simplify FALSE) str_split_fixed(string, pattern, n)参数列表 string: 字符串字符串向量。 pattern: 匹配的字符。 n: 分割个数 #最后一组就不会被分割 simplify: False 返回列表Ture 返回矩阵有了这个参数那str_split_fixed就属于旧时代的遗物了示例 # 字符分割返回列表str_split(string ba-na-na,pattern ) [[1]] [1] b a - n a - n a # 字符分割3次返回列表str_split(string ba-na-na,pattern , n 3) [[1]] [1] b a -na-na # 字符分割返回矩阵str_split(string ba-na-na,pattern -,simplify T)[,1] [,2] [,3] [1,] ba na na # 字符分割需要指定分割块数str_split_fixed(string ba-na-na,pattern -, n Inf)[,1] [,2] [,3] [1,] ba na na3.14、str_detect: 检查匹配字符串的字符概述检查字符串中是否包含指定字符返回逻辑向量。语法 str_detect(string, pattern)参数列表 string: 字符串字符串向量。 pattern: 匹配字符。示例 # 检测字符串中是否包含sstr_detect(string c(sql,json,java),pattern s) [1] TRUE TRUE FALSE # 检测字符串中是否以s开头str_detect(string c(sql,json,java),pattern ^s) [1] TRUE FALSE FALSE3.15、str_match / str_match_all: 从字符串中提取匹配组概述与字符提取函数str_extract类似返回匹配到的字符不同之处在于返回格式。str_match和str_match_all的区别在于前者返回矩阵格式后者返回列表格式。str_match_all会返回每一个匹配到的值str_match只会返回第一个匹配成功的字符。语法 str_match(string, pattern) str_match_all(string, pattern)参数列表 string: 字符串字符串向量。 pattern: 匹配字符。示例 val - c(aabbcc, 123, 1ab) # 从字符串中提取匹配组# 匹配字符a并返回对应的字符str_match(val, a) [,1] [1,] a [2,] NA [3,] a #从字符串中提取匹配组以字符串matrix格式返回str_match_all(val, a) [[1]][,1] [1,] a [2,] a [[2]][,1][[3]][,1] [1,] a # 匹配字符0-9限1个并返回对应的字符str_match(val, [0-9]) [,1] [1,] NA [2,] 1 [3,] 1 # 匹配字符0-9不限数量并返回对应的字符str_match(val, [0-9]*) [,1] [1,] [2,] 123 [3,] 1 # 匹配每一个字符0-9并返回对应的字符str_match_all(val, [0-9]) [[1]][,1][[2]][,1] [1,] 1 [2,] 2 [3,] 3 [[3]][,1] [1,] 1 3.16、str_replace / str_replace_all: 字符串替换概述用于替换字符串中的部分字符str_replace与str_replace_all的区别在于前者只替换一次匹配的对象而后者可以替换所有匹配的对象语法 str_replace(string, pattern, replacement)参数列表 string: 字符串字符串向量。 pattern: 匹配字符。 replacement: 用于替换的字符。示例 #替换第一个匹配的字符# 把目标字符串第一个出现的a或b替换为-str_replace(val, [ab], -) [1] -bc 123 c-a #替换所有匹配的字符 # 把目标字符串所有出现的a或b替换为-str_replace_all(val, [ab], -) [1] --c 123 c--3.17、str_replace_na:把NA替换为指定字符串概述把NA替换为指定字符串语法 str_replace_na(string, replacement NA)参数列表 string: 字符串字符串向量。 replacement : 用于替换的字符。示例 # 把NA值替换为字符串str_replace_na(c(NA,NA,abc),x) [1] x NA abc3.18、str_locate / str_locate_all: 找到匹配的字符串的位置概述字符位置提取函数str_locate和str_locate_all,返回匹配到的字符的位置**str_locate()和str_locate_all()**的区别在于前者只匹配第一个匹配的字符而后者可以匹配所有可能的值语法 str_locate(string, pattern) str_locate_all(string, pattern)参数列表 string: 字符串字符串向量。 pattern: 匹配字符。示例 val - c(aabbcc,123,bacabc) # 用字符匹配str_locate(val, a)start end [1,] 1 1 [2,] NA NA [3,] 2 2 # 用向量匹配str_locate(val, c(a, 12, b))start end [1,] 1 1 [2,] 1 2 [3,] 1 1 # 以字符串matrix格式返回str_locate_all(val, a) [[1]]start end [1,] 1 1 [2,] 2 2[[2]]start end[[3]]start end [1,] 2 2 [2,] 4 4 # 匹配a或b字符以字符串matrix格式返回str_locate_all(val, [ab]) [[1]]start end [1,] 1 1 [2,] 2 2 [3,] 3 3 [4,] 4 4[[2]]start end[[3]]start end [1,] 1 1 [2,] 2 2 [3,] 4 4 [4,] 5 53.19、str_extract / str_extract_all: 从字符串中提取匹配字符概述字符提取函数str_extract和str_extract_all对字符串进行提取str_extract_all函数返回所有的匹配结果。语法 str_extract(string, pattern, group NULL) str_extract_all(string, pattern, simplify FALSE)参数列表 string: 字符串字符串向量。 pattern: 匹配字符。 group: 如果提供则不会返回完整的匹配而是从指定的捕获组返回匹配的文本。 simplify: 返回值TRUE返回matrixFALSE返回字符串向量示例 shopping_list - c(apples x4, bag of flour, bag of sugar, milk x2) # 提取所有数字\转义\d正则表达式等价于[0-9]查找所有数字。str_extract(shopping_list, \\d) [1] 4 NA NA 2 # 提取小写字母匹配前面的子表达式一次或多次。str_extract(shopping_list, [a-z]) [1] apples bag bag milk # 提取小写字母{1,4}匹配前面的子表达式最少1次最多4次。str_extract(shopping_list, [a-z]{1,4}) [1] appl bag bag milk # \b匹配一个单词边界即字与空格间的位置。若左右两侧皆加上\b则表示取一个位于两个空格之间的完整字符串。str_extract(shopping_list, \\b[a-z]{1,4}\\b) [1] NA bag bag milk # ()标记一个子表达式的开始和结束位置。配合group参数可以精确的挑出想要的子表达式。str_extract(shopping_list, ([a-z]) of ([a-z])) [1] NA bag of flour bag of sugar NA str_extract(shopping_list, ([a-z]) of ([a-z]), group 1) [1] NA bag bag NA str_extract(shopping_list, ([a-z]) of ([a-z]), group 2) [1] NA flour sugar NA # 提取所有匹配字符返回列表的形式。str_extract_all(shopping_list, [a-z]) [[1]] [1] apples x [[2]] [1] bag of flour[[3]] [1] bag of sugar[[4]] [1] milk x str_extract_all(shopping_list, \\b[a-z]\\b) [[1]] [1] apples[[2]] [1] bag of flour[[3]] [1] bag of sugar[[4]] [1] milk str_extract_all(shopping_list, \\d) [[1]] [1] 4[[2]] character(0)[[3]] character(0)[[4]] [1] 2 # Simplify参数将返回值转化为矩阵形式str_extract_all(shopping_list, \\b[a-z]\\b, simplify TRUE)[,1] [,2] [,3] [1,] apples [2,] bag of flour [3,] bag of sugar [4,] milk str_extract_all(shopping_list, \\d, simplify TRUE)[,1] [1,] 4 [2,] [3,] [4,] 2 # 将所有的单词提取出来剔除标点之类的非字符str_extract_all(This is, suprisingly, a sentence., boundary(word)) [[1]] [1] This is suprisingly a sentence 3.20、str_conv: 字符编码转换概述转换字符的编码方式语法 str_conv(string, encoding)参数列表 string: 字符串字符串向量。 encoding: 编码名。示例 x - rawToChar(as.raw(177))x [1] \xb1str_conv(x, ISO-8859-2) # Polish a with ogonek [1] ąstr_conv(x, ISO-8859-1) # Plus-minus [1] ±3.21、str_to_upper / str_to_lower: 字符串转成大/小写概述将字符串转成大/小写语法 str_to_upper(string, locale en) str_to_lower(string, locale en)参数列表 string: 字符串字符串向量 locale: 按哪种语言习惯排序默认为en (English)示例 val - This is a dog. It is so cute. # 全大写str_to_upper(val) [1] THIS IS A DOG. IT IS SO CUTE. # 全小写str_to_lower(val) [1] this is a dog. it is so cute.3.22、str_to_title: 字符串转成标题概述将每个单词的首字母都大写语法 str_to_title(string, locale en)参数列表 string: 字符串字符串向量 locale: 按哪种语言习惯排序默认为en (English)示例 val - This is a dog. It is so cute. # 每个单词的首字母都大写str_to_title(val) [1] This Is A Dog. It Is So Cute.3.23、str_to_sentence字符转为语句概述只有第一个单词的首字母大写语法 str_to_sentence(string, locale en)参数列表 string: 字符串字符串向量 locale: 按哪种语言习惯排序默认为en (English)示例 val - This is a dog. It is so cute. # 只有第一个单词的首字母大写str_to_sentence(val) [1] This is a dog. It is so cute.3.24、str_glue / str_glue_data提取字符串中的变量概述字符串格式化函数str_glue用花括号{}表示占位符括号内的变量被替换成全局变量值。str_glue 与 str_glue_data的区别在于参数传递方式和变量引用方式有所不同。如果需要从数据框中获取变量值可以使用str_glue_data函数而如果直接引用变量名可以使用str_glue函数。这两个函数的区别具体如下参数传递方式str_glue函数使用…参数来传递变量而str_glue_data函数使用data参数来传递变量。在str_glue_data中可以通过data参数指定一个数据框data frame其中包含了要插入到字符串中的变量。变量引用方式在str_glue函数中可以直接引用变量名例如{var}“。而在str_glue_data函数中需要使用花括号和句点的组合来引用变量例如”{.data$var}。这是因为str_glue_data需要通过data参数指定数据框所以需要使用句点来引用数据框中的变量。环境设置str_glue函数默认使用当前环境来获取变量值而str_glue_data函数使用data参数指定的数据框作为环境来获取变量值。这意味着在str_glue_data中可以直接使用数据框中的变量名而不需要在变量名前加上数据框的名称。语法 str_glue(..., .sep , .envir parent.frame()) str_glue_data(.x, ..., .sep , .envir parent.frame(), .na NA)参数列表 ...表示要插入到字符串中的变量。可以是一个或多个变量用逗号分隔。 .sep表示多个变量之间的分隔符默认为空格。例如如果设置为-则多个变量之间将用-分隔。 .envir表示要从中获取变量值的环境默认为当前环境。可以是一个环境对象或一个整数表示要获取变量值的环境的层数。 .na表示当变量值为NA时的替代文本默认为空字符串。例如如果设置为NA则当变量值为NA时将使用NA替代。示例 name - Fredage - 50anniversary - as.Date(1991-10-12) # 使用全局变量str_glue(My name is {name}, ,my age next year is {age 1}, ,and my anniversary is {format(anniversary, %A, %B %d, % ... ... [TRUNCATED] My name is Fred, my age next year is 51, and my anniversary is 星期六, 十月 12, 1991. # 双{{}}会失效str_glue(My name is {name}, not {{name}}.) My name is Fred, not {name}. # 使用局部变量str_glue(My name is {name}, ,and my age next year is {age 1}.,name Joe,age 40) My name is Joe, and my age next year is 41. # 调用数据框mtcars %% str_glue_data({rownames(.)} has {hp} hp) Mazda RX4 has 110 hp Mazda RX4 Wag has 110 hp Datsun 710 has 93 hp Hornet 4 Drive has 110 hp Hornet Sportabout has 175 hp Valiant has 105 hp ...3.25、str_remove / str_remove_all: 字符删除概述字符删除函数str_remove和str_remove_all用于删除字符串中的部分字符。语法 str_remove(string, pattern) str_remove_all(string, pattern)参数列表 string: 字符串字符串向量。 pattern: 匹配字符。示例 # 删除第一个匹配到的字符str_remove(string c(abc,123,bac),pattern [ab]) [1] bc 123 ac # 删除所有匹配到的字符str_remove_all(string c(abc,123,bac),pattern [ab]) [1] c 123 c 四、参数控制函数概述参数控制函数仅用于构造功能的参数不能独立使用。boundary: 定义使用边界coll: 使用标准Unicode排序规则比较字符串fixed: 比较文字字节regex: 定义正则表达式 4.1、boundary: 定义使用边界概述定义使用边界语法 boundary(type c(character, line_break, sentence, word),skip_word_none NA,... )参数列表 type: 要检测的边界类型character 每一个字符 line_break 换行符 sentence 一句话以.结尾且句子前后有空格分开 word 单词前后有空格隔开skip_word_none: 忽略不包含任何字符或数字的“单词”一一例如标点符号。默认NA仅在单词边界上拆分时才会跳过此类“单词”。示例 words - c(These are some words.)str_count(words, boundary(word)) [1] 4str_split(words, )[[1]] [1] These are some words.str_split(words, ) [[1]] [1] These are some words. str_split(words, boundary(word))[[1]] [1] These are some words4.2、coll: 使用标准Unicode排序规则比较字符串。概述使用标准Unicode排序规则比较字符串语法 coll(pattern, ignore_case FALSE, locale en, ...)参数列表 pattern: 匹配字符 ignore_case: Ture不区分大小写差异False区分差异 locale: 按哪种语言习惯排序默认为en (English)示例 pattern - a.bstrings - c(abb, a.b)str_detect(strings, pattern) [1] TRUE TRUEstr_detect(strings, fixed(pattern)) [1] FALSE TRUEstr_detect(strings, coll(pattern)) [1] FALSE TRUE# coll() is useful for locale-aware case-insensitive matchingi - c(I, \u0130, i)i [1] I İ istr_detect(i, fixed(i, TRUE)) [1] TRUE FALSE TRUEstr_detect(i, fixed(i, FALSE)) [1] FALSE FALSE TRUEstr_detect(i, coll(i, TRUE)) [1] TRUE FALSE TRUEstr_detect(i, coll(i, TRUE, locale tr)) [1] FALSE TRUE TRUE4.3、fixed: 比较文字字节概述比较文字字节语法 fixed(pattern, ignore_case FALSE)参数列表 pattern: 匹配字符 ignore_case: Ture不区分大小写差异False区分差异示例 pattern - a.bstrings - c(abb, a.b)str_detect(strings, pattern) [1] TRUE TRUEstr_detect(strings, fixed(pattern)) [1] FALSE TRUEstr_detect(strings, coll(pattern)) [1] FALSE TRUE# coll() is useful for locale-aware case-insensitive matchingi - c(I, \u0130, i)i [1] I İ istr_detect(i, fixed(i, TRUE)) [1] TRUE FALSE TRUEstr_detect(i, fixed(i, FALSE)) [1] FALSE FALSE TRUEstr_detect(i, coll(i, TRUE)) [1] TRUE FALSE TRUEstr_detect(i, coll(i, TRUE, locale tr)) [1] FALSE TRUE TRUE4.4、coll和fixed区别在R语言的stringr包中coll和fixed函数都是用于进行字符串匹配和替换的函数但它们有一些区别。 coll函数coll函数用于进行基于正则表达式的字符串匹配和替换。它使用的是基于Unicode的正则表达式引擎可以进行更复杂的模式匹配。coll函数可以接受正则表达式作为模式参数并根据模式进行字符串的匹配和替换。 fixed函数fixed函数用于进行基于固定字符串的字符串匹配和替换。它不使用正则表达式而是直接按照给定的固定字符串进行匹配和替换。fixed函数适用于简单的字符串匹配不需要考虑正则表达式的特殊字符。总的来说coll函数适用于复杂的字符串匹配和替换可以使用正则表达式进行模式匹配。而fixed函数适用于简单的字符串匹配和替换不需要考虑正则表达式的特殊字符。选择使用哪个函数取决于具体的需求和字符串处理的复杂程度。 4.5、regex: 定义正则表达式概述定义正则表达式语法 regex(pattern,ignore_case FALSE,multiline FALSE,comments FALSE,dotall FALSE,... )参数列表 pattern: 匹配字符 ignore_case: Ture不区分大小写差异False区分差异 multiline: 如果TRUE则$和^匹配每一行的开头和结尾。如果为FALSE(默认)则只匹配输入的开始和结束 comments: 如果为TRUE则忽略空格和以#开头的注释。用\\转义文本空间 dotall: 如果为TRUE将匹配行终止符将换行符\n当作是一般字符去识别。示例 # Regular expression variationsstr_extract_all(The Cat in the Hat, [a-z]) [[1]] [1] he at in the at # ignore_case Ture 忽视大小写str_extract_all(The Cat in the Hat, regex([a-z], TRUE)) [[1]] [1] The Cat in the Hat # multiline TRUE 匹配每一行str_extract_all(a\nb\nc, ^.) [[1]] [1] a str_extract_all(a\nb\nc, regex(^., multiline TRUE)) [[1]] [1] a b c # dotall TRUE 匹配换行符str_extract_all(a\nb\nc, a.) [[1]] character(0) str_extract_all(a\nb\nc, regex(a., dotall TRUE)) [[1]] [1] a\n五、用于字符处理的其他包除了stringr包之外R语言中还有其他一些常用的用于字符处理的包包括 stringistringi包是一个功能强大的字符串处理包提供了大量的函数和方法来处理和操作字符串。它支持多种语言和字符编码具有较高的性能。 stringdiststringdist包提供了一系列计算字符串之间距离的函数例如编辑距离、汉明距离等。它可以用于字符串匹配、聚类和分类等任务。 stringistringi包是另一个用于字符串处理的包它提供了一系列函数来处理和操作字符串包括字符串匹配、替换、分割、提取等功能。 stringstring包提供了一些基本的字符串处理函数例如字符串匹配、替换、分割等。它是R语言的基础包无需额外安装。 stringdistroystringdistroy包是stringdist包的扩展提供了更多的字符串距离计算方法例如Jaro-Winkler距离、Smith-Waterman距离等。这些包都提供了丰富的函数和方法来处理和操作字符串具体选择哪个包取决于您的需求和偏好。您可以通过在R中使用install.packages()命令安装这些包并使用library()命令加载它们。同时您也可以通过使用?命令在R中获取更详细的帮助信息。

查看全文

http://www.pierceye.com/news/352658/