<span id="3dn8r"></span>
    1. <span id="3dn8r"><optgroup id="3dn8r"></optgroup></span><li id="3dn8r"><meter id="3dn8r"></meter></li>


        原項目鏈接如下:

        mio/amadeus


        ESPnet2 TTS model


        mio/amadeus

        This model was trained by mio using amadeus recipe in espnet.


        Demo: How to use in ESPnet2

        Follow the ESPnet installation instructions
        if you haven’t done that already.
        cd espnet
        git checkout d5b5ec7b2e77bd3e10707141818b7e6c57ac6b3f
        pip install -e .
        cd egs2/amadeus/tts1
        ./run.sh --skip_data_prep false --skip_train true --download_model mio/amadeus


        TTS config

        expand

        config: conf/tuning/finetune_vits.yaml
        print_config: false
        log_level: INFO
        dry_run: false
        iterator_type: sequence
        output_dir: exp/tts_amadeus_vits_finetune_from_jsut_32_sentence
        ngpu: 1
        seed: 777
        num_workers: 4
        num_att_plot: 3
        dist_backend: nccl
        dist_init_method: env://
        dist_world_size: null
        dist_rank: null
        local_rank: 0
        dist_master_addr: null
        dist_master_port: null
        dist_launcher: null
        multiprocessing_distributed: false
        unused_parameters: true
        sharded_ddp: false
        cudnn_enabled: true
        cudnn_benchmark: false
        cudnn_deterministic: false
        collect_stats: false
        write_collected_feats: false
        max_epoch: 2000
        patience: null
        val_scheduler_criterion:
        - valid
        - loss
        early_stopping_criterion:
        - valid
        - loss
        - min
        best_model_criterion:
        - - train
        - total_count
        - max
        keep_nbest_models: 3
        nbest_averaging_interval: 0
        grad_clip: -1
        grad_clip_type: 2.0
        grad_noise: false
        accum_grad: 1
        no_forward_run: false
        resume: true
        train_dtype: float32
        use_amp: false
        log_interval: 50
        use_matplotlib: true
        use_tensorboard: true
        create_graph_in_tensorboard: false
        use_wandb: true
        wandb_project: amadeus
        wandb_id: null
        wandb_entity: null
        wandb_name: null
        wandb_model_log_interval: -1
        detect_anomaly: false
        pretrain_path: null
        init_param:
        - downloads/f3698edf589206588f58f5ec837fa516/exp/tts_train_vits_raw_phn_jaconv_pyopenjtalk_accent_with_pause/train.total_count.ave_10best.pth:tts:tts
        ignore_init_mismatch: false
        freeze_param: []
        num_iters_per_epoch: null
        batch_size: 20
        valid_batch_size: null
        batch_bins: 5000000
        valid_batch_bins: null
        train_shape_file:
        - exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_accent_with_pause/train/text_shape.phn
        - exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_accent_with_pause/train/speech_shape
        valid_shape_file:
        - exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_accent_with_pause/valid/text_shape.phn
        - exp/tts_stats_raw_linear_spectrogram_phn_jaconv_pyopenjtalk_accent_with_pause/valid/speech_shape
        batch_type: numel
        valid_batch_type: null
        fold_length:
        - 150
        - 204800
        sort_in_batch: descending
        sort_batch: descending
        multiple_iterator: false
        chunk_length: 500
        chunk_shift_ratio: 0.5
        num_cache_chunks: 1024
        train_data_path_and_name_and_type:
        - - dump/22k/raw/train/text
        - text
        - text
        - - dump/22k/raw/train/wav.scp
        - speech
        - sound
        valid_data_path_and_name_and_type:
        - - dump/22k/raw/dev/text
        - text
        - text
        - - dump/22k/raw/dev/wav.scp
        - speech
        - sound
        allow_variable_data_keys: false
        max_cache_size: 0.0
        max_cache_fd: 32
        valid_max_cache_size: null
        optim: adamw
        optim_conf:
        lr: 0.0001
        betas:
        - 0.8
        - 0.99
        eps: 1.0e-09
        weight_decay: 0.0
        scheduler: exponentiallr
        scheduler_conf:
        gamma: 0.999875
        optim2: adamw
        optim2_conf:
        lr: 0.0001
        betas:
        - 0.8
        - 0.99
        eps: 1.0e-09
        weight_decay: 0.0
        scheduler2: exponentiallr
        scheduler2_conf:
        gamma: 0.999875
        generator_first: false
        token_list:
        - <blank>
        - <unk>
        - '1'
        - '2'
        - '0'
        - '3'
        - '4'
        - '-1'
        - '5'
        - a
        - o
        - '-2'
        - i
        - '-3'
        - u
        - e
        - k
        - n
        - t
        - '6'
        - r
        - '-4'
        - s
        - N
        - m
        - pau
        - '7'
        - sh
        - d
        - g
        - w
        - '8'
        - U
        - '-5'
        - I
        - cl
        - h
        - y
        - b
        - '9'
        - j
        - ts
        - ch
        - '-6'
        - z
        - p
        - '-7'
        - f
        - ky
        - ry
        - '-8'
        - gy
        - '-9'
        - hy
        - ny
        - '-10'
        - by
        - my
        - '-11'
        - '-12'
        - '-13'
        - py
        - '-14'
        - '-15'
        - v
        - '10'
        - '-16'
        - '-17'
        - '11'
        - '-21'
        - '-20'
        - '12'
        - '-19'
        - '13'
        - '-18'
        - '14'
        - dy
        - '15'
        - ty
        - '-22'
        - '16'
        - '18'
        - '19'
        - '17'
        - <sos/eos>
        odim: null
        model_conf: {}
        use_preprocessor: true
        token_type: phn
        bpemodel: null
        non_linguistic_symbols: null
        cleaner: jaconv
        g2p: pyopenjtalk_accent_with_pause
        feats_extract: linear_spectrogram
        feats_extract_conf:
        n_fft: 1024
        hop_length: 256
        win_length: null
        normalize: null
        normalize_conf: {}
        tts: vits
        tts_conf:
        generator_type: vits_generator
        generator_params:
        hidden_channels: 192
        spks: -1
        global_channels: -1
        segment_size: 32
        text_encoder_attention_heads: 2
        text_encoder_ffn_expand: 4
        text_encoder_blocks: 6
        text_encoder_positionwise_layer_type: conv1d
        text_encoder_positionwise_conv_kernel_size: 3
        text_encoder_positional_encoding_layer_type: rel_pos
        text_encoder_self_attention_layer_type: rel_selfattn
        text_encoder_activation_type: swish
        text_encoder_normalize_before: true
        text_encoder_dropout_rate: 0.1
        text_encoder_positional_dropout_rate: 0.0
        text_encoder_attention_dropout_rate: 0.1
        use_macaron_style_in_text_encoder: true
        use_conformer_conv_in_text_encoder: false
        text_encoder_conformer_kernel_size: -1
        decoder_kernel_size: 7
        decoder_channels: 512
        decoder_upsample_scales:
        - 8
        - 8
        - 2
        - 2
        decoder_upsample_kernel_sizes:
        - 16
        - 16
        - 4
        - 4
        decoder_resblock_kernel_sizes:
        - 3
        - 7
        - 11
        decoder_resblock_dilations:
        - - 1
        - 3
        - 5
        - - 1
        - 3
        - 5
        - - 1
        - 3
        - 5
        use_weight_norm_in_decoder: true
        posterior_encoder_kernel_size: 5
        posterior_encoder_layers: 16
        posterior_encoder_stacks: 1
        posterior_encoder_base_dilation: 1
        posterior_encoder_dropout_rate: 0.0
        use_weight_norm_in_posterior_encoder: true
        flow_flows: 4
        flow_kernel_size: 5
        flow_base_dilation: 1
        flow_layers: 4
        flow_dropout_rate: 0.0
        use_weight_norm_in_flow: true
        use_only_mean_in_flow: true
        stochastic_duration_predictor_kernel_size: 3
        stochastic_duration_predictor_dropout_rate: 0.5
        stochastic_duration_predictor_flows: 4
        stochastic_duration_predictor_dds_conv_layers: 3
        vocabs: 85
        aux_channels: 513
        discriminator_type: hifigan_multi_scale_multi_period_discriminator
        discriminator_params:
        scales: 1
        scale_downsample_pooling: AvgPool1d
        scale_downsample_pooling_params:
        kernel_size: 4
        stride: 2
        padding: 2
        scale_discriminator_params:
        in_channels: 1
        out_channels: 1
        kernel_sizes:
        - 15
        - 41
        - 5
        - 3
        channels: 128
        max_downsample_channels: 1024
        max_groups: 16
        bias: true
        downsample_scales:
        - 2
        - 2
        - 4
        - 4
        - 1
        nonlinear_activation: LeakyReLU
        nonlinear_activation_params:
        negative_slope: 0.1
        use_weight_norm: true
        use_spectral_norm: false
        follow_official_norm: false
        periods:
        - 2
        - 3
        - 5
        - 7
        - 11
        period_discriminator_params:
        in_channels: 1
        out_channels: 1
        kernel_sizes:
        - 5
        - 3
        channels: 32
        downsample_scales:
        - 3
        - 3
        - 3
        - 3
        - 1
        max_downsample_channels: 1024
        bias: true
        nonlinear_activation: LeakyReLU
        nonlinear_activation_params:
        negative_slope: 0.1
        use_weight_norm: true
        use_spectral_norm: false
        generator_adv_loss_params:
        average_by_discriminators: false
        loss_type: mse
        discriminator_adv_loss_params:
        average_by_discriminators: false
        loss_type: mse
        feat_match_loss_params:
        average_by_discriminators: false
        average_by_layers: false
        include_final_outputs: true
        mel_loss_params:
        fs: 22050
        n_fft: 1024
        hop_length: 256
        win_length: null
        window: hann
        n_mels: 80
        fmin: 0
        fmax: null
        log_base: null
        lambda_adv: 1.0
        lambda_mel: 45.0
        lambda_feat_match: 2.0
        lambda_dur: 1.0
        lambda_kl: 1.0
        sampling_rate: 22050
        cache_generator_outputs: true
        pitch_extract: null
        pitch_extract_conf: {}
        pitch_normalize: null
        pitch_normalize_conf: {}
        energy_extract: null
        energy_extract_conf: {}
        energy_normalize: null
        energy_normalize_conf: {}
        required:
        - output_dir
        - token_list
        version: '202207'
        distributed: false


        Citing ESPnet

        @inproceedings{watanabe2018espnet,
        author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
        title={{ESPnet}: End-to-End Speech Processing Toolkit},
        year={2018},
        booktitle={Proceedings of Interspeech},
        pages={2207--2211},
        doi={10.21437/Interspeech.2018-1456},
        url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
        }
        @inproceedings{hayashi2020espnet,
        title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
        author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
        booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
        pages={7654--7658},
        year={2020},
        organization={IEEE}
        }

        or arXiv:
        @misc{watanabe2018espnet,
        title={ESPnet: End-to-End Speech Processing Toolkit},
        author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
        year={2018},
        eprint={1804.00015},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
        }

        數(shù)據(jù)統(tǒng)計

        數(shù)據(jù)評估

        kazusam/kt瀏覽人數(shù)已經(jīng)達(dá)到412,如你需要查詢該站的相關(guān)權(quán)重信息,可以點(diǎn)擊"5118數(shù)據(jù)""愛站數(shù)據(jù)""Chinaz數(shù)據(jù)"進(jìn)入;以目前的網(wǎng)站數(shù)據(jù)參考,建議大家請以愛站數(shù)據(jù)為準(zhǔn),更多網(wǎng)站價值評估因素如:kazusam/kt的訪問速度、搜索引擎收錄以及索引量、用戶體驗等;當(dāng)然要評估一個站的價值,最主要還是需要根據(jù)您自身的需求以及需要,一些確切的數(shù)據(jù)則需要找kazusam/kt的站長進(jìn)行洽談提供。如該站的IP、PV、跳出率等!

        關(guān)于kazusam/kt特別聲明

        本站OpenI提供的kazusam/kt都來源于網(wǎng)絡(luò),不保證外部鏈接的準(zhǔn)確性和完整性,同時,對于該外部鏈接的指向,不由OpenI實(shí)際控制,在2023年 5月 26日 下午6:13收錄時,該網(wǎng)頁上的內(nèi)容,都屬于合規(guī)合法,后期網(wǎng)頁的內(nèi)容如出現(xiàn)違規(guī),可以直接聯(lián)系網(wǎng)站管理員進(jìn)行刪除,OpenI不承擔(dān)任何責(zé)任。

        相關(guān)導(dǎo)航

        Trae官網(wǎng)

        暫無評論

        暫無評論...
        主站蜘蛛池模板: 亚洲国产一区二区三区| 免费黄色毛片视频| 亚洲AV无一区二区三区久久| 一级毛片完整版免费播放一区| 免费国产综合视频在线看| 高潮内射免费看片| 亚洲成年人啊啊aa在线观看| 天堂亚洲免费视频| 国产国拍精品亚洲AV片 | 羞羞视频免费网站入口| 国产成人aaa在线视频免费观看| 亚洲成av人在线观看网站| 国产乱色精品成人免费视频 | 无码人妻精品中文字幕免费东京热| 亚洲人成电影福利在线播放| 最近中文字幕无免费| 亚洲一区免费视频| 青草草在线视频永久免费| 免费精品国产自产拍在线观看| 久久精品国产亚洲AV不卡| 国产精品免费AV片在线观看| 亚洲欧洲免费视频| 最近的中文字幕大全免费版| 老湿机一区午夜精品免费福利| 亚洲中文字幕无码一久久区| 一级毛片全部免费播放| 亚洲精品456人成在线| 可以免费观看的一级毛片| a毛片免费全部在线播放**| 亚洲六月丁香六月婷婷色伊人| 国产又黄又爽又刺激的免费网址| 国产免费久久精品丫丫| 国产亚洲婷婷香蕉久久精品 | 91精品成人免费国产| 亚洲最新在线视频| 亚洲av高清在线观看一区二区| 永久免费av无码网站yy| 亚洲偷自拍另类图片二区| 亚洲午夜福利AV一区二区无码| 国产在线jyzzjyzz免费麻豆 | 九九久久精品国产免费看小说|